Home > Administration, SQL Server, SQL Server Transaction Log, T-SQL > Stored Procedure to Delete Rows in Batches Based on a Date Column

Stored Procedure to Delete Rows in Batches Based on a Date Column

I’ve posted previously about log space reservation (here and here). I’ve also posted about running dynamic T-SQL in a WHILE loop (here). This morning, I wrote a procedure that combines the two topics.

Because SQL Server may have to roll an active transaction back, it reserves space in the log file in addition to the space used to hold the log records. This reserved space is released when the transaction commits.

One of the places I’ve seen this cause problems is a routine that deletes historical data from a table. Imagine you have an application that has been active for two years. It contains 150 million rows and has never been purged. A business decision is made to purge all data older than two months, which accounts for approximately 140 million rows. The simplest way to code this is as follows:

USE [databasename]
GO

DELETE FROM historytable
WHERE insertdate < DATEADD(MONTH,-2,GETDATE());
GO

This will work, but it will consume a huge amount of log space because all 140 million deletes will happen in a single transaction. So, in addition to the log space required for the deletes and any associated allocations, SQL Server will hold on to the reserved space for all 140 million records for the entirety of the transaction. A better solution is to delete records in smaller batches, with each delete wrapped in an explicit transaction.

That brings us to the procedure I wrote this morning for a situation very similar to the one I described above. The procedure will delete rows from a table based on the value of a date column. It allows you to set the number of rows to delete in each batch along with the threshold for deletion. See below for a sample execution.

-- =============================================
-- Create uspBatchDelete
-- Created : 2013-08-08
-- Created By: Frank Gill
-- Performs a batch delete of rows from a table 
-- Inputs - @tablename - the table to delete from
--			@datecolumn - the date column to compare against
--			@increment - the number of rows to delete in each batch
--          @numberofdays - the number of days of data you want to maintain
--			(unless your database has a time machine, the last parm should be negative)
-- =============================================

-- Drop stored procedure if it already exists
IF EXISTS (
  SELECT * 
    FROM INFORMATION_SCHEMA.ROUTINES 
   WHERE SPECIFIC_SCHEMA = N'dbo'
     AND SPECIFIC_NAME = N'usp_Batch_Delete' 
)
   DROP PROCEDURE dbo.uspBatchDelete
GO

CREATE PROCEDURE dbo.uspBatchDelete
	@tablename SYSNAME, 
	@datecolumn SYSNAME,
	@increment BIGINT,
	@numberofdays INT
AS
	
	BEGIN TRY
	
	-- Declare local variables
	DECLARE @sqlstr NVARCHAR(2000);
	DECLARE @rowcount BIGINT;
	DECLARE @loopcount BIGINT;
	DECLARE @ParmDefinition nvarchar(500);
	
	-- Set the parameters for the sp_executesql statement
	SET @ParmDefinition = N'@rowcountOUT BIGINT OUTPUT';
	-- Initialize the loop counter
	SET @loopcount = 1;
	
	-- Build the dynamic SQL string to return the row count
	-- Note that the input parameters are concatenated into the string, while the output parameter is contained in the string
	-- Also note that running a COUNT(*) on a large table can take a long time

	SET @sqlstr = N'SELECT @rowcountOUT = COUNT(*)
    FROM ' + @tablename + ' WITH (NOLOCK)
    WHERE ' + @datecolumn + ' < DATEADD(DAY,' + CAST(@numberofdays AS VARCHAR(4)) + ',GETDATE())';
	
	-- Execute the SQL String using sp_executesql, passing in the parameter definition and defining the output variable
	EXECUTE sp_executesql
    @sqlstr
    ,@ParmDefinition
    ,@rowcountOUT = @rowcount OUTPUT;
	 
	-- Perform the loop while there are rows to delete
	WHILE @loopcount <= @rowcount
	BEGIN
		 
		 -- Build a dynamic SQL string to delete rows
		 SET @sqlstr = 'DELETE TOP (' + CAST(@increment AS VARCHAR(10)) + ') FROM ' + @tablename + '
		 WHERE ' + @datecolumn + ' < DATEADD(DAY,' + CAST(@numberofdays AS VARCHAR(4)) + ',GETDATE())';
		 
		 -- Execute the dynamic SQL string to delete a batch of rows
		 EXEC(@sqlstr);
		 
		 -- Add the @increment value to @loopcount
		 SET @loopcount = @loopcount + @increment;
		 
		 PRINT CAST(@increment AS VARCHAR(10)) + ' rows deleted.'
		 
	END
	
	END TRY
	
	BEGIN CATCH
		
		SELECT
		ERROR_NUMBER() AS ErrorNumber
		,ERROR_SEVERITY() AS ErrorSeverity
		,ERROR_STATE() AS ErrorState
		,ERROR_PROCEDURE() AS ErrorProcedure
		,ERROR_LINE() AS ErrorLine
		,ERROR_MESSAGE() AS ErrorMessage;

	END CATCH
GO

In this example, the procedure will delete rows from tablename in batches of 50,000 where datecolumnname is older than two months. If you are dealing with a scenario similar to the one described above, deleting 22 months of data in a single run may take longer than you want. In that case, you can run the procedure multiple times changing the @numberofdays parm with each run. You will need to do some testing to determine the optimal values for the @increment and @numberofdays parameters.

-- =============================================
-- Example to execute the stored procedure
-- =============================================
EXECUTE dbo.uspBatchDelete @tablename = 'tablename',
@datecolumn = 'datecolumnname', 
@increment = 50000, 
@numberofdays = -60;
GO

I hope you can put this code to use.

  1. Martin Zwarthoed
    April 1, 2014 at 07:37

    I’ve noticed that the script has one minor error: PRINT @increment + ‘ rows deleted.’
    The @increment variable has to be converted to varchar, otherwise the procedure will fail. The rest of the procedure works fine. Thanks! It was exactly what I was looking for.

    • April 1, 2014 at 09:24

      Martin,

      Thank you for pointing that out. I’ve updated the code with a CAST and tested it. I am glad you could use the proc.

      Frank

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: