Archive

Archive for August, 2013

Stored Procedure to Delete Rows in Batches Based on a Date Column

August 8, 2013 2 comments

I’ve posted previously about log space reservation (here and here). I’ve also posted about running dynamic T-SQL in a WHILE loop (here). This morning, I wrote a procedure that combines the two topics.

Because SQL Server may have to roll an active transaction back, it reserves space in the log file in addition to the space used to hold the log records. This reserved space is released when the transaction commits.

One of the places I’ve seen this cause problems is a routine that deletes historical data from a table. Imagine you have an application that has been active for two years. It contains 150 million rows and has never been purged. A business decision is made to purge all data older than two months, which accounts for approximately 140 million rows. The simplest way to code this is as follows:

USE [databasename]
GO

DELETE FROM historytable
WHERE insertdate < DATEADD(MONTH,-2,GETDATE());
GO

This will work, but it will consume a huge amount of log space because all 140 million deletes will happen in a single transaction. So, in addition to the log space required for the deletes and any associated allocations, SQL Server will hold on to the reserved space for all 140 million records for the entirety of the transaction. A better solution is to delete records in smaller batches, with each delete wrapped in an explicit transaction.

That brings us to the procedure I wrote this morning for a situation very similar to the one I described above. The procedure will delete rows from a table based on the value of a date column. It allows you to set the number of rows to delete in each batch along with the threshold for deletion. See below for a sample execution.

-- =============================================
-- Create uspBatchDelete
-- Created : 2013-08-08
-- Created By: Frank Gill
-- Performs a batch delete of rows from a table 
-- Inputs - @tablename - the table to delete from
--			@datecolumn - the date column to compare against
--			@increment - the number of rows to delete in each batch
--          @numberofdays - the number of days of data you want to maintain
--			(unless your database has a time machine, the last parm should be negative)
-- =============================================

-- Drop stored procedure if it already exists
IF EXISTS (
  SELECT * 
    FROM INFORMATION_SCHEMA.ROUTINES 
   WHERE SPECIFIC_SCHEMA = N'dbo'
     AND SPECIFIC_NAME = N'usp_Batch_Delete' 
)
   DROP PROCEDURE dbo.uspBatchDelete
GO

CREATE PROCEDURE dbo.uspBatchDelete
	@tablename SYSNAME, 
	@datecolumn SYSNAME,
	@increment BIGINT,
	@numberofdays INT
AS
	
	BEGIN TRY
	
	-- Declare local variables
	DECLARE @sqlstr NVARCHAR(2000);
	DECLARE @rowcount BIGINT;
	DECLARE @loopcount BIGINT;
	DECLARE @ParmDefinition nvarchar(500);
	
	-- Set the parameters for the sp_executesql statement
	SET @ParmDefinition = N'@rowcountOUT BIGINT OUTPUT';
	-- Initialize the loop counter
	SET @loopcount = 1;
	
	-- Build the dynamic SQL string to return the row count
	-- Note that the input parameters are concatenated into the string, while the output parameter is contained in the string
	-- Also note that running a COUNT(*) on a large table can take a long time

	SET @sqlstr = N'SELECT @rowcountOUT = COUNT(*)
    FROM ' + @tablename + ' WITH (NOLOCK)
    WHERE ' + @datecolumn + ' < DATEADD(DAY,' + CAST(@numberofdays AS VARCHAR(4)) + ',GETDATE())';
	
	-- Execute the SQL String using sp_executesql, passing in the parameter definition and defining the output variable
	EXECUTE sp_executesql
    @sqlstr
    ,@ParmDefinition
    ,@rowcountOUT = @rowcount OUTPUT;
	 
	-- Perform the loop while there are rows to delete
	WHILE @loopcount <= @rowcount
	BEGIN
		 
		 -- Build a dynamic SQL string to delete rows
		 SET @sqlstr = 'DELETE TOP (' + CAST(@increment AS VARCHAR(10)) + ') FROM ' + @tablename + '
		 WHERE ' + @datecolumn + ' < DATEADD(DAY,' + CAST(@numberofdays AS VARCHAR(4)) + ',GETDATE())';
		 
		 -- Execute the dynamic SQL string to delete a batch of rows
		 EXEC(@sqlstr);
		 
		 -- Add the @increment value to @loopcount
		 SET @loopcount = @loopcount + @increment;
		 
		 PRINT CAST(@increment AS VARCHAR(10)) + ' rows deleted.'
		 
	END
	
	END TRY
	
	BEGIN CATCH
		
		SELECT
		ERROR_NUMBER() AS ErrorNumber
		,ERROR_SEVERITY() AS ErrorSeverity
		,ERROR_STATE() AS ErrorState
		,ERROR_PROCEDURE() AS ErrorProcedure
		,ERROR_LINE() AS ErrorLine
		,ERROR_MESSAGE() AS ErrorMessage;

	END CATCH
GO

In this example, the procedure will delete rows from tablename in batches of 50,000 where datecolumnname is older than two months. If you are dealing with a scenario similar to the one described above, deleting 22 months of data in a single run may take longer than you want. In that case, you can run the procedure multiple times changing the @numberofdays parm with each run. You will need to do some testing to determine the optimal values for the @increment and @numberofdays parameters.

-- =============================================
-- Example to execute the stored procedure
-- =============================================
EXECUTE dbo.uspBatchDelete @tablename = 'tablename',
@datecolumn = 'datecolumnname', 
@increment = 50000, 
@numberofdays = -60;
GO

I hope you can put this code to use.

Using Policy-Based Management to Update PAGE_VERIFY Settings – Part Two

August 2, 2013 2 comments

In my last post, I showed how to define a policy using Policy-Based Management (PBM). In this post, I will show how to evaluate this policy. While a policy can be evaluated against a single instance of SQL Server, PBM is especially powerful when used with Central Management Server. For more information about setting up Central Management Server, see my previous post on the topic.


5 - CMS

To begin evaluating your policy, open the Registered Servers tab in Management Studio and expand the Central Management Servers node. Right-click on the server group you want to evaluate and select New Policy. In the example, I will be using the 2005-2008 group.


6 - Evaluate Policies

In the Evaluate Policies dialog, click the lookup button next to the Source box.


7 - Select Source

Click the radio button next to Server and enter the name of the instance where you defined your policy. Click OK.


8 - Evaluate Policies 2

You should see your policy in the Policies list. To evaluate it, check the checkbox next to the policy and click Evaluate. Central Management Server will connect to each of the instances defined in the selected server group and will evaluate the policy, returning the results in the Target Details pane.


9 - Evaluate Policies Results

In my example, a number of databasesfailed the test. As a result, a red X is displayed next to the policy name in the Results pane. The Target Details pane will show the status for each database. I’ve obscured the results, to protect the innocent (and my job), but the Server column contains the name of the instance connected to and the Target column contains the following: SQLSERVER:\SQL\<Machine Name>\<Instance Name>\Database\<Database Name>.


A - Evaluate Plices Result Detail

Clicking the View link next to a failed database will show you the reason for the failure. In this case, the PAGE_VERIFY setting for the database is set to NONE. That, in technical terms, is real bad! Fortunately, Policy-Based Management gives us an easy method of fixing all databases in violation of the policy.


B - Apply Policy Changes

To apply the policy against all databases in violation, check the checkbox next to the policy name in the Results pane. This will select all of the databases in violation. If you want to apply the policy to a subset of the violators, you can select those manually in the Target Details pane. Once you’ve selected the databases you want to apply the policy to, click the Apply button.


C - Policy Apply Warning

Management Studio warns you that you are going to make changes to the databases in violation of the policy. If you are sure that this is what you want to do, click Yes.


D - Policy Applied

Central Management Server will connect to the instances containing databases in violation of the policy and change the PAGE_VERIFY setting to CHECKSUM. When it completes, you should see a green check mark next to the policy name in the Results pane, as well as green check marks next to all databases in the detail pane. If so, congratulations, you’ve successfully evaluated and applied your policy and PAGE_VERIFY is set to CHECKSUM for all databases.

These posts have demonstrated how to define and evaluate a Policy-Based Management policy with a single condition. There are many other facets and conditions that can be evaluated. You can get more information on Policy-Based Management at Books Online. And if you are really interested in learning more, I recommend Pro SQL Server 2008 Policy-Based Management.

Policy-Based Management provides a powerful tool for evaluating and standardizing your environment. I hope this introduction is helpful to you.