Archive
Identifying Corrupt Rows Using DBCC CHECKDB and DBCC PAGE
Four months ago, when I blogged about giving all the presentations, things were starting to ramp up pat my new job. I've really enjoyed the last few months, but the combination of my presenting schedule and the responsibilities of the new gig has left little time for blogging.
This week, I ran into a corruption issue that gave me a chance to make a deeper dive into internals than I have in a while. It also gave me the opportunity to develop a script to dump the results of DBCC PAGE to a table.
This week, we discovered corruption in a client's database we were migrating. Digging deeper into the results of DBCC CHECKDB, I found that six objects in the database were corrupted badly enough to require data loss. One of those objects contained 47 irreparable pages.
In an effort to provide the client as much information as possible about the corrupted data, I set out to identify the rows contained on the corrupt pages. Here are the scripts I used.
First, I needed to create tables to hold the results of DBCC CHECKDB and DBCC PAGE. For the PageResults table, I guessed at the column definitions. For DBCCResults, I made all of the numeric columns BIGINT to prevent conversion errors.
USE [Your_Database] GO /* Create table to hold the results of DBCC CHECKDB */ /****** Object: Table [dbo].[DBCCResults] Script Date: 9/30/2014 11:00:47 AM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO SET ANSI_PADDING ON GO IF OBJECT_ID('Your_Database..DBCCResults') IS NOT NULL BEGIN DROP TABLE DBCCResults; END IF OBJECT_ID('Your_Database..PageResults') IS NOT NULL BEGIN DROP TABLE PageResults; END CREATE TABLE PageResults (ParentObject VARCHAR(100) ,[Object] VARCHAR(1000) ,[Field] VARCHAR(100) ,[VALUE] VARCHAR(1000)) CREATE TABLE [dbo].[DBCCResults]( [Error] [bigint] NULL, [Level] [bigint] NULL, [State] [bigint] NULL, [MessageText] [varchar](7000) NULL, [RepairLevel] [varchar](7000) NULL, [Status] [bigint] NULL, [DbId] [bigint] NULL, [DbFragID] [bigint] NULL, [ObjId] [bigint] NULL, [IndId] [bigint] NULL, [PartID] [bigint] NULL, [AllocID] [bigint] NULL, [RidDbid] [bigint] NULL, [RidPruid] [bigint] NULL, [File] [bigint] NULL, [Page] [bigint] NULL, [Slot] [bigint] NULL, [RefDbid] [bigint] NULL, [RefPruId] [bigint] NULL, [RefFile] [bigint] NULL, [RefPage] [bigint] NULL, [RefSlot] [bigint] NULL, [Allocation] [bigint] NULL ) ON [PRIMARY] GO
After creating the tables, I needed to generate DBCC CHECKDB statements for each database. The script below uses sys.databases as the driver, and you can modify the results with an appropriate WHERE clause. The script is set to print the DBCC CHECKDB commands. Replacing the PRINT statement with EXEC sp_executesql will execute the statement after generation. Upon execution, the results of DBCC CHECKDB will be inserted into the DBCC CHECKDB table.
USE master; GO /* Drop temp table if it exists */ IF OBJECT_ID('tempdb..#databases') IS NOT NULL BEGIN DROP TABLE #databases END /* Declare local variables and create temp tables */ DECLARE @sqlstr NVARCHAR(2000); DECLARE @dbname SYSNAME; DECLARE @loopcount TINYINT = 1; DECLARE @looplimit TINYINT; CREATE TABLE #databases (dbname SYSNAME); /* This will generate statements for all user databases Change the WHERE clause to limit or expand the results */ INSERT INTO #databases SELECT name FROM master.sys.databases WHERE name > 4; /* Get the loop limit */ SELECT @looplimit = @@ROWCOUNT; /* Build the DBCC CHECKDB statement for each database This code will print each statement Replace the PRINT statement with EXEC sp_executesql @sql str to execute the code */ WHILE @loopcount <= @looplimit BEGIN SELECT TOP 1 @dbname = dbname FROM #databases; SELECT @sqlstr = 'INSERT INTO Your_Database..DBCCResults EXEC(''DBCC CHECKDB(' + @dbname + ') WITH TABLERESULTS, NO_INFOMSGS'')'; PRINT @sqlstr; SELECT @loopcount += 1; DELETE FROM #databases WHERE dbname = @dbname; END
Once, the DBCCResults table was populated, I needed to dump the contents of the irreparable pages. For corrupted pages, DBCC CHECKDB returns two possible RepairLevels -- repair_rebuild indicates the page can be rebuilt to fix the corruption; repair_allow_data_loss means that the page is corrupted so badly it cannot be repaired. The only option is to delete the page and restore consistency to the object structure. This link from Paul Randal provides a much more detailed explanation.
USE Your_Database; /* Drop temp tables if they exist */ SET NOCOUNT ON; IF OBJECT_ID('tempdb..#DBCCPages') IS NOT NULL BEGIN DROP TABLE #DBCCPages; END IF OBJECT_ID('tempdb..#PageResults') IS NOT NULL BEGIN DROP TABLE #PageResults; END /* Create temp tables */ CREATE TABLE #PageResults (ParentObject VARCHAR(100) ,[Object] VARCHAR(1000) ,[Field] VARCHAR(100) ,[VALUE] VARCHAR(1000)) /* Declare local variables */ DECLARE @loopcount INT = 1; DECLARE @looplimit INT; DECLARE @sqlstr NVARCHAR(4000); DECLARE @pagenum BIGINT; /* Select information about the corrupt data This example selects rows with a RepairLevel of repair_allow_data_loss for a single object */ SELECT DISTINCT O.name, T.RepairLevel, T.IndId, T.PartID, T.Page INTO #DBCCPages FROM DBCCResults T INNER JOIN Your_Database.sys.objects O ON t.ObjId = O.object_id WHERE RepairLevel = 'repair_allow_data_loss' AND O.name = 'Your_Table' ORDER BY O.name, T.Page; /* Set the loop limit */ SET @looplimit = @@ROWCOUNT; /* Build a DBCC PAGE statement for each corrupt page and execute it Insert the results into the #PageResults temp table */ WHILE @loopcount <= @looplimit BEGIN SELECT TOP 1 @pagenum = Page FROM #DBCCPages SET @sqlstr = 'DBCC PAGE (Your_Database,1,' + CAST(@pagenum AS NVARCHAR) + ',3) WITH TABLERESULTS' INSERT INTO PageResults EXEC sp_executesql @sqlstr; SET @loopcount += 1; DELETE FROM #DBCCPages WHERE Page = @pagenum; END
Because I needed to provide the client with a list of rows affected by the corruption, I had to dump all pages identified as repair_allow_data_loss. The script below queries DBCC Results to return a list of PageIDs for the irreparable pages. It runs DBCC PAGE...WITH TABLERESULTS for each PageID and writes the results to the PageResults table. Here is a sample of the output:
You can see that the results contain a set of rows for each slot, or table row, contained on the page. I needed to take each set of those rows and pivot them into a single row. The script below does that. You will need to add a MIN(CASE statement for each row you want to pivot.
USE Your_Database; /* Select data from PageResults to return the key value for each row contained on a corrupt page */ SELECT ParentObject, MIN(CASE Field WHEN 'col1' THEN VALUE END) AS col1, MIN(CASE Field WHEN 'col2' THEN VALUE END) AS col2, MIN(CASE Field WHEN 'col3' THEN VALUE END) AS col3, MIN(CASE Field WHEN 'col4' THEN VALUE END) AS col4, MIN(CASE Field WHEN 'col5' THEN VALUE END) AS col5 INTO Your_Database..PageResults FROM #PageResults GROUP BY ParentObject ORDER BY col1, col2, col3, col4, col5;
I hope this post provides more insight into the tools and methods available for analyzing corruption and determining its extent.
T-SQL Tuesday – Extended Events in SQL Server 2012
I started as a SQL Server DBA in November of 2007 and attended my first PASS Summit in 2008. While there, I saw Jerome Halmans present on Advanced Troubleshooting with SQL Server Extended Events. It was one of my favorite presentations of the summit and I wanted to start using Extended Events in my environment. However, every time I tried to use them, the overhead of getting them set up was greater than their usefulness. I found myself using other troubleshooting tools that I was more familiar with.
Despite these difficulties, I remained interested in using Extended Events. Several months ago, I downloaded Jonathan Kehayias' ( blog | twitter) SQL Server 2008 Extended Events SSMS Addin but couldn't get any momentum going.
That brings me to my most useful feature in SQL Server 2012. This week's Brent Ozar PLF newsletter contained a link to Bob Dorr's article on SQL Server 2012: RML, XEvent Viewer and Distributed Replay. I just started playing with Extended Events in 2012 today and I am really excited about it.
They are now integrated into Management Studio, removing the manual scripting that was necessary to use them previously. The interface looks and feels a lot like SQL Profiler, but Extended Events has much less overhead than a Profiler trace. And they are one of Thomas LaRock's ( blog | twitter ) top 3 things you should learn in SQL Server 2012.
I will be at the SQL Skills immersion event on internals and performance next week, but Extended Events is on the top of my list when I get back.
Using Central Management Server
Central Management Server (CMS) allows a query to be executed against multiple instances of SQL Server at one time. It is a very useful tool for returning information about your environment and creating the same object on several instances. I will use one of the queries from my previous post in this example.
To set up CMS, open SQL Server Management Studio 2008 or later and click View and select Registered Servers. On the Registered Servers pane, right-click Central Management Servers and select Register Central Management Servers.
Enter a SQL Server name into the Server name field. You can test the connection using the Test button. Once you are satisfied, click Save. Your server will appear in the tree on the left.
Right-click the server name and select New Server Registration.
In the New Server Registration dialog, enter a server you want to add. After testing the connection, click Save. The registered server will appear under the Central Management Server. Repeat this for all of the servers you want to add. Also, note that you can create multiple levels in the CMS tree structure. In my environment, we have the clustered servers grouped together by both data center and environment. This allows queries to be executed against the entire environment as well as any subset of the instances.
To execute a query against multiple databases, right-click on the level in the tree you want to run against and select New Query.
A query window will open. You will notice that the status bar is pink and the lower left-hand corner will show Connected (n/n), displaying the number of instances you are connected to.
For this example, I am going to use the xp_fixeddrives query from my last post, but you can use CMS to execute any T-SQL query.
IMPORTANT NOTE: USE EXTREME CAUTION WHEN EXECUTING ANY INSERTS, UPDATES, DELETES, DROPS OR ANY OTHER POTENTIALLY DESTRUCTIVE QUERIES.
Once you click Execute, there is no going back. If you execute
DROP DATABASE master
against all of the instances in your environment, you could potentially drop every copy of master. Additionally, there is no way to control the order in which the servers return their results, so canceling a query may have unexpected results.
Once you've absorbed that, click execute. Your query results will return with one result set per instance. The first column in each row will always contain the instance the returning that row.
There are a few things I recommend when running CMS queries.
First, always declare a USE clause. This will prevent problems when running various system stored procedures against 2005 and 2008 systems at the same time.
Second, when running a SELECT statement, always use a column list. This will prevent problems if the number of columns returned has changed between versions of SQL Server.
Third, keep in mind CMS connects to each instance and executes your query, returning each result set individually. This means you have no control over the order the result sets return and cannot order the entire result set. If you code an ORDER BY in your query, each result set will be sorted. If I need to sort the entire result set, I will copy it into Excel and do the sorting there.
I use CMS on a daily basis to return information from my environment. It also makes creating and modifying code used in MSX/TSX jobs very simple. With a single execution I can push a new stored procedure out to every instance in my environment. Have fun playing with CMS to see what uses you can find for it.