Archive
Identifying Corrupt Rows Using DBCC CHECKDB and DBCC PAGE
Four months ago, when I blogged about giving all the presentations, things were starting to ramp up pat my new job. I've really enjoyed the last few months, but the combination of my presenting schedule and the responsibilities of the new gig has left little time for blogging.
This week, I ran into a corruption issue that gave me a chance to make a deeper dive into internals than I have in a while. It also gave me the opportunity to develop a script to dump the results of DBCC PAGE to a table.
This week, we discovered corruption in a client's database we were migrating. Digging deeper into the results of DBCC CHECKDB, I found that six objects in the database were corrupted badly enough to require data loss. One of those objects contained 47 irreparable pages.
In an effort to provide the client as much information as possible about the corrupted data, I set out to identify the rows contained on the corrupt pages. Here are the scripts I used.
First, I needed to create tables to hold the results of DBCC CHECKDB and DBCC PAGE. For the PageResults table, I guessed at the column definitions. For DBCCResults, I made all of the numeric columns BIGINT to prevent conversion errors.
USE [Your_Database] GO /* Create table to hold the results of DBCC CHECKDB */ /****** Object: Table [dbo].[DBCCResults] Script Date: 9/30/2014 11:00:47 AM ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO SET ANSI_PADDING ON GO IF OBJECT_ID('Your_Database..DBCCResults') IS NOT NULL BEGIN DROP TABLE DBCCResults; END IF OBJECT_ID('Your_Database..PageResults') IS NOT NULL BEGIN DROP TABLE PageResults; END CREATE TABLE PageResults (ParentObject VARCHAR(100) ,[Object] VARCHAR(1000) ,[Field] VARCHAR(100) ,[VALUE] VARCHAR(1000)) CREATE TABLE [dbo].[DBCCResults]( [Error] [bigint] NULL, [Level] [bigint] NULL, [State] [bigint] NULL, [MessageText] [varchar](7000) NULL, [RepairLevel] [varchar](7000) NULL, [Status] [bigint] NULL, [DbId] [bigint] NULL, [DbFragID] [bigint] NULL, [ObjId] [bigint] NULL, [IndId] [bigint] NULL, [PartID] [bigint] NULL, [AllocID] [bigint] NULL, [RidDbid] [bigint] NULL, [RidPruid] [bigint] NULL, [File] [bigint] NULL, [Page] [bigint] NULL, [Slot] [bigint] NULL, [RefDbid] [bigint] NULL, [RefPruId] [bigint] NULL, [RefFile] [bigint] NULL, [RefPage] [bigint] NULL, [RefSlot] [bigint] NULL, [Allocation] [bigint] NULL ) ON [PRIMARY] GO
After creating the tables, I needed to generate DBCC CHECKDB statements for each database. The script below uses sys.databases as the driver, and you can modify the results with an appropriate WHERE clause. The script is set to print the DBCC CHECKDB commands. Replacing the PRINT statement with EXEC sp_executesql will execute the statement after generation. Upon execution, the results of DBCC CHECKDB will be inserted into the DBCC CHECKDB table.
USE master; GO /* Drop temp table if it exists */ IF OBJECT_ID('tempdb..#databases') IS NOT NULL BEGIN DROP TABLE #databases END /* Declare local variables and create temp tables */ DECLARE @sqlstr NVARCHAR(2000); DECLARE @dbname SYSNAME; DECLARE @loopcount TINYINT = 1; DECLARE @looplimit TINYINT; CREATE TABLE #databases (dbname SYSNAME); /* This will generate statements for all user databases Change the WHERE clause to limit or expand the results */ INSERT INTO #databases SELECT name FROM master.sys.databases WHERE name > 4; /* Get the loop limit */ SELECT @looplimit = @@ROWCOUNT; /* Build the DBCC CHECKDB statement for each database This code will print each statement Replace the PRINT statement with EXEC sp_executesql @sql str to execute the code */ WHILE @loopcount <= @looplimit BEGIN SELECT TOP 1 @dbname = dbname FROM #databases; SELECT @sqlstr = 'INSERT INTO Your_Database..DBCCResults EXEC(''DBCC CHECKDB(' + @dbname + ') WITH TABLERESULTS, NO_INFOMSGS'')'; PRINT @sqlstr; SELECT @loopcount += 1; DELETE FROM #databases WHERE dbname = @dbname; END
Once, the DBCCResults table was populated, I needed to dump the contents of the irreparable pages. For corrupted pages, DBCC CHECKDB returns two possible RepairLevels -- repair_rebuild indicates the page can be rebuilt to fix the corruption; repair_allow_data_loss means that the page is corrupted so badly it cannot be repaired. The only option is to delete the page and restore consistency to the object structure. This link from Paul Randal provides a much more detailed explanation.
USE Your_Database; /* Drop temp tables if they exist */ SET NOCOUNT ON; IF OBJECT_ID('tempdb..#DBCCPages') IS NOT NULL BEGIN DROP TABLE #DBCCPages; END IF OBJECT_ID('tempdb..#PageResults') IS NOT NULL BEGIN DROP TABLE #PageResults; END /* Create temp tables */ CREATE TABLE #PageResults (ParentObject VARCHAR(100) ,[Object] VARCHAR(1000) ,[Field] VARCHAR(100) ,[VALUE] VARCHAR(1000)) /* Declare local variables */ DECLARE @loopcount INT = 1; DECLARE @looplimit INT; DECLARE @sqlstr NVARCHAR(4000); DECLARE @pagenum BIGINT; /* Select information about the corrupt data This example selects rows with a RepairLevel of repair_allow_data_loss for a single object */ SELECT DISTINCT O.name, T.RepairLevel, T.IndId, T.PartID, T.Page INTO #DBCCPages FROM DBCCResults T INNER JOIN Your_Database.sys.objects O ON t.ObjId = O.object_id WHERE RepairLevel = 'repair_allow_data_loss' AND O.name = 'Your_Table' ORDER BY O.name, T.Page; /* Set the loop limit */ SET @looplimit = @@ROWCOUNT; /* Build a DBCC PAGE statement for each corrupt page and execute it Insert the results into the #PageResults temp table */ WHILE @loopcount <= @looplimit BEGIN SELECT TOP 1 @pagenum = Page FROM #DBCCPages SET @sqlstr = 'DBCC PAGE (Your_Database,1,' + CAST(@pagenum AS NVARCHAR) + ',3) WITH TABLERESULTS' INSERT INTO PageResults EXEC sp_executesql @sqlstr; SET @loopcount += 1; DELETE FROM #DBCCPages WHERE Page = @pagenum; END
Because I needed to provide the client with a list of rows affected by the corruption, I had to dump all pages identified as repair_allow_data_loss. The script below queries DBCC Results to return a list of PageIDs for the irreparable pages. It runs DBCC PAGE...WITH TABLERESULTS for each PageID and writes the results to the PageResults table. Here is a sample of the output:
You can see that the results contain a set of rows for each slot, or table row, contained on the page. I needed to take each set of those rows and pivot them into a single row. The script below does that. You will need to add a MIN(CASE statement for each row you want to pivot.
USE Your_Database; /* Select data from PageResults to return the key value for each row contained on a corrupt page */ SELECT ParentObject, MIN(CASE Field WHEN 'col1' THEN VALUE END) AS col1, MIN(CASE Field WHEN 'col2' THEN VALUE END) AS col2, MIN(CASE Field WHEN 'col3' THEN VALUE END) AS col3, MIN(CASE Field WHEN 'col4' THEN VALUE END) AS col4, MIN(CASE Field WHEN 'col5' THEN VALUE END) AS col5 INTO Your_Database..PageResults FROM #PageResults GROUP BY ParentObject ORDER BY col1, col2, col3, col4, col5;
I hope this post provides more insight into the tools and methods available for analyzing corruption and determining its extent.
Database Corruption on the Principal AND the Mirror
Over the past year, I've read a number of Paul Randal's (b|t) blog posts regarding database corruption. I knew he had corrupt databases available for download at sqlskills.com, but I'd never found time to look at them.
That changed last week when the newest member of my team, Nick Schwab (b|t), was looking at mirroring. He asked about Automatic Page Repair, specifically what would happen if a page was corrupted on the principal and the mirror at the same time. I told him I did not know and asked him to run some tests.
Using these instructions from Kendra Little's (b|t) blog, Nick was set to start corrupting pages.
Here is what he found. If the page was corrupted on the principal and was read from disk, the initial attempt would error but Automatic Page Repair would be triggered and subsequent access attempts would succeed. If a page was corrupted on the mirror, SQL Server would take no action until a role switch occurred and the page was read from disk on the new principal. At that point Automatic Page Repair would kick in.
That brings us to Nick's question. If the same page was corrupted on both the principal and the mirror, Automatic Page Repair would pull the mirror copy of the page over and still fail on corruption. I understand that the possibility of this happening in a live environment is extremely small, but it is worth keeping in mind for mirrored applications. At the least, it is a good argument for having a restore strategy (see Myth Number 30-30 here).
Many thanks to Kendra and Paul for making this information available at their blogs.