Archive

Archive for the ‘Corruption’ Category

Identifying Corrupt Rows Using DBCC CHECKDB and DBCC PAGE

October 6, 2014 1 comment

Four months ago, when I blogged about giving all the presentations, things were starting to ramp up pat my new job. I've really enjoyed the last few months, but the combination of my presenting schedule and the responsibilities of the new gig has left little time for blogging.

This week, I ran into a corruption issue that gave me a chance to make a deeper dive into internals than I have in a while. It also gave me the opportunity to develop a script to dump the results of DBCC PAGE to a table.

This week, we discovered corruption in a client's database we were migrating. Digging deeper into the results of DBCC CHECKDB, I found that six objects in the database were corrupted badly enough to require data loss. One of those objects contained 47 irreparable pages.

In an effort to provide the client as much information as possible about the corrupted data, I set out to identify the rows contained on the corrupt pages. Here are the scripts I used.

First, I needed to create tables to hold the results of DBCC CHECKDB and DBCC PAGE. For the PageResults table, I guessed at the column definitions. For DBCCResults, I made all of the numeric columns BIGINT to prevent conversion errors.

USE [Your_Database]
GO

/* Create table to hold the results of DBCC CHECKDB */

/****** Object:  Table [dbo].[DBCCResults]    Script Date: 9/30/2014 11:00:47 AM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

IF OBJECT_ID('Your_Database..DBCCResults') IS NOT NULL
BEGIN

	DROP TABLE DBCCResults;

END

IF OBJECT_ID('Your_Database..PageResults') IS NOT NULL
BEGIN

	DROP TABLE PageResults;

END

CREATE TABLE PageResults
(ParentObject VARCHAR(100)
,[Object] VARCHAR(1000)
,[Field] VARCHAR(100)
,[VALUE] VARCHAR(1000))

CREATE TABLE [dbo].[DBCCResults](
	[Error] [bigint] NULL,
	[Level] [bigint] NULL,
	[State] [bigint] NULL,
	[MessageText] [varchar](7000) NULL,
	[RepairLevel] [varchar](7000) NULL,
	[Status] [bigint] NULL,
	[DbId] [bigint] NULL,
	[DbFragID] [bigint] NULL,
	[ObjId] [bigint] NULL,
	[IndId] [bigint] NULL,
	[PartID] [bigint] NULL,
	[AllocID] [bigint] NULL,
	[RidDbid] [bigint] NULL,
	[RidPruid] [bigint] NULL,
	[File] [bigint] NULL,
	[Page] [bigint] NULL,
	[Slot] [bigint] NULL,
	[RefDbid] [bigint] NULL,
	[RefPruId] [bigint] NULL,
	[RefFile] [bigint] NULL,
	[RefPage] [bigint] NULL,
	[RefSlot] [bigint] NULL,
	[Allocation] [bigint] NULL
) ON [PRIMARY]

GO

After creating the tables, I needed to generate DBCC CHECKDB statements for each database. The script below uses sys.databases as the driver, and you can modify the results with an appropriate WHERE clause. The script is set to print the DBCC CHECKDB commands. Replacing the PRINT statement with EXEC sp_executesql will execute the statement after generation. Upon execution, the results of DBCC CHECKDB will be inserted into the DBCC CHECKDB table.

USE master;
GO

/* Drop temp table if it exists */

IF OBJECT_ID('tempdb..#databases') IS NOT NULL
BEGIN

	DROP TABLE #databases

END

/* Declare local variables and create temp tables */
DECLARE @sqlstr NVARCHAR(2000);
DECLARE @dbname SYSNAME;
DECLARE @loopcount TINYINT = 1;
DECLARE @looplimit TINYINT;

CREATE TABLE #databases
(dbname SYSNAME);

/* This will generate statements for all user databases 
   Change the WHERE clause to limit or expand the results */
INSERT INTO #databases
SELECT name FROM master.sys.databases
WHERE name > 4;

/* Get the loop limit */
SELECT @looplimit = @@ROWCOUNT;

/* Build the DBCC CHECKDB statement for each database 
   This code will print each statement
   Replace the PRINT statement with EXEC sp_executesql @sql str to execute the code */
WHILE @loopcount <= @looplimit
BEGIN

	SELECT TOP 1 @dbname = dbname FROM #databases;

	SELECT @sqlstr = 'INSERT INTO Your_Database..DBCCResults EXEC(''DBCC CHECKDB(' + @dbname + ') WITH TABLERESULTS, NO_INFOMSGS'')';

	PRINT @sqlstr;

	SELECT @loopcount += 1;

	DELETE FROM #databases WHERE dbname = @dbname;

END


Once, the DBCCResults table was populated, I needed to dump the contents of the irreparable pages. For corrupted pages, DBCC CHECKDB returns two possible RepairLevels -- repair_rebuild indicates the page can be rebuilt to fix the corruption; repair_allow_data_loss means that the page is corrupted so badly it cannot be repaired. The only option is to delete the page and restore consistency to the object structure. This link from Paul Randal provides a much more detailed explanation.

USE Your_Database;

/* Drop temp tables if they exist */
SET NOCOUNT ON;

IF OBJECT_ID('tempdb..#DBCCPages') IS NOT NULL
BEGIN
	
	DROP TABLE #DBCCPages;

END

IF OBJECT_ID('tempdb..#PageResults') IS NOT NULL
BEGIN
	
	DROP TABLE #PageResults;

END

/* Create temp tables */
CREATE TABLE #PageResults
(ParentObject VARCHAR(100)
,[Object] VARCHAR(1000)
,[Field] VARCHAR(100)
,[VALUE] VARCHAR(1000))

/* Declare local variables */
DECLARE @loopcount INT = 1;
DECLARE @looplimit INT;
DECLARE @sqlstr NVARCHAR(4000);
DECLARE @pagenum BIGINT;

/* Select information about the corrupt data
   This example selects rows with a RepairLevel of repair_allow_data_loss for a single object */
SELECT DISTINCT O.name, T.RepairLevel, T.IndId, T.PartID, T.Page 
INTO #DBCCPages 
FROM DBCCResults T
INNER JOIN Your_Database.sys.objects O
ON t.ObjId = O.object_id
WHERE RepairLevel = 'repair_allow_data_loss'
AND O.name = 'Your_Table'
ORDER BY O.name, T.Page;

/* Set the loop limit */
SET @looplimit = @@ROWCOUNT;

/* Build a DBCC PAGE statement for each corrupt page and execute it
   Insert the results into the #PageResults temp table */
WHILE @loopcount <= @looplimit
BEGIN

	SELECT TOP 1 @pagenum = Page FROM #DBCCPages

	SET @sqlstr = 'DBCC PAGE (Your_Database,1,' + CAST(@pagenum AS NVARCHAR) + ',3) WITH TABLERESULTS'

	INSERT INTO PageResults
	EXEC sp_executesql @sqlstr;

	SET @loopcount += 1;

	DELETE FROM #DBCCPages WHERE Page = @pagenum;
	
END


Because I needed to provide the client with a list of rows affected by the corruption, I had to dump all pages identified as repair_allow_data_loss. The script below queries DBCC Results to return a list of PageIDs for the irreparable pages. It runs DBCC PAGE...WITH TABLERESULTS for each PageID and writes the results to the PageResults table. Here is a sample of the output:

DBCCPAGEWithTableResults

You can see that the results contain a set of rows for each slot, or table row, contained on the page. I needed to take each set of those rows and pivot them into a single row. The script below does that. You will need to add a MIN(CASE statement for each row you want to pivot.

USE Your_Database;
/* Select data from PageResults to return the key value for each row contained on a corrupt page */

SELECT ParentObject,
     MIN(CASE Field WHEN 'col1' THEN VALUE END) AS col1,
     MIN(CASE Field WHEN 'col2' THEN VALUE END) AS col2,
     MIN(CASE Field WHEN 'col3' THEN VALUE END) AS col3,
     MIN(CASE Field WHEN 'col4' THEN VALUE END) AS col4,
     MIN(CASE Field WHEN 'col5' THEN VALUE END) AS col5
INTO Your_Database..PageResults
   FROM #PageResults
   GROUP BY ParentObject
   ORDER BY col1, col2, col3, col4, col5; 

I hope this post provides more insight into the tools and methods available for analyzing corruption and determining its extent.

Database Corruption on the Principal AND the Mirror

August 15, 2012 Leave a comment

Over the past year, I've read a number of Paul Randal's (b|t) blog posts regarding database corruption. I knew he had corrupt databases available for download at sqlskills.com, but I'd never found time to look at them.

That changed last week when the newest member of my team, Nick Schwab (b|t), was looking at mirroring. He asked about Automatic Page Repair, specifically what would happen if a page was corrupted on the principal and the mirror at the same time. I told him I did not know and asked him to run some tests.

Using these instructions from Kendra Little's (b|t) blog, Nick was set to start corrupting pages.

Here is what he found. If the page was corrupted on the principal and was read from disk, the initial attempt would error but Automatic Page Repair would be triggered and subsequent access attempts would succeed. If a page was corrupted on the mirror, SQL Server would take no action until a role switch occurred and the page was read from disk on the new principal. At that point Automatic Page Repair would kick in.

That brings us to Nick's question. If the same page was corrupted on both the principal and the mirror, Automatic Page Repair would pull the mirror copy of the page over and still fail on corruption. I understand that the possibility of this happening in a live environment is extremely small, but it is worth keeping in mind for mirrored applications. At the least, it is a good argument for having a restore strategy (see Myth Number 30-30 here).

Many thanks to Kendra and Paul for making this information available at their blogs.