I am using Microsoft SQL Server 2016 (SP2-GDR) (KB4505220) – 13.0.5101.9 (X64) Jun 15 2019 23:15:58 Copyright (c) Microsoft Corporation Standard Edition (64-bit) on Windows Server 2012 R2 Standard 6.3 (Build 9600: )
Yesterday, I got two entries in “suspect_pages” for the same database. One of event type 1 and one of type 2
1 = An 823 error that causes a suspect page (such as a disk error) or an 824 error other than a bad checksum or a torn page (such as a bad page ID).
2 = Bad checksum.
database_id file_id page_id eventtype error_count last_update_date 8 1 1482057 1 1 2019-11-14 14:40 8 1 1482057 2 1 2019-11-14 14:40
I found the object related and they both point to the same table on the database.
DBCC TRACEON (3604); DBCC PAGE (8, 1, 14823057, 0); DBCC TRACEOFF (3604);
I had a valid backup of before the corruption and couldn’t afford a down time so I took a backup of the corrupted database, restored my backup on a new name. I dropped the corrupted table and then recreated it from the valid backup.
Today, I restored the corrupted database backup that I took yesterday on a test server and when I run a full checkdb, it detects no corruption.
DBCC CheckDB() WITH No_INFOMSGS, ALL_ERRORMSGS
How is it possible that the backup I took from a corrupted database (according to suspect_pages) doesn’t have any problems? Can those entries in suspect_pages be a false positive?
Database Compatibility Level is 130 (SQL 2016) Our SQL Server is running on Windows Server 2012.
Is it recommended to run DBCC CheckDB on tempdb? If yes, can someone please list out the reasons to do so? This could be on any SQL Server version. The one I have is SQL Server 2017.
I am executing checkdb with physical_only option and it fails with multiple errors like below:
Msg 8965, Level 16, State 1, Line 1
Table error: Object ID 1557580587, index ID 1, partition ID 72057594088456192, alloc unit ID 72057594177454080 (type In-row data). The off-row data node at page (1:13282192), slot 3, text ID 6370769698816 is referenced by page (0:0), slot 0, but was not seen in the scan.
Msg 8965, Level 16, State 1, Line 1
Table error: Object ID 1557580587, index ID 1, partition ID 72057594088456192, alloc unit ID 72057594177454080 (type In-row data). The off-row data node at page (1:13282192), slot 5, text ID 6370769764352 is referenced by page (0:0), slot 0, but was not seen in the scan.
CHECKDB found 0 allocation errors and 5255 consistency errors in table ‘TableX’ (object ID 1557580587).
CHECKDB found 0 allocation errors and 5255 consistency errors in database ‘DatabaseX’. repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (DWH_LAND).
However full checkdb is successfull:
CHECKDB found 0 allocation errors and 0 consistency errors in database ‘DatabaseX’. DBCC execution completed. If DBCC printed error messages, contact your system administrator.
TableX has around 200 000 rows and has clustered columnstore index on it.
We are using following version of SQL Server:
Microsoft SQL Server 2017 (RTM-CU13) (KB4466404) – 14.0.3048.4
Should I be worried?
I run a nightly Integrity Check (set up via Maintenance Plans) on all of my databases. For the last two nights, the final CHECKDBCC has failed… at least according to the SQL Server Agent Job.
Progress: 2019-01-25 03:51:01.15 Source: Check Database Integrity Executing query “USE [CellTestData_QC] “.: 50% complete End Progress
Error: 2019-01-25 05:14:43.79 Code: 0xC002F210 Source: Check Database Integrity Execute SQL Task Description: Executing the query “DBCC CHECKDB(N’CellTestData_QC’) WITH NO_INFOMSGS…” failed with the following error: “A severe error occurred on the current command. The results if any should be disca… The package execution fa… The step failed.
But if I check the SQL Server Logs, I get two conflicting messages.
First I get:
01/25/2019 05:00:39,spid105,Unknown,DBCC CHECKDB (CellTestData_QC) WITH no_infomsgs executed by sa terminated abnormally due to error state 6. Elapsed time: 1 hours 9 minutes 38 seconds.
But then I get:
01/25/2019 05:32:06,spid110,Unknown,CHECKDB for database ‘CellTestData_QC’ finished without errors on 2019-01-24 23:01:46.507 (local time). This is an informational message only; no user action is required.
So what exactly is going on here? Is my process completing? Or erroring out?
For a bit more context…
- We restore [CellTestData_QC] every morning at 5AM (likely why I’m getting the SQL Server Log message of ‘terminated abnormally at 5AM).
- Additionally, my backups of [CellTestData] and integrity checks of [CellTestData] and [CellTestData_QC] have been taking longer over the last few days – possibly because of issues with our SAN, which is why we’re getting the IntegrityCheck of [CellTestData_QC] beginning to run into the restore of [CellTestData_QC]
- Integrity checks run every night at 11PM
- DB backups (full and partial) run nightly
- Transaction Log backups run every 15min
- Currently working with 8 databases, that range in size from 4MB to 100GB, with a total of 330GB of databases
- I’m running SQL Server Standard 2012 on a 64-bit Windows Server VM (using vSphere) with 64GB RAM, 1TB SAN storage, 175GB NAS storage, 4CPU, 4.9GHz
Any ideas what might be going on here?