SQL Server DBCC Check Fail - sql-server

If I set DBCC CHECK to run as part of a job; will the job fail (and subsequently alert me) if an allocation error/consistency error is found?
USE [mydb]
GO
DBCC CHECKDB(N'mydb') WITH NO_INFOMSGS

By the way, if the job fails than there is something wrong with database consistency or resource availability, in either case administrator have to rectify the issue. There is an option of configuring "Alerts" in the job that will trigger an email etc to inform certain group or people, if proper monitoring is not available.

Related

Can't Run DBCC CHECKDB on master DB - Azure Files

Storing SQL Server database files on new Azure Files share. Cannot run full / comprehensive CHECKDB against these databases - I think this has something to do with user account not having permissions to create snapshots. As a result, I offloaded these checks to an alternate server where I can also test .baks. Everything works fine except for the master db, which registers corruption when you restore it as a user db and run CHECKDB against it (https://www.itprotoday.com/my-master-database-really-corrupt), even though it's not corrupt.
Questions:
1) Has anyone run into the same problem running CHECKDB on SQL db files stored on an Azure Files share? Is there a workaround?
2) What's an alternative to running CHECKDB on master if I cannot run it in PROD? Can I somehow restore master to another SQL instance and check it there?
Error when I execute DBCC CHECKDB (master) in PROD:
Msg 5030, Level 16, State 12, Line 4
The database could not be exclusively locked to perform the operation.
Msg 7926, Level 16, State 1, Line 4
Check statement aborted. The database could not be checked as a database snapshot could not be created and the database or table could not be locked. See Books Online for details of when this behavior is expected and what workarounds exist. Also see previous errors for more details.
Message when I run DBCC CHECKDB on user db in PROD:
DBCC CHECKDB will not check SQL Server catalog or Service Broker consistency because a database snapshot could not be created or because WITH TABLOCK was specified.
Please reference this Azure Support document: Error message when you run any of the DBCC CHECK commands in SQL Server: "The database could not be exclusively locked to perform the operation"
In Microsoft SQL Server, you may receive an error message when you run any of the following DBCC commands:
DBCC CHECKDB
DBCC CHECKTABLE
DBCC CHECKALLOC
DBCC CHECKCATALOG
DBCC CHECKFILEGROUP
The error message contains the following text:
Msg 5030, Level 16, State 12, Line 1 The database could not be exclusively locked to perform the operation.
Msg 7926, Level 16, State 1, Line 1
Check statement aborted. The database could not be checked as a database snapshot could not be created and the database or table could not be locked. See Books Online for details of when this behavior is expected and what workarounds exist. Also see previous errors for more details.
Cause:
This problem occurs if the following conditions are true:
At least one other connection is using the database against which you
run the DBCC CHECK command.
The database contains at least one file group that is marked as
read-only.
Starting with SQL Server 2005, DBCC CHECK commands create and use an internal database snapshot for consistency purposes when the command performs any checks. If a read-only file group exists in the database, the internal database snapshot is not created. To continue to perform the checks, the DBCC CHECK command tries to acquire an EX database lock. If other users are connected to this database, this attempt to acquire an EX lock fails. Therefore, you receive an error message.
Resolution
To resolve this problem, follow these steps instead of running the DBCC CHECK command against the database:
Create a database snapshot of the database for which you want to
perform the checks. For more information about how to create a
database snapshot, see the "Create a Database Snapshot
(Transact-SQL)" topic in SQL Server Books Online.
Run the DBCC CHECK command against the database snapshot.
Drop the database snapshot after the DBCC CHECK command is
completed.
This document can give more helps to solve the problem.
Updates:
For the system databases it does not use database snapshots, but it will hold table locks.
You also an reference this blog: Checkdb giving error for master database:
Mike Walsh gives us more things about the error.
Hope this helps.

SQL Server detected a logical consistency-based I/O error

I am using Sharepoint Foundation 2010. I got error message(824) in Event logs while executing regular schedule job for backing up databases.
WSS_Logging is showing error below:
"SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0xa691e24a; actual: 0xb68ce671). It occurred during a read of page (1:6095) in database ID 9 at offset 0x00000002f9e000 in file 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\WSS_Logging.mdf'. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). "
Please help..
From MSDN:
What does this error mean:
This error indicates that Windows reports that the page is successfully read from disk, but SQL Server has discovered something wrong with the page. This error is similar to error 823 except that Windows did not detect the error. This usually indicates a problem in the I/O subsystem, such as a failing disk drive, disk firmware problems, faulty device driver, and so on
simply put,run CHKDSK and see if you get any errors
CHKDSK [volume[[path]filename]] [/F] [/V] [/R] [/X] [/I] [/C] [/L[:size]]
and also change the page_verify option to checksum if you haven't
Read below for more details from MSDN link
what are next steps:
Look for Hardware Failure
Run hardware diagnostics and correct any problems. Also examine the Microsoft Windows system and application logs and the SQL Server error log to see whether the error occurred because of hardware failure. Fix any hardware-related problems that are contained in the logs.
If you have persistent data corruption problems, try to swap out different hardware components to isolate the problem. Check to make sure that the system does not have write-caching enabled on the disk controller. If you suspect write-caching to be the problem, contact your hardware vendor.
Finally, you might find it useful to switch to a new hardware system. This switch may include reformatting the disk drives and reinstalling the operating system.
Restore from Backup
If the problem is not hardware-related and a known clean backup is available, restore the database from the backup.
Consider changing the databases to use the PAGE_VERIFY CHECKSUM option.
DBCC CHECKDB (yourdatabasename)
and DBCheck will give errors against the tables.
You can do repair each table with this function CHECKTABLE('tablename1', REPAIR_ALLOW_DATA_LOSS)
USE yourdatabasename;
GO
ALTER DATABASE yourdatabasename
SET single_user;
GO
DBCC CHECKTABLE('tablename1', REPAIR_ALLOW_DATA_LOSS)
GO
DBCC CHECKTABLE('tablename2', REPAIR_ALLOW_DATA_LOSS)
GO
ALTER DATABASE yourdatabasename
SET MULTI_USER;
GO
USE dbreckitInventory;
GO
ALTER DATABASE dbreckitInventory
SET single_user;
GO
DBCC CHECKTABLE('tblpurchasedetails', REPAIR_ALLOW_DATA_LOSS)
GO
DBCC CHECKTABLE('TblSalesDetails', REPAIR_ALLOW_DATA_LOSS)
GO
ALTER DATABASE dbreckitInventory
SET MULTI_USER;
GO

SQL Job Agent DB Restore fails with error #6107: Only user processes can be killed

We have an SQL Job Agent that runs in the "wee hours" to restore our local database (FooData) from a production backup.
First, the database is set to SINGLE_USER mode and any open processes are killed. Second, the database is restored.
But the 3rd step fails occasionally with Error 6107: "Only User Processes Can Be Killed"
This happens about once or twice a week at seemingly random intervals. Here is the code for step 3 where the failure occasionally occurs:
USE master;
go
exec msdb.dbo.KillSpids FooData;
go
ALTER DATABASE FooData SET MULTI_USER;
go
Does anybody have any ideas what might be occurring to cause this error? I'm thinking there might be some automated process starting up during step 3 or possibly some user trying to log in during that time? I'm not a DBA, so I'm guessing at this point, although I believe that a user should not be able to log in while the DB is in SINGLE_USER mode.
A user probably isn't logged in. The system is probably performing some task. The output of exec sp_who or sp_who2 will show what sessions are open. Any SPID below 50 is a system process, and cannot be killed with KILL. The only way to stop them is to stop the SQL Server service or issue a SHUTDOWN command (which does the same thing).
I found the answer to my problem by changing one line of code which worked like a charm.
As mentioned in the original question, the 'KillSpids" line is used in Step 1 of the job. (Along with SET SINGLE USER) The 'KillSpids' made sense in Step 1 because there may be unwanted processes still active on the database.
The 'KillSpids' line was then added again into Step 3, but it was unnecessary, and was also causing the 6107 error.
I replaced the 'KillSpids' line with the one shown below. Setting the freshly restored database to single user mode takes care of the concern that a user might try to log in before all the job steps have been completed. Here is the updated code:
USE master;
go
ALTER DATABASE [FooData] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
go
ALTER DATABASE FooData SET MULTI_USER;
go

Extreme wait-time when taking a SQL Server database offline

I'm trying to perform some offline maintenance (dev database restore from live backup) on my dev database, but the 'Take Offline' command via SQL Server Management Studio is performing extremely slowly - on the order of 30 minutes plus now. I am just about at my wits end and I can't seem to find any references online as to what might be causing the speed problem, or how to fix it.
Some sites have suggested that open connections to the database cause this slowdown, but the only application that uses this database is my dev machine's IIS instance, and the service is stopped - there are no more open connections.
What could be causing this slowdown, and what can I do to speed it up?
After some additional searching (new search terms inspired by gbn's answer and u07ch's comment on KMike's answer) I found this, which completed successfully in 2 seconds:
ALTER DATABASE <dbname> SET OFFLINE WITH ROLLBACK IMMEDIATE
(Update)
When this still fails with the following error, you can fix it as inspired by this blog post:
ALTER DATABASE failed because a lock could not be placed on database 'dbname' Try again later.
you can run the following command to find out who is keeping a lock on your database:
EXEC sp_who2
And use whatever SPID you find in the following command:
KILL <SPID>
Then run the ALTER DATABASE command again. It should now work.
There is most likely a connection to the DB from somewhere (a rare example: asynchronous statistic update)
To find connections, use sys.sysprocesses
USE master
SELECT * FROM sys.sysprocesses WHERE dbid = DB_ID('MyDB')
To force disconnections, use ROLLBACK IMMEDIATE
USE master
ALTER DATABASE MyDB SET SINGLE_USER WITH ROLLBACK IMMEDIATE
Do you have any open SQL Server Management Studio windows that are connected to this DB?
Put it in single user mode, and then try again.
In my case, after waiting so much for it to finish I had no patience and simply closed management studio. Before exiting, it showed the success message, db is offline. The files were available to rename.
execute the stored procedure
sp_who2
This will allow you to see if there is any blocking locks.. kill their should fix it.
In SSMS: right-click on SQL server icon, Activity Monitor. Open Processes. Find the processed connected. Right-click on the process, Kill.
In my case I had looked at some tables in the DB prior to executing this action. My user account was holding an active connection to this DB in SSMS. Once I disconnected from the server in SSMS (leaving the 'Take database offline' dialog box open) the operation succeeded.
anytime you run into this type of thing you should always think of your transaction log. The alter db statment with rollback immediate indicates this to be the case. Check this out: http://msdn.microsoft.com/en-us/library/ms189085.aspx
Bone up on checkpoints, etc. You need to decide if the transactions in your log are worth saving or not and then pick the mode to run your db in accordingly. There's really no reason for you to have to wait but also no reason for you to lose data either - you can have both.
Closing the instance of SSMS (SQL Service Manager) from which the request was made solved the problem for me.....
To get around this I stopped the website that was connected to the db in IIS and immediately the 'frozen' 'take db offline' panel became unfrozen.
Also, close any query windows you may have open that are connected to the database in question ;)
I tried all the suggestions below and nothing worked.
EXEC sp_who
Kill < SPID >
ALTER DATABASE SET SINGLE_USER WITH Rollback Immediate
ALTER DATABASE SET OFFLINE WITH ROLLBACK IMMEDIATE
Result: Both the above commands were also stuck.
4 . Right-click the database -> Properties -> Options
Set Database Read-Only to True
Click 'Yes' at the dialog warning SQL Server will close all connections to the database.
Result: The window was stuck on executing.
As a last resort, I restarted the SQL server service from configuration manager and then ran ALTER DATABASE SET OFFLINE WITH ROLLBACK IMMEDIATE. It worked like a charm
In SSMS, set the database to read-only then back. The connections will be closed, which frees up the locks.
In my case there was a website that had open connections to the database. This method was easy enough:
Right-click the database -> Properties -> Options
Set Database Read-Only to True
Click 'Yes' at the dialog warning SQL Server will close all connections to the database.
Re-open Options and turn read-only back off
Now try renaming the database or taking it offline.
For me, I just had to go into the Job Activity Monitor and stop two things that were processing. Then it went offline immediately. In my case though I knew what those 2 processes were and that it was ok to stop them.
In my case, the database was related to an old Sharepoint install. Stopping and disabling related services in the server manager "unhung" the take offline action, which had been running for 40 minutes, and it completed immediately.
You may wish to check if any services are currently utilizing the database.
Next time, from the Take Offline dialog, remember to check the 'Drop All Active Connections' checkbox. I was also on SQL_EXPRESS on local machine with no connections, but this slowdown happened for me unless I checked that checkbox.
SSMS, especially if running it from your own desktop remotely and not directly within the database server, can be a reason for the long delays in detaching a database. For some reason SSMS may not be able to disconnect any existing "connections" to the database.
We found the process was almost instant when we did it directly from the database server itself. And in fact it killed the attempt from my own desktop SSMS session, and it "took over" and detached the database.
Nothing else suggested here worked.
Thanks
In my case i stopped Tomcat server . then immediately the DB went offline .

Test Before Attempting Exclusive Lock

I've written some code to upgrade a SQL Server database. Before I upgrade the database, I attain an exclusive lock via:
ALTER DATABASE Test SET SINGLE_USER WITH NO_WAIT
However, I'd like to test the database to see if the exclusive lock is possible before I run the above code. The test doesn't have to be 100% perfect, I'd just like to avoid the possibility of a timeout when attempting to gain an exclusive lock.
To that end, I've written the code below:
SELECT
*
FROM
sys.db_tran_locks
WHERE
resource_database_id = DB_ID('Test') AND
request_session_id <> ##SPID
I'm assuming that if there's 1 or more row returned, then the database must be in use. Is this true? Or is it not that simple?
UPDATE Taking #gbn's comments into account, I've decided to force rollback of existing connections using the following statement:
ALTER DATABASE Test SET SINGLE_USER WITH ROLLBACK IMMEDIATE
Before running this code, I'll give the user the an opportunity to opt out. However, I'd like the user to be able to see the list of active connections to the database - so they can make an informed decision. Which leads me onto this question.
Mainly, a DB lock is just to show it is in use. Databases don't really have many exclusive locks situations compared to code/table objects.
Single user mode is not a lock, but the number of connections allowed.
I'd wrap the ALTER DATABASE in a TRY/CATCH block because there is no guarantee the state won't change between check and ALTER DB.
However, I could be wrong or misunderstanding the question... so you'll also have to test for the exclusive lock mode on the database resource in the query above. Your code above will show you any lock, which could be someone having a blank query window open in SSMS...
Edit, based on comment
You can detect who is using it by these:
sys.dm_exec_connections
sys.dm_exec_sessions
sys.dm_exec_requests
To be honest, it's difficult to stop auto stats update or a user taking the single connection. Normally, you'd this which disconnects all other users and not bother waiting...
ALTER DATABASE MYDB SET SINGLE_USER WITH ROLLBACK IMMEDIATE
What if the database becomes busy from the time you determined that it was not in use, until the point where you try to acquire the exclusive lock?

Resources