We have a transaction that has a fair number of updates and inserts that run inside it as sent by our server-side code. We've run into an issue where all work in the transaction up to a given point is rolled back, and then later updates/inserts are run and end up being committed when the transaction is closed.
We narrowed it down to a bit of code where it would always happen and pulled that code out. Then the same behavior started happening elsewhere in the transaction. Nothing in our code is telling the transaction to roll back, and we haven't changed our code on our prod servers for quite a while before it started happening.
We finally restarted our prod db server and the problem went away, for a little while. Then it started happening again, and happened consistently after that.
We're on SQL Server 2016 and our web server is running ColdFusion 11 hotfix 18. Queries are being issued via <cfquery> inside a <cftransaction>.
Has anyone run into anything like this or have any way to diagnose the issue?
Update w/ more info: Apparently SQL Server was processing hardly any updates. We finally discovered that a log file was out of space. This usually will produce notifications/errors that this has happened, but this time it didn't for some reason. I realize this question is probably not the very highest quality, but in hopes that anyone who may run into this will find this and gain some benefit, I'm leaving it here.
Related
I'm a new "accidental" DBA and I'm currently trying to resolve a lockup caused by a trigger I created on a production database supporting a front end application.
I created a trigger, and then I decided I'd be best off creating a job to do the work instead, so tried to delete the trigger in object explorer. The delete failed with the message:
An exception occurred while executing a Transact-SQL statement or batch.
Lock request time out period exceeded.
I then tried to manually drop it and it failed at 0%, 0s left to go. I checked for the longest running transaction and then tried to kill the process in activity monitor. Since then the process has been stuck on "Task State:RUNNING and Command:KILLED/ROLLBACK". After some googling it sounds like I have two options.
Option 1: Restart DTC on the SQL server.... didn't work, still stuck.
Option 2: Restart the SQL service. Uh-oh.
This is the first time I've ever had to do anything like this and I'm pretty nervous being the only SQL guy in the office. Please can anyone let me know what the potential implications of restarting the service are, in terms of data loss and impact to front end users? Am I better off waiting to restart after business hours?
Thanks, and apologies if I've asked this question badly, first time for everything.
Cheers
Wait. It's rolling back and has to finish the rollback. Don't restart SQL, that will just result in the rollback continuing after the restart, possibly with the database offline.
If this is a production system and you do bounce the database, all users of your user interface will get weird and wonderful errors. Unless your application can handle it, your users will have a bad experience and then you will start getting phone calls from the boss....
As a side note, check for locking\blocking processes. The message in the question "Lock request time out period exceeded. " seems to suggest there is locking/blocking happening.
I have some very strange problems. I have an application running on Windows 2003 terminal server from multiple clients. The application uses SQL Server 2008 Express as its database.
Yesterday, I connected to the app, closed some sessions on the server that were not responding, and to my surprise, I saw that some data was missing from the database. After a futher search I found that all the database changes made from last week were lost.
It's like the database rolled back all the changes, and returned to the state of one week ago! I can confirm that all the changes were lost. In fact I have inserted a record into a table with identity_insert ON (to manually insert an ID on an autonumeric col) and that record is missing, so there is no way this is a program failure.
Does anyone have any idea of what could have happend here?
EDIT
I have a suspect: could a transaction initiated by a session stays in a unconfirmed state for one week, retain all the database changes and when I close the session rollback all the changes made?
EDIT II
Find this on log:
SQL Server never rolls back a database to a previous state (like this). The database was restored, or the entire disk/VM was rolled back, or DML was executed to create the impression that a rollback happened (but really didn't). Maybe someone executed a sync tool in the wrong direction.
The question does not have information that allows for finding the problem. But it certainly isn't SQL Server rolling back a database.
You can try examining the log using fn_dblog.
From the log it looks like the server has only just started up after a reboot or service restart.
If a database is not cleanly shut down then the database can be left with partially applied transactions. If this happens then the database is recovered on start up.
Any transactions that are incomplete are rolled back. Committed transactions that were not yet applied are rolled forwards. How long this recovery takes depends on the size of the transactions in the log that have not yet been applied to the database.
The transactions may not show up in the log after they have been rolled back following a crash. This depends upon their location in the log and the databases's recovery mode.
If the transaction is at the end of the log it is likely the log will just be rolled back and the transaction removed.
If the transaction is in the middle of the log you might see a LOP_ABORT_XACT in the log.
When using simple recovery there is a good chance the log will be cleared after recovery (since the logs are only kept until the transactions are committed).
See Are log records removed from ldf file for rollbacks? for more details.
We are using SQL Server 2012 Enterprise edition.
Normally we get hardly any blocked processes, but last weekend we experienced very unusual situation. Within 2 hours we got more "blocked process" alerts than we did in the last year together. There were a few hundred alerts within this time. Then suddenly without any interference from anyone everything went back to norm and we didn't get any blocked processes ever since. I want to prevent this situation from occurring again.
I am well aware how to find what can be causing blocking at present, but I have very little idea how to find what caused the block in the past, which is currently resolved.
I checked error logs in SQL Server Management Studio, but there is nothing there under the date when blocking occurred. There is also nothing unusual in the Windows event viewer. Where else should I check?
Could you please help?
From what you describe, I'm not too sure you will actually find the cause of the previously blocking processes if you did not actively setup tracing i.e. have your blocked process threshold set and configured with an alert to provide said trace information. The situation you described is interesting and definitely worth monitoring.
Here is an article on blocked process threshold configuration in SQL Server and a link through to Alerts configuration.
Hope this helps
I'm dealing with a very strange problem. Initially, I thought this was a problem with cleaning up test data... but after completely refactoring my test data cleanup code and still seeing the exact same behavior... I'm at a loss.
I have 245 unit test methods in various classes. Each class has its own unique test data, and I initialize those objects, and then in each test method that data is usually inserted into the database and then manipulated with the tests. Each test class has a ClassCleanup method that cleans out all of the test data from the database, and that ClassCleanup is also run on TestInitialize, to ensure everything is cleaned up before any other test method is run.
When I "Run All" using VS 2012 Test Explorer, 22 tests fail. They all fail with some variation of Primary Key constraint violations. Meaning, when they are initializing data for these tests, the data was not cleaned up from a previous test method in that class. If I re-run all tests, I get the same tests failing every time. This is fairly reproducible. No matter how many times I run all tests. These same 27 tests fail with primary key violations.
However, the weird thing is that if I re-run ONLY those failed tests, only 9 tests fail. This is ALSO reproducible, meaning, no matter how many times I run only these 27 previously failing tests, 14 will fail, and the rest will pass. This continues as I run only failed tests, until I get to a point where none of the tests fail. It should also be noted that if I run each test class individually, everything passes.
I know how this looks.
"You obviously are not cleaning up your test data." If that were the case, then I should see the same tests fail on every run, no matter what. Those 27 tests should fail EVERY run, not just when I run everything else.
"You must not have unique primary keys between classes and things aren't being cleaned up." See above. Even IF I had primary keys repeated in my classes (which I don't, because I have personally triple-checked the uniqueness of the primary keys on test data within the various classes), since these tests are not run concurrently on separate threads (which has been verified by logging the ThreadId), the cleanup code for any given test would clean out the duplicated data, regardless.
"You must not be using connection pooling." No, actually I am. And I've verified using SQL Profiler that the requests are definitely pooled. Also, because these tests are not running in parallel, there is only ever the one connection thread.
"You shouldn't use connection pooling." Well, yes, I should, since the underlying codebase supports various web projects, but for the sake of argument I tried running all the tests with connection pooling disabled (using Pooling=false in the connection string) and I get the exact same results. No change in behavior, whatsoever.
"There must be something wrong with your local environment." I get the same results running these tests on other colleagues' dev boxes (which incidentally use SQL 2012), as well. This is not unique to my environment, or even my version of SQL Server.
"You should try running mstest from the command line." Already did that. Same results.
If anyone has encountered something like this, please let me know. I know there must be something simple I'm missing, as that's usually the case with these kinds of problems, but I've covered as many bases as I possibly can in trying to sort this out.
The following is based off the assumption that your database is in full recovery mode and you do not perform any restores or other trickery during your tests (such as detaching/reattaching the db, etc).
Here is a fairly tedious approach to investigating your problem, but is guaranteed to provide the data needed to figure this out.
Take a Full Backup of the database Do this right before starting the test suite. We're going to be restoring the database so also make sure you've got enough disk space for 2-3 copies of the database files.
Create a Sql Profiler trace For events, select RPC Starting/completed, sql Batch Starting/completed, Sql Statement starting/completed, SP Statement starting/completed, TM:* completed, SQLTransaction, DTCTransaction, and user error message. Capture all the columns.
Reproduce the Issue Run the minimum number of tests to produce a failure. Let the tests finish so you capture all the cleanup code, then stop the profiler trace.
Take a transaction log backup We may need this for point-in-time restores later.
Locate the failure in the trace If you're getting a primary key failure, then it should be easy to track down, just look for the User Error Message. Write down the exact time the error occurred.
Examine the trace for obvious issues Start from the error and work backwards until you find the start of the test that failed. Write down the exact time the setup started for the last failed test. Examine all the sql in this range. Is the sql exactly what you expect? Are the row counts correct? Is the transactionId correct? (The transactionId column should be different for every statement not in a transaction, and the same for every statement inside a transaction). If you have mis-matched BEGIN TRAN/COMMIT TRAN/ROLLBACK TRAN's, the transactionId will let you know.
Restore the DB to right before the failed test setup Restore it to a new database so we can compare the original and the copy. First restore the full backup using "RESTORE DATABASE .... WITH NORECOVERY". Then restore the transaction log backup using "RESTORE LOG .. WTIH STOPAT, RECOVERY" and specify a time immediately before the failed test setup.
Verify the database state Check for test data that may not have been cleaned up. Is everything as it should be? If not, you can restore the database again to an earlier point. You're looking for a point in time just before a test starts where the database is in a good, known state.
Restore the DB to right before the error occurred If you have room, restore to another new DB. Check for the data that caused the PK violation. Would the error occur if you ran the problematic statement again? Verify that it does or doesn't occur.
If it doesn't occur, your problem is likely mis-matched transaction handling. If you were missing a COMMIT earlier, you may have had a transaction still open. When you restore with STOPAT, any un-committed transactions would be rolled back. This would also explain how running the tests individually works, but together they fail.
If it does occur, then work backwards until you find the issue. You may need to restore the DB multiple times before you figure it out. Your process will be Restore DB, examine trace, examine data, restore to different point, examine trace, examine data, etc.
If after all this you are still at a loss, then you may want to investigate using database snapshots as part of your unit tests. Basically, create the db snapshot, setup and run test, teardown is replaced with a reverting the database back to the snapshot. This will guarantee an identical database before and after each test.
2012 Management Studio has an improved database restore wizard that makes the point in time restores very easy. Good Luck!
Not sure why yours is failing like it is but I had something similar and now I put a transactionscope in the setup like this:
public void SetUp()
{
_transactionScope = new TransactionScope(TransactionScopeOption.RequiresNew);
}
And dispose it in the teardown. That got rid of my database issues and prevented me from having to write manual cleanup code.
I'm currently experiencing some problems on my DotNetNuke SQL Server 2005 Express site on Win2k8 Server. It runs smoothly for most of the time. However, occasionally (order once or twice an hour) it runs very slowly indeed - from a user perspective it's almost like there's a deadlock of some description when this occurs.
To try to work out what the problem is I've run SQL Profiler against the SQL Express database.
Looking at the results, some specific questions I have are:
The SQL trace shows an Audit Logon and Audit Logoff for every RPC:Completed - does this mean Connection Pooling isn't working?
When I look in Performance Monitor at ".NET CLR Data", then none of the "SQL client" counters have any instances - is this just a SQL Express lack-of-functionality problem or does it suggest I have something misconfigured?
The queries running when the slowness occur don't yet seem unusual - they run fast at other times. What other perfmon counters or other trace/log files can you suggest as useful tools for my further investigation.
Jumping straight to Profiler is probably the wrong first step. First, try checking the Perfmon stats on the server. I've got a tutorial online here:
http://www.brentozar.com/perfmon
Start capturing those metrics, and then after it's experienced one of those slowdowns, stop the collection. Look at the performance metrics around that time, and the bottleneck will show up. If you want to send me the csv output from Perfmon at brento#brentozar.com I can give you some insight as to what's going on.
You might still need to run Profiler afterwards, but I'd rule out the OS and hardware first. Also, just a thought - have you checked the server's System and Application event logs to make sure nothing's happening during those times? I've seen instances where, say, the antivirus client downloads new patches too often, and does a light scan after each update.
My spidey sense tells me that you may have SQL Server blocking issues. Read this article to help you monitor blocking on your server to check if its the cause.
If you think the issues may be performance related and want to see what your hardware bottleneck is, then you should gather some cpu, disk and memory stats using perfmon and then co-relate them with your profiler trace to see if the slow response is related.
no
nothing wrong with that...it shows that you're not using the .NET functionality embed in SQL Server.
You can check http://www.xsqlsoftware.com/Product/xSQL_Profiler.aspx for more detailed analysis of profiler trace. It has reports that show top queries by time or CPU (Not one single query, but sum of all execution of a single query).
Some other things to check:
Make sure your datafiles or log files
are not auto-extending.
Make sure your anti-virus is set to
ignore your sql data and log
files.
When looking at the profiler output, be sure the check the queries that finished just prior to your targets,
they could've been blocking.
Make sure you've turned off Auto-close on the database; re-opening after closing takes some
time.