I'm dealing with a very strange problem. Initially, I thought this was a problem with cleaning up test data... but after completely refactoring my test data cleanup code and still seeing the exact same behavior... I'm at a loss.
I have 245 unit test methods in various classes. Each class has its own unique test data, and I initialize those objects, and then in each test method that data is usually inserted into the database and then manipulated with the tests. Each test class has a ClassCleanup method that cleans out all of the test data from the database, and that ClassCleanup is also run on TestInitialize, to ensure everything is cleaned up before any other test method is run.
When I "Run All" using VS 2012 Test Explorer, 22 tests fail. They all fail with some variation of Primary Key constraint violations. Meaning, when they are initializing data for these tests, the data was not cleaned up from a previous test method in that class. If I re-run all tests, I get the same tests failing every time. This is fairly reproducible. No matter how many times I run all tests. These same 27 tests fail with primary key violations.
However, the weird thing is that if I re-run ONLY those failed tests, only 9 tests fail. This is ALSO reproducible, meaning, no matter how many times I run only these 27 previously failing tests, 14 will fail, and the rest will pass. This continues as I run only failed tests, until I get to a point where none of the tests fail. It should also be noted that if I run each test class individually, everything passes.
I know how this looks.
"You obviously are not cleaning up your test data." If that were the case, then I should see the same tests fail on every run, no matter what. Those 27 tests should fail EVERY run, not just when I run everything else.
"You must not have unique primary keys between classes and things aren't being cleaned up." See above. Even IF I had primary keys repeated in my classes (which I don't, because I have personally triple-checked the uniqueness of the primary keys on test data within the various classes), since these tests are not run concurrently on separate threads (which has been verified by logging the ThreadId), the cleanup code for any given test would clean out the duplicated data, regardless.
"You must not be using connection pooling." No, actually I am. And I've verified using SQL Profiler that the requests are definitely pooled. Also, because these tests are not running in parallel, there is only ever the one connection thread.
"You shouldn't use connection pooling." Well, yes, I should, since the underlying codebase supports various web projects, but for the sake of argument I tried running all the tests with connection pooling disabled (using Pooling=false in the connection string) and I get the exact same results. No change in behavior, whatsoever.
"There must be something wrong with your local environment." I get the same results running these tests on other colleagues' dev boxes (which incidentally use SQL 2012), as well. This is not unique to my environment, or even my version of SQL Server.
"You should try running mstest from the command line." Already did that. Same results.
If anyone has encountered something like this, please let me know. I know there must be something simple I'm missing, as that's usually the case with these kinds of problems, but I've covered as many bases as I possibly can in trying to sort this out.
The following is based off the assumption that your database is in full recovery mode and you do not perform any restores or other trickery during your tests (such as detaching/reattaching the db, etc).
Here is a fairly tedious approach to investigating your problem, but is guaranteed to provide the data needed to figure this out.
Take a Full Backup of the database Do this right before starting the test suite. We're going to be restoring the database so also make sure you've got enough disk space for 2-3 copies of the database files.
Create a Sql Profiler trace For events, select RPC Starting/completed, sql Batch Starting/completed, Sql Statement starting/completed, SP Statement starting/completed, TM:* completed, SQLTransaction, DTCTransaction, and user error message. Capture all the columns.
Reproduce the Issue Run the minimum number of tests to produce a failure. Let the tests finish so you capture all the cleanup code, then stop the profiler trace.
Take a transaction log backup We may need this for point-in-time restores later.
Locate the failure in the trace If you're getting a primary key failure, then it should be easy to track down, just look for the User Error Message. Write down the exact time the error occurred.
Examine the trace for obvious issues Start from the error and work backwards until you find the start of the test that failed. Write down the exact time the setup started for the last failed test. Examine all the sql in this range. Is the sql exactly what you expect? Are the row counts correct? Is the transactionId correct? (The transactionId column should be different for every statement not in a transaction, and the same for every statement inside a transaction). If you have mis-matched BEGIN TRAN/COMMIT TRAN/ROLLBACK TRAN's, the transactionId will let you know.
Restore the DB to right before the failed test setup Restore it to a new database so we can compare the original and the copy. First restore the full backup using "RESTORE DATABASE .... WITH NORECOVERY". Then restore the transaction log backup using "RESTORE LOG .. WTIH STOPAT, RECOVERY" and specify a time immediately before the failed test setup.
Verify the database state Check for test data that may not have been cleaned up. Is everything as it should be? If not, you can restore the database again to an earlier point. You're looking for a point in time just before a test starts where the database is in a good, known state.
Restore the DB to right before the error occurred If you have room, restore to another new DB. Check for the data that caused the PK violation. Would the error occur if you ran the problematic statement again? Verify that it does or doesn't occur.
If it doesn't occur, your problem is likely mis-matched transaction handling. If you were missing a COMMIT earlier, you may have had a transaction still open. When you restore with STOPAT, any un-committed transactions would be rolled back. This would also explain how running the tests individually works, but together they fail.
If it does occur, then work backwards until you find the issue. You may need to restore the DB multiple times before you figure it out. Your process will be Restore DB, examine trace, examine data, restore to different point, examine trace, examine data, etc.
If after all this you are still at a loss, then you may want to investigate using database snapshots as part of your unit tests. Basically, create the db snapshot, setup and run test, teardown is replaced with a reverting the database back to the snapshot. This will guarantee an identical database before and after each test.
2012 Management Studio has an improved database restore wizard that makes the point in time restores very easy. Good Luck!
Not sure why yours is failing like it is but I had something similar and now I put a transactionscope in the setup like this:
public void SetUp()
{
_transactionScope = new TransactionScope(TransactionScopeOption.RequiresNew);
}
And dispose it in the teardown. That got rid of my database issues and prevented me from having to write manual cleanup code.
Related
We have a transaction that has a fair number of updates and inserts that run inside it as sent by our server-side code. We've run into an issue where all work in the transaction up to a given point is rolled back, and then later updates/inserts are run and end up being committed when the transaction is closed.
We narrowed it down to a bit of code where it would always happen and pulled that code out. Then the same behavior started happening elsewhere in the transaction. Nothing in our code is telling the transaction to roll back, and we haven't changed our code on our prod servers for quite a while before it started happening.
We finally restarted our prod db server and the problem went away, for a little while. Then it started happening again, and happened consistently after that.
We're on SQL Server 2016 and our web server is running ColdFusion 11 hotfix 18. Queries are being issued via <cfquery> inside a <cftransaction>.
Has anyone run into anything like this or have any way to diagnose the issue?
Update w/ more info: Apparently SQL Server was processing hardly any updates. We finally discovered that a log file was out of space. This usually will produce notifications/errors that this has happened, but this time it didn't for some reason. I realize this question is probably not the very highest quality, but in hopes that anyone who may run into this will find this and gain some benefit, I'm leaving it here.
It is very easy to make mistakes when it comes to UPDATE and DELETE statements in SQL Server Management Studio. You can easily delete way more than you want if you had a mistake in the WHERE condition or, even worse, delete the whole table if you mistakenly write an expression that evaluates to TRUE all the time.
Is there anyway to disallow queries that affects a large number of rows from within SQL Server Management Studio? I know there is a feature like that in MySQL Workbench, but I couldn't find any in SQL Server Management Studio.
No.
It is your responsibility to ensure that:
Your data is properly backed up, so you can restore your data after making inadverdent changes.
You are not writing a new query from scratch and executing it directly on a production database without testing it first.
You execute your query in a transaction, and review the changes before committing the transaction.
You know how to properly filter your query to avoid issuing a DELETE/UPDATE statement on your entire table. If in doubt, always issue a SELECT * or a SELECT COUNT(*)-statement first, to see which records will be affected.
You don't rely on some silly feature in the front-end that might save you at times, but that will completely screw you over at other times.
A lot of good comments already said. Just one tiny addition: I have created a solution to prohibit occasional execution of DELETE or UPDATE without any WHERE condition at all. This is implemented as "Fatal actions guard" in my add-in named SSMSBoost.
(My comments were getting rather unwieldy)
One simple option if you are uncertain is to BEGIN TRAN, do the update, and if the rows affected count is significantly different than expected, ROLLBACK, otherwise, do a few checks, e.g. SELECTs to ensure just the intended data was updated, and then COMMIT. The caveat here is that this will lock rows until you commit / rollback, and potentially require escalation to TABLOCK if a large number of rows are updated, so you will need to have the checking scripts planned in advance.
That said, in any half-serious system, no one, not even senior DBA's, should really be executing direct ad-hoc DML statements on a prod DB (and arguably the formal UAT DB too) - this is what tested applications are meant for (or tested, verified patch scripts executed only after change control processes are considered).
In less formal dev environments, does it really matter if things get broken? In fact, if you are an advocate of Chaos Monkey, having juniors break your data might be a good thing in the long run - it will ensure that your process re scripting, migration, static data deployment, integrity checking are all in good order?
My suggestion for you is disable auto commit. where you can commit your changes after review it. and commit it before ending the session.
for more details you can please follow the MSDN link:
http://msdn.microsoft.com/en-us/library/ms187807.aspx
When I run the test cases inside soapUI, everything works fine. But when I'm running the tests inside jenkins, the assertions fail as you can see in this gist.
I am not having issues when connecting to the database. Any tips for those asserts?
I suspect what you have is a timing issue. If your jdbc checks are occurring before the transaction has been committed by your server then you will see the sort of behaviour you have. This is particularly an issue where the web-service does not issue a response before you commence your jdbc checks.
Three possible solutions are:
add a delay step before the assertions to allow the server time to commit the transactions
add a response for your request to indicate when the database checks should occur
Check the max id of the table and wait for a greater record to arrive before performing the checks.
When running a stored procedure, we're getting the error 297
"The user does not have permission to perform this action"
This occurs during times of heavy load (regularly, when a trim job is running concurrently). The error clears up when the service accessing SQL Server is restarted (and very likely the trim job is finished as well), so it's obviously not a real permissions problems. The error is reported on a line of a stored procedure which access a function, which in turn accesses dynamic management views.
What kind of situations could cause an error like this, when it's not really a permissions problem?
Might potentially turning on trace flag 4616 fix this, as per this article? I'd like to be able to just try it, but need more info. Also, I'm baffled by the fact that this is an intermittent problem, only happening under periods of high activity.
I was trying to reproduce this same error in other situations (that were also not real permissions problems), and I found that when running this on SQL Server 2005 I do get the permissions problem:
select * from sys.dm_db_index_physical_stats(66,null,null, null, null)
(66 is an invalid DBID.)
However, we're not using dm_db_index_physical_stats with an incorrect DBID. We ARE using dm_tran_session_transactions and dm_tran_active_transactions, but they don't accept parameters so I can't get the error to happen with them. But I was thinking perhaps that the issue is linked.
Thanks for any insights.
Would it be related to concurrency issues?
For example, the same data being processed or a global temp table being accessed? If so, you may consider sp_getapplock
And does each connection use different credentials with a different set of permissions? Do all users have GRANT VIEW SERVER STATE TO xxx?
Finally, and related to both ideas above, do you use EXECUTE AS anywhere that may not be reverted etc?
Completely random idea: I've seen this before but only when I've omitted a GO between the end of the stored proc definition and the following GRANT statement. So the SP tried to set it's own permissions. Is it possible that a timeout or concurrency issue causes some code to run that wouldn't normally?
If this occurs only during periods of heavy activity maybe you can run Profiler and watch for what locks are being held.
Also is this always being run the same way? For example is it run as a SQL Agent Job? or are you sometimes running manually and sometimes running it as a job. My thinking is maybe it is running as diff. users at different times.
Maybe also take a look at this Blog Post
Thanks everyone for your input. What I did (which looks like it's fixed the problem for now), is alter the daily trim job. It now waits substantially longer between deletes, and also deletes a much smaller chunk of records at a time.
I'll update this later on with more info as I get it.
Thanks again.
I have been told that SQL Profiler makes changes to the MSDB when it is run. Is this true and if so what changes does it make?
MORE INFO
The reason I ask is that we have a DBA who wants us to range a change request when we run the profiler on a live server. Her argument is that it makes changes to the DB's which should be change controlled.
Starting a trace adds a row into msdb.sys.traces, stopping the trace removes the row. However msdb.sys.traces is a view over an internal table valued function and is not backed by any physical storage. To prove this, set msdb to read_only, start a trace, observer the new row in msdb.sys.traces, stop the trace, remember to turn msdb back read_write. Since a trace can be started in the Profiler event when msdb is read only it is clear that normally there is no write into msdb that can occur.
Now before you go and grin to your dba, she is actually right. Profiler traces can pose a significant stress on a live system because the traced events must block until they can generate the trace record. Live, busy, systems may experience blocking on resources of type SQLTRACE_BUFFER_FLUSH, SQLTRACE_LOCK, TRACEWRITE and other. Live traces (profiler) are usualy worse, file traces (sp_trace_create) are better, but still can cause issues. So starting new traces should definetly something that the DBa should be informed about and very carefully considered.
The only ones I know happen when you schedule a trace to gather periodic information - a job is added.
That's not the case as far as I'm aware (other than the trivial change noted by others).
What changes are you referring to?
Nothing I have ever read, heard, or seen says that SQL Profiler or anything it does or uses has any impact on the MSDB database. (SQL Profiler is, essentially, a GUI wrapped around the trace routines.) It is of course possible to configure a specific setup/implementation to do, well, anything, and perhaps that's what someone is thinking of.
This sounds like a kind of "urban legend". I recommend that you challenge it -- get the people who claim it to be true to provide proof.