We are running a website on a vps server with sql server 2008 x64 r2. We are being bombarded with 17886 errors - namely:
The server will drop the connection, because the client driver has
sent multiple requests while the session is in single-user mode. This
error occurs when a client sends a request to reset the connection
while there are batches still running in the session, or when the
client sends a request while the session is resetting a connection.
Please contact the client driver vendor.
This causes sql statements to return corrupt results. I have tried pretty much all of the suggestions I have found on the net, including:
with mars, and without.
with pooling and without
with async=true and without
we only have one database and it is absolutely multi-user.
Everything has been installed recently so it is up to date. They may be correlated with high cpu (though not exclusively according to the monitors I have seen). Also correlated with high request rates from search engines. However, high cpu/requests shouldn't cause sql connections to reset - at worst we should have high response times or iis refusing to send response.
Any suggestions? I am only a developer not dba - do i need a dba to solve this problem?
Not sure but some of your queries might cause deadlocks on the server.
At the point you detect this error again
Open Management Studio (on the server, install it if necessary)
Open a new query window
Run sp_who2
Check the blkby column which is short for Blocked By. If there is any data in that column you have a deadlock problem (Normally it should be like the screenshot I attached, completely empty).
If you have a deadlock then we can continue with next steps. But right now please check that.
To fix the error above, ”MultipleActiveResultSets=True” needs to be added to the connection string.
via Event ID 17886 MSSQLServer – The server will drop the connection
I would create an eventlog task to email you whenever 17886 is thrown. Then go immediately to the db and execute the sp_who2, get the blkby spid and run a dbcc inputbuffer. Hopefully the eventinfo will give you something a bit more tangible to go on.
sp_who2
DBCC INPUTBUFFER(62)
GO
Use a "Instance Per Request" strategy in your DI-instantiation code and your problem will be solved
Most probably you are using dependency injection. During web development you have to take into account the possibility of concurrent requests. Therefor you have to make sure every request gets new instances during DI, otherwise you will get into concurrency issues. Don't be cheap by using ".SingleInstance" for services and contexts.
Enabling MARS will probably decrease the number of errors, but the errors that are encountered will be less clear. Enabling MARS is always never the solution, do not use this unless you know what you're doing.
Related
We have some NServiceBus Handlers (6.4.0) using SQL Server Transport (3.1.2) that run fine, but their expired message purge cycle always fails to remove any rows, and a WARN is always logged about this. Contrary to the message, I don't see any messages accumulating in the endpoint table. The image below is of the handler running as a console app, logging the WARN.
WARN when running as console app
Environment Oddities: Our transport (and user data) database is in Compatibility Mode 80 (i.e. SQL Server 2000 mode) even though the server instance is 2008 R2. We had some trouble with the transport tables because the the server complained that ARITHABORT had to be on to support the index used on those tables, but our corporate software demands it be off by default. To get around changing it globally, in EndpointConfig we use the 'UseCustomConnectionFactory()' to supply a function which creates new SqlConnections and, after creation, runs SET ARITHABORT ON on the connection before returning it for use by the application. That seemed to solve that issue - but now we get the purge failure and WARN. The actual error message mentions "timeout" and 'server not responding' - but the database is continually available, query-able, and in use when this is occurring. Additionally, this occurs when volume is very low - as low as 2 or 3 messages per minute.
Any ideas on what might be wrong, how we might debug further, or how to resolve the issues, are very much appreciated.
When NHibernate’s log level is set to “DEBUG”, we start seeing a bunch of “Internal Connection Fatal” errors in our logs. It look like NHibernate dies about ½ way through processing a particular result set. According to the logs, the last column NHibernate reads appears to have garbage in it that isn’t in the underlying data.
The issue seems to go away when either:
The log level is set back to “ERROR”.
The view being queried is changed to return less data (either less rows OR null or blank values for various columns).
We’re using ASP.NET MVC, IIS7, .NET Framework 4.5, SQL Server 2012, log4net.2.0.2 and NHibernate.3.3.3.4001.
I guess my real concern is that there is some hidden issue with the code that the added strain of logging is bringing to light, but I'm not sure what it could be. I have double checked the NHibernate mappings and they look good. I've also checked to ensure that I'm disposing of the NHibernate session at the end of each request. I also tried bumping up the command timeout, which didn't seem to make a difference.
If one of the columns is of non-simple type (binary, text, etc.) nh may be having problems populating a property.
Turns out the connection from our dev app server to our dev database server was wonky.
From the dev app server, open SSMS and try to connect to the dev database server.
Sometimes we get the "internal connection fatal error", sometimes we don't.
Issue is possibly caused by TCP Chimney Offload/SQL Server incompatibility.
Check the following KB Article for possible solutions:
http://support.microsoft.com/kb/942861
For Windows 7/2008 R2:
By default, the TCP Chimney Offload feature is set to Auto. This means
that the chimney does not offload all connections. Instead, it
selectively offloads the connections that meet the following criteria:
The connection is established through a 10 gigabits per second (Gbps)
Ethernet adapter.
The mean round-trip link latency is less than 20
milliseconds.
At least 130 kilobytes (KB) of data were exchanged over
the connection.
Last condition is triggered in the middle of dataset, so you see garbage instead of real data.
I have 3 servers set up for SQL mirroring and automatic failover using a witness server. This works as expected.
Now my application that connects to the database, seems to have a problem when a failover occurs - I need to manually intervene and change connection strings for it to connect again.
The best solution I've found so far involves using Failover Partner parameter of the connection string, however it's neither intuitive nor complete: Data Source="Mirror";Failover Partner="Principal" found here.
From the example in the blog above (scenario #3) when the first failover occurs, and principal (failover partner) is unavailable, data source is used instead (which is the new principal). If it fails again (and I only tried within a limited period), it then comes up with an error message. This happens because the connection string is cached, so until this is refreshed, it will keep coming out with an error (it seems connection string refreshes ~5 mins after it encounters an error). If after failover I swap data source and failover partner, I will have one more silent failover again.
Is there a way to achieve fully automatic failover for applications that use mirroring databases too (without ever seeing the error)?
I can see potential workarounds using custom scripts that would poll currently active database node name and adjust connection string accordingly, however it seems like an overkill at the moment.
Read the blog post here
http://blogs.msdn.com/b/spike/archive/2010/12/15/running-a-database-mirror-setup-with-the-sqlbrowser-service-off-may-produce-unexpected-results.aspx
It explains what is happening, the failover partner is actually being read from the sql server not from your config. Run the query in that post to find out what is actually being used as the failover server. It will probably be a machine name that is not discoverable from where your client is running.
You can clear the application pool in the case a failover has happened. Not very nice I know ;-)
// ClearAllPools resets (or empties) the connection pool.
// If there are connections in use at the time of the call,
// they are marked appropriately and will be discarded
// (instead of being returned to the pool) when Close is called on them.
System.Data.SqlClient.SqlConnection.ClearAllPools();
We use it when we change an underlying server via SQL Server alias, to enforce a "refresh" of the server name.
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.clearallpools.aspx
The solution is to turn connection pooling off Pooling="false"
Whilst this has minimal impact on small applications, I haven't tested it with applications that receive hundreds of requests per minute (or more) and not sure what the implications are. Anyone care to comment?
Try this connectionString:
connectionString="Data Source=[MSSQLPrincipalServerIP,MSSQLPORT];Failover Partner=[MSSQLMirrorServerIP,MSSQLPORT];Initial Catalog=DatabaseName;Persist Security Info=True;User Id=userName; Password=userPassword.; Connection Timeout=15;"
If you are using .net development, you can try to use ObjAdoDBLib or PigSQLSrvLib and PigSQLSrvCoreLib, and the code will become simple.
Example code:
New object
ObjAdoDBLib
Me.ConnSQLSrv = New ConnSQLSrv(Me.DBSrv, Me.MirrDBSrv, Me.CurrDB, Me.DBUser, Me.DBPwd, Me.ProviderSQLSrv)
PigSQLSrvLib or PigSQLSrvCoreLib
Me.ConnSQLSrv = New ConnSQLSrv(Me.DBSrv, Me.MirrDBSrv, Me.CurrDB, Me.DBUser, Me.DBPwd)
Execute this method to automatically connect to the online database after the mirror database fails over.
Me.ConnSQLSrv.OpenOrKeepActive
For more information, see the relevant links.
https://www.nuget.org/packages/ObjAdoDBLib/
https://www.nuget.org/packages/PigSQLSrvLib/
https://www.nuget.org/packages/PigSQLSrvCoreLib/
In my development environment, I seek to recreate a production issue we
face with MSSQL 2005. This issue has two parts:
The Problem
1) A deadlock occurs and MSSQL selects one connection ("Connection X") as the 'victim'.
2) All subsequent attempts to use "Connection X" fail (we use connection pooling). MSSQL says "The server failed to resume the transaction"
Of the two, #2 if more serious: since "connection X" is whacked every
"round robin" attempt to re-use "connection x" fails--and mysterious
"random" errors appear to the user. We must restart the server.
Why I Write
At this point, however, I wish to recreate problem #1. I can create a
deadlock easily.
But here's my issue: whereas in production, MSSQL chooses one
connection (SPID) as the 'deadlock victim', in my test environment, the deadlock just hangs...and hangs and hangs. Forever? I'm not sure, but I left it hanging overnight and it still hung in the morning.
So here's the question: how can I make sql server "choose a deadlock victim" when a deadlock occurs?
Attempts so Far
I tried setting the "lock_timeout" parameter via the jdbc url ("lockTimeout=5000"), however I got a different message than in production (in test,"Lock request time out period exceeded." instead of in production "Transaction (Process ID 59) was deadlocked on lock resources with another process and has been chosen as the deadlock victim.")
Some details on problem #2
I've researched this "unable to resume the transaction" problem and found a
few things:
bad exception handling may cause this problem. E.g.: the java code does
not close the Statement/PreparedStatement and the driver's implementation
of "Connection" is stuck with a bad/stale/old "transaction ID"
A jdbc driver upgrade may make the problem go away.
For now, however, I just want to recreate a deadlock and make sql server
"choose a deadlock victim".
thanks in advance!
Appendix A. Technical Environment
Development:
sql server 2005 SP3 (9.00.4035.00)
driver: sqljdbc.jar version 1.0
Jboss 3.2.6
jdbc url: jdbc:sqlserver://<>;
Production:
sql server 2005 SP2 (9.00.3042.00)
driver: sqljdbc.jar version 1.0
Jboss 3.2.6
jdbc url: jdbc:sqlserver://<>;
Appendix B. Steps to force a deadlock
get connection A
get connection B
run sql1 with connection A
run sql2 with connection B
run sql1 with connection B
run sql2 with connection A
where
sql1:
update member set name = name + 'x' WHERE member_id = 71
sql2:
update member set name = name + 'x' WHERE member_id = 72
The explanation of why the JDBc connection enters the incorrect state is given here: The server failed to resume the transaction... Why?. You should upgrade to JDBC SQL driver v2.0 before anything else. The link also contains advice on how to fix the application processing to avoid this situation, most importantly about avoiding the mix of JDBC transaction API with native Transact-SQL transactions.
As for the deadlock repro: you did not recreate a deadlock in test. You just blocked waiting for a transaction to commit. A deadlock is a different thing and SQL Server will choose a victim, you do not have to set deadlock priority, lock timeouts or anything. Deadlock priorities are a completely different topic and are used to choose the victim in certain scenarios like high priority vs. low priority overnight batch processing.
Any deadlock investigation should start with understanding the deadlock, if you want to eliminate it. The Dedlock Graph Event Class in Profiler is the perfect starting point. With the deadlock graph info you can see what resources is the deadlock occuring on and what statements are involved. Most times the solution is either to fix the order of updates in application (always follow the same order) or fix the access path (ie. add an index).
Update
The UPDATE .. WHERE key IN (SELECT ...) is usually deadlocking because the operation is not atomic. Multiple threads can return the the same IN list because the SELECT part does not lock anything. This is just a guess, to properly validate you must look at the deadlock info.
To validate your hand made test for deadlocks you should validate that the blocking SPIDs form a loop. Look at SELECT session_id, blocking_session_id FROM sys.dm_exec_requests WHERE blocking_session_id <> 0. If the result contains a loop (eg. A blocked by B and B blocked by A) adn the server does not trigger a deadlock, that's a bug. However, what you will find is that the blocking list will not form a loop, will be something A blocked by B and B blocked by C and C not in the list, which means you have done something wrong in the repro test.
You can specify a Deadlock priority ffor a the session using
SET DEADLOCK_PRIORITY LOW | MEDIUM | HIGH
See this MSDN link for details.
You can also use the following command to view the open transactions
DBCC OPENTRAN (db_name)
This command may help you identify what is causing the deadlock. See MSDN for more info.
What are the queries being run? What is actually causing the deadlock?
You say you have two connections A and B. A runs sql1 then sql2, while B runs sql2 then sql1. So, what is the work (queries) being done? More importantly, where are the transactions? What isolation level are you using? What opens/closes the transactions? (Yes, this leads to questioning the exception processing used by your drivers--if they don't detect and properly process a returned "it didn't work" message, then you absolutely need to take them out back and shoot them--bullets or penicillin, your call.)
Understanding the explicit details underlying the deadlock will allow you to recreate it. I'd first try to recreate it "below" your application -- that is, open up two windows in SSMS, and recreate the application's actions step by step, by hand if/as necessary. Once you can do this, step back and replicate that in your application--all on your development servers, of course!
(A thought--are your Dev databases copies of your Production DBs? If Dev DBs are orders of magnitude smaller than Prod ones, your queries may be the same but what SQL does "under the hood" will be vastly different.)
A last thought, SQL will detect and process deadlocks automatically (I really don't think you can disable this), if yours are running overnight then I don't think you have a deadlock, but rather just a conventional locking/blocking issue.
[Posting this now -- going to look something up, will check back later.]
[Later]
Interesting--SQL Server 2005 compact edition does not detect deadlocks, it only does timeouts. You're not using that in Dev, are you?
I see no way to "turn off" or otherwise control the deadlock timeout period. I hit and messed with deadlocks just last week, and some arbitrary testing then indicated that deadlocks are detected and resolved in (for our dev server) under 5 seconds. It truly seems like you don't have deadlocks on you Dev machine, just blocking. But realize that this stuff is hard for "armchair DBAs" to analyzed, you'd really need to sit down and do some serious analysis of what's going on within the system when this problem is occuring.
[ This is a response to the answers. The UI does not allow longer 'comments' on answers]
What are the queries being run? What is actually causing the deadlock?
In my test environment, I ran very simple queries:
sql1:
UPDATE principal SET name = name + '.' WHERE principal_id = 71
sql2:
UPDATE principal SET name = name + '.' WHERE principal_id = 72
Then executed them in chiastic/criss-cross order, i.e. w/o any commits.
connectionA
sql1
connectionB
sql2
sql1
sql2
This to me seems like a basic example of a deadlock. If this a "mere lock", however, and not a deadlock, please disabuse me of this notion.
In production, our 'problematic query' ("prodbad") looked liked this:
UPDATE post SET lock_flag = ?
WHERE thread_id IN (SELECT thread_id FROM POST WHERE post_id = ?)
Note a few things:
1) This "prod problem query" actually works. AFAIK it had a
deadlock this one time
2) I suspect that the problem lies in page locking, i.e. pessimistic locking due to reads elsewhere in the transaction
3) I do not know what sql this transaction executed prior to this query.
4 )This query is an example of "I can do that in one sql statement"
processing, which while seems clever to the programmer ultimately causes much more IO than running two queries:
queryM:SELECT thread_id FROM POST WHERE post_id = ?
queryN: UPDATE post SET lock_flag = ? WHERE thread_id = <>
*>(A thought--are your Dev databases copies of your Production DBs?
If Dev DBs are orders of magnitude smaller than Prod ones, your queries may be the same but >what SQL does "under the hood" will be vastly different.)*
In this case the prod and dev db's differ. "Prod server" had tons of data. "Dev db" had little data. The queries were very differently. All I wanted to do was recreate a deadlock.
*> The server failed to resume the transaction... Why?. You should upgrade to JDB
C SQL driver v2.0 before anything else.*
Thanks. We plan on this change. Switching drivers introduces a little bit of risk, so we'll need to run some test..
To recap:
I had the "bright idea" to force a simple deadlock and see if my connection was "whacked/hosed/borked/etc." The deadlock, however, behaved differently than in production.
Occasionally, on a ASP (classic) site users will get this error:
[DBNETLIB][ConnectionRead (recv()).]General network error.
Seems to be random and not connected to any particular page. The SQL server is separated from the web server and my guess is that every once and a while the "link" goes down between the two. Router/switch issue... or has someone else ran into this problem before?
Using the same setup as yours (ie separate web and database server), I've seen it from time to time and it has always been a connection problem between the servers - typically when the database server is being rebooted but sometimes when there's a comms problem somewhere in the system. I've not seen it triggered by any problems with the ASP code itself, which is why you're seeing it apparently at random and not connected to a particular page.
I'd seen this error many times. It could be caused by many things including network errors too :).
But one of the reason could be built-in feature of MS-SQL.
The feature detects DoS attacks -- in this case too many request from web server :).
But I have no idea how we fixed it :(.
SQL server configuration Manager
Disable TCP/IP , Enable Shared Memory & Named Pipes
Good Luck !
Not a solution exactly and not the same environment. However I get this error in a VBA/Excel program, and the problem is I have a hanging transaction which has not been submitted in SQL Server Management Studio (SSMS). After closing SSMS, everything works. So the lesson is a hanging transaction can block sprocs from proceeding (obvious fact, I know!). Hope this help someone here.
open command prompt - Run as administrator and type following command on the client side
netsh advfirewall set allprofiles state off
FWIW, I had this error from Excel, which would hang on an EXEC which worked fine within SSMS. I've seen queries with problems before, which were also OK within SSMS, due to 'parameter sniffing' and unsuitable cached query plans. Making a minor edit to the SP cured the problem, and it worked OK afterwards in its orginal form. I'd be interested to hear if anyone has encountered this scenario too. Try the good old OPTION (OPTIMIZE FOR UNKNOWN) :)