What causes NHibernate “Internal Connection Fatal” errors? - sql-server

When NHibernate’s log level is set to “DEBUG”, we start seeing a bunch of “Internal Connection Fatal” errors in our logs. It look like NHibernate dies about ½ way through processing a particular result set. According to the logs, the last column NHibernate reads appears to have garbage in it that isn’t in the underlying data.
The issue seems to go away when either:
The log level is set back to “ERROR”.
The view being queried is changed to return less data (either less rows OR null or blank values for various columns).
We’re using ASP.NET MVC, IIS7, .NET Framework 4.5, SQL Server 2012, log4net.2.0.2 and NHibernate.3.3.3.4001.
I guess my real concern is that there is some hidden issue with the code that the added strain of logging is bringing to light, but I'm not sure what it could be. I have double checked the NHibernate mappings and they look good. I've also checked to ensure that I'm disposing of the NHibernate session at the end of each request. I also tried bumping up the command timeout, which didn't seem to make a difference.

If one of the columns is of non-simple type (binary, text, etc.) nh may be having problems populating a property.

Turns out the connection from our dev app server to our dev database server was wonky.
From the dev app server, open SSMS and try to connect to the dev database server.
Sometimes we get the "internal connection fatal error", sometimes we don't.

Issue is possibly caused by TCP Chimney Offload/SQL Server incompatibility.
Check the following KB Article for possible solutions:
http://support.microsoft.com/kb/942861
For Windows 7/2008 R2:
By default, the TCP Chimney Offload feature is set to Auto. This means
that the chimney does not offload all connections. Instead, it
selectively offloads the connections that meet the following criteria:
The connection is established through a 10 gigabits per second (Gbps)
Ethernet adapter.
The mean round-trip link latency is less than 20
milliseconds.
At least 130 kilobytes (KB) of data were exchanged over
the connection.
Last condition is triggered in the middle of dataset, so you see garbage instead of real data.

Related

SQL Server equivalent to Oracle error numbers

I'm working on a .NET application migration from Oracle to SQL Server database. The application was developed in the 2000s by a third party, so we intend to modify it as little as possible in order to avoid introducing new bugs.
I replaced the Oracle references to SqlClient ones (OracleConnection to SqlConnection, OracleTransaction to SqlTransaction etc.) and everything worked fine. However, I'm having trouble with a logic that tries to reconnect to the DB in case of errors.
If a problem occurs when trying to read/write to the database, method TryReconnect is called. This method checks whether the Oracle exception number is 3114 or 12571; if so, it tries to reopen the connection.
I checked these error codes:
ORA-03114: Not Connected to Oracle
ORA-12571: TNS: packet writer failure
I searched for the equivalent error codes for SQL Server but I couldn't find them. I checked the MSSQL and .NET SqlClient documentation but I'm not sure that any of those is equivalent to ORA-3114 and ORA-12571.
Can somebody help me deciding which error numbers should be checked in this logic? I thought about checking for codes 0 (I saw it happen when I stopped the database to force an error and test this) and -2 (Timeout expired), but I'm not really sure about it.
The behavior is different. You can't base your SQL Server retry logic on Oracle semantics. For starters, SqlConnection will retry to connect even in the old System.Data.SqlClient library. Its replacement, Microsoft.Data.SqlClient includes configurable retry logic to handle connections to cloud databases from on-premise applications, eg an on-prem application connecting to Azure SQL. This retry logic is on by default in the current RTM version , 3.0.0.
You can also look at high-level resiliency libraries like Polly, a very popular resiliency package that implements recovery strategies like retries with backoff, circuit breakers etc. This article describes Cadru.Polly which contains strategies for handling several SQL Server transient faults. You could use this directly or you can handle the transient error numbers described in that article:
Exception Handling Strategy
Errors Handled
SqlServerTransientExceptionHandlingStrategy
40501, 49920, 49919, 49918, 41839, 41325, 41305, 41302, 41301, 40613, 40197, 10936, 10929, 10928, 10060, 10054, 10053, 4221, 4060, 12015, 233, 121, 64, 20
SqlServerTransientTransactionExceptionHandlingStrategy
40549, 40550
SqlServerTimeoutExceptionHandlingStrategy
-2
NetworkConnectivityExceptionHandlingStrategy
11001
Polly allows you to combine policies and specify different retry strategies for them, eg :
Using a cached response in some cases (lookup data?)
Retrying with backoff (even random delays) in other cases (deadlocks?). Random delays can be very useful if you run into timeouts because too many concurrent operations cause deadlocks or timeouts. Without it, all failing requests would retry at the same time, causing yet another failure
Using a circuit breaker to switch to a different service or server.
You could create an Oracle strategy so you can use Polly throughout your projects and handle all recoverable failures, not just database retries.

What does `test-on-borrow` do?

What does it do? How does it work? Why am I supposed to test the database connection before "borrowing it from the pool"?
I was not able to find any related information as to why I should be using it. Just how to use it. And it baffles me.
Can anyone provide some meaningful definition and possibly resources to find out more?
"test-on-borrow" indicates that a connection from the pool has to be validated usually by a simple SQL validation query defined in "validationQuery". These two properties are commonly used in conjunction to make sure that the current connections in the pool are not stale (no longer connected to the DB actively as a result of a DB restart, or timeouts enforced by the DB, or whatever other reason that might cause stale connections). By testing the connections on borrow, the application can automatically reconnect to the DB using new connections (and dropping the invalid ones) without a manual restart of the app and thus preventing DB connection errors in the app.
You can find more information on jdbc connection pool attributes here:
https://tomcat.apache.org/tomcat-8.0-doc/jdbc-pool.html#Common_Attributes

Error 17886 - The server will drop the connection

We are running a website on a vps server with sql server 2008 x64 r2. We are being bombarded with 17886 errors - namely:
The server will drop the connection, because the client driver has
sent multiple requests while the session is in single-user mode. This
error occurs when a client sends a request to reset the connection
while there are batches still running in the session, or when the
client sends a request while the session is resetting a connection.
Please contact the client driver vendor.
This causes sql statements to return corrupt results. I have tried pretty much all of the suggestions I have found on the net, including:
with mars, and without.
with pooling and without
with async=true and without
we only have one database and it is absolutely multi-user.
Everything has been installed recently so it is up to date. They may be correlated with high cpu (though not exclusively according to the monitors I have seen). Also correlated with high request rates from search engines. However, high cpu/requests shouldn't cause sql connections to reset - at worst we should have high response times or iis refusing to send response.
Any suggestions? I am only a developer not dba - do i need a dba to solve this problem?
Not sure but some of your queries might cause deadlocks on the server.
At the point you detect this error again
Open Management Studio (on the server, install it if necessary)
Open a new query window
Run sp_who2
Check the blkby column which is short for Blocked By. If there is any data in that column you have a deadlock problem (Normally it should be like the screenshot I attached, completely empty).
If you have a deadlock then we can continue with next steps. But right now please check that.
To fix the error above, ”MultipleActiveResultSets=True” needs to be added to the connection string.
via Event ID 17886 MSSQLServer – The server will drop the connection
I would create an eventlog task to email you whenever 17886 is thrown. Then go immediately to the db and execute the sp_who2, get the blkby spid and run a dbcc inputbuffer. Hopefully the eventinfo will give you something a bit more tangible to go on.
sp_who2
DBCC INPUTBUFFER(62)
GO
Use a "Instance Per Request" strategy in your DI-instantiation code and your problem will be solved
Most probably you are using dependency injection. During web development you have to take into account the possibility of concurrent requests. Therefor you have to make sure every request gets new instances during DI, otherwise you will get into concurrency issues. Don't be cheap by using ".SingleInstance" for services and contexts.
Enabling MARS will probably decrease the number of errors, but the errors that are encountered will be less clear. Enabling MARS is always never the solution, do not use this unless you know what you're doing.

Automatic failover with SQL mirroring and connection strings

I have 3 servers set up for SQL mirroring and automatic failover using a witness server. This works as expected.
Now my application that connects to the database, seems to have a problem when a failover occurs - I need to manually intervene and change connection strings for it to connect again.
The best solution I've found so far involves using Failover Partner parameter of the connection string, however it's neither intuitive nor complete: Data Source="Mirror";Failover Partner="Principal" found here.
From the example in the blog above (scenario #3) when the first failover occurs, and principal (failover partner) is unavailable, data source is used instead (which is the new principal). If it fails again (and I only tried within a limited period), it then comes up with an error message. This happens because the connection string is cached, so until this is refreshed, it will keep coming out with an error (it seems connection string refreshes ~5 mins after it encounters an error). If after failover I swap data source and failover partner, I will have one more silent failover again.
Is there a way to achieve fully automatic failover for applications that use mirroring databases too (without ever seeing the error)?
I can see potential workarounds using custom scripts that would poll currently active database node name and adjust connection string accordingly, however it seems like an overkill at the moment.
Read the blog post here
http://blogs.msdn.com/b/spike/archive/2010/12/15/running-a-database-mirror-setup-with-the-sqlbrowser-service-off-may-produce-unexpected-results.aspx
It explains what is happening, the failover partner is actually being read from the sql server not from your config. Run the query in that post to find out what is actually being used as the failover server. It will probably be a machine name that is not discoverable from where your client is running.
You can clear the application pool in the case a failover has happened. Not very nice I know ;-)
// ClearAllPools resets (or empties) the connection pool.
// If there are connections in use at the time of the call,
// they are marked appropriately and will be discarded
// (instead of being returned to the pool) when Close is called on them.
System.Data.SqlClient.SqlConnection.ClearAllPools();
We use it when we change an underlying server via SQL Server alias, to enforce a "refresh" of the server name.
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlconnection.clearallpools.aspx
The solution is to turn connection pooling off Pooling="false"
Whilst this has minimal impact on small applications, I haven't tested it with applications that receive hundreds of requests per minute (or more) and not sure what the implications are. Anyone care to comment?
Try this connectionString:
connectionString="Data Source=[MSSQLPrincipalServerIP,MSSQLPORT];Failover Partner=[MSSQLMirrorServerIP,MSSQLPORT];Initial Catalog=DatabaseName;Persist Security Info=True;User Id=userName; Password=userPassword.; Connection Timeout=15;"
If you are using .net development, you can try to use ObjAdoDBLib or PigSQLSrvLib and PigSQLSrvCoreLib, and the code will become simple.
Example code:
New object
ObjAdoDBLib
Me.ConnSQLSrv = New ConnSQLSrv(Me.DBSrv, Me.MirrDBSrv, Me.CurrDB, Me.DBUser, Me.DBPwd, Me.ProviderSQLSrv)
PigSQLSrvLib or PigSQLSrvCoreLib
Me.ConnSQLSrv = New ConnSQLSrv(Me.DBSrv, Me.MirrDBSrv, Me.CurrDB, Me.DBUser, Me.DBPwd)
Execute this method to automatically connect to the online database after the mirror database fails over.
Me.ConnSQLSrv.OpenOrKeepActive
For more information, see the relevant links.
https://www.nuget.org/packages/ObjAdoDBLib/
https://www.nuget.org/packages/PigSQLSrvLib/
https://www.nuget.org/packages/PigSQLSrvCoreLib/

What causes this SqlException: A transport-level error has occurred when receiving results from the server

Here is the full error: SqlException: A transport-level error has occurred when receiving results from the server. (provider: Shared Memory Provider, error: 1 - I/O Error detected in read/write operation)
I've started seeing this message intermittently for a few of the unit tests in my application (there are over 1100 unit & system tests). I'm using the test runner in ReSharper 4.1.
One other thing: my development machine is a VMWare virtual machine.
I ran into this many moons ago. Bottom line is you are running out of available ports.
First make sure your calling application has connection pooling on.
If that does then check the number of available ports for the SQL Server.
What is happening is that if pooling is off then every call takes a port and it takes by default 4 minutes to have the port expire, and you are running out of ports.
If pooling is on then you need to profile all the ports of SQL Server and make sure you have enough and expand them if necessary.
When I came across this error, connection pooling was off and it caused this issue whenever a decent load was put on the website. We did not see it in development because the load was 2 or 3 people at max, but once the number grew over 10 we kept seeing this error. We turned pooling on, and it fixed it.
I ran into this many moons ago as well. However, not to discount #Longhorn213s explanation, but we had the exact opposite behavior. We received the error in development and testing, but not production where obviously the load was much greater. We ended up tolerating the issue in development as it was sporadic and didn't materially slow down progress. I think there could be several reasons for this error, but was never able to pin point the cause myself.
We've also run across this error and figured out that we were killing a SQL server connection from the database server. The client application is under the impression that the connection is still active and tries make use of that connection, but fails because it was terminated.
We saw this in our environment, and traced part of it down to the "NOLOCK" hint in our queries. We removed the NOLOCK hint and set our servers to use Snapshot Isolation mode, and the frequency of these errors was reduced quite a bit.
We have seen this error a few times and tried different resolutions with varying success. One common underlying theme has been that the system giving the error was running low on memory. This is especially true if the server that is hosting Sql Server is running ANY other non-OS process. By default SQL Server will grab any memory that it can, then if leaving little for other processes/drivers. This can cause erratic behavior and intermittent messages. It is good practice to configure your SQL Server for a maximum memory that leaves some headroom is there are other processes that might need it. Example: Visual Studio on a dev machine that is running a copy of SQL Server developers edition on the same machine.

Resources