Intermittent Read Timeouts to SQL Server DB - com.microsoft.sqlserver.jdbc.SQLServerException: Read timed out - sql-server

I have a software service running via Tomcat which performs some analytics on data from my SQLServer database using the SQLServer JDBC driver.
Around 50% of the time when running SQL queries I receive the error:
com.microsoft.sqlserver.jdbc.SQLServerException: Read timed out
I have set my connectiontimeout property in the connection strings that the software uses to connect to my database. I have also set the connection timeout within the software itself in the settings. This has no effect on the error, I set connection timeout to 120 seconds and it still throws a readtimeout within 30.
The intermittent nature of the timeouts is very strange. I'll get this error for a few minutes, and then it will work for a few minutes, refreshing easily.
Is anyone familiar with the error? Are there any more settings I could set in the connection string to try and resolve the readtimeout as it seems to be a different thing to connectiontimeout.

Related

Every 1 in 30 connections I get Win32Exception: Unknown location error. Azure web app to AWS SQL DB

We have a couple of .NET Core 3.0 Web Apps (UK South) that connect to a MS SQL 2016 database which is running on an Amazon Windows Server 2016 Datacenter (EC2 instance). We connect via an Azure Relay/Hybrid Connection which is installed on the SQL Server.
It has been working fine for over a year with no errors, but recently we've started getting the following error, about 1 in every 30 connections:
An unhandled exception occurred while processing the request.
Win32Exception: An existing connection was forcibly closed by the remote host.
Unknown location
SqlException: A connection was successfully established with the server, but
then an error occurred during the pre-login handshake. (provider: TCP Provider,
error: 0 - An existing connection was forcibly closed by the remote host.)
If you try again it usually works.
After reading a lot of posts on this I added transient error handling to the code/resilience using EnableRetryOnFailure() to the DB connection.
I also tried adding Trusted_Connection=False to the connection string.
After this the you could see the connection re-trying multiple times until it worked, sometimes taking 20 seconds or more. Still, maybe 1 in 100 connections it eventually fails with the same error.
We also looked at the TLS_DHE bug https://learn.microsoft.com/en-us/troubleshoot/windows-server/identity/apps-forcibly-closed-tls-connection-errors but the TLS_DHE ciphers are not installed on the server at all.
There's nothing in the event logs on the Windows server, or in the database logs at the time of the error.
Recent changes in the infrastructure: Panda antivirus, moved web apps to a different Azure region.
I've been reading posts on this for days now, mostly really old and slightly different. I'm looking for any ideas of things to try to pinpoint the error. Thanks.
edit: I found some event logs in Microsoft/ServiceBus/Client
HybridConnectionManager Trace: Microsoft.Azure.Relay.RelayException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.WebSockets.WebSocketException: An internal WebSocket error occurred. Please see the innerException, if present, for more details. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult)
at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
--- End of inner exception stack trace ---
Well, this took three months to resolve and it involved our network support team, AWS support, and Azure support.
I've come back three times to edit this answer. The solution returned on a different server so we tried the fixes that worked on one and they didn't work!
In Azure Relay/Hyrbid connections, under the connection in question we saw there were TWO listeners, when there should only be one. Each Hybrid Connection Manager you install and connect shows up there as a listener.
So where was the second listener? Nowhere. It seemed to be a hanging orphan link from a previously deleted connection.
The only way to delete the phantom listener was to
uninstall HCM on the database server
remove the connection from all azure apps using it
delete the hybrid connection completely in azure
recreate the connection in azure afresh
reconnect the apps
reinstall HCM on the database server
connect HCM to the new hybrid connection
After this we showed one listener under the connection in Azure, and things worked immediately.
When you have two listeners the data is load balanced between them, so in my case half the time the data was being routed to a non-existent listener and failing. This is why no logs appeared on the database server - it wasn't getting there at all!

SSRS error "Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding."

I have a SQL Server Reporting Services(SSRS) report which works fine in Visual Studio 2015 report designer version 13.0,1100.286, but once deployed to a report server it keeps throwing the following error:
An error has occurred during report processing. (rsProcessingAborted)
Cannot create a connection to data source 'DummyDataSource'. (rsErrorOpeningConnection)
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
When I attempt to test the connection of the data source in SQL Server Report Builder, I get the same error:
And here is how I set the credentials for my data source. Note that 'myUser' can connect to database 'MyDatabase' in SQL Server Management Studio, and in addition the report works fine in Visual Studio report designer with the same credentials:
I have tried setting the report timeout to 1800 (from initial setting of "Use the system default setting"), but that didn't solve the problem:
I also tried setting the timeout for the data set in the report to 30 and 60 seconds, also with no success. It seems it can't connect to the database at all, because it fails on "Test Connection", before I even attempt to run the report itself.
Any idea why that's happening?
So after much digging and trying everything I came upon while googling this, the answer was the following: recently our IT people added a ton of new IPs to the database server and when a report (on the report server) attempted to connect to the database, it was enumerating all those IPs and was trying to connect to all of them, which resulted in it failing to connect to the right one and hence, the error above.
In order to address this, we added a new DNS entry, mapped solely to the IP address of the database and that finally fixed the issue.
So to summarize, my old connection string (that was trying to connect to all IPs) was:
Data Source=MyDatabase;Initial Catalog=DummyDataSource
The new DNS entry is: sql.MyDatabase.CompanyName.com
And finally, the new connection string that works fine is:
Data Source=sql.MyDatabase.CompanyName.com;Initial Catalog=DummyDataSource

Getting the connection pool error even when setting the connection string property Pooling to false

I am having some connection pool issues in my sharepoint application. Every time that my application tries to fetch some data from an SQL Server 2008 R2, I got this exception:
"Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached."
I know that I am probably having connections leak problems, but I have checked all parts of my code that I am accessing the database, and all of them are implemented with the using() pattern. My SQL Server version is 2008 R2 Express, so I don't have the Profiler Tool to see how many connections my application is actually creating.
I have tried disable the pooling setting the Pooling=false; in my connection string, but I've got the same error. Also, I have tried increase the connection timeout and the max pool size with no success.
Inspecting the User Connections on my SQL Server instance before my application get failed using the perfmon, I have got that the number of connections is not even close to 100 (the default max pool size).
One important information is that if I run this application in another computer here in my office, it works perfectly.
Obs: I am using entity context to access the database and this application is not published, I am just run it locally with vs2013.
If you guys know some good way to inspect the connections behavior of my application or have a idea of what could be happening in my development environment, please share with me.
Thanks.

SQL Server Connection Timeout Not working

I've changed the Connection Timeout to 10 seconds.
'...;Connection Timeout=10;..."
The code uses EF 6, .Net 4.5.1.
SQL Server Client has both tcp/ip and Named Pipes enabled.
Disconnected from the Network and then start debugging.
The timeout error happens at the 1 minute mark.
This seems to indicate the error happens when the sql server is not located.
Is SQL Server Ado.net client code is setting the network timeout?
I had thought this was fixed a while ago, like back in 2.0 days.

IIS + Kerberos + SQL Server + EF Initial connection failure

I have a web server on my domain that I'm trying to use Kerberos delegation to allow access to my SQL Server. They are all Server 2008 R2 servers with IIS 7.5 and SQL 2008 R2 (the DC is also Server 2008 R2).
Everything is working, in that I see transactions being executed on my SQL Server under the user's account. However, the first time I access the site after an extended period of time (30 mins or so) I get the following error thrown by my EF DataContext object:
Exception: The underlying provider failed on Open
at System.Data.EntityClient.EntityConnection.OptenStoreConnectionIf...
Inner Exception: A network-related or instance-specific error occurred while
establishing a connection to SQL Server. The server was not found or was not
accessible. Verify that the instance name is correct and that SQL Server is
configured to allow remote connections. (provider: Named Pipes Provider,
error: 40 - Could not open a connection to SQL Server)
Inner Inner Exception: The system cannot find the file specified
The error page takes ~20 to 30 seconds to be served. After receiving this error, if I hit refresh in my browser, I get the page with all of the data almost instantly (around 200ms)
What would be causing this initial connection to fail, but all subsequent connections to succeed?
Misc information:
EF 6.0
IIS 7.5, Windows Auth & APS.NET Impersonation enabled, Extended Protection Off, Kernal-mode auth Off, Providers - Negotiate:Kerberos
AppPool uses service account (all SPNs are registered to that account)
If there is any more information that you need, let me know and I'll update this list!
UPDATE:
After doing several network traces, I'm seeing the following pattern:
HTTP Request 1
6 frames of KerberosV5 traffic
HTTP Response: No SQL Data
HTTP Request 2
2 frames of KerberosV5 traffic
TDS Prelogin
TDS Response
2 more frames KerberosV5 traffic (TGS MSSQLSvc request and response)
6 frames of TDS Traffic (SQL Data)
HTTP Response: Success!!
I'm thinking this is a kerberos issue...
I can't really tell what is causing your issue, but here is a tip on how you can deal with it, just in case you don't manage to find the cause:
EF CodePlex Link on Connection Resiliency
MSDN article on Connection Resiliency
This is feature introduced with Entity Framework 6.x. By the default, when EF encounters issue that you've brought up, it will throw an exception and then if you want to have a retry, you must write quite a messy code and duplicate it everywhere.
With Connection Resiliency, you're able to write DbExecutionStrategy that suits you the best. DbExecutionStrategy has a method that you can override that gives you ability to decide whether the query should be executed again once specific Exception type occurs. For the executing code and end user, this would just look like slight delay in execution, no error would appear.
From my personal experience, what you see now can be caused by many things, including some setting on your hosting provider (if you're not hosting it on premises). I'd look look into SQL logs or Event Viewer to see if SQL is from some reason going to a state where it is not available.

Resources