IIS + Kerberos + SQL Server + EF Initial connection failure - sql-server

I have a web server on my domain that I'm trying to use Kerberos delegation to allow access to my SQL Server. They are all Server 2008 R2 servers with IIS 7.5 and SQL 2008 R2 (the DC is also Server 2008 R2).
Everything is working, in that I see transactions being executed on my SQL Server under the user's account. However, the first time I access the site after an extended period of time (30 mins or so) I get the following error thrown by my EF DataContext object:
Exception: The underlying provider failed on Open
at System.Data.EntityClient.EntityConnection.OptenStoreConnectionIf...
Inner Exception: A network-related or instance-specific error occurred while
establishing a connection to SQL Server. The server was not found or was not
accessible. Verify that the instance name is correct and that SQL Server is
configured to allow remote connections. (provider: Named Pipes Provider,
error: 40 - Could not open a connection to SQL Server)
Inner Inner Exception: The system cannot find the file specified
The error page takes ~20 to 30 seconds to be served. After receiving this error, if I hit refresh in my browser, I get the page with all of the data almost instantly (around 200ms)
What would be causing this initial connection to fail, but all subsequent connections to succeed?
Misc information:
EF 6.0
IIS 7.5, Windows Auth & APS.NET Impersonation enabled, Extended Protection Off, Kernal-mode auth Off, Providers - Negotiate:Kerberos
AppPool uses service account (all SPNs are registered to that account)
If there is any more information that you need, let me know and I'll update this list!
UPDATE:
After doing several network traces, I'm seeing the following pattern:
HTTP Request 1
6 frames of KerberosV5 traffic
HTTP Response: No SQL Data
HTTP Request 2
2 frames of KerberosV5 traffic
TDS Prelogin
TDS Response
2 more frames KerberosV5 traffic (TGS MSSQLSvc request and response)
6 frames of TDS Traffic (SQL Data)
HTTP Response: Success!!
I'm thinking this is a kerberos issue...

I can't really tell what is causing your issue, but here is a tip on how you can deal with it, just in case you don't manage to find the cause:
EF CodePlex Link on Connection Resiliency
MSDN article on Connection Resiliency
This is feature introduced with Entity Framework 6.x. By the default, when EF encounters issue that you've brought up, it will throw an exception and then if you want to have a retry, you must write quite a messy code and duplicate it everywhere.
With Connection Resiliency, you're able to write DbExecutionStrategy that suits you the best. DbExecutionStrategy has a method that you can override that gives you ability to decide whether the query should be executed again once specific Exception type occurs. For the executing code and end user, this would just look like slight delay in execution, no error would appear.
From my personal experience, what you see now can be caused by many things, including some setting on your hosting provider (if you're not hosting it on premises). I'd look look into SQL logs or Event Viewer to see if SQL is from some reason going to a state where it is not available.

Related

Every 1 in 30 connections I get Win32Exception: Unknown location error. Azure web app to AWS SQL DB

We have a couple of .NET Core 3.0 Web Apps (UK South) that connect to a MS SQL 2016 database which is running on an Amazon Windows Server 2016 Datacenter (EC2 instance). We connect via an Azure Relay/Hybrid Connection which is installed on the SQL Server.
It has been working fine for over a year with no errors, but recently we've started getting the following error, about 1 in every 30 connections:
An unhandled exception occurred while processing the request.
Win32Exception: An existing connection was forcibly closed by the remote host.
Unknown location
SqlException: A connection was successfully established with the server, but
then an error occurred during the pre-login handshake. (provider: TCP Provider,
error: 0 - An existing connection was forcibly closed by the remote host.)
If you try again it usually works.
After reading a lot of posts on this I added transient error handling to the code/resilience using EnableRetryOnFailure() to the DB connection.
I also tried adding Trusted_Connection=False to the connection string.
After this the you could see the connection re-trying multiple times until it worked, sometimes taking 20 seconds or more. Still, maybe 1 in 100 connections it eventually fails with the same error.
We also looked at the TLS_DHE bug https://learn.microsoft.com/en-us/troubleshoot/windows-server/identity/apps-forcibly-closed-tls-connection-errors but the TLS_DHE ciphers are not installed on the server at all.
There's nothing in the event logs on the Windows server, or in the database logs at the time of the error.
Recent changes in the infrastructure: Panda antivirus, moved web apps to a different Azure region.
I've been reading posts on this for days now, mostly really old and slightly different. I'm looking for any ideas of things to try to pinpoint the error. Thanks.
edit: I found some event logs in Microsoft/ServiceBus/Client
HybridConnectionManager Trace: Microsoft.Azure.Relay.RelayException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.WebSockets.WebSocketException: An internal WebSocket error occurred. Please see the innerException, if present, for more details. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.Socket.EndReceive(IAsyncResult asyncResult)
at System.Net.Sockets.NetworkStream.EndRead(IAsyncResult asyncResult)
--- End of inner exception stack trace ---
Well, this took three months to resolve and it involved our network support team, AWS support, and Azure support.
I've come back three times to edit this answer. The solution returned on a different server so we tried the fixes that worked on one and they didn't work!
In Azure Relay/Hyrbid connections, under the connection in question we saw there were TWO listeners, when there should only be one. Each Hybrid Connection Manager you install and connect shows up there as a listener.
So where was the second listener? Nowhere. It seemed to be a hanging orphan link from a previously deleted connection.
The only way to delete the phantom listener was to
uninstall HCM on the database server
remove the connection from all azure apps using it
delete the hybrid connection completely in azure
recreate the connection in azure afresh
reconnect the apps
reinstall HCM on the database server
connect HCM to the new hybrid connection
After this we showed one listener under the connection in Azure, and things worked immediately.
When you have two listeners the data is load balanced between them, so in my case half the time the data was being routed to a non-existent listener and failing. This is why no logs appeared on the database server - it wasn't getting there at all!

MS Access front end to SQL Server intermittant ODBC connection issues

We have a MS-Access front application using data from a SQL server back-end. We are using ODBC connections for the data communications. The application randomly getting frozen or crashes giving different errors as attached below which very likely seems to be issues related to ODBC connection. Problems can be recreated by running different queries in quick successions.
SQL Server Version: SQL Express 14.0.3356.20
Access Version: Microsoft 365 Apps for Business Version 2011 (Build 13426.20204)
Most common error codes:
10054, 10060, 3146, 3151
ODBC Data Source Type: System DSN
ODBC Driver: ODBC Driver 17 for SQL Server (Version 2017.176.01.01)
Example Connection String for a linked table: ODBC;Description=Test Description;DRIVER=ODBC Driver 17 for SQL Server;SERVER=MY-SERVER;UID=MyUser;Trusted_Connection=No;APP=Microsoft Office;DATABASE=TestDatabase;;TABLE=dbo.TableName
Largest Table Size: 53000 records (Doesn't really matter, fails for even smaller queries, but if you try faster getting results from different queries you can break it)
Error1 Error2 Error3
We have gone through hundres of searches/articles about these and applied the fixes listed below, but still couldn't get the problem solved.
We have tried below solutions as of now which didn't help.
Created firewall rules (Inbound - TCP Port 1433, Outbound - TCP all
ports)
Added TCP Port 1433 - TCP/IP protocol
Enabled Namepipe
Updated SQL Server Express from 14.0.3335.7 to 14.0.3356.20
Made IP4 and IP6 are both enabled
Enabled and checked ODBC Trace Logs
Checked Windows Event Logs
Hope I have included everything that might be useful to get some help resolving this issue.
Note: Not sure if it's related, but the SQL Server itself (in the backend) giving the below error in rare occasions if it helps.
Connection Timeout Expired. The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement. This could be because the pre-login handshake failed or the server was unable to respond back in time. The duration spent while attempting to connect to this server was - [Pre-Login] initialization=29278; handshake=26244; (.Net SqlClient Data Provider)
Thanks everyone who tried to help. This issue has now been fixed.
For anyone who will come here in future looking for a solution for a similar issue, below is what worked for us.
Added a new instance of SQL Server 2019 (in the same virtual machine
which 2017 sits on)
Enabled UDP Port 1434
Changed virtual port from blank to zero
2017 was still crashing before taking offline with the same settings.
No issues with the 2019 version.

Datasource verification problems after Windows updates

Yesterday windows updates were installed on my laptop, and afterwards many features of ColdFusion were out of configuration.
I am using ColdFusion 2016 and SQL Server 2016 RC.
I fixed a number of issues (see below) but still get the message
Connection verification failed for data source: MT_EL
java.sql.SQLNonTransientConnectionException: [Macromedia][SQLServer
JDBC Driver]Error establishing socket to host and port: 8500:1433.
Reason: Network is unreachable: connect The root cause was that:
java.sql.SQLNonTransientConnectionException: [Macromedia][SQLServer
JDBC Driver]Error establishing socket to host and port: 8500:1433.
Reason: Network is unreachable: connect.
The DSNs had been verifying for at least a year before the problems occurred.
So far I have done the following:
Both SQL Server and CF Server had to be started again. SQL Server was not a problem but the CF Server would not start. I went to the jvm.config file and reduced the -xms setting. This did not solve anything, so I looked at the logs. From the logs it was apparent that the neo-security.xml file was corrupted, and upon checking I saw that neo-security.xml was now empty. neo-datasource, neo-drivers and one or two other files were also empty. The back-ups of these files were also empty, but I found some old versions in another place, and copied them over. Now I was able to start the CF Server and get into the CF Administrator, but had to set up user names/passwords and also DSNs again.
SQL Server Configuration Manager had been moved to a different folder, but I found it and soon saw an error message saying that SQL Server Configuration Manager could not connect to the wmi provider. I fixed this by opening a command prompt in administrator mode and typingmofcomp "%programfiles(x86)%\Microsoft SQL Server\13\Shared\sqlmgmproviderxpsp2up.mof".
Now I could get into SQL Server Configuration Manager, but for some reason it is listed twice, The malfunctioning one still says I cannot connect to the wmi provider, but expanding the functioning one, I found that TCP/IP is enabled and the default port is 1433.
I checked the firewall and could not see any issues there.
SQL permissions + log in/password credentials are the same as before, when there were no DSN verification problems.
I have tried ports 8501 and 8502, but the above error persists.
I have checked the SQL Server logs. It is apparent that a number of errors occurred yesterday and certain features were disabled. However it is evident that these issues have now been resolved, and the most recent messages are of informational type and state that no user action is necessary.
Anyone any ideas? Thank you in advance for any comments/assistance.

Intermittent Azure database connectivity from TFS build agents

I have the following setup:
Azure MSSQL database
TFS build server
Build server in one of its steps contacts Azure database and every so often I get an error message like below:
Invoke-Sqlcmd : A network-related or instance-specific error occurred
while establishing a connection to SQL Server. The server was not
found or was not accessible. Verify that the instance name is correct
and that SQL Server is configured to allow remote connections.
(provider: Named Pipes Provider, error: 40 - Could not open a
connection to SQL Server)
The rate of these failures varies. Sometimes it's one failed build for ten successful ones. Sometimes, I get five failed builds in a row.
The error occurs irrespectively of whether the build server is connected to mymssqlserver.database.windows.net or mymssqlserver.database.secure.windows.net
Azure Resource health logs are telling me that the database does indeed go offline late at night or early in the morning every few days for about 5 to 10 minutes but these offline times do not overlap with the connectivity issues.
The error always occur on remote, TFS hosted build agents, never on a local one.
Autoclose is turned off.
Your problem should be related to the Firewall Settings of you SQL Database on Azure.
I would suggest to:
verify whether the Azure SQL Database Deployment task has the parameter "Specify Firewall Rule Using" set to "Auto-Detect"; this is a must when using Hosted Build Agent;
enable verbose logging on the Build Definition (by setting System.Debug to True in the variable section), run the Build and verify whether the Azure SQL Database Deployment task is successfully setting the Firewall Rules;
verify whether the on premise build agent machine is already enlisted as allowed client in the Firewall Settings of your SQL Database, this would be the reason the deployment always works on that 'local' build agent;
read the Troubleshoot section at SQL DB Firewall Configuration and spot any possible culprit of your setup.
Solution that worked for me was to change the network protocol from the deafult, named pipes to TCP.
My connection string now looks like:
tcp:mymssqlserver.database.windows.net

SQL Server: "a connection was successfully established with server....existing connection was forcibly closed by the remote host."

Yes folks, it's this one again.
"A connection was successfully established with the server, but then
an error occurred during the login process (provider: TCP Provider,
error: 0 - An existing connection was forcibly closed by the remote
host.)"
I'm sorry... I have Google'd this, I have read the other StackOverflow articles on this problem, and I have tried all kinds of suggestions, but nothing works.
Here's a few notes about what we're seeing.
This issue occurs occassionally in SQL Server Management Studio itself (doing any kind of database activity... getting a list of tables in a database, having a look at a Stored Procedure, etc)
It also happens in Visual Studio 2010 itself, when it is trying to get data from the servers (e.g. when creating a .dbml file, etc)
It also sometimes happens in our .Net (ASP, WPF, Silverlight) applications.
Our SQL Server 2005 & 2008 servers are all based on virtual machines in data centres around the world, and we see sometimes this error on each of them. But most of the time, they all work absolutely fine.
When the error does occur, we can just "retry" what caused the error, and then it'll work fine.
We think.. if we have an IIS Web Server in a data centre in a particular city, and it accesses a SQL Server in the same data centre, then we don't see the issue.
We think.. if we connect to the servers, and specify the UserID and Password to use, it causes this error much more frequently than if we just use Active Directory authentication.
Put all that together, and it sounds to me like some kind of network issue.
But can anyone suggest what to look for ?
This isn't a bug in our .Net applications, as even SQL Server Management Studio "trips up" with this error.
It's baffling us.
Just in case anyone else hits this issue, we finally found the solution.
Our company uses Riverbed software to compress data, when it's being passed between locations, and this was somehow causing some connections to get dropped.
Our IT gurus found a configuration setting which finally fixed this issue.
I believe there's a setting in there to turn off compressing results from SQL Server (or something like that). That fixed it for us.
It could be any number of network issues. ANYTHING that prevents the code from reaching the server even for the few miliseconds it takes to make one query.
it could also be the result of a failover. When we went from a single SQL Server to a clustered environment, we'd see this happen during a failover. In this case, it turned out to be our Connection Pooling. In essence, the SQL cluster has a controller and two servers behind it. A and B.
Say our web app is using server A just fine, Connection pooling creates a connection on both sides. The server is aware of it, and the web app is aware of it. Once the cluster fails over to the second server, the web app is aware of the connection but server B is not, so we get an error.
The point is, any possible cause of network issues imaginable may be the cause. DOS attacks on the server, man-in-the middle attacks intercepting and changing traffic. Someone trips on an ethernet cable and it's loose in the jack. You name it, if it can cause a connection issue, it could be the cause.
Your issue also sounds like one we had recently - we also have a virtual environment, wih software that moves VMs from one host to another as needed for load balancing. Every so often, we'd get bombarded with the same error. It turned out to be an issue with the NIC drivers on one of the hosts, so whenever a VM moved to that particular host, errors would occur.
It's really not a programming issue. It's an environment issue, and you need trained professionals with direct access to your environment to research and resolve this.
My problem was that I was inadvertently using a wireless network to connect to our network because the Ethernet cable was faulty. This after repairing SQL Server, running a Winsock reset as recommended elsewhere ...
I am experiencing the same issue and our app interfaces with a several Azure SQL DBs. I believe (same as you) I do not have a bug in the C# code to cause this issue. We've solved it by a simple for loop containing an extra attempts to try to connect to the Azure SQL again if the previous attempt fails and then run the query.
Most of the time everything runs fine but sometimes we can see the loop kicked-in and on the 2nd or 3rd time it executed properly without the below mentioned error. After that we see in the log file the error below for all the unsuccessful attempts:
A connection was successfully established with the server,
but then an error occurred during the login process. (provider: TCP
Provider, error: 0 - An existing connection was forcibly closed by the
remote host.)
Even though this is a less-then-pretty solution, it allowed us to run our app without interruptions. I know you've mentioned that trying to connect again (to introduce some connection-failure tolerance) solves the problem and unfortunately this is the only correct solution I found so far as well.
I should mention that we have tried many debugging strategies to figure this out. Right now it all points to the availability of the database we are trying to connect to i.e.: It happens if the number of allowed DB connections is exceeded. (or so it seems at this time)
Turn off your VPN
My Problem fixed by turn off VPN
It was happening in our code when we were opening the dbconnection for oracle and were passing DBtype as SQL in our database object.
in my case - the error was Microsoft first suggestion:
Client is connecting with an unsupported version of the SQL Server Native Client.
In our case, We got this error when we updated sql server to sp3. We were not able to connect to the database from SSIS package.
We updated the native client and configurations. We were able to connect.
link to download the native client - https://www.microsoft.com/en-us/download/confirmation.aspx?id=50402
Link for configurations settings and further troubleshooting - https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms187005(v=sql.105)
Hope it helps.
Cheers!
Had the same type of issue. In my case it was a bit more complicated... I could connect to “ServerA” from “ServerB” via SSMS, but it would fail with sqlcmd. The error was the same:
Sqlcmd: Error: Microsoft SQL Server Native Client 11.0 : TCP Provider: An existing connection was forcibly closed by the remote host.
I could also connect from “ServerC” with both SSMS and sqlcmd. The following are the versions on the VMs:
ServerA: Microsoft Windows Server 2012 R2 Datacenter / Microsoft SQL Server 2012 (SP3-CU10) (KB4025925) - 11.0.6607.3 (X64)
ServerB: Microsoft Windows Server 2012 R2 Datacenter / Microsoft SQL Server 2012 - 11.0.5058.0 (X64)
ServerC: Microsoft Windows Server 2012 R2 Datacenter / Microsoft SQL Server 2012 (SP3-CU10) (KB4025925) - 11.0.6607.3 (X64)
Bottom line was the “unsupported version”. I noticed a mismatch of “sqlncli11.dll” between ServerC and ServerB, so I copied it to the System32 folder. After this, sqlcmd worked like a charm. Below were the versions in my case:
Failed:
FileVersion: 2011.0110.5058.00
ProductVersion: 11.0.5058.0
Worked:
FileVersion: 2011.0110.6607.03
ProductVersion: 11.0.6607.3
I was working on 2 projects at the same time (on 2 different machines) and both used SQL Server.
When i disconnected SQL with 1 machine the errorMessage went away. Probably you can mess around with IP-adresses too to fix the problem.
In my case I was seeing this error intermittently from a .Net application connecting to a SQL server located in the same server room. It turned out that some of the databases had auto close turned on which caused the server to close the connections in the pool. When the application tried to use one of the pool connections that had been closed, it would throw this error and I saw a log message on the SQL server that the database it was trying to connect to was being started. Auto-close has now been turned off on those databases and the error hasn't been seen since.
Also, having auto-close on is the default behavior for SQL Express databases and these were originally created on an Express instance during testing before being migrated to the production server where we were seeing the errors.
this answer is for those how have this problem with Azure Sql server database.
It happens when you reach mat pool
first remove Persist Security Info=False from connection string
second check your database plan in azure portal and increase the PTUs of your database plan.
In SSMS "Connect to Server" screen click Options, then on "Connection Properties" TAB change "Network protocol" to "Named Pipes"
Try this -
Click Start, point to All Programs, and click SQL Server Configuration Manager.
Click to expand SQL Server Network Configuration and then click Client Protocols.
Right-click the TCP/IP protocol and then click Enable.
Right-click the Named Pipes protocol and then click Enable.
Restart the SQL server service if prompted to do so.
I have had this issue a couple of time already, and I've fixed by reducing the MTU size. Often 1350, 1250, etc on my network interface.

Resources