We have a very strange intermittent issue which has started coming up over the last month or so whereby some connections to mssql server fail with the error:
System.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)
The error does not bring down the site, nor does it require a db restart - if you simply rerun the same query will work the second time. This means a lot of users will hit an error every now and then and have to refresh the error page for things to work.
Now, my initial knee-jerk reaction was this could be due to:
Resource related issue - so I started running SQL profiler and perfmon, but did not find any issues with the serve struggling to keep up with the number of connections / sec. I've been looking at MSSQL:SQL Errors, MSSQL:Wait Statistics, MSSQL:Exec Statistics, MSSQL:Locks. Does anyone have any guidance on other stats I should be poking and prodding here?
Unclosed DB connections - I ruled this one out after going through all the data-tier code. We have all the fail safes in place to stop this from happening.
Connection / Network related issue: our SQL server sits on a separate server (MS SQL Server Standard 2008) to our application server (running ASP.Net on IIS7) - both servers run on xlarge Amazon EC2 instances with all security policies configured (as per Amazons direction). Anyone got guidance on how to test the connectivity between the two servers or if this could be the issue?
Is it a possible issue with the IIS connection string? I have not tested this but should we be fully qualifying the server with the computer name we are connecting to (just thought of it)? We use a connection string in the format: server=xxxxx;Database=xxxx;uid=xxxx;password=xxx;
Your thoughts and insight is very much appreciated!
Thanks in advance
Solved. After testing almost every possible performance metric and examining every piece of code, I discovered that the error was caused by a bit of deprecated database code. The main issue was being caused by code using:
SqlConnection.ClearPools;
For future reference, any other developers looking to debug their code and manage connection pools, an excellent resource can be found here: http://www.codeproject.com/KB/dotnet/ADONET_ConnectionPooling.aspx
Try changing the connection string to the FQDN+port
server=xxxxx.domain.tld,1234;
Note: you don't need any instance name if you use port
On our global corporate intranet... we had a similar issue that happened to remote clients: more often if they were further away, never in the same building as the server.
After some poking around, chatting to the DBAs and MS, it was said to be caused by timing/Kerberos/too many firewalls etc. Adding FQDN+port removed all our issues.
It may be solved by switching to TCP/IP instead of Named Pipes, if you can.
Perhaps you can test this by changing the server name to the server IP address.
I use server=tcp:servername in my connection string to force TCP.
KB313295
It seems like connection are not being closed correctly, and after some time you can't open any more new connections. As the total allowed connections to database is a constant digit.
If you are using C#/VB.net
Are you using "Using" statements to open the connections ?
using (System.Data.SqlClient.SqlConnection con = new SqlConnection("YourConnection string"))
{
con.Open();
}
Related
We discovered very strange behavior with Entity Framework last night. It was probably the result of my misguided approach to code first against an existing database, I solved it by turning off the initializer.
Database.SetInitializer<DataContext>(null);
But I would still like to know how such behavior was possible and if it's something we need to be worried about for future EF deployments. Maybe EF code first is not stable across all environments.
We have an existing database that supports the functionality of an existing legacy app. I added new functionality to the app with "code first" and ended up needing to share the same database. So we created the new table with a SQL script and turned on EF Migrations. This all worked fine for me locally, and also worked against the SQL Server instance on my colleague's machine.
However when he tried to run it all from his machine, against the same database, the EF call always timed out with the error:
A network-related or instance-specific error occurred while
establishing a connection to SQL Server. The server was not found or
was not accessible. Verify that the instance name is correct and that
SQL Server is configured to allow remote connections. (provider: Named
Pipes Provider, error: 40 - Could not open a connection to SQL Server
We even hard coded a different connection string to see if it would build its own separate database. Same error.
We finally figured out it was the Initializer because while inspecting the DbContext at a break point in the debugger, we triggered the initializer. If we paused execution and expanded the DbContext in a watch window, the database would initialize and the process would continue without error!
It's also worth noting that things were working on his machine until EF noticed model changes. That's when we enabled migrations and updated the table via script and permanently broke the Initializer on his machine.
This is one of the most bizarre things I've experienced. What could cause the EF initializer to hang and throw false network errors about an otherwise functional database...but then work from the debugger? And only on one machine!
Context: The Cloud
We have a java-based web application that we normally host on our own servers. Recently we used Amazon Web Services (AWS EC2) cloud to host an instance.
This "cloud setup" matches our typical "on site" setup: one server for the app server, another server for the database server. (Several app servers point to the same database server)
The problem
In this cloud setup, we receive intermittent "connection reset by peer errors" between the database and the jdbc driver, where at (seemingly) random intervals and at random points in the codebase, the database connection fails.
Here are a few error excerpts for the log
Stack Trace Example 1:
at com.participate.pe.genericdisplay.client.taglib.GenDisplayViewTag.doStartTag(GenDisplayViewTag.java:77)
... 75 more
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:170)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(SQLServerConnection.java:304)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.getMetaData(SQLServerConnection.java:1734)
at org.jboss.resource.adapter.jdbc.WrappedConnection.getMetaData(WrappedConnection.java:354)
Stack Trace Example 2
at java.lang.Thread.run(Thread.java:619)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1368)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1355)
at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1532)
at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:3274)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:4437)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:4389)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$1ConnectionCommand.doExecute(SQLServerConnection.java:1457)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4026)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1416)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectionCommand(SQLServerConnection.java:1462)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.setAutoCommit(SQLServerConnection.java:1610)
at org.jboss.resource.adapter.jdbc.BaseWrapperManagedConnection.checkTransaction(BaseWrapperManagedConnection.java:429)
Technical Environment
Jboss 4.2.2.GA (Jboss-Web 2.0/ Tomcat 6)
MSSQL 2005 2.0 jdbc driver
Some points
We have never seen this problem in
our own environment (i.e. own data centers) running the application for several years
This led me to conclude "something funny is going on with Amazon network environment". I may be wrong/missing something/etc.
This problem only occurs with our application. We have other java and php applications which have not had this problem. The other java application uses a different jdbc driver (jtds, afaik)
It doesn't seem like a simple connection timeout
Questions
-Has anyone seen this before?
-If it's an EC2 "known issue", can we configure our way around the problem (i.e. make sure everything is on its own subnet or virtual private cloud (vpc) ?
-Any jdbc driver settings to get past this problem?
** Update **
I've extended and increased the bounty on this question.
On extra bit of information: the two virtual servers (database and application server) were on different subnets--i.e. one hop between the two servers.
In a non-cloud environment we have "zero hops" bewtewn the two servers.
Our hosting admins said we had no control over the subnets of our EC2 instances. This made me wonder if virtual private cloud would help.
thanks in advance
will
Not sure if this is related or not. We experienced something similar with an app that we were running in the EC2 environment. Same symptom, that the database connection would intermittently close. We were using MSSQL 1.2 driver. Also, we would see the errors usually after a delay or idle time with the connection. Our assumption (never proven) was that something in the network layer was closing the connection and the client wasn't detecting it, so it became stale.
We were able to work around it because we were using commons connection pools, and had the pool recreate the connection on failure. We eventually moved the application out of EC2 and didn't see the issue again.
Just a word of caution on usind DBCP/connection pool features to mitigate the issue - the more you enable 'testOnBorrow' and other features, the more you can introduce latency or other performance changing affects on the system. I don't know if DBCP still does this or not, but a few years ago it would generate actual test queries to test the connection - full stack, database responses - not just at the network layer. The above link from Brian brings back horrific memories from the early 2000s on surrounding re-try logic for JDBC connection management.
Anyway, it's tough to really root cause this, other than gather evidence and eliminate the 'seemingly random' to a specific set of conditions:
You could try to throw up a Wireshark/PCAP trace, find when it happens, and send the results to both Amazon and Microsoft to see if they can root cause it
You could try the above with certain test harnesses to isolate the problem (JMeter tests to get concurrency up), bounce the network connection, watch for recovery, etc
You could try alternative versions of SQL Server to discount a SQL Server/JDBC driver bug that has since been fixed.
If DNS is used in connection strings, could use IP addresses to validate nslookup issues
I'm not a SQL Server expert, but another route for research could be within the related products domain - e.g. see if anyone experienced similar issues with TFS/Sharepoint (e.g. such as http://nickhoggard.wordpress.com/2009/12/07/further-experiences-with-tfs-2010-beta-2-on-amazon-ec2/ )
I have seen this issue in both the EC2 environment and the Windows Azure environment. I think connection retry logic needs to be a standard part of your design when working in a distributed computing environment.
This article is for SQL Azure - but I think it equally applies to EC2 and all drivers.
I can also confirm that this happens and will spin up a lower priority investigation since it's not production critical.
Our production servers are in our data center. We use developer laptops to run our applications. Neither of these get this issue once we configured c3p0 connection pool timeouts and test period (see article: http://www.codefin.net/2007/05/hibernate-and-mysql-connection-timeouts.html).
However, we do have a development staging server that is in EC2 and it does indeed happen there. If I find something that seems to work, I'll ping back. Also, I'm using mysql. I see that you are using MS SQL Server so it is across database vendors.
We have some scripts that we run as part of our unit tests.
This worked fine until today.
We have tried running scripts with both windows and sql authentication.
We have no problems logging in using sql manager
Anybody have any ideas why we get the following error:
Shared Memory Provider: No process is on the other end of the pipe.
Sqlcmd: Error: Microsoft SQL Native Client : Communication link failure.
Thanks
Shiraz
EDIT
Thanks for the replys. The actual appears to be a password problem, which used up all the connections. The process was not listening because there were no available connections.
Look in SQL Server Configuration Manager and make sure the protocols you are using to connect to it are setup correctly. I suggest you enable "Shared Memory" and "TCP/IP".
Ask around, try and determine what was changed on your environment--by who, and how--that caused a working process to stop working. If succesful, you will (a) have a strong lead on discovering the details of what's going wrong, and (b) be in a position to ensure it doesn't recur. (Just solving the tech side might not prevent it from happening again...)
Today we had a lot more activity than normal between our Ruby on Rails application and our remote legacy SQL Server 2005 database, and we started getting the error below intermittently. What is is? How can I prevent it (besides avoiding the situation, which we're working on)?
Error Message:
ActiveRecord::StatementInvalid: DBI::DatabaseError: 08S01 (20020) [unixODBC][FreeTDS][SQL Server]
Bad token from the server: Datastream processing out of sync: SELECT * FROM [marketing] WHERE ([marketing].[contact_id] = 832085)
You need to use some sort of connection pooling.
Windows itself (including Windows Server X) will only allow a certain number of socket connections to be created in a given time frame, even if you close them. All others will fail after that.
A connection pool will keep the same sockets open, avoiding the problem. Also, new connections are real slow.
This Microsoft article says:
Often caused by an abruptly terminated network connection, which causes a damaged Tabular Data Stream token to be read by the client.
Was the server network bound? I have no experience with SQLÂ Server, but does it have a limit on the number of connections you can make?
Add the following to your script; the first statement you run:
SET NO_BROWSETABLE OFF
Here is the full error: SqlException: A transport-level error has occurred when receiving results from the server. (provider: Shared Memory Provider, error: 1 - I/O Error detected in read/write operation)
I've started seeing this message intermittently for a few of the unit tests in my application (there are over 1100 unit & system tests). I'm using the test runner in ReSharper 4.1.
One other thing: my development machine is a VMWare virtual machine.
I ran into this many moons ago. Bottom line is you are running out of available ports.
First make sure your calling application has connection pooling on.
If that does then check the number of available ports for the SQL Server.
What is happening is that if pooling is off then every call takes a port and it takes by default 4 minutes to have the port expire, and you are running out of ports.
If pooling is on then you need to profile all the ports of SQL Server and make sure you have enough and expand them if necessary.
When I came across this error, connection pooling was off and it caused this issue whenever a decent load was put on the website. We did not see it in development because the load was 2 or 3 people at max, but once the number grew over 10 we kept seeing this error. We turned pooling on, and it fixed it.
I ran into this many moons ago as well. However, not to discount #Longhorn213s explanation, but we had the exact opposite behavior. We received the error in development and testing, but not production where obviously the load was much greater. We ended up tolerating the issue in development as it was sporadic and didn't materially slow down progress. I think there could be several reasons for this error, but was never able to pin point the cause myself.
We've also run across this error and figured out that we were killing a SQL server connection from the database server. The client application is under the impression that the connection is still active and tries make use of that connection, but fails because it was terminated.
We saw this in our environment, and traced part of it down to the "NOLOCK" hint in our queries. We removed the NOLOCK hint and set our servers to use Snapshot Isolation mode, and the frequency of these errors was reduced quite a bit.
We have seen this error a few times and tried different resolutions with varying success. One common underlying theme has been that the system giving the error was running low on memory. This is especially true if the server that is hosting Sql Server is running ANY other non-OS process. By default SQL Server will grab any memory that it can, then if leaving little for other processes/drivers. This can cause erratic behavior and intermittent messages. It is good practice to configure your SQL Server for a maximum memory that leaves some headroom is there are other processes that might need it. Example: Visual Studio on a dev machine that is running a copy of SQL Server developers edition on the same machine.