I've recently moved a classic ASP site from a single-server IIS6 (Window Server 2003) and SQL Server 2005 setup, to a Hyper-V setup running Windows Server 2012 on the host and two VMs (single machine).
Here is a diagram of the current setup:
My problem is that I am getting the following error intermittently:
Named Pipes Provider: Could not open a connection to SQL Server [53].
I've been told and was able to prove that the web-to-DB traffic never uses the physical NIC, so that should rule out any issues w/ the NIC or its drivers/configuration.
I've also made sure that there are no IP conflicts (the host and VM IPs are all different).
The only pattern I can detect is that it seems more likely to happen during peak periods. The odd thing is it can go 7 days without an error, and then on a single day, the error will happen on 50-100 requests, often within the same 30 seconds, or in groups of 30-second intervals.
I've been trying to figure this out for weeks -- since migrating to the new server over 3 weeks ago. If no one here can help, my last resort is to open a ticket with Microsoft. However, I'm not optimistic they will be able to help as I'm not able to reproduce it.
As a last resort, I'm considering moving them back to a single instance, which I'm trying my best to avoid.
Update:
Here is the connection string I'm using:
Provider=SQLNCLI11;Server=[my DB VM IP address];Integrated Security=SSPI;"
Related
I have a bunch of legacy access based databases that I've been using for years without issue - queries have been running between them for years using ODBC/DAO/ADO. Now suddenly in the last few days, I've started getting the "The database has been placed in a state by user...." error on a bunch of them.
I have tried to narrow the problem down, but it seems to be getting worse. I have tried making a local copy of the database file, opening it, and then on the same machine, trying to create an ODBC connection to it, and get the error. I have also tried running successive queries on the database and still get the same thing (copy of the file on my local machine, so there is only my single connection, basically connect to the database, run a query, close the connection, wait 2 minutes, then try to open a new connection - FAIL - so it is definitely not a multi user limit problem or anything like that.
The issue is consistent across multiple platforms (directly in MS Access (2010 and 2013), with Excel (2010 and 2013) queries to the Access DB, and with Windows Forms VB.net applications trying to query the access DB (through datasets, OLEDB, and ADO)
Until this week all of these applications were working as designed and had been for years- I am the only Dev working on this stuff, so I know that nothing in the programming has changed, so it must be an external issue.
The back end databases reside on a shared server drive (server is running Windows Server 2008) - and we have had no other connection issues to the server or network; it is limited to connections to access database files.
Does anyone know if something has changed lately (in the last week or so) with the ODBC drivers? Maybe an MS update?
Thanks in advance!
It seems that you can fix this issue by buffering the Access binary. Use the Binary.Buffer function in a query that defines your Access database, then reference that query in order to use the binary in a query that pulls each table. Note: I also define parameters for my folder path and file names.
For example:
//myDbBinary
let
Source = Binary.Buffer(File.Contents(DataFolder_param & FileName_param),
[CreateNavigationProperties=true]))
in
Source
// Table1 Query
let
Source = Access.Database(myDbBinary, [CreateNavigationProperties=true]),
_Table1 = Source{[Schema="",Item="Table1"]}[Data]
in
_Table1
The source is this
Machine 1
Windows Server 2008
SQL Server 2008
The database. Contains all the information our sites use.
Machine 2
Windows Server 2012
IIS 8
The webserver. Uses IIS to host two sites:
Production site: (default) Has the most up-to-date UI and features
Backup site: Older UI, but still using the latest data from Machine 1
Here's how it works:
User goes to one of the sites hosted on Machine 2 and enters their company information
Machine 1 is queried for that company's connection string.
The site uses the connection string to connect to the correct database on Machine 1.
The problem is that about 1/3 of the connection strings use the network name (e.g. "Data Source='Machine1';") while the other 2/3 use the IP address (e.g. "Data Source=192.168.1.200;"). When connecting via the Production site, a timeout occurs if uses a connection string with a network name. However if the same user, using the same credentials, logs in to the Backup site, everything works fine regardless of which 'Data Source' is used.
I created a simple Powershell script to test the connection from Machine 2; network names and ip addresses both work, which makes me suspect it is an IIS or web.config issue. I've gone through both extensively, and these are the only differences I've noted:
Different Application Pools in IIS: However when I ran "Get-CimInstance Win32_Process" it showed both instances of w3wp.exe had been started with the same command and arguments (with the exception of different pipes)
Slightly different web.config. The Backup site has an entirely self-contained web.config, while the Production on stores its connection strings is a separate file.
Been banging my head against this for several days. Very limited in the steps I can take considering this a production website and
Database. Any advice is appreciated.
Try putting the network-library in the connection string to force tcp.
see connectionstrings.com/define-sql-server-network-protocol
;Network Library=DBMSSOCN;
PS
Yep. Been there, done that. 4 days of "on site" client visit.......and it was the protocol.. Thus how I learned to force it via the connection string. You can also try this:
Create a (temporary) System DSN (ODBC in Control Panel) with a weird name like "peanutbutter". There is a client connection button in there somewhere. Force it to tcp. Then search your registry for peanut butter and find out how the network library gets stored.
A picture is worth a thousand words. See left side of image below. (a random image from the old interweb)
Not sure if this is the correct forum for this, but here goes.
Im looking for any suggestions as to what I can try to reslove this...
I have an Access 2003 front end (on each client) with SQL 2008 database. Ive went round each user and set up the odbc connection on each pc.
for most users its fine and been working well for a year, but for a few every now and then when running a query (either an update or a select when opening a form) the SQL connection seems to have been dropped and they cant go any further.
I cant think of any glaring difference between those who have it working and those who dont.
Any idea's where I should start with this?
thanks
I've had such cases before: Access frontend, SQL Server backend. On one or some of the customer's PCs, the connection suddenly drops (throwing some ODBC or SQL Server connection error). Happens randomly and rarely (e.g. once per hour/day/week), and the Access application needs to be restarted to continue working.
In all of these cases, one of the following was the culprit:
Broken network cable
Broken network card
Buggy network card driver
Unstable network protocol (yes, this one was in the old days of NetBIOS)
The thing is: Access is extremely sensitive to network errors. A simple glitch in the network, a few seconds of lost connectivity -- something which you won't even notice with other applications -- will cause an Access frontend application to lose its database connection and crash horribly. It's very frustrating, because the customer will say "I don't experience any network trouble with Word/Windows Explorer/etc., so my network is fine, and it's your application that's broken." It's not true. If Access experiences sporadic and unpredictabe network errors, it's usually really a network problem.
So, the first thing I'd do is to replace (a) the network card, (b) the network cable and (c) use another switch port for one of the machines experiencing problems. If the problems are gone on that machine, you know that one of these components was the faulty one.
Context: The Cloud
We have a java-based web application that we normally host on our own servers. Recently we used Amazon Web Services (AWS EC2) cloud to host an instance.
This "cloud setup" matches our typical "on site" setup: one server for the app server, another server for the database server. (Several app servers point to the same database server)
The problem
In this cloud setup, we receive intermittent "connection reset by peer errors" between the database and the jdbc driver, where at (seemingly) random intervals and at random points in the codebase, the database connection fails.
Here are a few error excerpts for the log
Stack Trace Example 1:
at com.participate.pe.genericdisplay.client.taglib.GenDisplayViewTag.doStartTag(GenDisplayViewTag.java:77)
... 75 more
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The connection is closed.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDriverError(SQLServerException.java:170)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.checkClosed(SQLServerConnection.java:304)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.getMetaData(SQLServerConnection.java:1734)
at org.jboss.resource.adapter.jdbc.WrappedConnection.getMetaData(WrappedConnection.java:354)
Stack Trace Example 2
at java.lang.Thread.run(Thread.java:619)
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: Connection reset
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1368)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:1355)
at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1532)
at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:3274)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:4437)
at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:4389)
at com.microsoft.sqlserver.jdbc.SQLServerConnection$1ConnectionCommand.doExecute(SQLServerConnection.java:1457)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:4026)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:1416)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.connectionCommand(SQLServerConnection.java:1462)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.setAutoCommit(SQLServerConnection.java:1610)
at org.jboss.resource.adapter.jdbc.BaseWrapperManagedConnection.checkTransaction(BaseWrapperManagedConnection.java:429)
Technical Environment
Jboss 4.2.2.GA (Jboss-Web 2.0/ Tomcat 6)
MSSQL 2005 2.0 jdbc driver
Some points
We have never seen this problem in
our own environment (i.e. own data centers) running the application for several years
This led me to conclude "something funny is going on with Amazon network environment". I may be wrong/missing something/etc.
This problem only occurs with our application. We have other java and php applications which have not had this problem. The other java application uses a different jdbc driver (jtds, afaik)
It doesn't seem like a simple connection timeout
Questions
-Has anyone seen this before?
-If it's an EC2 "known issue", can we configure our way around the problem (i.e. make sure everything is on its own subnet or virtual private cloud (vpc) ?
-Any jdbc driver settings to get past this problem?
** Update **
I've extended and increased the bounty on this question.
On extra bit of information: the two virtual servers (database and application server) were on different subnets--i.e. one hop between the two servers.
In a non-cloud environment we have "zero hops" bewtewn the two servers.
Our hosting admins said we had no control over the subnets of our EC2 instances. This made me wonder if virtual private cloud would help.
thanks in advance
will
Not sure if this is related or not. We experienced something similar with an app that we were running in the EC2 environment. Same symptom, that the database connection would intermittently close. We were using MSSQL 1.2 driver. Also, we would see the errors usually after a delay or idle time with the connection. Our assumption (never proven) was that something in the network layer was closing the connection and the client wasn't detecting it, so it became stale.
We were able to work around it because we were using commons connection pools, and had the pool recreate the connection on failure. We eventually moved the application out of EC2 and didn't see the issue again.
Just a word of caution on usind DBCP/connection pool features to mitigate the issue - the more you enable 'testOnBorrow' and other features, the more you can introduce latency or other performance changing affects on the system. I don't know if DBCP still does this or not, but a few years ago it would generate actual test queries to test the connection - full stack, database responses - not just at the network layer. The above link from Brian brings back horrific memories from the early 2000s on surrounding re-try logic for JDBC connection management.
Anyway, it's tough to really root cause this, other than gather evidence and eliminate the 'seemingly random' to a specific set of conditions:
You could try to throw up a Wireshark/PCAP trace, find when it happens, and send the results to both Amazon and Microsoft to see if they can root cause it
You could try the above with certain test harnesses to isolate the problem (JMeter tests to get concurrency up), bounce the network connection, watch for recovery, etc
You could try alternative versions of SQL Server to discount a SQL Server/JDBC driver bug that has since been fixed.
If DNS is used in connection strings, could use IP addresses to validate nslookup issues
I'm not a SQL Server expert, but another route for research could be within the related products domain - e.g. see if anyone experienced similar issues with TFS/Sharepoint (e.g. such as http://nickhoggard.wordpress.com/2009/12/07/further-experiences-with-tfs-2010-beta-2-on-amazon-ec2/ )
I have seen this issue in both the EC2 environment and the Windows Azure environment. I think connection retry logic needs to be a standard part of your design when working in a distributed computing environment.
This article is for SQL Azure - but I think it equally applies to EC2 and all drivers.
I can also confirm that this happens and will spin up a lower priority investigation since it's not production critical.
Our production servers are in our data center. We use developer laptops to run our applications. Neither of these get this issue once we configured c3p0 connection pool timeouts and test period (see article: http://www.codefin.net/2007/05/hibernate-and-mysql-connection-timeouts.html).
However, we do have a development staging server that is in EC2 and it does indeed happen there. If I find something that seems to work, I'll ping back. Also, I'm using mysql. I see that you are using MS SQL Server so it is across database vendors.
Here is the full error: SqlException: A transport-level error has occurred when receiving results from the server. (provider: Shared Memory Provider, error: 1 - I/O Error detected in read/write operation)
I've started seeing this message intermittently for a few of the unit tests in my application (there are over 1100 unit & system tests). I'm using the test runner in ReSharper 4.1.
One other thing: my development machine is a VMWare virtual machine.
I ran into this many moons ago. Bottom line is you are running out of available ports.
First make sure your calling application has connection pooling on.
If that does then check the number of available ports for the SQL Server.
What is happening is that if pooling is off then every call takes a port and it takes by default 4 minutes to have the port expire, and you are running out of ports.
If pooling is on then you need to profile all the ports of SQL Server and make sure you have enough and expand them if necessary.
When I came across this error, connection pooling was off and it caused this issue whenever a decent load was put on the website. We did not see it in development because the load was 2 or 3 people at max, but once the number grew over 10 we kept seeing this error. We turned pooling on, and it fixed it.
I ran into this many moons ago as well. However, not to discount #Longhorn213s explanation, but we had the exact opposite behavior. We received the error in development and testing, but not production where obviously the load was much greater. We ended up tolerating the issue in development as it was sporadic and didn't materially slow down progress. I think there could be several reasons for this error, but was never able to pin point the cause myself.
We've also run across this error and figured out that we were killing a SQL server connection from the database server. The client application is under the impression that the connection is still active and tries make use of that connection, but fails because it was terminated.
We saw this in our environment, and traced part of it down to the "NOLOCK" hint in our queries. We removed the NOLOCK hint and set our servers to use Snapshot Isolation mode, and the frequency of these errors was reduced quite a bit.
We have seen this error a few times and tried different resolutions with varying success. One common underlying theme has been that the system giving the error was running low on memory. This is especially true if the server that is hosting Sql Server is running ANY other non-OS process. By default SQL Server will grab any memory that it can, then if leaving little for other processes/drivers. This can cause erratic behavior and intermittent messages. It is good practice to configure your SQL Server for a maximum memory that leaves some headroom is there are other processes that might need it. Example: Visual Studio on a dev machine that is running a copy of SQL Server developers edition on the same machine.