As part of debugging some other issues on our server I noticed some really odd behavior with respect to connections that I'm hoping to understand.
I have 2 go servers, one of which talks to a SQL Server RDS instance, and another that talks to a managed SQL Server instance in Azure.
I believe there is a slight difference in the way the 2 backends work - RDS has a single port (1433) on which the client authenticates and subsequently establishes the connection. Azure SQL seems to authenticate on port 1433 and then redirect the client to another service that actually handles the connections.
In both cases I've got substantial load running against the servers. At least 500 requests/s, with peaks of about 2k req/s. Each of these requests results in a Select query which returns a single row with a primary key lookup - so really short lived connections to SQL. The average time per query is 50-80ms on both, with p95 in the 100-150ms range
Behavior I'm trying to understand:
I'm using the go database/sql driver with an MS-SQL implementation (Specifically go-mssqldb).
I've set Max Idle connections and Max Open connections to 64.
What I would expect: 64 long running Established connections that are occasionally idle but quickly reused.
What I'm seeing: Generally 64 Established connections, with the number often dropping down to somewhere between 50 and 64. This also results in 200-400 connections in the TIME_WAIT state at any given time.
What could be causing this behavior? It is just the fact that the go driver lazily closes connections? If so why would the number drop below 64?
I'm happy to provide any more details!
Related
I have a doubt about how a connection into SQL Server 2012 operates.
If is treated as per each request data.
or
If is treated as per session, when each user is connected (or the session is alive).
I need to know how many connections can stay alive depending on.
You should use one connection per request.
If you keep a connection per session, then you will limit the number of sessions to the maximum number of active connections. By using a connection per request you only need as many connections as there are threads handling requests, so the number of concurrent users is virtually unlimited.
Also, the server session ends a long time after the user actually left the site, which would further limit the number of concurrent users.
Even if the database can handle a lot of connections, it's a waste of resources to use a connection per session, and it causes a limitation that is completely unneccesary.
To give some context, I'm currently running a SQL Server 2012 instance on Amazon RDS and I've had to move to a larger instance twice already. The first time SQLAzureMW was the way to go, but at the time no table was that significantly large. The second time, SQLAzureMW always timed out the source server on the bcp command with large tables (a few over 5 GB). Similarly, SSIS Import / Export Wizard also timed out. I found the source server was always the problem so I tried increasing the instance's class from an m1.medium to an m1.xlarge to no avail, the source server still always timed out before making any significant progress on the large tables.
In the end I ended up writing my own .NET program that simply ran a "SELECT * FROM [table] ORDER BY [id] OFFSET {0} ROWS" on the large source tables and pushed the results into SQLBulkCopy on the destination server. Again the source server timed out repeatedly but I wrapped the try and catch statements in a loop that would simply resume the query from the last point where SQLBulkCopy. That being said, I'm not exactly thrilled with this solution.
I'm considering building a solution around the Microsoft.SqlServer.Management.Smo.Transfer class but I'm afraid there might be the same problems with lack of recovery from a broken source connection.
I'd much rather an out of the box solution for this like SQLAzureMW was before tables got too large and that I'd expect SSIS Import Export Wizard to be. There has to be a better way.
We were running into a similar situation: running SQLAZureMW on an Window server 2012 EC2 instance connecting to SQL Server 2012 RDS Instance. AWS support suggested the following changes on our EC2 instance and it seems to have solved all of our issues:
Increase TCP/IP timeout value as described here (i'm not sure this is actually necessary) http://docs.aws.amazon.com/redshift/latest/mgmt/connecting-firewall-guidance.html
Disabling all TCP offloading for the network adapter.
Instructions from AWS:
Here are the steps to disable TCP Offloading: Go to the properties of
the Citrix PV ethernet adapter Click Configure Go to Advanced Disable
all of the following Properties:
IPv4Checksum Offload Large Receive Offload (IPv4), Large Send Offload
Version 2(IPv4), TCP Checksum Offload (IPv4), UDP Checksum Offload
(IPv4)
Then as a final step run the following command from the command
prompt:
netsh int ip set global taskoffload=disabled
netsh int tcp set global chimney=disabled
netsh int tcp set global rss=disabled
netsh int tcp set global netdma=disabled
This issue has been known and reported to MSFT. The problem here is not with SQL Server (your source). The NIC drivers for the network card have a feature called TCP chimney which offloads the bulk data movement from the CPU to the network card. i.e For large data movement, the CPU does not get involved and rather relies on the network card to process the data. But while doing so, the NIC card some times runs out of memory (known bug).
You can simply turn off the Chimney feature off and give it another try. If your source is a production box, you may want to create a backup of the DB before doing anything with that machine (just to be on the safe side). People have reported resolving this problem by turning the feature off. Here is a link you can follow.
I thought I answered this but it turns out the problem was the instances I chose. I believe the m1 class of instances shared the same hardware network device for SAN storage and networking. The result being that enough network activity caused the system drive, and thus the virtual memory, to become inaccessible at least for an instant. Spending the money on newer hardware, m2 and above, solved the problem.
SUMMARY: if sites have separate application pools, can their traffic avoid contention through "NIC teaming"?
((Let me know if this is better posted on http://networkengineering.stackexchange.com))
DETAILS:
Our hosting provider has priced a scenario where NIC teaming could be done, between the server hosting our websites, and the server hosting our databases.
Tech details (in case they matter):
(1) The websites are hosted on a server running Windows Server 2008, with IIS 7.0.
(2) The databases are hosted on a server running Windows Server 2003, with SQL Server 2005.
(3) NIC teaming scenario they described would involve each of the two servers having a 10GBE dual-port NIC card, with crossover cables between.
(4) Each site has its own web.config, and its own application pool in IIS.
(5) Currently, the connection strings to SQL Server, for each website, all look exactly the same, but we could make each website use a different connection string.
HOWEVER, the hosting provider told us we will only see "bandwidth aggregation" if
(A) Our application is coded to use the NIC teaming (it is not), or
(B) Our communication goes over more than one TCP stream.
So, here's my first 2 questions... call this "PLAN A" --
(I) because our sites all have separate application pools (detail #4 above -- resulting in "w3wp.exe" appearing over 10 times, in Task Manager),
would that mean we have more than one TCP stream?
(II) could there be any effective decrease in network contention -- that is, could the traffic from the different sites / different application pools travel on separate tNICs?
My third question... call this "PLAN B":
(III) If the answer to both the above is "No", then I still see a possibility of giving ONE of our sites a separate SQL Server connection string, to give it a separate NIC, or separate tNIC. Does that make sense?
It sounds like it does, if I'm understanding another post here at StackOverflow:
.NET SqlConnection NIC usage
But I'd still PREFER plan a -- automatic decrease of contention, based on separate application pools -- because I trust a NIC Teaming Solution to direct traffic in a much more intelligent way -- based on varying demand -- than it would be to exclusively dedicate a port to one site's SQL Server.
Please forgive if this is TMI... feedback welcome.
Thanks for your interest...
The number of application pools does not determine the number of TCP streams. Each HTTP request to your server will be a separate TCP stream, unless a client reuses an existing connection (HTTP keep-alive).
If you are experiencing network contention, using a teamed NIC should help you decrease it. You are creating another physical path to the server, but the router or switch will have to know to use it.
For a project I need to read the incoming and outgoing bytes per second of an SQL-Server (2012) instance of an instance or database (doesn't matter). For this I found the following performance counters:
SQL Server, Broker / DBM Transport Object
Receive I/O bytes/sec
Send I/O bytes/sec
When starting SQL Server Management Studio and executing some select statements the values of the performance counters are staying 0. While when I included the client statistics I see that Bytes sent from client and Bytes sent from server are not 0. I'm executing these select statements to a default installed installed on the same pc.
Does anyone how to solve this issue?
Thanks in advance
The documentation explains what SQL Server, Broker / DBM Transport Object measures:
The Broker / DBM Transport performance object contains performance
counters that report networking information for Service Broker and
database mirroring.
There is no performance counter for Transact-SQL traffic. If it helps, the DMV sys.dm_exec_connections will aggregate the traffic size for a connection. If the traffic occurs over a network interface then you could use the system network counters, that is the Network Interface Object. But a local test would not register anything because the connection will use shared memory protocol.
That being said, it is unusual to have to measure SQL Server Transact-SQL network traffic. If the question ever arise, then you're doing it wrong. Network traffic should always be negligible. The dimension everybody is interested is IO, for which there is support in SQL Server, Buffer Manager Object, SQL Server, Databases Object and DMVs like sys.dm_io_virtual_file_stats.
We have an application that uses NHibernate to connect to our database on SQL Server.We use connection pooling and session per request approach to execute our queries over SQL Server.
We used SQL Server Activity Monitor to monitor connections count and noticed there was 25-30 connections involved whenever a user logged in to system.
So here's my question to ask : can large number of connections to SQL Server leads to performance issues?
Each connection to SQL Server requires the allocation of certain amount of memory and so there is a performance consideration in this regard.
In the scheme of things however, 20-30 connections is a very small number.
Have you validated that all connections belong to your application? The reason I ask is because SQL Server itself will establish and maintain a certain number of connections/sessions as part of the servers overall operation.
Some usefull DMV's for you to monitor:
select * from sys.dm_exec_connections
select * from sys.dm_exec_sessions
Session ID's above 51 are from outside of SQL Server so to speak, i.e. user sessions.
Further to comments:
SQL Server 2005 can support up to 32,767 connections. To check your capacity execute:
select ##MAX_CONNECTIONS
If connection pooling is being used then connections will remain open and in a sleep state until required for processing requests. Alternatively, perhaps the application is not closing connections when requests have finished processing.
I can only comment from a SQL Server perspective as I am not familiar with the mechanics of NHibernate.