Not so long time ago, I found out my MS SQL server sends data in packets by 590 bytes, fragmenting large responses in about of 30-50 separated packets, as displayed on this Wireshark screenshot.
I tried to change max packet size to 1500, but it has no effect on actual size of tcp packets even after rebooting the server. I tried to carry out some diagnostics and what i found out:
The problem isn't in clients or server's MTU.
There is no devices with MTU smaller than 1500 bytes on packets path.
The ping -l -f 1472 command from server to any client is succesfull.
TCP-connections between server and clients almost always have window size in 1024 bytes.
So, a question - why max packet size does not affect on actual network packets size? im out of ideas about incorrect hardware or OS setup. Probably, the problem in the MS SQL Server itself? Has anyone encountered a similar problem?
The server running Windows Server 2008 R2 Enterprise.
Clients - Windows 7.
Related
I am having a real big headache with slow SQL query. I have two connection string which are as follows :
ConnectString1 = "Driver={SQL Server};Server=MSSQLSERVER5;Database=SchoolMain;Uid=Admin;Pwd=admin101;"
and
ConnectString2 = "Provider=SQLOLEDB.1;Persist Security Info=False;User ID=Admin;Password=admin101;Initial Catalog=SchoolMain;Data Source=192.168.1.2,1433"
Both are connecting to a SQL Server database instance on the same system. The first connects using the database instance name, while second connects using IP address (over the internet to the system) and port number.
The query with ConnectString1 is very first, takes less than 2 seconds to execute while query using ConnectString2 is extremely very slow and most times comes back with timeout expired error.
I have searched everywhere on the internet and still cannot find where the issue is. I read about turning off the LLMNR protocol and adding entries in the hosts file to tackle Reverse DNS, followed the steps but it's still the same as query with ConnectString2 is still very slow.
Though when I changed the IP address in ConnectString2 from 192.168.1.2 to 127.0.0.1, the query works very fast, just exactly as it is with ConnectString1.
Is there a way to route all IP address to 127.0.0.1 on the machine?
I need ConnectString2 to work query will be pushed over the IP address from other systems outside the LAN.
Note: I am using SQL Server 2008
Please help.
The main reason this may happen is reverse DNS, meaning the sql needs to translate the IP to a physical address and it takes time.
To fix this problem use your host file to add the address to your machine and than use the name given at the host file instead of the IP.
If you do not want to temper with the host file try to look at the alternate solution to enable the TCP/IP which is disabled by default.
The solution is an ASP.NET MVC application using E/F hosted in IIS on a Windows Server 2012 R2 Standard VM hosted in a Hyper-V environment. The same VM is running SQL Server 2012.
The hosting environment is hosting 30 other solutions and there is plenty of free disk space and no known disk problems with hosting environment or VM (chkdsk and sfc has been run on VM and did not report any problems).
The problem is that the solution/server stops working for short periods of 5-1o minutes and every time we see event ID 508/533 from ESENT and a message about writing to "C:\Windows\system32\LogFiles\Sum".
A similar message has been seen with sqlsvr but this was solved by giving everyone all rights to C:\Windows\system32\LogFiles\Sum.
When the problem persists, it affects the whole VM and sometimes it is no even possible to connect via remote desktop.
We have seen a high number of open SQL Server connections when the problems occurs and prior to introducing caching for a specific Web API method we were actually able to empty the SQL Server connection pool. Just in case we have changed the connection pool from 100 to 200 connections even though we have not seen this particular problem since we introduced the cache.
All DbContext instances are disposed by "using", an ApiController.Dispose override or a Controller.Dispose override and only one SqlConnection are used (for the logging system).
I suspect the problem to be outside the solution and that the high number of SQL Server connections are related to the fact that SQL Server is unable to write to the disk.
Below is some Windows Event Log excerpts for three recent "break downs" with some additional info about the number of web request prior to the problem and after the server has automatically recovered.
Any suggestions?
web requests during the 10 minutes right before the problem: 1399
web requests during the first 10 minutes after the server has recovered: 1630
18-03-2015 20:07:20 833 MSSQLSERVER
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Xxx.mdf] in database [Xxx] (5). The OS file handle is 0x0000000000000A7C. The offset of the latest long I/O is: 0x000003e104e000
18-03-2015 20:07:40 833 MSSQLSERVER
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Xxx_log.ldf] in database [Xxx] (5). The OS file handle is 0x0000000000000A8C. The offset of the latest long I/O is: 0x0000007f203000
18-03-2015 20:08:16 533 ESENT
svchost (1740) A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 1806336 (0x00000000001b9000) for 4096 (0x00001000) bytes has not completed for 36 second(s). This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.
18-03-2015 20:17:14 508 ESENT
svchost (1740) A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 1806336 (0x00000000001b9000) for 4096 (0x00001000) bytes succeeded, but took an abnormally long time (36 seconds) to be serviced by the OS. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.
web requests during the 10 minutes right before the problem: 696
web requests during the first 10 minutes after the server has recovered: 614
19-03-2015 01:17:19 533 ESENT
svchost (1740) A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 3067904 (0x00000000002ed000) for 4096 (0x00001000) bytes has not completed for 36 second(s). This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.
19-03-2015 01:33:02 508 ESENT
svchost (1740) A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 3067904 (0x00000000002ed000) for 4096 (0x00001000) bytes succeeded, but took an abnormally long time (983 seconds) to be serviced by the OS. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.
19-03-2015 01:33:03 833 MSSQLSERVER
SQL Server has encountered 5 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Xxx_log.ldf] in database [Xxx] (5). The OS file handle is 0x0000000000000A8C. The offset of the latest long I/O is: 0x000000a389d000
web requests during the 10 minutes right before the problem: 555
web requests during the first 10 minutes after the server has recovered: 784
19-03-2015 03:33:51 833 MSSQLSERVER
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\Xxx_log.ldf] in database [Xxx] (5). The OS file handle is 0x0000000000000A8C. The offset of the latest long I/O is: 0x000000aa95f000
19-03-2015 03:40:48 533 ESENT
svchost (1740) A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 3846144 (0x00000000003ab000) for 4096 (0x00001000) bytes has not completed for 36 second(s). This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.
19-03-2015 03:40:48 833 MSSQLSERVER
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\MSDBLog.ldf] in database [msdb] (4). The OS file handle is 0x0000000000000A90. The offset of the latest long I/O is: 0x00000000108000
19-03-2015 03:40:49 508 ESENT
svchost (1740) A request to write to the file "C:\Windows\system32\LogFiles\Sum\Svc.log" at offset 3846144 (0x00000000003ab000) for 4096 (0x00001000) bytes succeeded, but took an abnormally long time (36 seconds) to be serviced by the OS. This problem is likely due to faulty hardware. Please contact your hardware vendor for further assistance diagnosing the problem.
19-03-2015 03:40:49 17894 MSSQLSERVER
Dispatcher (0x1a88) from dispatcher pool 'XE Engine main dispatcher pool' Worker 0x00000000F03B8160 appears to be non-yielding on Node 0. Approx CPU Used: kernel 0 ms, user 0 ms, Interval: 336140.
Disk I/O problems was my initial thought but the "funny" thing is that it actually never has happened during peak hours and that the server during peak hours is not stressed on CPU or disk I/O.
I cannot find any VM disk errors. I have no access to the hosting environment but I am told that there are no disk problems. The hosting environment is performing VM backups and if this is the problem, there is nothing to do about it, as it is required. I might try to have the VM moved to another disk but I do not know if this is possible.
Currently we have set up some detailed disk I/O monitoring on the VM and hopefully this will give us some information about the problem but I rather doubt it.
Maybe the VM is just "sick" and the next step might be to create a new one from scratch…
It sounds like your disk is just plain overloaded, since I/Os are taking so long. Ideally they should take around 10 milliseconds. Instead, they're taking over 1000x that long.
Since you're running in a VM, though, tracking down the problem can be a bit more tricky. Is it due to the I/O load in the virtual machine, or on the host? Your VM disk may be shared with other I/O load of the host.
Can you move the database to a different volume in the VM, hosted on a different physical spindle of the host?
Another possibility is that the underlying storage is going bad, and the I/Os are being retried by the underlying hardware.
-martin
I'm trying to establish TCP connection with PostgreSQL 9.1 server via Microsoft telnet. but When the connection has been establishe I received
Jconnector 3.6 1 ♥
What does it mean? Is it opssible at all to establish such connection manually to communicate the database via TCP?
When a TCP connection to a port is opened, what is listening on the Port sometimes announces itself: in this case, what is listening is Jconnector 3.6.1. The heart-shape is some binary data.
TCP connections tend to be only used by program code, as 'conversations' at that level quite often involve binary data. I don't know what Postgres does, but if you get it to run a select and return the data it will very likely be all binary and quire unreadable by a human.
If you search for what the wire-protocol is for Postgres, you may be able to make it do something via a telnet session, but expect you'd have a lot of difficulty.
For some time now our flagship application has been having mysterious errors. The error message is the generic
[DBNETLIB][ConnectionWrite (send()).]General network error. Check your network documentation.
This is reliably reproduced by leaving the app open for the night and resuming work in the morning. Since it's a backend server app this is a normal scenario.
The funny thing is - we've migrated from SQL Server 7 to 2000 to 2008 and the issue is present on all of them. But what seems to matter is the OS on which we run the app. On WinXP it works fine, on Vista/7 it fails. So the problem is at the client end.
The results of Google on the error message cover a very wide spectrum of different causes (since this is a very generic error) and none of the scenarios found there are similar to ours.
So perhaps someone around here will know what the problem is in our case?
You should be able to reproduce this error condition on demand by:
1. Opening a database connection (in your client application)
2. Unplugging the network cable
3. Plugging network cable back in (wait until the network connection is restored)
4. Using the previously opened connection to query the database
As far as I can tell from experience, client side ADO code is not able to consistently determine if an underlying network connection is actually valid or not. Checking if the database connection is open (in the client code) returns true. However, performing any operations on that connection results in a General network error.
The connection pool appears to be able to determine when a connection goes 'bad' so it never returns a bad connection to the application. It simply opens a new connection instead.
So, if a database connection is kept alive for a long time (used or unused) by the application, the underlying TCP/IP connectivity can get broken.
The bottom line is that database connections should be closed and returned back to the connection pool when not in use.
Edit
Also, depending on the number of clients connecting to the db, not using the connection pool can cause another issue. You may hit the maximum number of sockets open on the server side. This is from memory. Once a connection is closed on the client side, the connection on the server goes into a TIME_WAIT state. By default, the server socket takes about 4 minutes to close, so it is not available to other clients during that time. The bottom line is that there is a limited number of available sockets on the server. Keeping too many connections open can create a problem.
One project I worked on easily hit this socket limit with around 120 users. A new 'feature' was added that absolutely hammered the server, and after a few hours of using the app, things would suddenly slow to a crawl for everyone. SQL server was not closing enough sockets in time for new connection requests. Although there are 65K sockets altogether, only the first 5000 are made available to the ADO (this is a default registry setting thing, so can be changed).
The number of sockets in TIME_WAIT state would slowly build up until the OS would not allocate any more. So clients had to wait until server side sockets closed and a new connection could then be created.
Have you tried disabling SNP/TCP Chimneying?
Had a similar error. For me it was indirectly caused by mismatched calls to WSACleanup and WSAStartup.
The program called WSACleanup more times than WSAStartup. This would cause a reference counter (somewhere in the sockets library) to reach zero too early.
I think effectively from that moment on all sockets owned by the process are broken.
And this would also kill the SQL client since it uses sockets to 'talk' to the SQL server as well.
SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [T:\MSSQL\DATA\%file_name%] in database [%DB_name%] (2). The OS file handle is 0×00000838. The offset of the latest long I/O is: 0×000000ebdc0000
Has anyone encountered and solved this?
Please see this - you may have IO issues - and physical drive issues
http://blogs.msdn.com/chrissk/archive/2008/06/19/i-o-requests-taking-longer-than-15-seconds-to-complete-on-file.aspx