Database Network Latency - database

I am currently working on an n-tier system and battling some database performance issues.
One area we have been investigating is the latency between the database server and the application server. In our test environment the
average ping times between the two boxes is in the region of 0.2ms however on the clients site its more in the region of 8.2 ms. Is that
somthing we should be worried about?
For your average system what do you guys consider a resonable latency and how would you go about testing/measuring the latency?

Yes, network latency (measured by ping) can make a huge difference.
If your database response is .001ms then you will see a huge impact from going from a 0.2ms to 8ms ping. I've heard that database protocols are chatty, which if true means that they would be affected more by slow network latency versus HTTP.
And more than likely, if you are running 1 query, then adding 8ms to get the reply from the db is not going to matter. But if you are doing 10,000 queries which happens generally with bad code or non-optimized use of an ORM, then you will have wait an extra 80seconds for an 8ms ping, where for a 0.2ms ping, you would only wait 4 seconds.
As a matter of policy for myself, I never let client applications contact the database directly. I require that client applications always go through an application server (e.g. a REST web service). That way, if I accidentally have an "1+N" ORM issue, then it is not nearly as impactful. I would still try to fix the underlying problem...

In short : no !
What you should monitor is the global performance of your queries (ie transport to the DB + execution + transport back to your server)
What you could do is use a performance counter to monitor the time your queries usually take to execute.
You'll probably see your results are over the millisecond area.
There's no such thing as "Reasonable latency". You should rather consider the "Reasonable latency for your project", which would vary a lot depending on what you're working on.
People don't have the same expectation for a real-time trading platform and for a read only amateur website.

On a linux based server you can test the effect of latency yourself by using the tc command.
For example this command will add 10ms delay to all packets going via eth0
tc qdisc add dev eth0 root netem delay 10ms
use this command to remove the delay
tc qdisc del dev eth0 root
More details available here:
http://devresources.linux-foundation.org/shemminger/netem/example.html
All applications will differ, but I have definitely seen situations where 10ms latency has had a significant impact on the performance of the system.

One of the head honchos at answers.com said according to their studies, 400 ms wait time for a web page load is about the time when they first start getting people canceling the page load and going elsewhere. My advice is to look at the whole process, from original client request to fulfillment and if you're doing well there, there's no need to optimize further. 8.2 ms vs 0.2 ms is exponentially larger in a mathematical sense, but from a human sense, no one can really perceive an 8.0 ms difference. It's why they have photo finishes in races ;)

Related

What happens when bandwidth is exceeded with a Microsoft SQL Server connection?

The application used by a group of 100+ users was made with VB6 and RDO. A replacement is coming, but the old one is still maintained. Users moved to a different building across the street and problems began. My opinion regarding the problem has been bandwidth, but I've had to argue with others who say it's database. Users regularly experience network slowness using the application, but also workstation tasks in general. The application moves large audio files and indexes them on occasion as well as others. Occasionally the database becomes hung. We have many top end, robust SQL Servers, so it is not a server problem. What I figured out is, a transaction is begun on a connection, but fails to complete properly because of a communication error. Updates from other connections become blocked, they continue stacking up, and users are down half a day. What I've begun doing the moment I'm told of a problem, after verifying the database is hung, is set the database to single user then back to multiuser to clear connections. They must all restart their applications. Today I found out there is a bandwidth limit at their new location which they regularly max out. I think in the old location there was a big pipe servicing many people, but now they are on a small pipe servicing a small number of people, which is also less tolerant of momentary high bandwidth demands.
What I want to know is exactly what happens to packets, both coming and going, when a bandwidth limit is reached. Also I want to know what happens in SQL Server communication. Do some packets get dropped? Do they start arriving more out of sequence? Do timing problems occur?
I plan to start controlling such things as file moves through the application. But I also want to know what configurations are usually present on network nodes regarding transient high demand.
This is a very broad question. Networking is very key (especially in Availability Groups or any sort of mirroring set up) to good performance. When transactions complete on the SQL server, they are then placed in the output buffer. The app then needs to 'pick up' that data, clear it's output buffer and continue on. I think (without knowing your configuration) that your apps aren't able to complete the round trip because the network pipe is inundated with requests, so the apps can't get what they need to successfully finish and close out. This causes havoc as the network can't keep up with what the apps and SQL server are trying to do. Then you have a 200 car pileup on a 1 lane highway.
Hindsight being what it is, there should have been extensive testing on the network capacity before everyone moved across the street. Clearly, that didn't happen so you are kind of left to do what you can with what you have. If the company can't get a stable networking connection, the situation may be out of your control. If you're the DBA, I highly recommend you speak to your higher ups and explain to them the consequences of the reduced network capacity. Often times, showing the consequences of inaction can lead to action.
Out of curiosity, is there any way you can analyze what waits are happening when the pileup happens? I'm thinking it will be something along the lines of ASYNC_NETWORK_IO which is usually indicative that SQL is waiting on the app to come back and pick up it's data.

Why is the CPU not maxed out?

I have an application that I'd like to make more efficient - it isn't taxing any one resource enough that I can identify it as a bottleneck, so perhaps the app is doing something that is preventing full efficiency.
The application pulls data from a database on one SQL Server instance, does some manipulation on it, then writes it to a database on another SQL Server instance - all on one machine. It doesn't do anything in parallel.
While the app is running (it can take several hours), none of the 4 CPU cores are maxed out (they hover around 40-60% utilization each), the disks are almost idle and very little RAM is used.
Reported values:
Target SQL Server instance: ~10% CPU utilization, 1.3GB RAM
Source SQL Server instance: ~10% CPU utilization, 300MB RAM
Application: ~6% CPU utilization, 45MB RAM
All the work is happening on one disk, which writes around 100KB/s during the operation, on average. 'Active time' according to task manager is usually 0%, occasionally flickering up to between 1 and 5% for a second or so. Average response time, again according to task manager, moves betweeen 0ms and 20ms, mainly showing between 0.5 and 2ms.
Databases are notorious for IO limitations. Now, seriously, as you say:
The application pulls data from a database on one SQL Server instance,
does some manipulation on it, then writes it to a database on another
SQL Server instance - all on one machine.
I somehow get the idea this is a end user level mashine, maybe a workstation. Your linear code (a bad idea to get full utilization btw, as you never run all 3 parts - read, process, write - in parallel) will be seriously limited by whatever IO subsystem you have.
But that will not come into play as long as you can state:
It doesn't do anything in parallel.
What it must do is do things in parallel:
One task is reading the next data
One task does the data processing
One task does the data writing
You can definitely max out a lot more than your 4 cores. Last time I did something like that (read / manipulate / write) we were maxing out 48 cores with around 96 or so processing threads running in parallel (and a smaller amount doing the writes). But a core of that is that your application msut start actually using multiple CPU's.
If you do not parallelize:
You only will max out one core max,
YOu basically waste time waiting for databases on both ends. The latency while you wait for data to be read or committed is latency you are not processing anything.
;) And once you fix that you will get IO problems. Promised.
I recommend reading How to analyse SQL Server performance. You need to capture and analyze the wait stats. These will tell you what is the execution doing that prevents it from going max out on CPU. You already have a feeling that the workload is causing the SQL engine to wait rather than run, but only after you understand the wait stats you'll be able to get a feel what is waiting for. Follow the article linked for specific analysis techniques.

SQL Server 100% CPU Utilization - One database shows high CPU usage than others

We have an SQL server with about 40 different (about 1-5GB each) databases. The server is an 8 core 2.3G CPU with 32Gigs of RAM. 27Gig is pinned to SQL Server. The CPU utliziation is mostly close to 100% always and memory consumption is about 95%. The problem here is the CPU which is constantly close to 100% and trying to understand the reason.
I have run an initial check to see which database contributes to high CPU by using - this script but I could not substantiate in detail on whats really consuming CPU. The top query (from all DBs) only takes about 4 seconds to complete. IO is also not a bottleneck.
Would Memory be the culprit here? I have checked the memory split and the OBJECT CACHE occupies about 80% of memory allocated (27G) to SQL Server. I hope that is normal provided there are lot of SPs involved. Running profiler, I do see lot of recompiles, but mostly are due to "temp table changed", "deferred compile" etc and am not clear if these recompiles are a result of plans getting thrown out of cache due to memory pressure
Appreciate any thoughts.
You can see some reports in SSMS:
Right-click the instance name / reports / standard / top sessions
You can see top CPU consuming sessions. This may shed some light on what SQL processes are using resources. There are a few other CPU related reports if you look around. I was going to point to some more DMVs but if you've looked into that already I'll skip it.
You can use sp_BlitzCache to find the top CPU consuming queries. You can also sort by IO and other things as well. This is using DMV info which accumulates between restarts.
This article looks promising.
Some stackoverflow goodness from Mr. Ozar.
edit:
A little more advice...
A query running for 'only' 5 seconds can be a problem. It could be using all your cores and really running 8 cores times 5 seconds - 40 seconds of 'virtual' time. I like to use some DMVs to see how many executions have happened for that code to see what that 5 seconds adds up to.
According to this article on sqlserverstudymaterial;
Remember that "%Privileged time" is not based on 100%.It is based on number of processors.If you see 200 for sqlserver.exe and the system has 8 CPU then CPU consumed by sqlserver.exe is 200 out of 800 (only 25%).
If "% Privileged Time" value is more than 30% then it's generally caused by faulty drivers or anti-virus software. In such situations make sure the BIOS and filter drives are up to date and then try disabling the anti-virus software temporarily to see the change.
If "% User Time" is high then there is something consuming of SQL Server.
There are several known patterns which can be caused high CPU for processes running in SQL Server including

Is relational database appropriate for soft real-time system?

I'm working on a real-time video analysis system which processes the video stream frame by frame. At each frame it can generate several events which should be recorded and some delivered to another system via network. The system is soft real-time, i.e. message latencies higher than 25ms are highly undesirable, but not fatal.
Are relational databases (specifically, MySQL and Postgres) appropriate as the datastore for such system?
Can I expect the DB to work well when it is installed on its own server and has ~50 25fps streams of single-row SQL inserts coming in over the network?
EDIT: I think in general performance would not be a problem, but I worry about the latency variance. If it will occasionally delay for 1000 ms, that would be very bad.
Oh, and the system runs 24/7 so the DB could grow arbitrarily big. Does that degrade the insert latency?
I wouldn't worry too much about performance when choosing a relational database over another type of datastore, choose the solution that best meets your requirements for accessing that data later. However, if you do choose not only a RDBMS but one over the network then you might want to consider buffering events to a local disk briefly on their way over to the DB. Use a separate thread or process or something to push events into the DB to keep the realtime system unaffected.
Biggest problems are how unpredictable the latency will be and how it never goes down, always up. But modern hardware to the rescue, specify a machine with enough cpu cores. You can count on at least two, getting four is easy. So you can spin up a thread and dedicate one core to the dbase updates, isolating it from your soft real-time code. Now you don't care about the variability in the delays, at least as long as the dbase updates don't take so long that you generate data faster than it can consume.
Setup a dbase server and load it up with fake data, double the amount you think it ever needs to store. Test continuously while you develop, add the instrumenting code you need to measure how it is doing at an early stage in the project.
As I've written, if you queue the rows that need to be saved and save them in an async way (so not to stop the "main" thread) there shouldn't be any problem... BUT!!!
You want to save them in a DB... So someone else will read the rows AT THE SAME TIME they are being written. Sadly it's normally quite difficult to tell to a DB "this work is very high priority, everything else can be stalled but not this". So if someone does:
BEGIN TRANSACTION
SELECT COUNT(*) FROM TABLE
WAITFOR DELAY '01:00:00'
(I'm using T-Sql here... But I think it's quite clear. Ask for the COUNT(*) of the table, so that there is a lock on the table and then WAITFOR an hour)
then the writes could be stalled and go in timeout. In general if you configure everyone but the app to be able only to do reads, these problems shouldn't be present.

Load Testing a Database attached to a Network storage device

We are looking at stress testing a NAS system for a Database, basically was want to see how much abuse it can take and how much it affects the Database performance. here is what we have planed
I have a test Tool I'm building that will kickoff a configurable number of
threads that run a sql query (also configurable, and thinking about
having it able to run multiple quires)
using the SQLIOSim utility to simulate SQL Server activity
Copying very large amounts of Data onto and off of the device (at the same time)
Can anyone think of anything else we could do (that's repeatable) for placing a load on the system.
You'll also want to simulate network conditions between your database and the NAS. As more traffic hits a network, its realizable utilization drops, and this will seriously affect your performance.
By way of example. If you have 50 machines on a 1Gbps network and the network is approaching 100% utilization, packet collisions and retries at the data link layer mean that your effective total transfer is a fraction of the network potential that would be realized if you had only two communicators on the net. Worse yet, as retries increase, so does effective load. You get an ugly feedback loop in the face of peak demand.
There are a number of network traffic simulators and generators out there, though I'm afraid I've never used any of them.
You could look at Pole Position, a generic database performance testing suite.
http://www.polepos.org/
Depending on the goals you wish to achieve with your load testing you may also want to look at using SQLIO & not just SQLIOSim. SQLIOSim is very good for stress testing & simulating SQL Server load & will go from green to red if it detects any IO errors. It's output is a bit cryptic though although KKline gives some insight.
SQLIO is useful if you want to perform one operation continuously such as large random reads, or large sequential writes & anything in between. It will also give you some useful output stats which you can graph & use as a comparison.
You can try IoMeter from Intel

Resources