SQL Server long running query taking hours but using low CPU - sql-server

I'm running some stored procedures in SQL Server 2012 under Windows Server 2012 in a dedicated server with 32 GB of RAM and 8 CPU cores. The CPU usage is always below 10% and the RAM usage is at 80% because SQL Server has 20 GB (of 32 GB) assigned.
There are some stored procedures that are taking 4 hours some days and other days, with almost the same data, are taking 7 or 8 hours.
I'm using the least restrictive isolation level so I think this should not be a locking problem. The database size is around 100 GB and the biggest table has around 5 million records.
The processes have bulk inserts, updates and deletes (in some cases I can use truncate to avoid generating logs and save some time). I'm making some full-text-search queries in one table.
I have full control of the server so I can change any configuration parameter.
I have a few questions:
Is it possible to improve the performance of the queries using
parallelism?
Why is the CPU usage so low?
What are the best practises for configuring SQL Server?
What are the best free tools for auditing the server? I tried one
from Microsoft called SQL Server 2012 BPA but the report is always
empty with no warnings.
EDIT:
I checked the log and I found this:
03/18/2015 11:09:25,spid26s,Unknown,SQL Server has encountered 82 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [C:\Program Files\Microsoft SQL Server\MSSQL11.HLSQLSERVER\MSSQL\DATA\templog.ldf] in database [tempdb] (2). The OS file handle is 0x0000000000000BF8. The offset of the latest long I/O is: 0x00000001fe4000

Bump up max memory to 24 gb.
Move tempdb off the c drive and consider mult tempdb files, with auto grow at least 128 Mbps or 256 Mbps.
Install performance dashboard and run performance dashboard report to see what queries are running and check waits.
If you are using auto grow on user data log and log files of 10%, change that to something similar to tempdb growth above.
Using performance dashboard check for obvious missing indexes that predict 95% or higher improvement impact.
Disregard all the nay Sayers who say not to do what I'm suggesting. If you do these 5 things and you're still having trouble post some of the results from performance dashboard, which by the way is free.
One more thing that may be helpful, download and install the sp_whoisactive stored proc, run it and see what processes are running. Research the queries that you find after running sp_whoisactive.

query taking hours but using low CPU
You say that as if CPU would matter for most db operations. HINT: They do not.
Databases need IO. RAM sin some cases helps mitigate this, but at the end it runs down to IO.
And you know what I see in your question? CPU, Memory (somehow assuming 32gb is impressive) but NO WORD ON DISC LAYOUT.
And that is what matters. Discs, distribution of files to spread the load.
If you look into performance counters then you will see latency being super high on discs - because whatever "pathetic" (in sql server terms) disc layout you have there, it simply is not up to the task.
Time to start buying. SSD are a LOT cheaper than discs. You may say "Oh, how are they cheaper". Well, you do not buy GB - you buy IO. And last time I checked SSD did not cost 100 times the price of discs - but they have 100 times or more the IO. and we talk always of random IO.
Then isolate Tempdb on separate SSD - tempdb either does no a lot or a TON and you want to see this.
Then isolate the log file.
Make multiple data files, for database and tempdb (particularly tempdb - as many as you have cores).
And yes, this will cost money. But at the end - you need IO and like most developers you got CPU. Bad for a database.

Related

SQL Server 2008 sudden high IO Stall and queries dead in water

SQL Server 2008 Enterprise SP4 0.0.6547.0 x64
Running on Windows 2012R2 patched current.
A VM running on Cisco UCM blades and 6.0 Update 3 plus patches.
A Nimble CS700 SAN for the storage.
This is a large OLTP server with 12 vCPU. Normal CPU usage hovers around 6-11%
What happens is that, without warning, the IO Stall times will go through the roof (2000-1000ms) and most queries will stop returning results. Adam Machanic's sp_whoisactive will show dozens of active queries. CPU is at 90+%.
SAN shows almost zero activity and all other VMs on the same SAN are operating optimally.
We see massive blocking as the stalled processes hold blocks, with some timing out and sleeping with blocks hanging on the SPID. Killing the SPIDs in question provides temporary relief, but seconds later we are right back where we started.
The only thing that provides relief is a reboot of the server.
Management is rightly demanding an actual root cause. When this happened last summer, with visibility to the CEO level, we engaged Microsoft support, who were dumbfounded and offered no actual root cause.
What I can't do is upgrade the SQL server. The machine hosts a packaged application and the package publisher refuses to support their software if we implement any newer SQL Server version. I desperately want to go to 2014/2016/2017, and would feel that it would solve this problem and others.
In any event, I searched the bug reports and did not see anything that matched.
Has anyone run into this issue? If so did you suss out a root cause? I have a gut feel that there is a bug in either SQL 2008, Windows 2012R2 or how they interact. But I don't want to write that into the RCA without having some corroboration.
Would appreciate any pointers.
Here is my approach
1.) Try eliminate storage issues.We once had a storage issue(SAN) and root cause seemed to be some HBA.You can further check if your storage is performing with in acceptable limits
You should start with below counters and see if they are less than 15ms
Avg. Disk sec/Read - is the average time, in seconds, of a read of data from the disk.
Avg. Disk sec/Write - is the average time, in seconds, of a write of data to the disk.
There is more info here :https://www.mssqltips.com/sqlservertip/2460/perfmon-counters-to-identify-sql-server-disk-bottlenecks/
2.) Once you have eliminated storage issues, you can further check if SQLSERVER is the only causing IO spikes or if there are any other applications causing IO.You can use resource monitor to find this
3.) If you have reached here, SQLSERVER may be culprit..Go with below steps and try following same sequence and see if problem persists after each step.
Remember HIGH IO can be caused due to
Stale stats and missing indexes:You might not be updating stats regularly or some type of queries might need more frequent index rebuilds/stats update
gather queries causing HIGH IO and try tuning them,you can observe number of reads done and try adding indexes to minimize number of reads
Further Check memory pressure,some times high memory usage can cause Buffer pool flush and there by queries will go to disk..You can look for a counter called PLE and see what is good for your environment
Further research pointed to VMWare. The machine was allocated 304GB of RAM, 264GB of which was assigned to SQL Server. However the underlying host was overcommitted on RAM by a large amount. We suspect thrashing as page life drops, and as other VMs also need real RAM.
Thanks
John.

Dynamics GP 2010 Awful Report Performance

We are running Dynamics GP 2010 on 2 load balanced citrix servers. For the past 3 weeks we have had severe performance hits when users are running Fixed Assets reporting.
The database is large in size, but when I run the reports locally on the SQL server, they run great. The SQL server seems to be performing adequately even when users are seeing slow performance.
Any ideas?
Just because your DB seems un-stressed, it does not mean that it is fine. It could contain other bottlenecks. Typically, if a DB server is not maxing-out its CPUs occasionally, it means there is a much bigger problem.
Standard process for troubleshooting performance problems on a data driven app go like this:
Tune DB indexes. The Tuning Wizard in SSMS is a great starting point. If you haven't tried this yet, it is a great starting point.
Check resource utilization: CPU, RAM. If your CPU is maxed-out, then consider adding/upgrading CPU or optimize code or split your tiers. If your RAM is maxed-out, then consider adding RAM or split your tiers.
Check HDD usage: if your queue length goes above 1 very often (more than once per 10 seconds), upgrade disk bandwidth or scale-out your disk (RAID, multiple MDF/LDFs, DB partitioning).
Check network bandwidth
Check for problems on your app (Dynamics) server
Shared report dictionaries are the bane of reporting in GP. they do tend to slow things down. also, modifying reports becomes impossible as somebody has it open all the time.
use local report dictionaries and have a system to keep them synced with a "master" reports.dic

Performance tuning of ERP System called axavia

I'm working here in a small company and one of my jobs is the administration of the ERP system 'AXAVIA' (www.axavia.com)
There are .NET Clients and a MSSQL Server 2005 Database with a size of about 10GB.
The system works on a metadata model, this means they have very few tables (one for each datatype and some for the relations) and this data is computed with adhoc queries. Up to 2000 batches / sec...
I guess they don't really hava a database specialist, because the didn't know anything about index fragmentation and i allready deleted a lot of unused indexes - now the db is about 30% smaller...
What else can i do for more performance?
- I rebuild now the indexes every night
I think, there are no 'missing indexes' and also the primary keys are at least 'ok'
The filesystem is a fast 10 raid - and with 6,6 GB Ram there is very little IO
The Server is a VM Ware with one virtual CPU - here i guess is the beste possibility: The huge ammount of small batches would benefit from a phyical cpu with 4 cores?!
I'm also thinking about partitioned tables, but in the moment the database isn't big enough to benefit much from this.
So - any other ideas?
Add a CPU, at lesat for test. I Would say you likely run into a problem here. Generally - and I mean really in general - I never have one core VMS anymore. Even the smallest machine has 2 cores. Makes thigns a lot faster even on windows level (OS operations ahppen on the second core).
10gb is tiny today. Still there is no database crappy programming can not kill (and it is likely in your case that is a lot of crappy programming going on, from your explanations). Start a full analysis of why things are waiting. If they are just hitting he server with a lot of sequential SQL for any operation the only thing you can do is make sure (a) you have as little waits as possible and (b) you have as fast a CPU as possible. In a sdatabase like you describe it the problem is seriously in the program - and basically there is only so much you can tune down at the database level.
If not already, have your data and log files on seperate drives. You can also move your tempdb to it's own drive, and also split it into multiple files. Read Brent's piece on tempdb here: Brent Ozar
I suggest you to use Glenn Berry's script to determine troubles in your server:
https://dl.dropboxusercontent.com/u/13748067/SQL%20Server%202005%20Diagnostic%20Information%20Queries(September%202014).sql
There are many another potential problems, not only missing indexes.
I was used this script as knowledge database to create my own tool to check my ERP health. And I can tell you it works well.

SQL Server Performance Problem

Our primary database server is an 8 core box with 8GB of RAM. The CPU is a Xeon E7330 # 2.4GHz. It runs Windows Server 2003 R2 (x64 edition) and SQL Server 2005
I wanted to do some testing so I set up SQL Server 2005 on another brand-new server which is an 8 core box with 4 GB of RAM. It has a Xeon X5460 # 3.16GHz and runs Windows Server 2003 R2 Standard. I Installed SQL Server 2005 out of the box and restored a backup of the primary database on to it, and did an UPDATE STATISTICS on all the tables.
The process I was testing executes the same stored proc many times. I was astounded to find from the profiler that this proc which executes with duration=0 or 1 on the primary server, was consistently executing with durations in excess of 130. This essentially makes the secondary server useless for testing, because it's just too slow.
No other apps run on either of these two boxes, just SQL server. And unlike the primary database server, the test server only had me accessing it.
I can't believe the difference in spec between these two machines explains this colossal difference in performance. Can anybody suggest any settings I may need to change?
Updates in answers to questions:
Second server is 32 bit Windows
I'm inquiring now about the disk arrays and how comparable they are
On the primary server, the data and logs are on the same drive (!) and it works fine
Looking in task manager on the test server, the CPU is running at like 10%, only one core even showing activity
Task manager on the test server (4GB RAM) shows "PF Usage 2.01GB" with SQL Server running. On the primary server (8GB RAM) it shows "PF Usage 6.67GB". How would I make SQL Server on the test box use more of the RAM? Maybe that would make a difference
Another update:
The primary server has a RAID-5 with 15,000 RPM drives. The test box has a RAID-5 with 10,000 RPM drives.
32 bit OS means 2 GB Virtual Address Space for your processes. Standard edition OS mean no AWE extensions either. So your test machine will be severely RAM deprived compared with the production one. Your buffer pool will suffer from premature eviction of the pages, your execution plans will not have the option to choose hash-joins for a lot of queries and so on and so forth. I doubt this explains the entire difference, I'm sure there must be something more at play. You say only 10 CPU usage during the query, is your MAXDOP setting 1 by any chance on the test server? Have you compared the output of sp_configure on the two machines? (make sure you enable 'advanced options' too).
Can you run the same problem query on the two machines, from a SSMS query window, with SET STATISTICS IO ON and SET STATISTICS TIME ON? Run it 2-3 times on each and write down the results. Does it show the same number of logical reads but vastly different number of physical reads? This would point to the RAM being insufficient to cache the needed pages. IS the number of logical reads very different? It probably means you get a bad execution plan on test.
Is the query write intensive by any chance? If so did you pre-grow the test database or is your execution blocked by log growth and database growth events?
There are plenty of places to look at to narrow down the issue, like SQL performance counters, sys.dm_os_wait_stats, check the sys.dm_exec_requests wait_type and wait_resource.
was the data in the memory cache yet? or was it all read from disk
You either have a different plan being generated or some hardware differences. For hardware you can check the disk seconds/[read,write] (edit to clarify - you do this in perfmon) and see if you have some massive differences from caching (e.g. high perf raid controller).
For the plan difference just check out the execution plans.
Also do set statistics io on and see if you are getting physical reads instead of logical reads. Maybe the mem difference is keeping your dataset from fitting in memory in secondary but not primary machine.
Although you may not be able to use AWE on your 32-bit server, you can provide SQL Server with a little more memory by adding the /3GB switch to the boot.ini file. Check out Books Online, it should give you more information.

SQL Server 2005 Memory Pressure and tempdb writes problem

We are having some issues with our production SQL Server.
Server: Dual Quad Core Xeon
8 GB RAM
Single RAID 10 Array
Windows 2003 Server 64-bit
SQL Server 2005 Standard 64-Bit
There is about 250MB of free RAM on the machine right now. SQL Server has around 6GB of RAM, and our monitoring software says that only half of the SQL Server allocated RAM is actually being used.
Our main database is approximately 20GB, with about 12GB being used with any frequency. Our tempdb is at 700MB. Both are located on the same physical disk array.
Additionally, using Filemon, I was able to see that the tempdb file had 100's or 1000's of writes of length 65536. Disk queue length was over 100 nearly 80% of the time.
So, here are my questions-
What would cause all those writes on the tempdb? I'm not sure if we have always had that much activity, but it seems excessive and these problems are recent.
Should I just add more memory to the server?
On high load servers, should tempdb and db files be located on separate arrays?
A high disk queue length does not mean you have an I/O bottleneck if you have a SAN or NAS, you may want to look at other additional counters. Check out SQL Server Urban Legends discussed for more details.
1: The following operations heavily utilize tempdb
Repeated create and drop of temporary tables (local or global)
Table variables that use tempdb for storage purposes
Work tables associated with CURSORS
Work tables associated with an ORDER BY clause
Work tables associated with an GROUP BY clause
Work files associated with HASH PLANS
These SQL Server 2005 features also use tempdb heavily:
row level versioning (snapshotisolation)
online index re-building
As mentioned in other SO answers read this article on best practice for increasing tempdb performance.
2: Looking at the amount of free RAM on the server i.e. looking at the WMI counter Memory->Available Mbytes doesn't help as SQL Server will cache data pages in RAM, so any db server that's running long enough will have little free RAM.
The counters you should look at that are more meaningful in telling you if adding RAM to the server will help are:
SQL Server Instance:Buffer Manager->Page Life Expectancy (in seconds)
A value below 300-400 seconds will mean that Pages are not in memory very long and data continually is being read in from disks. Servers that have a low page life expectancy will benefit from additional RAM.
and
SQL Server Instance:Buffer Manager->Buffer Cache hit Ratio
This tells you the percentage of pages that were read from RAM that didn't have to incur a read from disk, a cache hit ratio lower then 85 will mean that the server will benefit from additional RAM
3: Yes, can't go wrong here. Having tempdb on a separate set of disks is recommended. Look at this KB article under the heading: Moving the tempdb database on how to do this.
Yes, the recommendation on high load servers is to put TempDB on a separate set of drives from the user databases:
SQL Server 2005 Books Online: Optimizing tempdb Performance
Not directly an answer on your question but this might be a good tip: Restarting your SQL Server instance will clear the tempdb, this might be a good start when investigating the actions which are done on the tempdb.
Excellent question, +1
tempdb is used far more heavily in SQL 2005+.
At least: Snapshot isolation levels, online index rebuild, reading INSERTED/DELETED in triggers(used to read the log file!)
This in addition to the usual order by clauses, temp tables etc.
You'd probably be better splitting your log and data files (also for recoverability).
More memory is always good but see this 64 bit specific stuff, Grumpy Old DBA below.
Finally, and maybe most important probably, you can have contention of space allocation in tempdb:
Explanations from Linchi Shea and SQL Server storage team
Late edit:
Paul Randall added an entry "Comprehensive tempdb blog post series" which offers good links
Writes to the tempdb can be anything. Internal hash tables, temp tables, table variable, stored procedure calls, etc.
If you only have 250 Megs of free RAM, then yes more RAM would be good.
It is always recommended that you split tempdb and user databases to different disks.
All writes to the tempdb will be 64k in size as that's the size of each database extent.

Resources