I can't find anything on this with Google. My SQL Server is on a VM and for some reason the system clock wanders from the Domain time, up to ~30 seconds. This happens randomly 0 to 3 times per week. I have been hounding my VM admin for months about this and he can't seem to find the cause. He has set the server to check with the domain time every 30 minutes but this does not stop the wandering, it just fixes it faster.
Luckily the system only generates a very few transactions per hour so a 30 second time jump is not likely to cause any of the records to be out of order based on the DATETIME fields.
The VM stuff is out of my hands and this has been going on for months so my question is, can changing the system time cause corruption to the SQL files or some other problem I should be keeping an eye out for?
Timekeeping in virtual machines is quite different from physical machines. Basically, on physical machines, the system clock works by counting processor cycles, but a virtual machine can't do it that way. More info here. So what you are seeing is normal behaviour for a VM, it's one of the fundamentals of virtualisation, and although it's annoying there is nothing you can do about it. We run plenty of SQL servers on VMs and yes, the clock jumps when it syncs, but it's never caused an issue to my knowledge.
Related
In my application we storing the created datetime(In UTC) in database.This works correctly while running the application in the local machine,Same application run from azure nearly +2 min difference occurs from the local executed app.
Same issue occurs between Sql server(on-premise) and Azure Sql
a "+2 min difference" sounds like it may be due to differences in the system clocks between the two systems.
Your question doesn't specify the source of the " created datetime(in UTC) "
Is that from a database function, or from your application?
The most likely explanation for the behavior you observe is that system clocks on the two different systems are not synchronized using the same time service.
A four-dollar timex watch keeps better time than the hardware clock in a $4000 server. (I'm surprised the drift is only two minutes.) If you want the clocks on the two systems to match, there needs to be a mechanism to keep them synchronized with each other.
FOLLOWUP
I believe the answer above addressed the question you asked.
You may have some additional questions. I think the question you may be looking for an answer to might be... "How do I configure multiple Windows servers so the system clocks are synchronized?"
Some suggestions
Windows Time Service (Does Microsoft provide a mechanism?)
NTP = Network Time Protocol (Does Azure support NTP?)
time.windows.com (What is the default time source on Azure?)
once a week - (What is the default frequency ...
etc.
I'm running soak tests at the moment and keep coming up against a wierd issue that I've never seen in the past. I've spent quite a while investigating the issue and so far not got to be bottom of it.
At some point during the test (sometimes 1 hour in, other times 4+ hours) the SQL Server machine starts maxing it's CPU. This always corresponds with a sharp decrease in DB cache memory and increase in free memory.
The signs obviously point at memory pressure and it seems that I can sometimes trigger this event by running a particularly heavy query.
I can understand why the plan cache is being flushed however the aspects of this that are confusing me are:
After the plan cache is flushed and my meaty query finishes there is plenty of free memory (even after further increasing the amount of memory SQL Server is allowed) the plan cache doesn't seem to recover. I'm left with loads of free memory which isn't helping anyone.
If I stop my soak test and then re-run it immediatly then things go back to normal, the plan cache grows as expected. SQL Server does not need restarted or to have any settings altered.
After the cache flush the cache hit ratio is still OK-ish, ~90% however this is much lower than the ~99% I am seeing before the flush and really hurting the CPU.
Before the flush a trace of cache misses, inserts and hits looks normal enough. Pre-flush the only issue I see is a non-parameterised ad-hoc query that's being inserted into the cache very frequently however even with this it's a very simple query which has a low cost so would expect these to be flushed from the cache ahead of most other things.
Post flush I'm seeing a very high number of inserts followed immediately by numerous misses on the same object (i.e. stored procedures), and thus memory consumption for the cache remains low.
You can see from the yellow line in the shot of my counters below that the cache memory usage drops off and stays low yet the free memory (royal blue) stays fairly high.
EDIT
After looking into this issue for another good while a pattern that keeps appearing is that if I push the server to it's limit for a short time (adding load above what the soak test is producing) then SQL Server seems to get itself into a mess which it can't recover from on it's own.
The number of connections to the server sharply increases when it hits the point of maximum pressure (I'm assuming due to it not being able to deal with requests quickly enough so new connections are needed to deal with the "constant" flow of requests). This backlog is then placing further pressure on the server which it doesn't appear to be able to recover from.
Now, I'm still puzzled by the metrics. I could accept this as purely a server resource issue if the new connections seemed to be eating up memory, further slowing processing, causing new connections, etc. What I am seeing though is that there is plenty of free memory but SQL Server isn't using it for the plan cache. Because of this it's spending more time compiling, upping CPU and things spiral out of control.
It feels like the connections are the key part of this problem. As mentioned before if I restart the test everything goes back to normal. I've since found that putting the DB into single user mode for a few seconds so that all test related connections die, waiting a few seconds and then going back to multi-user mode resolves the issue. I've tried just killing all active connections based on SPID however it seems there needs to be a pause of a few seconds in order for the server to recover and start using the plan cache properly.
See screenshot below of my counters. I'm trying to push the server over the top up to ~02:33:15 and I set to single user mode at ~02:34:30 and then multi-user mode a few seconds after.
Purple line is user connections, thick red is compilations p/s, bright green is cache memory, aqua connection memory, greyish/brown is free memory.
OK, it's been a long circular road but the best answer I currently have for this is that this issue is due to resource constraints and the unfortunate choices that SQL Server makes in relation to the plan cache for my particular circumstances. I'm not saying SQL Server is wrong, just that for my needs at this time I don't think it's making the right decisions.
I've adjusted my soak test so that if the DB server comes under pressure it pulls on the reigns a bit and drops some connections, until such time that the server comes back under control and the additional connections can be reestablished. The process of SQL Server getting itself back in order can take a few minutes but it does happen!
It seems that the server was getting itself into a vicious cycle, where it was coming under pressure, dropping cached plans and then having to spend more on recompiling these plans later than it gained by dropping them in the first place. This lead to things spiraling out of control and everything grinding to a halt.
In my particular case there is a very high cache hit ratio (above 99.5%) and due to the soak test basically doing the same thing repeatedly for hours for loads of users the cache is very well used. If the cache weren't so well used then SQL Server would have quite possibly made the right choice by dropping plans but I don't think it did here.
We are experiencing seemingly random timeouts on a two app (one ASP.Net and one WinForms) SQL Server application. I had SQL Profiler run during an hour block to see what might be causing the problem. I then isolated the times when the timeouts were occurring.
There are a large number of Reads but there is no large difference in the reads when the timeout errors occur and when they don't. There are virtually no writes during this period (primarily because everyone is getting time outs and can't write).
Example:
Timeout occurs 11:37. There are an average of 1500 transactions a minute leading up to the timeout, with about 5709219 reads.
That seems high EXCEPT that during a period in between timeouts (over a ten minute span), there are just as many transactions per minute and the reads are just as high. The reads do spike a little before the timeout (jumping up to over 6005708) but during the non-timeout period, they go as high as 8251468. The timeouts are occurring in both applications.
The bigger problem here is that this only started occurring in the past week and the application has been up and running for several years. So yes, the Profiler has given us a lot of data to work with but the current issue is the timeouts.
Is there something else that I should be possibly looking for in the Profiler or should I move to Performance Monitor (or another tool) over on the server?
One possible culprit might be the Database Size. The database is fairly large (>200 GB) but the AutoGrow setting was set to 1MB. Could it be that SQL Server is resizing itself and that transaction doesn't show itself in the profiler?
Many thanks
Thanks to the assistance here, I was able to identify a few bottlenecks but I wanted to outline my process to possibly help anyone going through this.
The #1 problem was found to be a high number of LOCK_MK_S entries found from the SQLDiag and other tools.
Run the Trace Profiler over two different periods of time. Comparing durations for similar methods led me to find that certain UPDATE calls were always taking the same amount of time, over 10 seconds.
Further investigation found that these UPDATE stored procs were updating a table with a trigger that was taking too much time. Since a trigger may lock the table while it completes, it was affecting every other query. ( See the comment section - I was incorrectly stating that the trigger would always lock the table - in our case, the trigger was preventing the lock from being released)
Watch the use of Triggers for doing major updates.
I have an application that I'd like to make more efficient - it isn't taxing any one resource enough that I can identify it as a bottleneck, so perhaps the app is doing something that is preventing full efficiency.
The application pulls data from a database on one SQL Server instance, does some manipulation on it, then writes it to a database on another SQL Server instance - all on one machine. It doesn't do anything in parallel.
While the app is running (it can take several hours), none of the 4 CPU cores are maxed out (they hover around 40-60% utilization each), the disks are almost idle and very little RAM is used.
Reported values:
Target SQL Server instance: ~10% CPU utilization, 1.3GB RAM
Source SQL Server instance: ~10% CPU utilization, 300MB RAM
Application: ~6% CPU utilization, 45MB RAM
All the work is happening on one disk, which writes around 100KB/s during the operation, on average. 'Active time' according to task manager is usually 0%, occasionally flickering up to between 1 and 5% for a second or so. Average response time, again according to task manager, moves betweeen 0ms and 20ms, mainly showing between 0.5 and 2ms.
Databases are notorious for IO limitations. Now, seriously, as you say:
The application pulls data from a database on one SQL Server instance,
does some manipulation on it, then writes it to a database on another
SQL Server instance - all on one machine.
I somehow get the idea this is a end user level mashine, maybe a workstation. Your linear code (a bad idea to get full utilization btw, as you never run all 3 parts - read, process, write - in parallel) will be seriously limited by whatever IO subsystem you have.
But that will not come into play as long as you can state:
It doesn't do anything in parallel.
What it must do is do things in parallel:
One task is reading the next data
One task does the data processing
One task does the data writing
You can definitely max out a lot more than your 4 cores. Last time I did something like that (read / manipulate / write) we were maxing out 48 cores with around 96 or so processing threads running in parallel (and a smaller amount doing the writes). But a core of that is that your application msut start actually using multiple CPU's.
If you do not parallelize:
You only will max out one core max,
YOu basically waste time waiting for databases on both ends. The latency while you wait for data to be read or committed is latency you are not processing anything.
;) And once you fix that you will get IO problems. Promised.
I recommend reading How to analyse SQL Server performance. You need to capture and analyze the wait stats. These will tell you what is the execution doing that prevents it from going max out on CPU. You already have a feeling that the workload is causing the SQL engine to wait rather than run, but only after you understand the wait stats you'll be able to get a feel what is waiting for. Follow the article linked for specific analysis techniques.
We have an SQL server with about 40 different (about 1-5GB each) databases. The server is an 8 core 2.3G CPU with 32Gigs of RAM. 27Gig is pinned to SQL Server. The CPU utliziation is mostly close to 100% always and memory consumption is about 95%. The problem here is the CPU which is constantly close to 100% and trying to understand the reason.
I have run an initial check to see which database contributes to high CPU by using - this script but I could not substantiate in detail on whats really consuming CPU. The top query (from all DBs) only takes about 4 seconds to complete. IO is also not a bottleneck.
Would Memory be the culprit here? I have checked the memory split and the OBJECT CACHE occupies about 80% of memory allocated (27G) to SQL Server. I hope that is normal provided there are lot of SPs involved. Running profiler, I do see lot of recompiles, but mostly are due to "temp table changed", "deferred compile" etc and am not clear if these recompiles are a result of plans getting thrown out of cache due to memory pressure
Appreciate any thoughts.
You can see some reports in SSMS:
Right-click the instance name / reports / standard / top sessions
You can see top CPU consuming sessions. This may shed some light on what SQL processes are using resources. There are a few other CPU related reports if you look around. I was going to point to some more DMVs but if you've looked into that already I'll skip it.
You can use sp_BlitzCache to find the top CPU consuming queries. You can also sort by IO and other things as well. This is using DMV info which accumulates between restarts.
This article looks promising.
Some stackoverflow goodness from Mr. Ozar.
edit:
A little more advice...
A query running for 'only' 5 seconds can be a problem. It could be using all your cores and really running 8 cores times 5 seconds - 40 seconds of 'virtual' time. I like to use some DMVs to see how many executions have happened for that code to see what that 5 seconds adds up to.
According to this article on sqlserverstudymaterial;
Remember that "%Privileged time" is not based on 100%.It is based on number of processors.If you see 200 for sqlserver.exe and the system has 8 CPU then CPU consumed by sqlserver.exe is 200 out of 800 (only 25%).
If "% Privileged Time" value is more than 30% then it's generally caused by faulty drivers or anti-virus software. In such situations make sure the BIOS and filter drives are up to date and then try disabling the anti-virus software temporarily to see the change.
If "% User Time" is high then there is something consuming of SQL Server.
There are several known patterns which can be caused high CPU for processes running in SQL Server including