I have an issue where a full backup on a MSSQL server randomly takes ~2.5 the time to complete. Is there any option in SQL Management Studio or Stored Procedure that would tell me what cause this giant slowdown.
I have many MSSQL server running and that particular one is internal only and backup happen at 5am in the morning while there is no one in the office until 7:30-8:00. Backup is typically taking a steady 14 minutes 20 seconds (plus or less 10 seconds) but once or twice per week it suddenly takes upward of 45 minutes.
The backup size is growing but it's really minor like 40-50 mb per day while the backup size is sitting currently at around 21 gb. All daily transaction are stable in size too. When i have a slow backup the transaction of the previous day is not any different is size than the day before neither the other "normal" days.
The only logs i see simply give me start time and end time of the maintenance plan which is useless as it's the total runtime.
Related
MSSQL V18.7.1
Transaction log on databases is back-upped every hour.
Size from this databaselog is auto-grow with 128Mb max 5Gb
This runs smoothly but sometimes we do get an error in our application:
'The transaction log for database Borculo is full due to 'LOG_BACKUP'
This message we got 8.15AM while on 8.01AM de log-backup was done (and emptied).
I would really like it if I had a script or command to check what caused this exponential growth.
We could backup more often (ever 30 minutes) or change size but the problem is not solved then.
Basically this problem should not occur with the number of transactions we have.
Probably some task is running (in our ERP) which causes this.
This does not happen every day but in the last month this is the 2nd time.
The transaction-log is a back-upped one to get info from. Not the active one.
Can anyone point me in the right direction?
Thanks
An hourly transaction log backup means in case of a disaster you could lose up to an hour's worth of data.
It is usually advised to keep you transaction log backups as frequent as possible.
Every 15 mins is usually a good starting point. But if it is a business critical database consider a transaction log backup every minute.
Also why would you limit the size for your transaction log file? If you have more space available on the disk, allow your file to grow if it needs to grow.
It is possible that the transaction log file is getting full because there is some maintenance task running (Index/Statistics maintenance etc) and because the log file is not backed up for an entire hour, the logs doesn't get truncated for and hour and the file reaches 5GB in size. Hence the error message.
Things I would do, to sort this out.
Remove the file size limit, or at least increase the limit to allow it to grow bigger than 5 GB.
Take transaction Log backups more frequently, maybe every 5 minutes.
Set the log file growth increment to at least 1 GB from 128MB (to reduce the number of VLFs)
Monitor closely what is running on the server when the log file gets full, it is very likely to be a maintenance task (or maybe a bad hung connection).
Instead of setting max limit on the log file size, setup some alerts to inform you when the log file is growing too much, this will allow you to investigate the issue without any interference or even potential downtime for the end users.
I run a database that has a tables with several NTEXT columns that contain short lived data. We would normally only keep a few weeks of data in those tables but in an effort to reduce disk usage this was reduced down to only 72 hours.
While I expected to see some increase in backup performance, I was not expecting such a large increase. The database dropped from 105 GB of storage to 99 GB with the change in data retention. Before the change backups would roughly take 60 minutes. After the change it dropped to 40 minutes to backup.
I assumed that with a 6% reduction of storage that it would be an equal reduction in backup time, but appears to have shaved off a third of the time required for backup.
Because the majority of the data that was removed was NTEXT, does this have a much larger impact on backup performance than other data types?
I have done some searching but I haven't been able to find any connection between the two things.
Edit: I left out that these are full backups running.
I take a transaction log back-up every 30 minutes, but every day at about 05:30 the back-up takes 02:30 and this increases each day. Can anyone help?
From the size of the backup, I would guess that there is a scheduled process occurring between 5am and 5:30am that is generating roughly 100x as many transactions as in any other 30 minute period - possibly some processing over the entire database, which is increasing in complexity (and thus generating more transaction logs) as the database grows.
Check schedule of your maintenance plans. You might be running re-indexing job that time.
I have an agent job set to run log backups every two hours from 2:00 AM to 11:59 PM (leaving a window for running a full or differential backup). A similar job is set up in every one of my 50 or so instances. I may be adding several hundred instances over time (we host SQL Servers for some of our customers). They all backup to the same SAN disk volume. This is causing latency issues and otherwise impacting performance.
I'd like to offset the job run times on each instance by 5 minutes, so that instance one would run the job at 2:00, 4:00, etc., instance two would run it at 2:05, 4:05, etc., instance three would run it at 2:10, 4:10, etc. and so on. If I offset the start time for the job on each instance (2:00 for instance one, 2:05 for instance two, 2:10 for instance three, etc.), can I reasonably expect that I will get my desired result of not having all the instances run the job at the same time?
If this is the same conversation we just had on twitter: when you tell SQL Server Agent to run every n minutes or every n hours, the next run is based on the start time, not the finish time. So if you set a job on instance 1 to run at 2:00 and run every 2 hours, the 2nd run will run at 4:00, whether the first run took 1 minute, 12 minutes, 45 minutes, etc.
There are some caveats:
there can be minor delays due to internal agent synchronization, but I've never seen this off by more than a few seconds
if the first run at 2:00 takes more than 2 hours (but less than 4 hours), the next time the job runs will be at 6:00 (the 4:00 run is skipped, it doesn't run at 4:10 or 4:20 to "catch up")
There was another suggestion to add a WAITFOR to offset the start time (and we should discard random WAITFOR, because that is probably not what you want - random <> unique). If you want to hard-code a different delay on each instance (1 minute, 2 minutes, etc.) then it is much more straightforward to do that with a schedule than by adding steps to all of your jobs. IMHO.
Perhaps you could setup a centralized DB that manages the "schedule" and have the jobs add/update a row when they run. This way each subsequent server can start the job that "polls" when it can start. This way any latency in the jobs will cause the others to wait so you don't have a disparity in your timings when one of the servers is thrown off.
Being a little paranoid I'd add a catchall scenario that says after "x" minutes of waiting proceed anyway so that a delay doesn't cascade far enough that the jobs don't run.
I have a production system running on an SQL Server 2008 DBMS, in Full recovery mode.
I take a full backup each day at midnight, a log backup every two hours, and a differential backup every 6 hours (06:00, 12:00, 18:00, but not midnight as the full is taken then).
In recent days however I've noticed that the diff backup filesize of 18:00 is smaller than the one of 12:00. And sometimes, the one of 12:00 is smaller than the 06:00 ...
I did not experience this behavior until recent days.
Reading from the Microsoft doc, the filesize of a diff backup should always be larger than the previous one, until a new full backup is taken.
Could anyone have a possible explanation what could be causing this ?
Thanks.
I had a similar situation. I had a bit thought over it and reckoned it's caused by multiple backup jobs existing for the database, which means the differential backup was mislead by the wrong backup flag.