SQL Server Agent Job Monitoring (did a Job even start?) - sql-server

I am a BI developer and was entrusted with a task of a DBA. It's about monitoring SQL Server Agent jobs respectively getting information about the following key figures, which are to be prepared after all frontend:
Failed Server Agent Jobs and Jobsteps
Run-time of jobs and Jobsteps
Find out if a job has even started
Memory allocation: evaluation and monitoring of the moving and stored data volumes
The goal is to monitor and, if necessary, provide an indication of any discrepancies in the key figures.
I got the first two points. In each case I show in a table whether a job or whether steps are enabled/disabled. I have also recorded the terms of each step and thresholds can be used to warn if critical ranges are reached.
The biggest problems for me are the jobs that did not even start (Point 3.). To my knowledge, these are not recorded in the MSDB tables. I would like to know when a job or jobstep has not even started. Can you tell me how to get this info? Maybe someone already has a suitable script for this problem ready?
On the subject of memory usage, I'm interested in how much free space is left on the hard disk, how big the partition is and how the consumption changes over time.
On the internet, I could find nothing to the points 3rd and 4th. I would be very grateful for your help! (and forgive me my bad english :) )

I get as a result, when the next run of Job xy is planned. He does not appear in the table.
However, he appears on the SQL Server Agent in the job history. There, however, the information is missing when he should run the next time.
My plan was to take the next_scheduled_run_date column from the sysjobactivity table and then compare a run later to the run_requested_Date column.
But obviously some records are missing in the sysjobactivity table. The other table, which otherwise contains target_start times, is the sysjobschedules.
Unfortunately, this only has the currently scheduled date. I have not found another table that contains a history of the target run_dates.
Of course, one could have manual tables (analogous to the target values) but that would be too much effort.

Related

Best Way to Monitor SQL Server Agent Job Step

What is the best practice to monitor SQL Server job steps?
I have a single job with 50 steps. Sometimes, some of the steps are taking longer to complete. I want to have a monitoring process that notifies me when a step takes longer than usual. The problem is that I'm not able to find any information about job steps while a step is running and before they get completed (at least not in sysjobsteps and sysjobs). In particular, I'm interested about the step_id, step_name and step_start_time.
I would appreciate any ideas.
The system table you're looking for is actually dbo.sysjobhistory, and specifically the run_date, run_time, and run_duration columns:
run_date - Date the job or step started execution. For an In Progress history, this is the date/time the history was written.
run_time - Time the job or step started in HHMMSS format.
run_duration - Elapsed time in the execution of the job or step in HHMMSS format.
I do realize you asked for information while a job step is in process, but as per the Microsoft Books Online:
In most cases the data is updated only after the job step completes and the table typically contains no records for job steps that are currently in progress, but in some cases underlying processes do provide information about in progress job steps.
So only in certain cases will you have access to that information.
Alternatively, I can only think to recommend monitoring the running queries on your server instead, if you want a more realtime approach. You can use tools like Adam Machanic's sp_WhoIsActive to help you accomplish that. You may find it difficult to correlate the job step itself to the query it's executing, unless the query is tagged with an identifiable comment in the beginning, but I also realize this is not ideal either.

How many SQL jobs a sql server can handle?

I am creating a database medical system and then I came to a point where I am trying to create a notification feature and i will use SQL jobs in it, where the SQL job responsibility is to check some tables and the entities that will find it need to be notified for a change in certain data will put their ids in an entity called Notification and a trigger will be called for the app to check that table and send the notificiation.
what I want to ask is how many SQL jobs can a sql server handle ?
Does the number of running SQL jobs in background affect the performance of my application or the database performance in a way or another ?
NOTE: the SQL job will run every 10 seconds
I couldn't find any useful information online.
thanks in advance.
This question really doesn't have enough background to get a definitive answer. What are the considerations?
Do the queries in your ten-second job actually complete in ten seconds, even when your DBMS is under its peak transactional workload? Obviously, if the job routinely doesn't complete in ten seconds, you'll get jobs piling up.
Do the queries in your job lock up tables and/or indexes so the transactional load can't run efficiently? (You should use SET ISOLATION LEVEL READ UNCOMMITTED; as much as you can so database reads won't lock things unnecessarily.)
Do the queries in your job do a lot of rows' worth of inserts and updates, and so swamp the SQL Server transaction logs?
How big is your server? (CPU cores? RAM? IO capacity?) How big is your database?
If your project succeeds and you get many users, will your answers to the above questions remain the same? (Hint: no.)
You should spend some time on the execution plans for the queries in your job, and try to make them as efficient as possible. Add the necessary indexes. If necessary refactor the queries to make them more efficient. SSMS will show you the execution plans and suggest appropriate indexes.
If your job is doing things like deleting expired rows, you may want to build the expiration in your data model. For example, suppose your job does
DELETE FROM readings WHERE expiration_date >= GETDATE()
and your application does this, relying on your job to avoid getting expired readings.
SELECT something FROM readings
You can refactor your application query to say
SELECT something FROM readings WHERE expiration_date < GETDATE()
and then run your job overnight, at a quiet time, rather than every ten seconds.
A ten-second job is not the greatest idea in the world. If you can rework your application so it will function correctly with a ten-second, ten-minute, or twelve-hour job, you'll have a more resilient production system. At any rate if something goes wrong with the job when your system is very busy you'll have more than ten seconds to fix it.

SQL Server Using TableDiff on large tables

We have a process which uses uses SQL Server's amazing tableDiff via:
Microsoft SQL Server\100\COM\Tablediff.exe
It's SQL Server 2008 R2. It connects from one instance to another identical instance. It works very well!
I have a situation where a table which now has 10767594 records is taking 2.5 hours to complete, it only has one table in the job. How can I improve this?
The process is triggered by a Windows Scheduled Task, this calls a .bat file, the .bat file contains the recommended code which has no issue. We have a couple of these in place and have had for some time. It's just the one job that deals with the big table from instance to instance that is taking too long.
I have realised that the source table does have an index but the destination table does not. I will put an index on this table, what else can I do?
Does table diff run better with indexes?
Is there a ways to use table diff more effectively?
E.g. if I capture the lastProcessedID can I run tableDiff next time for all records where id > lastProcessedID?
Any advice would be great. Thank you in advance
EDITED:
MY SOLUTION - This was a very very big surprise. As I mentioned above, the 10 million+ record table which was identical on the source and destination except for 2 indexes (on the source). After waiting for out of hours since this is an internal production server I applied the indexes to the source. Now I run the tableDiff job which has not been changed at all and it completes in under 2 minutes. 2.5 hours to 2 mins!
I have accepted the answer below because it very very helpful. I did go down the Merge Replication path however after setting up replication and publishing I found out that the production instance was not able to be a subscriber due to the replication not be ticked on install. As Jason says its a reasonable amount of research, learning and setting up. Since I am not a DBA and had not looked at this before it was a worth while experience.
The performance issue is because the remote queries pull every record from each place to do the comparison to generate the output. Indexes can help slightly to make the pull a little faster from each location, but it's not likely to be significant.
An incremental approach is definitely better. I don't believe tablediff directly supports comparing 2 queries. If it did, you could do something like EXCEPT or INTERSECT to do the comparisons. If you're trying to keep these databases in sync, why not consider other solutions, like log shipping, mirroring, SSIS, replication, clustering, etc.

How do I ensure SQL Server replication is running?

I have two SQL Server 2005 instances that are geographically separated. Important databases are replicated from the primary location to the secondary using transactional replication.
I'm looking for a way that I can monitor this replication and be alerted immediately if it fails.
We've had occasions in the past where the network connection between the two instances has gone down for a period of time. Because replication couldn't occur and we didn't know, the transaction log blew out and filled the disk causing an outage on the primary database as well.
My google searching some time ago led to us monitoring the MSrepl_errors table and alerting when there were any entries but this simply doesn't work. The last time replication failed (last night hence the question), errors only hit that table when it was restarted.
Does anyone else monitor replication and how do you do it?
Just a little bit of extra information:
It seems that last night the problem was that the Log Reader Agent died and didn't start up again. I believe this agent is responsible for reading the transaction log and putting records in the distribution database so they can be replicated on the secondary site.
As this agent runs inside SQL Server, we can't simply make sure a process is running in Windows.
We have emails sent to us for Merge Replication failures. I have not used Transactional Replication but I imagine you can set up similar alerts.
The easiest way is to set it up through Replication Monitor.
Go to Replication Monitor and select a particular publication. Then select the Warnings and Agents tab and then configure the particular alert you want to use. In our case it is Replication: Agent Failure.
For this alert, we have the Response set up to Execute a Job that sends an email. The job can also do some work to include details of what failed, etc.
This works well enough for alerting us to the problem so that we can fix it right away.
You could run a regular check that data changes are taking place, though this could be complex depending on your application.
If you have some form of audit train table that is very regularly updated (i.e. our main product has a base audit table that lists all actions that result in data being updated or deleted) then you could query that table on both servers and make sure the result you get back is the same. Something like:
SELECT CHECKSUM_AGG(*)
FROM audit_base
WHERE action_timestamp BETWEEN <time1> AND BETWEEN <time2>
where and are round values to allow for different delays in contacting the databases. For instance, if you are checking at ten past the hour you might check items from the start the last hour to the start of this hour. You now have two small values that you can transmit somewhere and compare. If they are different then something has most likely gone wrong in the replication process - have what-ever pocess does the check/comparison send you a mail and an SMS so you know to check and fix any problem that needs attention.
By using SELECT CHECKSUM_AGG(*) the amount of data for each table is very very small so the bandwidth use of the checks will be insignificant. You just need to make sure your checks are not too expensive in the load that apply to the servers, and that you don't check data that might be part of open replication transactions so might be expected to be different at that moment (hence checking the audit trail a few minutes back in time instead of now in my example) otherwise you'll get too many false alarms.
Depending on your database structure the above might be impractical. For tables that are not insert-only (no updates or deletes) within the timeframe of your check (like an audit-trail as above), working out what can safely be compared while avoiding false alarms is likely to be both complex and expensive if not actually impossible to do reliably.
You could manufacture a rolling insert-only table if you do not already have one, by having a small table (containing just an indexed timestamp column) to which you add one row regularly - this data serves no purpose other than to exist so you can check updates to the table are getting replicated. You can delete data older than your checking window, so the table shouldn't grow large. Only testing one table does not prove that all the other tables are replicating (or any other tables for that matter), but finding an error in this one table would be a good "canery" check (if this table isn't updating in the replica, then the others probably aren't either).
This sort of check has the advantage of being independent of the replication process - you are not waiting for the replication process to record exceptions in logs, you are instead proactively testing some of the actual data.

SQL Server Maintenance Suggestions?

I run an online photography community and it seems that the site draws to a crawl on database access, sometimes hitting timeouts.
I consider myself to be fairly compentent writing SQL queries and designing tables, but am by no means a DBA... hence the problem.
Some background:
My site and SQL server are running on a remote host. I update the ASP.NET code from Visual Studio and the SQL via SQL Server Mgmt. Studio Express. I do not have physical access to the server.
All my stored procs (I think I got them all) are wrapped in transactions.
The main table is only 9400 records at this time. I add 12 new records to this table nightly.
There is a view on this main table that brings together data from several other tables into a single view.
secondary tables are smaller records, but more of them. 70,000 in one, 115,000 in another. These are comments and ratings records for the items in #3.
Indexes are on the most needed fields. And I set them to Auto Recompute Statistics on the big tables.
When the site grinds to a halt, if I run code to clear the transaction log, update statistics, rebuild the main view, as well as rebuild the stored procedure to get the comments, the speed returns. I have to do this manually however.
Sadly, my users get frustrated at these issues and their participation dwindles.
So my question is... in a remote environment, what is the best way to setup and schedule a maintenance plan to keep my SQL db running at its peak???
My gut says you are doing something wrong. It sounds a bit like those stories you hear where some system cannot stay up unless you reboot the server nightly :-)
Something is wrong with your queries, the number of rows you have is almost always irrelevant to performance and your database is very small anyway. I'm not too familiar with SQL server, but I imagine it has some pretty sweet query analysis tools. I also imagine it has a way of logging slow queries.
I really sounds like you have a missing index. Sure you might think you've added the right indexes, but until you verify the are being used, it doesn't matter. Maybe you think you have the right ones, but your queries suggest otherwise.
First, figure out how to log your queries. Odds are very good you've got a killer in there doing some sequential scan that an index would fix.
Second, you might have a bunch of small queries that are killing it instead. For example, you might have some "User" object that hits the database every time you look up a username from a user_id. Look for spots where you are querying the database a hundred times and replace it with a cache--even if that "cache" is nothing more then a private variable that gets wiped at the end of a request.
Bottom line is, I really doubt it is something mis-configured in SQL Server. I mean, if you had to reboot your server every night because the system ground to a halt, would you blame the system or your code? Same deal here... learn the tools provided by SQL Server, I bet they are pretty slick :-)
That all said, once you accept you are doing something wrong, enjoy the process. Nothing, to me, is funner then optimizing slow database queries. It is simply amazing you can take a query with a 10 second runtime and turn it into one with a 50ms runtime with a single, well-placed index.
You do not need to set up your maintenance tasks as a maintenance plan.
Simply create a stored procedure that carries out the maintenance tasks you wish to perform, index rebuilds, statistics updates etc.
Then create a job that calls your stored procedure/s. The job can be configured to run on your desired schedule.
To create a job, use the procedure sp_add_job.
To create a schedule use the procedure sp_add_schedule.
I hope what I have detailed is clear and understandable but feel free to drop me a line if you need further assistance.
Cheers, John

Resources