Query runs for hours - sql-server

Some of the queries that execute overnight run for exactly 5 hours before logging a failure message. These queries retrieve records from a linked server which is also a SQL server. When I run the same query in the morning it executes in a minute. The timeout property in both servers is set to 1 hour. So I'm trying to understand how it can execute for more than an hour and how it can stop after 5 hours every time. Please help me understand what I'm missing.
Thanks.

It might be possible for that particular query which runs on that particular session/ window, if you are using transaction methodology in your queries, transaction had not been in committed state so it should be rollback before executing again.
Reason 2 - Linked server functionality had not been working at that time when you were executed the query.

Related

SQL Server - job history for sql job running less than 15 seconds

I am trying to figure out the issue.
we have a couple of sql job agents that runs less than 15 seconds everyday to execute store procedures.
Whenever I tried to see the job history, there is no information about it and it shows "never executed"
I did a simple test and tried to let it run say more than 60 seconds. I see it appears on the job history.
Is there any settings that I need to set so that I could see the jobs in job history that runs less than 15 seconds.
Thanks.

Many KILLED/ROLLBACK tasks running on SQL Server with 0% progress after 2 days

I created a stored procedure which sends an email and accidentally called the stored procedure within itself creating an endless loop. Within a few seconds of executing the stored procedure I realized what I had done and fixed the loop, but it had already created 517 processes. I killed all the SPID's but they are stuck in a KILLED/ROLLBACK state.
This code shows me the processes:
select session_id,handle.percent_complete, *
from sys.dm_exec_requests handle
outer apply sys.fn_get_sql(handle.sql_handle) spname
where cast(handle.start_time as date) = '2022-01-10'
spname.text is showing 'xp_sysmail_format_query' for all the SPID's. It's been two days, and all 517 processes have been stuck in this rollback state with 0% progress. We are still able to use all our business applications and execute queries, with the exception of EXEC msdb.dbo.sp_send_dbmail which, when starting even a test email, gets stuck executing and has to be cancelled. This is not good because any auto generated email warnings will not be sent, and all other sql email functions are blocked. I'm not sure what other jobs are being blocked at this time.
This is a huge problem and I cannot find a solution. I've read every post I can find about this. I've tried everything I can think of except restarting the SQL server. Some posts state that restarting the SQL server can fix this and some state not to restart it or that the tasks will just resume in the killed/rollback state when restarted. I tried killing the spids again with statusonly but that just informs me that they are in a rollback state with 0% complete.
Should I restart the server and will this fix anything? Is there another solution other than restoring the DB to a backup that is more than 2 days old and losing all the work the entire business has done in the last couple days?
Any assistance will be greatly appreciated.
As robust as it is, sometimes (fortunately rarely) SQL Server leaves us no choice when killing a process to adopt the IT mantra of turning it off an on again when the rollback does not complete in a timely fashion.
This can be more prevalent when a transaction enlists external methods or functions, email is notorious for this inparticular.
As unwelcome as it is, it's often the least-expensive in terms of time and should be considered an option soon in the diagnosis process when the low-hanging fruit options have been exhausted.

Repeating SQL Query takes long pause once or twice a minute

I have used the ADO-function from another website to make a query from Excel VBA 2016 on my SQL Server Express database on the same workstation. The SQL database is continuously updated with new records from a feed provider.
The query runs very smoothly and takes around 70 milliseconds to execute. I call the query every 3 seconds from a routine in Excel VBA. Although once or twice a minute the query takes 6000-12000 milliseconds. It seems the query is "hanging" at these moments for some reason.
When I turn off the external feed, the issue is not there!
I also have used Express Profiler to get insight in the cause. I have added the output below this post. Lines 33 and 173 are lines with a long interval. As you can see this duration is much longer than the other (identical queries).
I have set the READ_COMMIT_SNAPSHOT ON in options.
I have also set ALLOW_SNAPSHOT_ISOLATION ON in options.
I did not close down any connections, although SQL did give a message that it was closing all connections.
Although after testing the problem is still there.
After reading this:
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql I have tried using the command SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED but, after testing, the problem is still there.
Does anyone know a solution to this problem?

How To Improve Delete Timeout Issues In CRM 2011 On Prem Dev Environment?

Background
I have a unit test framework that creates entities for my unit tests, preforms the test, then automagically deletes the entities. It had been working fine except that some entities take 15 - 30 seconds to delete in our dev environment.
I recently received a VM setup in the Amazon Cloud to perform some long term changes requiring a couple release cycles to complete. When I run a unit test on VM, I'm continually getting SQL Timeout Errors attempting to delete the entity.
Steps
I've gone down this set of discovery / action steps:
Turned on tracing, saw that timeout was occurring on fn_CollectForCascadeWrapper which is used to handle cascading deletes. My unit test only has 6 entities in it, and they are deleted in such a way that no cascading deletes are needed. Ran Estimated Execution Plan on it and added some of the indexes it requested. This still didn't fix the timeout issue.
Turned on the Resource Manager on the VM to look at Disk Access / Memory / CPU. When I attempt a delete, the CPU hits 20% for 2 seconds, then drops down to near 0. Memory is unchanged, but Disk Read Access on the Resource Manager Goes crazy high, and stays that way for 7-10 minutes.
Hard Coded the fn_CollectForCascadeWrapper to return a result meaning nothing is required to be cascaded for the 6 entities in my unit test. Ran the unit test and again got the SQL Timeout Error. According to the Tracing, the actual delete statement was timing out:
delete from [New_inquiryExtensionBase] where ([New_inquiryId] = '7e250a5f-890e-40ae-9d2d-c55bbd7250cd');
delete from [New_inquiryBase]
OUTPUT DELETED.[New_inquiryId], 10012
into SubscriptionTrackingDeletedObject (ObjectId, ObjectTypeCode)
where ([New_inquiryId] = '7e250a5f-890e-40ae-9d2d-c55bbd7250cd')
Ran the query manually in SQL Management Studio. Took around 3 minutes to complete. No Triggers on the tables, so I thought the time must be due to the insert. Looked at the SubscriptionTrackingDeletedObject table, and noticed it had 2100 records in it. Deleted all records in the table, and reran my unit test. It actually worked in the normal 15-30 second time frame for deletes.
Researched and discovered what the SubscriptionTrackingDeletedObject is used for, and that the Async Service cleans it up. Noticed that the Async Service was not running on the server. Turned the service on, waited 10 minutes and queried the table again. My 6 entities were still listed there. Looked in trace log and saw timeout errors: Error cleaning up Principal Object Access Table
Researched POA and performed a SELECT COUNT(*) on the table and 7 minutes later it returned 261 million records! Researched how to cleanup the table and the only thing I found was for Role Up 6 (we're currently on 11).
What Next?
Could the POA be affecting the Delete? Or is it just the POA that is affecting the Async Service that is affecting the delete? Could inserting into the SubscriptionTrackingDeletedObject really be causing my problem?
I ended up turning on SQL Server Profiling, and running the delete statement listed in my question. It took 3.5 minutes to execute. I was expecting it to be kicking something else off that hit the POA table, but nope, it was just deleting those records.
I took a second look at the Query Execution Plan and noticed there were lots of Nested loops:
that were looking at the child tables that contain a reference to it (see the 13 tiny branches in the tree structure insert in the bottom right?) . So all the reads were being performed on the indexes themselves, and taking forever to get loaded on my uber slow VM.
I ended up running the same query for a different id, and it ran in 2 seconds. I then attempted my unit test, and finally it completed successfully.
I'm guessing each time I attempted a delete, a transaction was started, and then the time out on CRM rolled back the transaction, never allowing the child entity indexes to load. So my current fix is to ensure the child indexes are loaded in memory before actually performing the delete. How I'm going to do that, I'm not sure (perform a query by id for each of the child entities?).
Edit
We had a performance analyst from Microsoft come out and they wrote up a report that was over 200 pages long. 98% said the POA table was too long. Over Christmas we ended up turning off CRM and running some scripts to cleanup the POA table. This has been extremely helpful.

SQL Server job (stored proc) trace

I need your suggestion on tracing the issue.
We are running data load jobs at early morning and loading the data from Excel file into SQL Server 2005 db. When job runs on production server, many times it takes 2 to 3 hours to complete the tasks. We could drill down to one job step which is taking 99% of the total time to finish.
While running the job step (stored procs) on staging environment (with the same production database restored) takes 9 to 10 minutes, the same takes hours on production server when it run at early morning as part of job. The production server always stuck up at the very job step.
I would like to run trace on the very job step (around 10 stored procs run for each user in while loop within the job step) and collect the info to figure out the issue.
What are the ways available in SQL Server 2005 to achieve the same? I want to run the trace only for these SPs and not for certain period time period on production server, as trace give lots of information and it becomes very difficult for me (as not being DBA) to analyze that much of trace information and figure out the issue. So I want to collect info about specific SPs only.
Let me know what you suggest.
Appreciate your time and help.
Thanks.
Use SQL Profiler. It allows you to trace plenty of events, including stored procedures, and even apply filters to the trace.
Create a new trace
Select just stored procedures (RPC:Completed)
Check "TextData" for columns to read
Click the "Column Filters" button
Select "TextData" from the left hand nav
Expand the "Like" tree view, and type in your procedure name
Check "Exclude Rows that Do Not Contain Values"
Click "OK", then "Run"
What else is happening on the server at that time is important when it is faster on other servers but not prod. Maybe you are running into the daily backup or maintenance of statistics or indexes jobs?

Resources