Update query getting suspended in SQL Server

Update query getting suspended in SQL Server - sql-server

I am in between the development of a report and the main data set of the report is being populated using a stored procedure. One particular update statement is as given below
UPDATE T1
SET
T1.Status=T2.Status,
T1.ErrorMessage=T2.ErrorMessage,
T1.IssueStatus=T2.IssueStatus
FROM
#tmptblOtherSongs T1
INNER JOIN #tmpOtherSongStatus T2 ON
T1.SongCode= T2.SongCode AND
T1.SocietyCode= T2.SocietyCode AND
T1.TableName=T2.TableName
WHERE
T1.SessionID='TRYFBGHk' AND
T2.Status IS NOT NULL
The sp goes into a suspended state when it reaches this particular update query. I tried running the query alone, but the result was the same.It runs fine for small amount of data, as in thousands but the issue occurs for more amount of data.
Is there any way in which I can prevent this. I have no clue why this is happening.

When the process is in suspended state, it is waiting for something. Quite often it's either waiting for I/O (reading or writing data, can be also tempdb related), or waiting for blocking to end.
If there is no blocking, it's most likely I/O operations. You should look at execution plan if there's something that doesn't look right and / or set statistics io on and check what tables are causing most of the I/O.
I/O can most often be improved by adding indexes, and in your case I would start looking at T1.SessionID and after that T1 & T2.SongCode and SocietyCode + maybe TableName depending on how many rows there are in the table per single value.

Related

Why would a dirty read of a table cause WRITELOG waits in SQL Server 2019?

I'm running into an interesting issue in production, which I cannot replicate in our QA/Staging environments.
I have a query that is doing dirty reads on a fairly large table (around 6 million rows, but we only keep the last 90 days of data in it, older records are warehoused in a different database). This table has lots of writes to it, as it logs page views, but only occasionally is data read from the table.
Recently I noticed that when one specific query is running, SQL Server 2019 starts generating a ton of WRITELOG waits and appears to hold up any other requests that are trying to write to the database.
Now the query itself has nolock hints on all the tables, because it's okay if dirty data is returned. We use the nolock hints because the writes to the table are extremely frequently and queries to this table can be slow because there are a lot of page scans required.
The query itself looks something like this:
select
clt.ViewDate, clt.UserId, clt.RemoteAddress, clt.LibraryId, clt.Parameters
, u.Fullname
, cl.Id as VideoId, cl.Title
-- we need a compound key for each row, so we can count the unique rows
, case
when clt.ViewDate is null then null
else row_number() over (order by clt.ViewDate, clt.UserId, clt.LibraryId, clt.Parameters)
end as compoundKey
from
ContentLibrary as cl (nolock)
left join
(
ContentLibraryTracking as clt (nolock)
inner join
[User] as u (nolock)
on
clt.UserId = u.UserId
)
on
clt.ViewDate between #startDate and #endDate
and
clt.Parameters like #filter
where
1 = 1
and
cl.ContentType = #contentType
order by
clt.ViewDate
The problem table appears to be the ContentLibraryTracking. This is the table that has millions of rows and has lots of inserts and we warehouse rows nightly, so there can be a lot of page fragmentation. We do defrag the indices and stats weekly on the table.
When this query is running, sp_BlitzWho will report the query has entered into a CXCONSUMER. I will then see SQL Server 2019 starting to queue processes with a WRITELOG wait. This processed remain in this state until the query has finished running.
Since our application has some kind of write transaction with every page view, this means this query is holding up execution for entire application, which is obviously bad.
While I know have page scans is bad for a query plan, the query requires searching patterns in a varchar column, which is why the page scans happen. Since the reads are very infrequently, the table is optimized for writes since those are extremely frequent. And while the query could perform better, considering the work it's doing even when it's slow it runs within 15 seconds or so.
One thing I do see from the sp_BlitzWho results is the query is using parallelism and it also states the Transaction Isolation Level is Read Committed (which I would unexpected Read Uncommitted since all the tables have a nolock hint).
What would cause a query with dirty reads to be forcing the database to queue up WRITELOG events?
I could see this happening if the query was altering data and causing it's own transaction log entries, but that should not be happening with the query. That's the whole reason we are using the nolock hint on the tables.
Also, our database, log files and tempdb are all on their own logical storage devices, so reads from the database should not be causing a IO problems writing to the transaction log files.
A couple of notes on the environment:
We are running Microsoft SQL Server 2019 (RTM-CU8-GDR) (KB4583459) - 15.0.4083.2 (X64))
The database is running in a VM
We backup transaction logs every 5 minutes (could this be the issue?)
Memory and CPU usage appear fine with the query runs
SQL Monitor 11 only really shows spikes in the log flushes and waits (which would match the behavior). Page splits, buffer cache & page are all normal. I do see the "disk read bytes/sec" go up on the logic drive that has the database on it, but the writes on all drives (including the transaction logs) look okay.
Any thoughts would be greatly appreciated as I'm really scratching my head over this issue.

Right after I posted my question I started looking at the sp_BlitzWho results in more detail. I noticed the parallelism was using all the CPUs. So I changed the MAXDOP to half the CPU/cores and this appears to have resolved the issue. I'm going to keep monitoring the situation, but looks like an instance where the MAXDOP was not set correctly.
It make sense that if a query is eating up all the available cores, that other threads would be waiting. I was just thrown off by the WRITELOG waits.

Why is one Java thread only getting back half of the data inserted by another thread in SQLServer?

In this specific scenario, I have two threads:
One is polling for new files on disk in a specific folder. It uses metadata from the file to insert jobs into the database (across multiple tables). Autocommit is set to false, and I am committing only at the end of the loop (assuming there are more than one file found at a time, it commits once per file).
Another thread is constantly polling for new "jobs" to process. It queries primarily against one table, but there are a couple of left joins. The query uses the with (updlock, readpast) table hints only on the primary table.
I have noticed that some of the jobs that come back are missing values for one or more of the left joins, but only for the first time (it causes a failure, which then triggers that job to be picked up again seconds later). I know for sure that the value is not actually null, because I have added log statements before and after each query that inserts the data. I have made sure that autocommit is disabled, and I only commit once all pieces of data are there.
One thing I have noticed though, which seems to be a pattern, is that the polling thread starts its query shortly before the other thread finishes. It could be one or two operations behind, or it could be just about to commit. It returns the result after the first thread commits (based on timestamps in my log).
I don't understand how this could happen. I have a readpast hint on the primary table, so it should not even come back in the query. But it comes back without the associated data from the joining table, which does not have any hints on it. If it took a few seconds to run (it's generally pretty quick though) it should theoretically encounter the locked row that was not committed yet, and just skip it.
Here is a simple example of the query that I am using to poll for new jobs.
SELECT distinct
j.id,
p.request_path,
j.revision_id,
j.status,
j.creation_date,
dr.dependent_revision_id
from incoming_jobs j WITH (updlock, readpast)
left join file_paths p on j.revision_id = p.revision_id
left join dependent_revisions dr on j.id = dr.job_id
where j.status in ('WAITING')
and dr.dependent_revision_id is null
order by j.creation_date offset 0 rows fetch next 3 row only
In this case, there is a non-null value in file_paths for the job, but it comes back null anyhow. That value gets inserted immediately after the job is inserted into the incoming_jobs table by the other thread.

SQL Server session enters suspended state

Environment is SQL Server 2014, 64gb RAM, 6 processors. 2TB disk, with almost 400gb free space.
I have a procedure that is called by job. It creates temp table then joins several dimension tables to that table and inserts into fact table. It worked cleanly until monday running between 2 and 10 minutes. On monday it lasted nearly 5 hours without doing anything. idles process was at 98%, no reads no writes, state is suspended. There are no locks, no blocking sessions, literally nothing that I can pin down as culprit.
As soon as it's called it immediately goes to suspended state and I cannot find out why. It's supposed to be waiting for something, but I can't find what it's waiting for. It's blocking entire process and no data is being loaded.
I would really appreciate help.

A process goes into suspended mode because its waiting for a system resource become available. What specifically that resource is in your case, I'm not sure. If you re-run it and it continues to happen, I'd run a profiler trace on the procedure and see what it's doing at the moment it becomes suspended.

#XObi Mark,
In short, you'll need to look at the wait types and the query plan. Here's a query to capture the details of the query plan:
SELECT dm_ws.wait_duration_ms,
dm_ws.wait_type,
dm_es.status,
dm_t.TEXT,
dm_qp.query_plan,
dm_ws.session_ID,
dm_es.cpu_time,
dm_es.memory_usage,
dm_es.logical_reads,
dm_es.total_elapsed_time,
dm_es.program_name,
DB_NAME(dm_r.database_id) DatabaseName,
-- Optional columns
dm_ws.blocking_session_id,
dm_r.wait_resource,
dm_es.login_name,
dm_r.command,
dm_r.last_wait_type
FROM sys.dm_os_waiting_tasks dm_ws
INNER JOIN sys.dm_exec_requests dm_r ON dm_ws.session_id = dm_r.session_id
INNER JOIN sys.dm_exec_sessions dm_es ON dm_es.session_id = dm_r.session_id
CROSS APPLY sys.dm_exec_sql_text (dm_r.sql_handle) dm_t
CROSS APPLY sys.dm_exec_query_plan (dm_r.plan_handle) dm_qp
WHERE dm_es.is_user_process = 1
To analyze wait types, follow the advice on this link from Marcello Miorelli and steoleary.
How to find out why the status of a spid is suspended? What resources the spid is waiting for?

Tnx all for your input, I tried the above query and checked the helpful link you provided. It did give me a lot of info.
I finally traced the cause. It appears that procedure in question had a very inefficient way of getting dates from date table. It used
SELECT max(date) FROM d_date WHERE date_id = #myDateFrom
In the table date_id is integer, while date is, well, date.
This max was paralyzing the query. I realize that normally it's used to insure only a single row is returned from table, but in this case even without max only one row should be retrieved from d_date table. Removing the max from query returned execution time to roughly previous values.
Thank you all for your effort.
Obi Mark

Should I avoid using sp_getAppLock?

I have a stored procedure, and I want to ensure it cannot be executed concurrently.
My (multi-threaded) application does all necessary work on the underlying table via this stored procedure.
IMO, locking the table itself is an unnecessarily drastic action to take, and so when I found out about sp_GetAppLock, which essentially enforces a critical section, this sounded ideal.
My plan was to encase the stored procedure in a transaction and to set up spGetAppLock with transaction scope. The code was written and tested successfully.
The code has now been put forward for review and I have been told that I should not call this function. However when asking the obvious question "why not?", the only reasons I am getting are highly subjective, to do with any form of locking being complicated.
I don't necessarily buy this, but I was wondering whether anyone had any objective reasons why I should avoid this construct. Like I say, given my circumstances a critical section sounds an ideal approach to me.
Further info: An application sits on top of this with 2 threads T1 and T2. Each thread is waiting for a different message M1 and M2. The business logic involved says that processing can only happen once both M1 and M2 have arrived. The stored procedure logs that Mx has arrived (insert) and then checks whether My is present (select). The built-in locking is fine to make sure the inserts happen serially. But the selects need to happen serially too and I think I need to do something over and above the built-in functionality here.
Just for clarity, I want the "processing" to happen exactly once. So I can't afford for the stored procedure to return either false positives or false negatives. I'm worried that if the stored proc runs twice in very quick succession, then both "selects" might return data which indicates that it is appropriate to perform processing.

What is the procedure doing that you cannot rely on SQL Servers built-in concurrency control mechanisms? Often queries can be rewritten to allow real concurrency.
But if this procedure indeed has to be executed "alone", locking the table itself on first access is most likely going to be a lot faster than using the call to sp_GetAppLock. It sounds like this procedure is going to be called often. If that is the case you should look for a way to achieve the goal with minimal impact.
If the table contains no other rows besides of M1 and M2 a table lock is still your best bet.
If you have multiple threads sending multiple messages you can get more fine-grained by using "serializable" as transaction level and check if the other message is there before you do the insert but within the same transaction. To prevent deadlocks in this case make sure you check for both messages for example like this:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRAN;
SELECT
#hasM1 = MAX(CASE WHEN msg_type='M1' THEN 1 ELSE 0 END),
#hasM2 = MAX(CASE WHEN msg_type='M2' THEN 1 ELSE 0 END)
FROM messages WITH(UPDLOCK)
WHERE msg_type IN ('M1','M2')
INSERT ...
IF(??) EXEC do_other_stuff_and_delete_messages;
COMMIT
In the IF statement before(!) the COMMIT you can use the information collected before the insert together with the information that you inserted to decide if additional processing is necessary.
In that processing step make sure to either mark those messages as processed or to delete them all still within the same transaction. That will make sure that you will not process those messages twice.
SERIALIZABLE is the only transaction isolation level that allows to lock rows that do not exist yet, so the first select statement with the WITH(UPDLOCK) effectively prevents the other row being inserted while the first execution is still running.
Finally, these are a lot of things to be aware of that could go wrong. You might want to have a look at service broker instead. you could use three queues with that. one for type M1 and one for type M2. Every time a message arrives within those queues a procedure can automatically be called to insert a token into the third queue. The third queue then could activate a process to check if both messages exist and do work. That would make the entire process asynchronous but for that it would be easy to restrict the queue 3 response to always only do one check at a time.
Service broker on msdn, also look at "activation" for the automatic message processing.

sp_GetAppLock is just like many other tools and as such it can be misused, overused, or correctly used. It is an exact match for the type of problem described by the original poster.
This is a good MSSQL Tips post on the usage
Prevent multiple users from running the same SQL Server stored procedure at the same time
http://www.mssqltips.com/sqlservertip/3202/prevent-multiple-users-from-running-the-same-sql-server-stored-procedure-at-the-same-time/

We use sp_getapplock all the time, due to the fact that we support some legacy applications that have been re-worked to use a SQL back-end, and the SQL Server locking model is not an exact match for our application logic.
We tend to go for a 'pessimistic' locking model, where we lock an entity before allowing a user to edit it, and use the (NOLOCK) hint extensively when reading data to bypass any blocking from the native locks on the actual tables. sp_getapplock is a good match for this. We also use it to enforce critical paths in large multi-user systems. You have to be systematic about what you call the locks you place.
We've found no performance problems with large numbers of user/locks via this route, so I see no reason why it wouldn't work well for you. Just be aware that you can get blocking and deadlocks if you have processes that place the same named locks, but not necessarily in the same order.

You can create a table with a flag for each set of messages, so if one of the threads is first to start processing it will mark the flag as processing.
To make sure that record blocked properly once one of threads reaches it use:
SELECT ... FROM WITH(XLOCK,ROWLOCK,READCOMMITTED) ... WHERE ...
This peace of code will put Exclusive lock on the record meaning who first got to it owns the row.
Then you do your changes and update flag, other thread will get updated value because it will be blocked by Exclusive lock until first thread commmits or rollbacks transaction.
For this to work you always need to select records from table with XLOCK this way it will work as expected.
Hope this helps.
Exclusive lock prove:
USE master
GO
IF OBJECT_ID('dbo.tblTest') IS NOT NULL
DROP TABLE dbo.tblTest
CREATE TABLE tblTest ( id int PRIMARY KEY )
;WITH cteNumbers AS (
SELECT 1 N
UNION ALL
SELECT N + 1 FROM cteNumbers WHERE N<1000
)
INSERT INTO
tblTest
SELECT
N
FROM
cteNumbers
OPTION (MAXRECURSION 0)
BEGIN TRANSACTION
SELECT * FROM dbo.tblTest WITH(XLOCK,ROWLOCK,READCOMMITTED) WHERE id = 1
SELECT * FROM sys.dm_tran_locks WHERE resource_database_id = DB_ID('master')
ROLLBACK TRANSACTION

Understanding SQL Server LOCKS on SELECT queries

I'm wondering what is the benefit to use SELECT WITH (NOLOCK) on a table if the only other queries affecting that table are SELECT queries.
How is that handled by SQL Server? Would a SELECT query block another SELECT query?
I'm using SQL Server 2012 and a Linq-to-SQL DataContext.
(EDIT)
About performance :
Would a 2nd SELECT have to wait for a 1st SELECT to finish if using a locked SELECT?
Versus a SELECT WITH (NOLOCK)?

A SELECT in SQL Server will place a shared lock on a table row - and a second SELECT would also require a shared lock, and those are compatible with one another.
So no - one SELECT cannot block another SELECT.
What the WITH (NOLOCK) query hint is used for is to be able to read data that's in the process of being inserted (by another connection) and that hasn't been committed yet.
Without that query hint, a SELECT might be blocked reading a table by an ongoing INSERT (or UPDATE) statement that places an exclusive lock on rows (or possibly a whole table), until that operation's transaction has been committed (or rolled back).
Problem of the WITH (NOLOCK) hint is: you might be reading data rows that aren't going to be inserted at all, in the end (if the INSERT transaction is rolled back) - so your e.g. report might show data that's never really been committed to the database.
There's another query hint that might be useful - WITH (READPAST). This instructs the SELECT command to just skip any rows that it attempts to read and that are locked exclusively. The SELECT will not block, and it will not read any "dirty" un-committed data - but it might skip some rows, e.g. not show all your rows in the table.

On performance you keep focusing on select.
Shared does not block reads.
Shared lock blocks update.
If you have hundreds of shared locks it is going to take an update a while to get an exclusive lock as it must wait for shared locks to clear.
By default a select (read) takes a shared lock.
Shared (S) locks allow concurrent transactions to read (SELECT) a resource.
A shared lock as no effect on other selects (1 or a 1000).
The difference is how the nolock versus shared lock effects update or insert operation.
No other transactions can modify the data while shared (S) locks exist on the resource.
A shared lock blocks an update!
But nolock does not block an update.
This can have huge impacts on performance of updates. It also impact inserts.
Dirty read (nolock) just sounds dirty. You are never going to get partial data. If an update is changing John to Sally you are never going to get Jolly.
I use shared locks a lot for concurrency. Data is stale as soon as it is read. A read of John that changes to Sally the next millisecond is stale data. A read of Sally that gets rolled back John the next millisecond is stale data. That is on the millisecond level. I have a dataloader that take 20 hours to run if users are taking shared locks and 4 hours to run is users are taking no lock. Shared locks in this case cause data to be 16 hours stale.
Don't use nolocks wrong. But they do have a place. If you are going to cut a check when a byte is set to 1 and then set it to 2 when the check is cut - not a time for a nolock.

I have to add one important comment. Everyone is mentioning that NOLOCKreads only dirty data. This is not precise. It is also possible that you'll get the same row twice or the whole row is skipped during your read. The reason is that you could ask for some data at the same time when SQL Server is re-balancing b-tree.
Check another threads
https://stackoverflow.com/a/5469238/2108874
http://www.sqlmag.com/article/sql-server/quaere-verum-clustered-index-scans-part-iii.aspx)
With the NOLOCK hint (or setting the isolation level of the session to READ UNCOMMITTED) you tell SQL Server that you don't expect consistency, so there are no guarantees. Bear in mind though that "inconsistent data" does not only mean that you might see uncommitted changes that were later rolled back, or data changes in an intermediate state of the transaction. It also means that in a simple query that scans all table/index data SQL Server may lose the scan position, or you might end up getting the same row twice.

At my work, we have a very big system that runs on many PCs at the same time, with very big tables with hundreds of thousands of rows, and sometimes many millions of rows.
When you make a SELECT on a very big table, let's say you want to know every transaction a user has made in the past 10 years, and the primary key of the table is not built in an efficient way, the query might take several minutes to run.
Then, our application might me running on many user's PCs at the same time, accessing the same database. So if someone tries to insert into the table that the other SELECT is reading (in pages that SQL is trying to read), then a LOCK can occur and the two transactions block each other.
We had to add a "NO LOCK" to our SELECT statement, because it was a huge SELECT on a table that is used a lot by a lot of users at the same time and we had LOCKS all the time.
I don't know if my example is clear enough? This is a real life example.

The SELECT WITH (NOLOCK) allows reads of uncommitted data, which is equivalent to having the READ UNCOMMITTED isolation level set on your database. The NOLOCK keyword allows finer grained control than setting the isolation level on the entire database.
Wikipedia has a useful article: Wikipedia: Isolation (database systems)
It is also discussed at length in other stackoverflow articles.

select with no lock - will select records which may / may not going to be inserted. you will read a dirty data.
for example - lets say a transaction insert 1000 rows and then fails.
when you select - you will get the 1000 rows.