We have a problem of random sessions that leave their transactions open with changes uncommited and eventually blocking other users. Is it possible to query the server metadata to check if a certain transaction_id holds changes uncommitted ? (our ERP is written so it permanently has a transaction open, but it always should commit any change done as soon as possible).
I want to write a module on the ERP to log changes uncommited when none should exist (closing windows, etc. ...), to detect those bugs as soon as possible.
Try using this as a starting point:
select st.session_id,
dt.database_id,
dt.database_transaction_log_bytes_used,
dt.database_transaction_log_bytes_reserved,
datediff(second, s.last_request_end_time, getdate())
from sys.dm_tran_session_transactions as st
join sys.dm_tran_database_transactions as dt
on dt.transaction_id = st.transaction_id
join sys.dm_exec_sessions as s
on s.session_id = st.session_id;
You'll probably want to put some sort of threshold around amount of log used/reserved and/or how long the session has been idle.
Related
I'm running into an interesting issue in production, which I cannot replicate in our QA/Staging environments.
I have a query that is doing dirty reads on a fairly large table (around 6 million rows, but we only keep the last 90 days of data in it, older records are warehoused in a different database). This table has lots of writes to it, as it logs page views, but only occasionally is data read from the table.
Recently I noticed that when one specific query is running, SQL Server 2019 starts generating a ton of WRITELOG waits and appears to hold up any other requests that are trying to write to the database.
Now the query itself has nolock hints on all the tables, because it's okay if dirty data is returned. We use the nolock hints because the writes to the table are extremely frequently and queries to this table can be slow because there are a lot of page scans required.
The query itself looks something like this:
select
clt.ViewDate, clt.UserId, clt.RemoteAddress, clt.LibraryId, clt.Parameters
, u.Fullname
, cl.Id as VideoId, cl.Title
-- we need a compound key for each row, so we can count the unique rows
, case
when clt.ViewDate is null then null
else row_number() over (order by clt.ViewDate, clt.UserId, clt.LibraryId, clt.Parameters)
end as compoundKey
from
ContentLibrary as cl (nolock)
left join
(
ContentLibraryTracking as clt (nolock)
inner join
[User] as u (nolock)
on
clt.UserId = u.UserId
)
on
clt.ViewDate between #startDate and #endDate
and
clt.Parameters like #filter
where
1 = 1
and
cl.ContentType = #contentType
order by
clt.ViewDate
The problem table appears to be the ContentLibraryTracking. This is the table that has millions of rows and has lots of inserts and we warehouse rows nightly, so there can be a lot of page fragmentation. We do defrag the indices and stats weekly on the table.
When this query is running, sp_BlitzWho will report the query has entered into a CXCONSUMER. I will then see SQL Server 2019 starting to queue processes with a WRITELOG wait. This processed remain in this state until the query has finished running.
Since our application has some kind of write transaction with every page view, this means this query is holding up execution for entire application, which is obviously bad.
While I know have page scans is bad for a query plan, the query requires searching patterns in a varchar column, which is why the page scans happen. Since the reads are very infrequently, the table is optimized for writes since those are extremely frequent. And while the query could perform better, considering the work it's doing even when it's slow it runs within 15 seconds or so.
One thing I do see from the sp_BlitzWho results is the query is using parallelism and it also states the Transaction Isolation Level is Read Committed (which I would unexpected Read Uncommitted since all the tables have a nolock hint).
What would cause a query with dirty reads to be forcing the database to queue up WRITELOG events?
I could see this happening if the query was altering data and causing it's own transaction log entries, but that should not be happening with the query. That's the whole reason we are using the nolock hint on the tables.
Also, our database, log files and tempdb are all on their own logical storage devices, so reads from the database should not be causing a IO problems writing to the transaction log files.
A couple of notes on the environment:
We are running Microsoft SQL Server 2019 (RTM-CU8-GDR) (KB4583459) - 15.0.4083.2 (X64))
The database is running in a VM
We backup transaction logs every 5 minutes (could this be the issue?)
Memory and CPU usage appear fine with the query runs
SQL Monitor 11 only really shows spikes in the log flushes and waits (which would match the behavior). Page splits, buffer cache & page are all normal. I do see the "disk read bytes/sec" go up on the logic drive that has the database on it, but the writes on all drives (including the transaction logs) look okay.
Any thoughts would be greatly appreciated as I'm really scratching my head over this issue.
Right after I posted my question I started looking at the sp_BlitzWho results in more detail. I noticed the parallelism was using all the CPUs. So I changed the MAXDOP to half the CPU/cores and this appears to have resolved the issue. I'm going to keep monitoring the situation, but looks like an instance where the MAXDOP was not set correctly.
It make sense that if a query is eating up all the available cores, that other threads would be waiting. I was just thrown off by the WRITELOG waits.
Environment is SQL Server 2014, 64gb RAM, 6 processors. 2TB disk, with almost 400gb free space.
I have a procedure that is called by job. It creates temp table then joins several dimension tables to that table and inserts into fact table. It worked cleanly until monday running between 2 and 10 minutes. On monday it lasted nearly 5 hours without doing anything. idles process was at 98%, no reads no writes, state is suspended. There are no locks, no blocking sessions, literally nothing that I can pin down as culprit.
As soon as it's called it immediately goes to suspended state and I cannot find out why. It's supposed to be waiting for something, but I can't find what it's waiting for. It's blocking entire process and no data is being loaded.
I would really appreciate help.
A process goes into suspended mode because its waiting for a system resource become available. What specifically that resource is in your case, I'm not sure. If you re-run it and it continues to happen, I'd run a profiler trace on the procedure and see what it's doing at the moment it becomes suspended.
#XObi Mark,
In short, you'll need to look at the wait types and the query plan. Here's a query to capture the details of the query plan:
SELECT dm_ws.wait_duration_ms,
dm_ws.wait_type,
dm_es.status,
dm_t.TEXT,
dm_qp.query_plan,
dm_ws.session_ID,
dm_es.cpu_time,
dm_es.memory_usage,
dm_es.logical_reads,
dm_es.total_elapsed_time,
dm_es.program_name,
DB_NAME(dm_r.database_id) DatabaseName,
-- Optional columns
dm_ws.blocking_session_id,
dm_r.wait_resource,
dm_es.login_name,
dm_r.command,
dm_r.last_wait_type
FROM sys.dm_os_waiting_tasks dm_ws
INNER JOIN sys.dm_exec_requests dm_r ON dm_ws.session_id = dm_r.session_id
INNER JOIN sys.dm_exec_sessions dm_es ON dm_es.session_id = dm_r.session_id
CROSS APPLY sys.dm_exec_sql_text (dm_r.sql_handle) dm_t
CROSS APPLY sys.dm_exec_query_plan (dm_r.plan_handle) dm_qp
WHERE dm_es.is_user_process = 1
To analyze wait types, follow the advice on this link from Marcello Miorelli and steoleary.
How to find out why the status of a spid is suspended? What resources the spid is waiting for?
Tnx all for your input, I tried the above query and checked the helpful link you provided. It did give me a lot of info.
I finally traced the cause. It appears that procedure in question had a very inefficient way of getting dates from date table. It used
SELECT max(date) FROM d_date WHERE date_id = #myDateFrom
In the table date_id is integer, while date is, well, date.
This max was paralyzing the query. I realize that normally it's used to insure only a single row is returned from table, but in this case even without max only one row should be retrieved from d_date table. Removing the max from query returned execution time to roughly previous values.
Thank you all for your effort.
Obi Mark
We have a new sp getting released and during testing we found that when it runs its blocking other OLTP transactions. We found that initially it was because the new sp was causing lock escalation on a table and we reduced the number of batch size and was able to avoid that. even after avoiding lock escalation, it is still blocking oltp transactions that are coming in.
I think its locking the same row which the oltp transaction is updating.
I need to find a way to track all the locks held and release by the new sp. I tried trace/xevents(lock acquired/release) and it does not look like its capturing all the lock, may be cause it happens so fast.
Just to understand how lock acquired look like, i tested it out by doing a select * from atable. but it gives me different results. When we do select * doesnt it put a series of page locks, so i should be seeing shared page locks in the trace. but all i see is IS lock acquired and released.
what is the best way to track all the locks for a given transaction?
I ran below query in one session
begin tran
update orderstst
set unitprice=unitprice+1
waitfor delay '00:00:20'
and ran below dmv while the query is running on other session..
select resource_database_id,request_mode,request_type,request_status,txt.text
from sys.dm_tran_locks lck
join
sys.dm_exec_requests ec
on ec.session_id=lck.request_session_id
cross apply
sys.dm_exec_Sql_text(ec.sql_handle) txt
I got below data...
when the transaction is still not committed,but completed,i ran above dmv again.but didnt get any output.since this is not currently executing.
But running below dmv,will still give me lock info of all sessions holding locks..so you will be able to identify which session is holding more locks
select resource_database_id,request_mode,request_type,request_status
from sys.dm_tran_locks lck
join
sys.dm_exec_sessions ec
on ec.session_id=lck.request_session_id
Above query gives me below info..
So in summary,you have to run DMV1 or DMV2 for some period through sql agent job and insert into some table for later analaysis..
Further from SQL 2012,you can use extended events also..
Go to Management ->Extended Events ,Right Click and say ,start new session wizard.
Give it a name and check start at server startup
next screen gives you an option to select default template or not,i choose default template for locks as shown below and click next..
In the next screen,you can choose different events,in channel,select all channels and do the same in categories too and select the events of your interest,i choose below ..
In this screen,you can select actions ,i choose text ,sessionid
In next screen,filter like say for example ..gather events only for a databasename like 'somename' or query like some text..
Next screen is where you can save file to disk for later analysis..
Complete rest of screens and finally select start event session immediately option..
When you are done with gathering data,go to extended events and stop the session you created.Right click and say view target data..which shows you below screenn
EDIT: as of 12/3/2019 the start new session wizard is now located here:
I am doing a sql server profiler on deadlocks because users are getting query time outs. In the profiler the eventClass column shows Lock:Escalation and Lock:Cancel. How do I find out what would cause a query to be canceled? Basically the same queries are being run by bunch of users and things zoom right through, but off and on throughout the day users are timing out. I am run sqldiag also; however, unfortunately I am not DBA and muddling my way through to discover the problem. Any suggestions?
thanks community
nick
Query timeouts and deadlocks are pretty much mutually exclusive.
A deadlock situation will be discovered very quickly by deadlock monitor background thread and dealt with in a manner that one of the deadlocked processes (usually the one with lower cost of rolling back) will be chosen as a deadlock victim and it's work up to the point rolled back.
A query timeout could happen with livelocks, with high number of cuncurrent processes trying to access the same resource and thus blocking one another. When the time elapsed exceeds the timeout value (set by the client) the query will be canceled (and this is the reason you're seeing the Lock:Canceled events in the trace).
It is very important that client handles this condition, because all the resorces taken inside a transaction which timed out will remain taken as long as the connection is alive or the transaction is not rolled back.
To diagnose blocking situations, you can do several things.
If you happen to be monitoring at the time when a process is blocked, run the following query to find out the head of the blocking chain so you can investigate further:
select r.session_id, r.host_name, r.program_name,
r.login_name, r.nt_domain, r.nt_user_name,
r.total_elapsed_time/1000 as total_elapsed_time_sec, getdate() as vrijeme,
(select text from sys.dm_exec_sql_text(c.most_recent_sql_handle)) as sql_text
from sys.dm_exec_connections c
inner join sys.dm_exec_sessions r on r.session_id = c.session_id
where r.is_user_process = 1
and exists (
select *
from sys.dm_os_waiting_tasks r2
where r2.blocking_session_id = r.session_id
)
and not exists (
select *
from sys.dm_os_waiting_tasks r3
where r3.session_id = r.session_id
)
and r.total_elapsed_time/1000 > 10
This query has a 10 seconds treshold.
Firthermore you can use the Profiler to capture the blocking process event and then analyze it later on. Check this link for detailed explanation:
https://www.simple-talk.com/sql/sql-tools/how-to-identify-blocking-problems-with-sql-profiler/
There will usually be a handful of queries responsible for large majority of blocking. Identify those and try to optimize them (rewrite, indexing..). Besides, you can set up read committed snapshot isolation level for the database to avoid readers waiting on writers.
I'm wondering what is the benefit to use SELECT WITH (NOLOCK) on a table if the only other queries affecting that table are SELECT queries.
How is that handled by SQL Server? Would a SELECT query block another SELECT query?
I'm using SQL Server 2012 and a Linq-to-SQL DataContext.
(EDIT)
About performance :
Would a 2nd SELECT have to wait for a 1st SELECT to finish if using a locked SELECT?
Versus a SELECT WITH (NOLOCK)?
A SELECT in SQL Server will place a shared lock on a table row - and a second SELECT would also require a shared lock, and those are compatible with one another.
So no - one SELECT cannot block another SELECT.
What the WITH (NOLOCK) query hint is used for is to be able to read data that's in the process of being inserted (by another connection) and that hasn't been committed yet.
Without that query hint, a SELECT might be blocked reading a table by an ongoing INSERT (or UPDATE) statement that places an exclusive lock on rows (or possibly a whole table), until that operation's transaction has been committed (or rolled back).
Problem of the WITH (NOLOCK) hint is: you might be reading data rows that aren't going to be inserted at all, in the end (if the INSERT transaction is rolled back) - so your e.g. report might show data that's never really been committed to the database.
There's another query hint that might be useful - WITH (READPAST). This instructs the SELECT command to just skip any rows that it attempts to read and that are locked exclusively. The SELECT will not block, and it will not read any "dirty" un-committed data - but it might skip some rows, e.g. not show all your rows in the table.
On performance you keep focusing on select.
Shared does not block reads.
Shared lock blocks update.
If you have hundreds of shared locks it is going to take an update a while to get an exclusive lock as it must wait for shared locks to clear.
By default a select (read) takes a shared lock.
Shared (S) locks allow concurrent transactions to read (SELECT) a resource.
A shared lock as no effect on other selects (1 or a 1000).
The difference is how the nolock versus shared lock effects update or insert operation.
No other transactions can modify the data while shared (S) locks exist on the resource.
A shared lock blocks an update!
But nolock does not block an update.
This can have huge impacts on performance of updates. It also impact inserts.
Dirty read (nolock) just sounds dirty. You are never going to get partial data. If an update is changing John to Sally you are never going to get Jolly.
I use shared locks a lot for concurrency. Data is stale as soon as it is read. A read of John that changes to Sally the next millisecond is stale data. A read of Sally that gets rolled back John the next millisecond is stale data. That is on the millisecond level. I have a dataloader that take 20 hours to run if users are taking shared locks and 4 hours to run is users are taking no lock. Shared locks in this case cause data to be 16 hours stale.
Don't use nolocks wrong. But they do have a place. If you are going to cut a check when a byte is set to 1 and then set it to 2 when the check is cut - not a time for a nolock.
I have to add one important comment. Everyone is mentioning that NOLOCKreads only dirty data. This is not precise. It is also possible that you'll get the same row twice or the whole row is skipped during your read. The reason is that you could ask for some data at the same time when SQL Server is re-balancing b-tree.
Check another threads
https://stackoverflow.com/a/5469238/2108874
http://www.sqlmag.com/article/sql-server/quaere-verum-clustered-index-scans-part-iii.aspx)
With the NOLOCK hint (or setting the isolation level of the session to READ UNCOMMITTED) you tell SQL Server that you don't expect consistency, so there are no guarantees. Bear in mind though that "inconsistent data" does not only mean that you might see uncommitted changes that were later rolled back, or data changes in an intermediate state of the transaction. It also means that in a simple query that scans all table/index data SQL Server may lose the scan position, or you might end up getting the same row twice.
At my work, we have a very big system that runs on many PCs at the same time, with very big tables with hundreds of thousands of rows, and sometimes many millions of rows.
When you make a SELECT on a very big table, let's say you want to know every transaction a user has made in the past 10 years, and the primary key of the table is not built in an efficient way, the query might take several minutes to run.
Then, our application might me running on many user's PCs at the same time, accessing the same database. So if someone tries to insert into the table that the other SELECT is reading (in pages that SQL is trying to read), then a LOCK can occur and the two transactions block each other.
We had to add a "NO LOCK" to our SELECT statement, because it was a huge SELECT on a table that is used a lot by a lot of users at the same time and we had LOCKS all the time.
I don't know if my example is clear enough? This is a real life example.
The SELECT WITH (NOLOCK) allows reads of uncommitted data, which is equivalent to having the READ UNCOMMITTED isolation level set on your database. The NOLOCK keyword allows finer grained control than setting the isolation level on the entire database.
Wikipedia has a useful article: Wikipedia: Isolation (database systems)
It is also discussed at length in other stackoverflow articles.
select with no lock - will select records which may / may not going to be inserted. you will read a dirty data.
for example - lets say a transaction insert 1000 rows and then fails.
when you select - you will get the 1000 rows.