When does LOCK start during INSERT - sql-server

I have a INSERT statmenet that gets data by executing a stored procedure in SQL Server 2012.
INSERT INTO Employee
EXEC tp_GetEmployees
The stored procedure takes about 30 sec to provide the result.
Will the lock introduced for INSERT wait for 30 sec (for the SP to complete) ?
Or does the lock start only after the sp provided its result?

The INSERT statement needs to acquire necessary stability locks before it starts executing (or, strictly speaking, as a first step of execution). The INSERT statement contains the EXEC call, so the EXEC is executed as part of executing the INSERT, which means that the INSERT has already acquired any necessary stability locks (for INSERT this necessarily has to be at least an Intent-Exclusive IX mode, see intent mode locks).
Data locks are acquired as the data is being inserted, it would be impossible otherwise because one cannot simply guess what keys will be inserted, on what pages. As the data is provided by the EXEC, is not possible for INSERT to lock the data being inserted before actually having the data available.
For more detailed information, you can monitor lock acquisition and release in real time, eg. see Using XEVENT in SQL Server. Monitoring such will show you also whether the EXEC result is buffered or inserted as it comes (which is the natural next question to ask).

In one query window, run this:
create procedure DoLittle
as
WAITFOR DELAY '00:05:00'
select 1 as a
go
create table T (a int not null)
go
insert into T(a)
exec DoLittle
go
In another query window, run this:
sp_lock
Which should produce output like this (for whichever SPID the first query window is running as):
spid dbid ObjId IndId Type Resource Mode Status
------ ------ ----------- ------ ---- -------------------------------- -------- ------
54 1 1623676832 0 TAB IX GRANT
That is, an Intent eXclusive lock is already taken and held by the INSERT ... EXEC. But no other locks are held. No other query will be able to obtain an eXclusive lock against the table, but they may be able to acquire lower level locks.

Related

SELECT statement is not blocked by an existing exclusive table lock

For testing, I am trying to simulate a condition in which a query from our web application to our SQL Server backend would timeout. The web application is configured so this happens if the query runs longer than 30 seconds. I felt the easiest way to do this would be to take and hold an exclusive lock on the the table that the web application wants to query. As I understand it, an exclusive lock should prevent any additional locks (even the shared locks taken by a SELECT statement).
I used the following methodology:
CREATE A LONG-HELD LOCK
Open a first query window in SSMS and run
BEGIN TRAN;
SELECT * FROM MyTable WITH (TABLOCKX);
WAITFOR DELAY '00:02:00';
ROLLBACK;
(see https://stackoverflow.com/a/25274225/2824445 )
CONFIRM THE LOCK
I can EXEC sp_lock and see results with ObjId matching MyTable, Type of TAB, Mode of X
TRY TO GET BLOCKED BY THE LOCK
Open a second query window in SSMS and run SELECT * FROM MyTable
I would expect this to sit and wait, not returning any results until after the lock is released by the first query. Instead, the second query returns with full results immediately.
STUFF I TRIED
In the second query window, if I SET TRANSACTION ISOLATION LEVEL SERIALIZABLE, then the second query waits until the first completes as expected. However, the point is to simulate a timeout in our web application, and I do not have any easy way to alter the transaction isolation level of the web application's connections away from the default of READ COMMITTED.
In the first window, I tried modifying the table's values inside the transaction. In this case, when the second query returns immediately, the values it shows are the unmodified values.
Figured it out. We had READ_COMMITTED_SNAPSHOT turned on, which is how the second query was able to return the previous, unmodified values in part 2 of "Stuff I tried". I was able to determine this with SELECT is_read_committed_snapshot_on FROM sys.databases WHERE name = 'MyDatabase'. Once it was turned off with ALTER DATABASE MyDatabase SET READ_COMMITTED_SNAPSHOT OFF, I began to see the expected behavior in which the second query would wait for the first to complete.

How to properly truncate a staging table in an ETL pipeline?

We have an ETL pipeline that runs for each CSV uploaded into an storage account (Azure). It runs some transformations on the CSV and writes the outputs to another location, also as CSV, and calls a stored procedure on the database (SQL Azure) which ingests (BULK INSERT) this resulting CSV into a staging table.
This pipeline can have concurrent executions as multiple resources can be uploading files to the storage. Hence, the staging table is getting data inserted pretty often.
Then, we have an scheduled SQL job (Elastic Job) that triggers an SP that moves the data from the staging table into the final table.
At this point, we would want to truncate/empty the staging table so that we do not re-insert them in the next execution of the job.
Problem is, we cannot be sure that between the load from the staging table to the final table and the truncate command, there has not been any new data written into the staging table that could be truncated without first being inserted in to the final table.
Is there a way to lock the staging table while we're copying the data into the final table so that the SP (called from the ETL pipeline) trying to write to it will just wait until the lock is release? Is this achievable by using transactions or maybe some manual lock commands?
If not, what's the best approach to handle this?
I would propose solution with two identical staging tables. Lets name them StageLoading and StageProcessing.
Load process would have following steps:
1. At the beginning both tables are empty.
2. We load some data into StageLoading table (I assume each load is a transaction).
3. When Elastic job starts it will do:
- ALTER TABLE SWITCH to move all data from StageLoading to StageProcessing. It will make StageLoading empty and ready for next loads. It is a metadata operation, so takes miliseconds and it is fully blocking, so will be done between loads.
- load the data from StageProcessing to final tables.
- truncate table StageProcessing.
4. Now we are ready for next Elastic job.
If we try to do SWITCH when StageProcessing is not empty, ALTER will fail and it will mean that last load process failed.
I like the sp_getapplock and use this method myself in few places for its flexibility and that you have full control over the locking logic and wait times.
The only problem that I see is that in your case concurrent processes are not all equal.
You have SP1 that moves data from the staging table into the main table. Your system never tries to run several instances of this SP.
Another SP2 that inserts data into the staging table can be run several times simultaneously and it is fine to do it.
It is easy to implement the locking that would prevent any concurrent run of any combination of SP1 or SP2. In other words, it is easy if the locking logic is the same for SP1 and SP2 and they are treated equal. But, then you can't have several instances of SP2 running simultaneously.
It is not obvious how to implement the locking that would prevent concurrent run of SP1 and SP2, while allowing several instances of SP2 to run simultaneously.
There is another approach that doesn't attempt to prevent concurrent run of SPs, but embraces and expects that simultaneous runs are possible.
One way to do it is to add an IDENTITY column to the staging table. Or an automatically populated datetime if you can guarantee that it is unique and never decreases, which can be tricky. Or rowversion column.
The logic inside SP2 that inserts data into the staging table doesn't change.
The logic inside SP1 that moves data from the staging table into the main table needs to use these identity values.
At first read the current maximum value of identity from the staging table and remember it in a variable, say, #MaxID. All subsequent SELECTs, UPDATEs and DELETEs from the staging table in that SP1 should include a filter WHERE ID <= #MaxID.
This would ensure that if there happen to be a new row added to the staging table while SP1 is running, that row would not be processed and would remain in the staging table until the next run of SP1.
The drawback of this approach is that you can't use TRUNCATE, you need to use DELETE with WHERE ID <= #MaxID.
If you are OK with several instances of SP2 waiting for each other (and SP1), then you can use sp_getapplock similar to the following. I have this code in my stored procedure. You should put this logic into both SP1 and SP2.
I'm not calling sp_releaseapplock explicitly here, because the lock owner is set to Transaction and engine will release the lock automatically when transaction ends.
You don't have to put retry logic in the stored procedure, it can be within external code that runs these stored procedures. In any case, your code should be ready to retry.
CREATE PROCEDURE SP2 -- or SP1
AS
BEGIN
SET NOCOUNT ON;
SET XACT_ABORT ON;
BEGIN TRANSACTION;
BEGIN TRY
-- Maximum number of retries
DECLARE #VarCount int = 10;
WHILE (#VarCount > 0)
BEGIN
SET #VarCount = #VarCount - 1;
DECLARE #VarLockResult int;
EXEC #VarLockResult = sp_getapplock
#Resource = 'StagingTable_app_lock',
-- this resource name should be the same in SP1 and SP2
#LockMode = 'Exclusive',
#LockOwner = 'Transaction',
#LockTimeout = 60000,
-- I'd set this timeout to be about twice the time
-- you expect SP to run normally
#DbPrincipal = 'public';
IF #VarLockResult >= 0
BEGIN
-- Acquired the lock
-- for SP2
-- INSERT INTO StagingTable ...
-- for SP1
-- SELECT FROM StagingTable ...
-- TRUNCATE StagingTable ...
-- don't retry any more
BREAK;
END ELSE BEGIN
-- wait for 5 seconds and retry
WAITFOR DELAY '00:00:05';
END;
END;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
-- log error
END CATCH;
END
This code guarantees that only one procedure is working with the staging table at any given moment. There is no concurrency. All other instances will wait.
Obviously, if you try to access the staging table not through these SP1 or SP2 (which try to acquire the lock first), then such access will not be blocked.
Is there a way to lock the staging table while we're copying the data into the final table so that the SP (called from the ETL pipeline) trying to write to it will just wait until the lock is release? Is this achievable by using transactions or maybe some manual lock commands?
It looks you are searching for a mechanism that is wider than a transaction level. SQL Server/Azure SQL DB has one and it is called application lock:
sp_getapplock
Places a lock on an application resource.
Locks placed on a resource are associated with either the current transaction or the current session. Locks associated with the current transaction are released when the transaction commits or rolls back.Locks associated with the session are released when the session is logged out. When the server shuts down for any reason, all locks are released.
Locks can be explicitly released with sp_releaseapplock. When an application calls sp_getapplock multiple times for the same lock resource, sp_releaseapplock must be called the same number of times to release the lock. When a lock is opened with the Transaction lock owner, that lock is released when the transaction is committed or rolled back.
It basically means that your ETL Tool should open single session to DB, acquire the lock and release when finished. Other sessions before trying to do anything should try to acquire the lock(they cannot because it already taken), wait until when it released and continue to work.
Assuming you have a single outbound job
Add an OutboundProcessing BIT DEFAULT 0 to the table
In the job, SET OutboundProcessing = 1 WHERE OutboundProcessing = 0 (claim the rows)
For the ETL, incorporate WHERE OutboundProcessing = 1 in the query that sources the data (transfer the rows)
After the ETL, DELETE FROM TABLE WHERE OutboundProcessing = 1 (remove the rows you transferred)
If the ETL fails, SET OutboundProcessing = 0 WHERE OutboundProcessing = 1
I always prefer to "ID" each file I receive. If you can do this, you can associate the records from a given file throughout your load process. You haven't called out a need for this, but jus sayin.
However, with each file having an identity (just a int/bigint identity value should do) you can then dynamically create as many load tables as you like from a "template" load table.
When a file arrives, create a new load table named with the ID of the file.
Process your data from load to final table.
drop the load table for the file being processed.
This is somewhat similar to the other solution about using 2 tables (load and stage) but even in that solution you are still limited to having 2 files "loaded" (your still only applying one file to the final table though?)
Last, it is not clear if your "Elastic Job" is detached from the actual "load" pipeline/processing or if it is included. Being a job, I assume it is not included, if a job, you can only run a single instance at time? So its not clear why it's important to load multiple files at once if you can only move one from load to final at a time. Why the rush to get files into load?

SQL Server transaction isolation problem - global variable

SQL Server 2008 R2 (Data Center edition - I think)
I have a very specific requirement for the database.
I need to insert a row marked with timestamp [ChangeTimeStamp]. Timestamp value is passed as a parameter. Timestamp has to be unique.
Two processes can insert values at the same time, and I happen to run into duplicate key insertion once in a while. To avoid this, I am trying:
declare #maxChangeStamp bigint
set transaction isolation level read committed
begin transaction
select #maxChangeStamp = MAX(MaxChangeTimeStamp) from TSMChangeTimeStamp
if (#maxChangeStamp > #changeTimeStamp)
set #maxChangeStamp = #maxChangeStamp + 1
else
set #maxChangeStamp = #changeTimeStamp
update TSMChangeTimeStamp
set MaxChangeTimeStamp = #maxChangeStamp
commit
set #changeTimeStamp = #maxChangeStamp
insert statment
REPEATABLE READ - causes deadlock
READ COMMITTED - causes duplicate key inserts
#changeTimeStamp is my parameter.
TSMChangeTimeStamp holds only one value.
If anyone has a good idea how to solve this I will appreciate any help.
You don't read-increment-update, this will fail no matter what you try. Alway update and use the OUTPUT clause to the new value:
update TSMChangeTimeStamp
set MaxChangeTimeStamp += 1
output inserted.MaxChangeTimeStamp;
You can capture the output value if you need it in T-SQL. But although this will do what you're asking, you most definitely do not want to do this, specially on a system that is high end enough to run DC edition. Generating the next timestamp will place an X lock on the timestamp resource, and thus will prevent every other transaction from generating a new timestamp until the current transaction commits. You achieve complete serialization of work with only one transaction being active at a moment. The performance will tank to the bottom of the abyss.
You must revisit your requirement and come up with a more appropriate one. As it is now your requirement can also be expressed as 'My system is too fast, how can I make is really really really slow?'.
Inside the transaction, the SELECT statement will acquire a shared lock if the mode is not READ COMMITTED or snapshot isolation. If two processes both start the SELECT at the same time, they will both acquire a shared lock.
Later, the UPDATE statement attempts to acquire an exclusive lock (or update lock). Unfortunately, neither one can acquire an exclusive lock, because the other process has a shared lock.
Try using the WITH (UPDLOCK) table hint on the SELECT statement. From MSDN:
UPDLOCK
Specifies that update locks are to be taken and held until the
transaction completes. UPDLOCK takes update locks for read operations
only at the row-level or page-level. If UPDLOCK is combined with
TABLOCK, or a table-level lock is taken for some other reason, an
exclusive (X) lock will be taken instead.
When UPDLOCK is specified, the READCOMMITTED and READCOMMITTEDLOCK
isolation level hints are ignored. For example, if the isolation level
of the session is set to SERIALIZABLE and a query specifies (UPDLOCK,
READCOMMITTED), the READCOMMITTED hint is ignored and the transaction
is run using the SERIALIZABLE isolation level.
For example:
begin transaction
select #maxChangeStamp = MAX(MaxChangeTimeStamp) from TSMChangeTimeStamp with (updlock)
Note that update locks may be promoted to a table lock if there is no index for your table (Microsoft KB article 179362).
Explicitly requesting an XLOCK may also work.
Also note your UPDATE statement does not have a WHERE clause. This causes the UPDATE to lock and update every record in the table (if applicable in your case).

SQL Server deadlock issue

I am using SQL Server 2008 Enterprise. I am wondering whether dead lock issue is only caused by cross dependencies (e.g. task A has lock on L1 but waits on lock on L2, and at the same time, task B has lock on L2 but waits on lock on L1)? Are there any other reasons and scenarios which will cause deadlock?
Are there any other way which will causes dead lock -- e.g. timeout (a S/I/D/U statement do not return for a very long time, and deadlock error will be returned) or can not acquire lock for a long time but not caused by cross-dependencies (e.g. task C needs to get lock on table T, but another task D acquire the lock on table T without releasing the lock, which causes task C not be able to get lock on table T for a long time)?
EDIT 1: will this store procedure cause deadlock if executed by multiple threads at the same time?
create PROCEDURE [dbo].[FooProc]
(
#Param1 int
,#Param2 int
,#Param3 int
)
AS
DELETE FooTable WHERE Param1 = #Param1
INSERT INTO FooTable
(
Param1
,Param2
,Param3
)
VALUES
(
#Param1
,#Param2
,#Param3
)
DECLARE #ID bigint
SET #ID = ISNULL(##Identity,-1)
IF #ID > 0
BEGIN
SELECT IdentityStr FROM FooTable WHERE ID = #ID
END
thanks in advance,
George
Deadlocks require a cycle where resources are locked by processes that are waiting on locks held by other processes to release the locks. Any number of processes can participate in a deadlock, and the normal method for detecting deadlocks is to take a graph of the dependencies on the locks and search for cycles in that graph.
You need to have that cycle for a deadlock to exist. Anything else is just a process held up waiting for a lock to be released. A quick way to see what processes are being blocked by others is sp_who2.
If you want to troubleshoot deadlocks, the best way is to run a trace, picking up 'deadlock graph' events. This will allow you to see what's going on by telling you what queries are holding the locks.
Also there are conversion deadlocks: both processes A and B have shared locks on resource C. Both want to get exclusive locks on C.
Even if two processes compete on only one resource, they still can embrace in a deadlock. The following scripts reproduce such a scenario. In one tab, run this:
CREATE TABLE dbo.Test ( i INT ) ;
GO
INSERT INTO dbo.Test
( i )
VALUES ( 1 ) ;
GO
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE ;
BEGIN TRAN
SELECT i
FROM dbo.Test ;
--UPDATE dbo.Test SET i=2 ;
After this script has completed, we have an outstanding transaction holding a shared lock. In another tab, let us have that another connection have a shared lock on the same resource:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE ;
BEGIN TRAN
SELECT i
FROM dbo.Test ;
--UPDATE dbo.Test SET i=2 ;
This script completes and renders a result set, just like the first script did. Now let us highlight and execute the commented update commands in both tabs. To perform an update, each connection needs an exclusive lock. Neither connection can acquire that exclusive lock, because the other one is holding a shared lock. Although both connections are competing on only one resource, they have embraced in a conversion deadlock:
Msg 1205, Level 13, State 56, Line 1
Transaction (Process ID 59) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Also note that more than two connections may embrace in a deadlock.
Simply not releasing a lock for a long time is not a deadlock.
A deadlock is a situation where you can never go forward. It's caused by 2 (or more) processes that are waiting for others to finish but all those involved are holding a lock that is preventing the other(s) from continuing.
The only way out of a deadlock is to kill processes to free the locks as it doesn't matter how long you wait, it can not complete on it's own.

Does SQL Server wrap Select...Insert Queries into an implicit transaction?

When I perform a select/Insert query, does SQL Server automatically create an implicit transaction and thus treat it as one atomic operation?
Take the following query that inserts a value into a table if it isn't already there:
INSERT INTO Table1 (FieldA)
SELECT 'newvalue'
WHERE NOT EXISTS (Select * FROM Table1 where FieldA='newvalue')
Is there any possibility of 'newvalue' being inserted into the table by another user between the evaluation of the WHERE clause and the execution of the INSERT clause if I it isn't explicitly wrapped in a transaction?
You are confusing between transaction and locking. Transaction reverts your data back to the original state if there is any error. If not, it will move the data to the new state. You will never ever have your data in an intermittent state when the operations are transacted. On the other hand, locking is the one that allows or prevents multiple users from accessing the data simultaneously. To answer your question, select...insert is atomic and as long as no granular locks are explicitly requested, no other user will be able to insert while select..insert is in progress.
John, the answer to this depends on your current isolation level. If you're set to READ UNCOMMITTED you could be looking for trouble, but with a higher isolation level, you should not get additional records in the table between the select and insert. With a READ COMMITTED (the default), REPEATABLE READ, or SERIALIZABLE isolation level, you should be covered.
Using SSMS 2016, it can be verified that the Select/Insert statement requests a lock (and so most likely operates atomically):
Open a new query/connection for the following transaction and set a break-point on ROLLBACK TRANSACTION before starting the debugger:
BEGIN TRANSACTION
INSERT INTO Table1 (FieldA) VALUES ('newvalue');
ROLLBACK TRANSACTION --[break-point]
While at the above break-point, execute the following from a separate query window to show any locks (may take a few seconds to register any output):
SELECT * FROM sys.dm_tran_locks
WHERE resource_database_id = DB_ID()
AND resource_associated_entity_id = OBJECT_ID(N'dbo.Table1');
There should be a single lock associated to the BEGIN TRANSACTION/INSERT above (since by default runs in an ISOLATION LEVEL of READ COMMITTED)
OBJECT ** ********** * IX LOCK GRANT 1
From another instance of SSMS, open up a new query and run the following (while still stopped at the above break-point):
INSERT INTO Table1 (FieldA)
SELECT 'newvalue'
WHERE NOT EXISTS (Select * FROM Table1 where FieldA='newvalue')
This should hang with the string "(Executing)..." being displayed in the tab title of the query window (since ##LOCK_TIMEOUT is -1 by default).
Re-run the query from Step 2.
Another lock corresponding to the Select/Insert should now show:
OBJECT ** ********** 0 IX LOCK GRANT 1
OBJECT ** ********** 0 IX LOCK GRANT 1
ref: How to check which locks are held on a table

Resources