I'm starting to work with SQL Server database and I'm having a hard time trying to understand Transaction Isolation Levels and how they lock data.
I'm trying to accomlish the following simple task:
Accept a pair of integers [ID, counter] in a SQL stored procedure
Determine whether ID exists in a certain table: SELCT COUNT(*) FROM MyTable WHERE Id = {idParam}
If the previous COUNT statement returns 0, insert this ID and counter:
INSERT INTO MyTable(Id, Counter) VALUES({idParam}, {counterParam})
If the COUNT statement returns 1, update the existing record: UPDATE MyTable SET Counter = Counter + {counterParam} WHERE Id = {idParam}
Now, I understand I have to wrap this whole stored procedure in a transaction, and according to this MS article the appropriate isolation level would be SERIALIZABLE (it says: No other transactions can modify data that has been read by the current transaction until the current transaction completes). Please correct me if I'm wrong here.
Suppose I called the procedure with ID=1, so the first query woluld be SELCT COUNT(*) FROM MyTable WHERE SomeId=1 (1st transaction began). Then, immediately after this query was executed, the procedure is called with ID=2 (2nd transaction began).
What I fail to understand is how much data would be locked during the execution of my stored procedure in this case:
If the 1st query of the 1st transaction returns 0 records, does this mean that 1st transaction locks nothing and other transactions are able to INSERT ID=1 before 1st transaction tries it?
Or does the 1st transaction lock the whole table making the 2nd transaction wait even though those 2 transactions can never try to read/update the same row?
Or does 1st transaction somehow forbid anyone else to read/write only records with ID=1 until it is comleted?
If your filter is on an index, that's what's going to get locked. So regardless of whether the row already exists or not, it's locked for the duration of the transaction. Take care, though - it's very easy to turn a row lock into something nastier, especially full table locks. And of course, it's easy to introduce deadlocks this way :)
However, I'd suggest a different approach. First, try to do an insert. If it works, you're done - if it doesn't, you know you can safely do an atomic update. Very fast, very cheap, very reliable :)
Related
I have a database table with thousands of entries. I have multiple worker threads which pick up one row at a time, does some work (takes roughly one second each). While picking up the row, each thread updates a flag on the database row (like a timestamp) so that the other threads do not pick it up. But the problem is that I end up in a scenario where multiple threads are picking up the same row.
My general question is that what general design approach should I follow here to ensure that each thread picks up unique rows and does their task independently.
Note : Multiple threads are running in parallel to hasten the processing of the database rows. So I would like to have a as small as possible critical segment or exclusive lock.
Just to give some context, below is the stored proc which picks up the rows from the table after it has updated the flag on the row. Please note that the stored proc is not compilable as I have removed unnecessary portions from it. But generally that's the structure of it.
The problem happens when multiple threads execute the stored proc in parallel. The change made by the update statement (note that the update is done after taking up a lock) in one thread is not visible to the other thread unless the transaction is committed. And as there is a SELECT statement (which takes around 50ms) between the UPDATE and the TRANSACTION COMMIT, on 20% cases the UPDATE statement in a thread picks up a row which has already been processed.
I hope I am clear enough here.
USE ['mydatabase']
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[GetRequest]
AS
BEGIN
-- some variable declaration here
BEGIN TRANSACTION
-- check if there are blocking rows in the request table
-- FM: Remove records that don't qualify for operation.
-- delete operation on the table to remove rows we don't want to process
delete FROM request where somecondition = 1
-- Identify the requests to process
DECLARE #TmpTableVar table(TmpRequestId int NULL);
UPDATE TOP(1) request
WITH (ROWLOCK)
SET Lock = DateAdd(mi, 5, GETDATE())
OUTPUT INSERTED.ID INTO #TmpTableVar
FROM request tur
WHERE (Lock IS NULL OR GETDATE() > Lock) -- not locked or lock expired
AND GETDATE() > NextRetry -- next in the queue
IF(##RowCount = 0)
BEGIN
ROLLBACK TRANSACTION
RETURN
END
select #RequestID = TmpRequestId from #TmpTableVar
-- Get details about the request that has been just updated
SELECT somerows
FROM request
WHERE somecondition = 1
COMMIT TRANSACTION
END
The analog of a critical section in SQL Server is sp_getapplock, which is simple to use. Alternatively you can SELECT the row to update with (UPDLOCK,READPAST,ROWLOCK) table hints. Both of these require a multi-statement transaction to control the duration of the exclusive locking.
You need start a transaction isolation level on sql for isolation your line, but this can impact on your performance.
Look the sample:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
GO
BEGIN TRANSACTION
GO
SELECT ID, NAME, FLAG FROM SAMPLE_TABLE WHERE FLAG=0
GO
UPDATE SAMPLE_TABLE SET FLAG=1 WHERE ID=1
GO
COMMIT TRANSACTION
Finishing, not exist a better way for use isolation level. You need analyze the positive and negative point for each level isolation and test your system performance.
More information:
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql
http://www.besttechtools.com/articles/article/sql-server-isolation-levels-by-example
https://en.wikipedia.org/wiki/Isolation_(database_systems)
I am dubbing some SP at work and I have discover that whoever wrote the code used a transaction on a single update statement like this
begin transaction
*single update statment:* update table whatever with whatever
commit transaction
I understand that this is wrong because transaction is used when you want to update multiple updates.
I want to understand from the theoretical point, what are the implications of using the code as above?
Is there any difference in updating the whatever table with and without the transaction? Are there any extra locks or something?
Perhaps the transaction was included due to prior or possible future code which may involve other data. Perhaps that developer simply makes a habit of wrapping code in transactions, to be 'safe'?
But if the statement literally involves only a single update to a single row, there really is no benefit to that code being there in this case. A transaction does not necessarily 'lock' anything, though the actions performed inside it may, of course. It just makes sure that all the actions contained therein are performed all-or-nothing.
Note that a transaction is not about multiple tables, it's about multiple updates. It assures multiple updates happen all-or-none.
So if you were updating the same table twice, there would be a difference with or without the transaction. But your example shows only a single update statement, presumably updating only a single record.
In fact, it's probably pretty common that transactions encapsulate multiple updates to the same table. Imagine the following:
INSERT INTO Transactions (AccountNum, Amount) VALUES (1, 200)
INSERT INTO Transactions (AccountNum, Amount) values (2, -200)
That should be wrapped into a transaction, to assure that the money is transferred correctly. If one fails, so must the other.
I understand that this is wrong because transaction is used when you want to update multiple tables.
Not necessarily. This involves one table only - and just 2 rows:
--- transaction begin
BEGIN TRANSACTION ;
UPDATE tableX
SET Balance = Balance + 100
WHERE id = 42 ;
UPDATE tableX
SET Balance = Balance - 100
WHERE id = 73 ;
COMMIT TRANSACTION ;
--- transaction end
Hopefully your colleague's code looks more like this, otherwise SQL will issue a syntax error.
As per Ypercube's comment, there is no real purpose in placing one statement inside a transaction, but possibly this is a coding standard or similar.
begin transaction -- Increases ##TRANCOUNT to 1
update table whatever with whatever
commit transaction -- DECREMENTS ##TRANCOUNT to 0
Often, when issuing adhoc statements directly against SQL, it is a good idea to wrap your statements in a transaction, just in case something goes wrong and you need to rollback, i.e.
begin transaction -- Just in case my query goofs up
update table whatever with whatever
select ... from table ... -- check that the correct updates / deletes / inserts happened
-- commit transaction -- Only commit if the above check succeeds.
I have a program that connects to an Oracle database and performs operations on it. I now want to adapt that program to also support an SQL Server database.
In the Oracle version, I use "SELECT FOR UPDATE WAIT" to lock specific rows I need. I use it in situations where the update is based on the result of the SELECT and other sessions can absolutely not modify it simultaneously, so they must manually lock it first. The system is highly subject to sessions trying to access the same data at the same time.
For example:
Two users try to fetch the row in the database with the highest priority, mark it as busy, performs operations on it, and mark it as available again for later use.
In Oracle, the logic would go basically like this:
BEGIN TRANSACTION;
SELECT ITEM_ID FROM TABLE_ITEM WHERE ITEM_PRIORITY > 10 AND ITEM_CATEGORY = 'CT1'
ITEM_STATUS = 'available' AND ROWNUM = 1 FOR UPDATE WAIT 5;
UPDATE [locked item_id] SET ITEM_STATUS = 'unavailable';
COMMIT TRANSACTION;
Note that the queries are built dynamically in my code. Also note that when the previously most favorable row is marked as unavailable, the second user will automatically go for the next one and so on. Furthermore, different users working on different categories will not have to wait for each other's locks to be released. Worst comes to worst, after 5 seconds, an error would be returned and the operation would be cancelled.
So finally, the question is: how do I achieve the same results in SQL Server? I have been looking at locking hints which, in theory, seem like they should work. However, the only locks that prevents other locks are "UPDLOCK" AND "XLOCK" which both only work at a table level.
Those locking hints that do work at a row level are all shared locks, which also do not satisfy my needs (both users could lock the same row at the same time, both mark it as unavailable and perform redundant operations on the corresponding item).
Some people seem to add a "time modified" column so sessions can verify that they are the ones who modified it, but this sounds like there would be a lot of redundant and unnecessary accesses.
You're probably looking forwith (updlock, holdlock). This will make a select grab an exclusive lock, which is required for updates, instead of a shared lock. The holdlock hint tells SQL Server to keep the lock until the transaction ends.
FROM TABLE_ITEM with (updlock, holdlock)
As documentation sayed:
XLOCK
Specifies that exclusive locks are to be taken and held until the
transaction completes. If specified with ROWLOCK, PAGLOCK, or TABLOCK,
the exclusive locks apply to the appropriate level of granularity.
So solution is using WITH(XLOCK, ROWLOCK):
BEGIN TRANSACTION;
SELECT ITEM_ID
FROM TABLE_ITEM
WITH(XLOCK, ROWLOCK)
WHERE ITEM_PRIORITY > 10 AND ITEM_CATEGORY = 'CT1' AND ITEM_STATUS = 'available' AND ROWNUM = 1;
UPDATE [locked item_id] SET ITEM_STATUS = 'unavailable';
COMMIT TRANSACTION;
In SQL Server there are locking hints but they do not span their statements like the Oracle example you provided. The way to do it in SQL Server is to set an isolation level on the transaction that contains the statements that you want to execute. See this MSDN page but the general structure would look something like:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
select * from ...
update ...
COMMIT TRANSACTION;
SERIALIZABLE is the highest isolation level. See the link for other options. From MSDN:
SERIALIZABLE Specifies the following:
Statements cannot read data that has been modified but not yet
committed by other transactions.
No other transactions can modify data that has been read by the
current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would
fall in the range of keys read by any statements in the current
transaction until the current transaction completes.
Have you tried WITH (ROWLOCK)?
BEGIN TRAN
UPDATE your_table WITH (ROWLOCK)
SET your_field = a_value
WHERE <a predicate>
COMMIT TRAN
SQL Server 2008 R2 (Data Center edition - I think)
I have a very specific requirement for the database.
I need to insert a row marked with timestamp [ChangeTimeStamp]. Timestamp value is passed as a parameter. Timestamp has to be unique.
Two processes can insert values at the same time, and I happen to run into duplicate key insertion once in a while. To avoid this, I am trying:
declare #maxChangeStamp bigint
set transaction isolation level read committed
begin transaction
select #maxChangeStamp = MAX(MaxChangeTimeStamp) from TSMChangeTimeStamp
if (#maxChangeStamp > #changeTimeStamp)
set #maxChangeStamp = #maxChangeStamp + 1
else
set #maxChangeStamp = #changeTimeStamp
update TSMChangeTimeStamp
set MaxChangeTimeStamp = #maxChangeStamp
commit
set #changeTimeStamp = #maxChangeStamp
insert statment
REPEATABLE READ - causes deadlock
READ COMMITTED - causes duplicate key inserts
#changeTimeStamp is my parameter.
TSMChangeTimeStamp holds only one value.
If anyone has a good idea how to solve this I will appreciate any help.
You don't read-increment-update, this will fail no matter what you try. Alway update and use the OUTPUT clause to the new value:
update TSMChangeTimeStamp
set MaxChangeTimeStamp += 1
output inserted.MaxChangeTimeStamp;
You can capture the output value if you need it in T-SQL. But although this will do what you're asking, you most definitely do not want to do this, specially on a system that is high end enough to run DC edition. Generating the next timestamp will place an X lock on the timestamp resource, and thus will prevent every other transaction from generating a new timestamp until the current transaction commits. You achieve complete serialization of work with only one transaction being active at a moment. The performance will tank to the bottom of the abyss.
You must revisit your requirement and come up with a more appropriate one. As it is now your requirement can also be expressed as 'My system is too fast, how can I make is really really really slow?'.
Inside the transaction, the SELECT statement will acquire a shared lock if the mode is not READ COMMITTED or snapshot isolation. If two processes both start the SELECT at the same time, they will both acquire a shared lock.
Later, the UPDATE statement attempts to acquire an exclusive lock (or update lock). Unfortunately, neither one can acquire an exclusive lock, because the other process has a shared lock.
Try using the WITH (UPDLOCK) table hint on the SELECT statement. From MSDN:
UPDLOCK
Specifies that update locks are to be taken and held until the
transaction completes. UPDLOCK takes update locks for read operations
only at the row-level or page-level. If UPDLOCK is combined with
TABLOCK, or a table-level lock is taken for some other reason, an
exclusive (X) lock will be taken instead.
When UPDLOCK is specified, the READCOMMITTED and READCOMMITTEDLOCK
isolation level hints are ignored. For example, if the isolation level
of the session is set to SERIALIZABLE and a query specifies (UPDLOCK,
READCOMMITTED), the READCOMMITTED hint is ignored and the transaction
is run using the SERIALIZABLE isolation level.
For example:
begin transaction
select #maxChangeStamp = MAX(MaxChangeTimeStamp) from TSMChangeTimeStamp with (updlock)
Note that update locks may be promoted to a table lock if there is no index for your table (Microsoft KB article 179362).
Explicitly requesting an XLOCK may also work.
Also note your UPDATE statement does not have a WHERE clause. This causes the UPDATE to lock and update every record in the table (if applicable in your case).
When I perform a select/Insert query, does SQL Server automatically create an implicit transaction and thus treat it as one atomic operation?
Take the following query that inserts a value into a table if it isn't already there:
INSERT INTO Table1 (FieldA)
SELECT 'newvalue'
WHERE NOT EXISTS (Select * FROM Table1 where FieldA='newvalue')
Is there any possibility of 'newvalue' being inserted into the table by another user between the evaluation of the WHERE clause and the execution of the INSERT clause if I it isn't explicitly wrapped in a transaction?
You are confusing between transaction and locking. Transaction reverts your data back to the original state if there is any error. If not, it will move the data to the new state. You will never ever have your data in an intermittent state when the operations are transacted. On the other hand, locking is the one that allows or prevents multiple users from accessing the data simultaneously. To answer your question, select...insert is atomic and as long as no granular locks are explicitly requested, no other user will be able to insert while select..insert is in progress.
John, the answer to this depends on your current isolation level. If you're set to READ UNCOMMITTED you could be looking for trouble, but with a higher isolation level, you should not get additional records in the table between the select and insert. With a READ COMMITTED (the default), REPEATABLE READ, or SERIALIZABLE isolation level, you should be covered.
Using SSMS 2016, it can be verified that the Select/Insert statement requests a lock (and so most likely operates atomically):
Open a new query/connection for the following transaction and set a break-point on ROLLBACK TRANSACTION before starting the debugger:
BEGIN TRANSACTION
INSERT INTO Table1 (FieldA) VALUES ('newvalue');
ROLLBACK TRANSACTION --[break-point]
While at the above break-point, execute the following from a separate query window to show any locks (may take a few seconds to register any output):
SELECT * FROM sys.dm_tran_locks
WHERE resource_database_id = DB_ID()
AND resource_associated_entity_id = OBJECT_ID(N'dbo.Table1');
There should be a single lock associated to the BEGIN TRANSACTION/INSERT above (since by default runs in an ISOLATION LEVEL of READ COMMITTED)
OBJECT ** ********** * IX LOCK GRANT 1
From another instance of SSMS, open up a new query and run the following (while still stopped at the above break-point):
INSERT INTO Table1 (FieldA)
SELECT 'newvalue'
WHERE NOT EXISTS (Select * FROM Table1 where FieldA='newvalue')
This should hang with the string "(Executing)..." being displayed in the tab title of the query window (since ##LOCK_TIMEOUT is -1 by default).
Re-run the query from Step 2.
Another lock corresponding to the Select/Insert should now show:
OBJECT ** ********** 0 IX LOCK GRANT 1
OBJECT ** ********** 0 IX LOCK GRANT 1
ref: How to check which locks are held on a table