SQL Server - Is there any such thing called 'dirty write'? - sql-server

Does SQL Server allow a transaction to modify the data that is currently being modified by another transaction but hasn't yet been committed? Is this possible under any of the isolation levels, let's say READ UNCOMMITTED since that is the least restrictive? Or does it completely prevent that from happening? Would you call that a 'dirty write' if it is possible?

Any RDBMS providing transactions and atomicity of transactions cannot allow dirty writes.
SQL Server must ensure that all writes can be rolled back. This goes even for a single statement because even a single statement can cause many writes and run for hours.
Imagine a row was written but needed to be rolled back. But meanwhile another write happened to that row that is already committed. Now we cannot roll back because that would violate the durability guarantee provided to the other transaction: the write would be lost. (It would possibly also violate the atomicity guarantee provided to that other transaction, if the row to be rolled back was one of several of its written rows).
The only solution is to always stabilize written but uncommitted data using X-locks.
SQL Server never allows dirty writes or lost writes.

No, you can't unless you update in the same transaction. Setting the Isolation Level to Read Uncommitted will only work to read the data from the table even if it has not been committed but you can't update it.
The Read Uncommitted Isolation Level and the nolock table hint will be ignored for update or delete statements and it will wait until the transaction is committed.

Dirty write didn't occur in SQL Server according to my experiment with READ UNCOMMITTED which is the most loose isolation level. *Basically, dirty write is not allowed with all isolation levels in many databases.
Dirty write is that a transaction updates or deletes (overwrites) the uncommitted data which other transactions insert, update or delete.
I experimented dirty write with MSSQL(SQL Server) and 2 command prompts.
First, I set READ UNCOMMITTED isolation level:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
Then, I created person table with id and name as shown below.
person table:
id
name
1
John
2
David
Now, I did these steps below with MSSQL queries. *I used 2 command prompts:
Flow
Transaction 1 (T1)
Transaction 2 (T2)
Explanation
Step 1
BEGIN TRAN;GO;
T1 starts.
Step 2
BEGIN TRAN;GO;
T2 starts.
Step 3
UPDATE person SET name = 'Tom' WHERE id = 2;GO;
T1 updates David to Tom so this row is locked by T1 until T1 commits.
Step 4
UPDATE person SET name = 'Lisa' WHERE id = 2;GO;
T2 cannot update Tom to Lisa before T1 commits because this row is locked by T1 so to update this row, T2 needs to wait for T1 to unlock this row by commit.*Dirty write is not allowed.
Step 5
COMMIT;GO;
Waiting...
T1 commits.
Step 6
UPDATE person SET name = 'Lisa' WHERE id = 2;GO;
Now, T2 can update Tom to Lisa because T1 has already committed(unlocked this row).
Step 7
COMMIT;GO;
T2 commits.

Related

PostgreSQL's Repeatable Read Allows Phantom Reads But its document says that it does not allow

I have a problem with Postgresql repeatable read isolation level.
I did make an experiment about repeatable read isolation level's behavior when phantom read occurred.
Postgresql's manual says "The table also shows that PostgreSQL's Repeatable Read implementation does not allow phantom reads."
But phantom read occurred;
CREATE TABLE public.testmodel
(
id bigint NOT NULL
);
--Session 1 --
BEGIN TRANSACTION ISOLATION LEVEL Repeatable Read;
INSERT INTO TestModel(ID)
VALUES (10);
Select sum(ID)
From TestModel
where ID between 1 and 100;
--COMMIT;
--Session 2--
BEGIN TRANSACTION ISOLATION LEVEL Repeatable Read;
INSERT INTO TestModel(ID)
VALUES (10);
Select sum(ID)
From TestModel
where ID between 1 and 100;
COMMIT;
Steps I followed;
Create Table
Run session 1 (I commented commit statement)
Run session 2
Run commit statement in session 1.
To my surprise, both of them (session 1, session 2) worked without any exceptions.
As far as I understand from the document. It shouldn't have been.
I was expecting session 1 throw exception, when committing it after session 2.
What is the reason of this? I am confused.
The docs you referenced define a "phantom read" as a situation where:
A transaction re-executes a query returning a set of rows
that satisfy a search condition and finds that the set of rows
satisfying the condition has changed due to another recently-committed
transaction.
In other words, a phantom read has occurred if you run the same query twice (or two queries seeking the same data), and you get different results. The REPEATABLE READ isolation level prevents this from happening, i.e. if you repeat the same read, you will get the same answer. It does not guarantee that either of those results reflects the current state of the database.
Since you are only reading data once in each transaction, this cannot be an example of a phantom read. It falls under the more general category of a "serialization anomaly", i.e. behaviour which could not occur if the transactions were executed serially. This type of anomaly is only avoided at the SERIALIZABLE isolation level.
There is an excellent set of examples on the Postgres wiki, describing anomalies which are allowed under REPEATABLE READ, but prevented under SERIALIZABLE isolation:
https://wiki.postgresql.org/wiki/SSI
Phantom read doesn't occur in REPEATABLE READ in PostgreSQL as the documentation says. *I explain more about phantom read in my answer on Stack Overflow .
I experimented if phantom read occurs in REPEATABLE READ in PostgreSQL with 2 command prompts.
First, to set REPEATABLE READ, I rans the query below and log out and log in again:
ALTER DATABASE postgres SET DEFAULT_TRANSACTION_ISOLATION TO 'repeatable read';
And, I created person table with id and name as shown below.
person table:
id
name
1
John
2
David
Then, I did these steps below with PostgreSQL queries:
Flow
Transaction 1 (T1)
Transaction 2 (T2)
Explanation
Step 1
BEGIN;
T1 starts.
Step 2
BEGIN;
T2 starts.
Step 3
SELECT * FROM person;1 John2 David
T1 reads 2 rows.
Step 4
INSERT INTO person VALUES (3, 'Tom');
T2 inserts the row with 3 and Tom to person table.
Step 5
COMMIT;
T2 commits.
Step 6
SELECT * FROM person;1 John2 David
T1 still reads 2 rows instead of 3 rows after T2 commits.*Phantom read doesn't occur!!
Step 7
COMMIT;
T1 commits.
You are misunderstanding what "does not allow phantom reads" means.
This simply means that phantom reads can not happen, not that there will be an error.
Session 2 will not see any committed changes to the table until the transaction from session 2 is committed as well.
repeatable read guarantees a consistent state of the database for the duration of the transaction where only changes made by that transaction itself will be visible, but no other changes. There is no need to throw an error.

Lost update in snapshot vs all the rest isolation levels

Let's suppose we use create new table and enable snapshot isolation for our database:
alter database database_name set allow_snapshot_isolation on
create table marbles (id int primary key, color char(5))
insert marbles values(1, 'Black') insert marbles values(2, 'White')
Next, in session 1 begin a snaphot transaction:
set transaction isolation level snapshot
begin tran
update marbles set color = 'Blue' where id = 2
Now, before committing the changes, run the following in session 2:
set transaction isolation level snapshot
begin tran
update marbles set color = 'Yellow' where id = 2
Then, when we commit session 1, session 2 will fail with an error about transaction aborted - I understand that is preventing from lost update.
If we follow this steps one by one but with any other isolation level such as: serializable, repeatable read, read committed or read uncommitted this Session 2 will get executed making new update to our table.
Could someone please explain my why is this happening?
For me this is some kind of lost update, but it seems like only snapshot isolation is preventing from it.
Could someone please explain my why is this happening?
Because under all the other isolation levels the point-in-time at which the second session first sees the row is after the first transaction commits. Locking is a kind of time travel. A session enters a lock wait and is transported forward in time to when the resource is eventually available.
For me this is some kind of lost update
No. It's not. Both updates were properly completed, and the final state of the row would have been the same if the transactions had been 10 minutes apart.
In a lost update scenario, each session will read the row before attempting to update it, and the results of the first transaction are needed to properly complete the second transaction. EG if each is incrementing a column by 1.
And under locking READ COMMITTED, REPEATABLE READ, and SERIALIZABLE the SELECT would be blocked, and no lost update would occur. And under READ_COMMITTED_SNAPSHOT the SELECT should have a UPDLOCK hint, and it would block too.

In SQL Server, how can I lock a single row in a way similar to Oracle's "SELECT FOR UPDATE WAIT"?

I have a program that connects to an Oracle database and performs operations on it. I now want to adapt that program to also support an SQL Server database.
In the Oracle version, I use "SELECT FOR UPDATE WAIT" to lock specific rows I need. I use it in situations where the update is based on the result of the SELECT and other sessions can absolutely not modify it simultaneously, so they must manually lock it first. The system is highly subject to sessions trying to access the same data at the same time.
For example:
Two users try to fetch the row in the database with the highest priority, mark it as busy, performs operations on it, and mark it as available again for later use.
In Oracle, the logic would go basically like this:
BEGIN TRANSACTION;
SELECT ITEM_ID FROM TABLE_ITEM WHERE ITEM_PRIORITY > 10 AND ITEM_CATEGORY = 'CT1'
ITEM_STATUS = 'available' AND ROWNUM = 1 FOR UPDATE WAIT 5;
UPDATE [locked item_id] SET ITEM_STATUS = 'unavailable';
COMMIT TRANSACTION;
Note that the queries are built dynamically in my code. Also note that when the previously most favorable row is marked as unavailable, the second user will automatically go for the next one and so on. Furthermore, different users working on different categories will not have to wait for each other's locks to be released. Worst comes to worst, after 5 seconds, an error would be returned and the operation would be cancelled.
So finally, the question is: how do I achieve the same results in SQL Server? I have been looking at locking hints which, in theory, seem like they should work. However, the only locks that prevents other locks are "UPDLOCK" AND "XLOCK" which both only work at a table level.
Those locking hints that do work at a row level are all shared locks, which also do not satisfy my needs (both users could lock the same row at the same time, both mark it as unavailable and perform redundant operations on the corresponding item).
Some people seem to add a "time modified" column so sessions can verify that they are the ones who modified it, but this sounds like there would be a lot of redundant and unnecessary accesses.
You're probably looking forwith (updlock, holdlock). This will make a select grab an exclusive lock, which is required for updates, instead of a shared lock. The holdlock hint tells SQL Server to keep the lock until the transaction ends.
FROM TABLE_ITEM with (updlock, holdlock)
As documentation sayed:
XLOCK
Specifies that exclusive locks are to be taken and held until the
transaction completes. If specified with ROWLOCK, PAGLOCK, or TABLOCK,
the exclusive locks apply to the appropriate level of granularity.
So solution is using WITH(XLOCK, ROWLOCK):
BEGIN TRANSACTION;
SELECT ITEM_ID
FROM TABLE_ITEM
WITH(XLOCK, ROWLOCK)
WHERE ITEM_PRIORITY > 10 AND ITEM_CATEGORY = 'CT1' AND ITEM_STATUS = 'available' AND ROWNUM = 1;
UPDATE [locked item_id] SET ITEM_STATUS = 'unavailable';
COMMIT TRANSACTION;
In SQL Server there are locking hints but they do not span their statements like the Oracle example you provided. The way to do it in SQL Server is to set an isolation level on the transaction that contains the statements that you want to execute. See this MSDN page but the general structure would look something like:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
select * from ...
update ...
COMMIT TRANSACTION;
SERIALIZABLE is the highest isolation level. See the link for other options. From MSDN:
SERIALIZABLE Specifies the following:
Statements cannot read data that has been modified but not yet
committed by other transactions.
No other transactions can modify data that has been read by the
current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would
fall in the range of keys read by any statements in the current
transaction until the current transaction completes.
Have you tried WITH (ROWLOCK)?
BEGIN TRAN
UPDATE your_table WITH (ROWLOCK)
SET your_field = a_value
WHERE <a predicate>
COMMIT TRAN

SQL Server transaction isolation problem - global variable

SQL Server 2008 R2 (Data Center edition - I think)
I have a very specific requirement for the database.
I need to insert a row marked with timestamp [ChangeTimeStamp]. Timestamp value is passed as a parameter. Timestamp has to be unique.
Two processes can insert values at the same time, and I happen to run into duplicate key insertion once in a while. To avoid this, I am trying:
declare #maxChangeStamp bigint
set transaction isolation level read committed
begin transaction
select #maxChangeStamp = MAX(MaxChangeTimeStamp) from TSMChangeTimeStamp
if (#maxChangeStamp > #changeTimeStamp)
set #maxChangeStamp = #maxChangeStamp + 1
else
set #maxChangeStamp = #changeTimeStamp
update TSMChangeTimeStamp
set MaxChangeTimeStamp = #maxChangeStamp
commit
set #changeTimeStamp = #maxChangeStamp
insert statment
REPEATABLE READ - causes deadlock
READ COMMITTED - causes duplicate key inserts
#changeTimeStamp is my parameter.
TSMChangeTimeStamp holds only one value.
If anyone has a good idea how to solve this I will appreciate any help.
You don't read-increment-update, this will fail no matter what you try. Alway update and use the OUTPUT clause to the new value:
update TSMChangeTimeStamp
set MaxChangeTimeStamp += 1
output inserted.MaxChangeTimeStamp;
You can capture the output value if you need it in T-SQL. But although this will do what you're asking, you most definitely do not want to do this, specially on a system that is high end enough to run DC edition. Generating the next timestamp will place an X lock on the timestamp resource, and thus will prevent every other transaction from generating a new timestamp until the current transaction commits. You achieve complete serialization of work with only one transaction being active at a moment. The performance will tank to the bottom of the abyss.
You must revisit your requirement and come up with a more appropriate one. As it is now your requirement can also be expressed as 'My system is too fast, how can I make is really really really slow?'.
Inside the transaction, the SELECT statement will acquire a shared lock if the mode is not READ COMMITTED or snapshot isolation. If two processes both start the SELECT at the same time, they will both acquire a shared lock.
Later, the UPDATE statement attempts to acquire an exclusive lock (or update lock). Unfortunately, neither one can acquire an exclusive lock, because the other process has a shared lock.
Try using the WITH (UPDLOCK) table hint on the SELECT statement. From MSDN:
UPDLOCK
Specifies that update locks are to be taken and held until the
transaction completes. UPDLOCK takes update locks for read operations
only at the row-level or page-level. If UPDLOCK is combined with
TABLOCK, or a table-level lock is taken for some other reason, an
exclusive (X) lock will be taken instead.
When UPDLOCK is specified, the READCOMMITTED and READCOMMITTEDLOCK
isolation level hints are ignored. For example, if the isolation level
of the session is set to SERIALIZABLE and a query specifies (UPDLOCK,
READCOMMITTED), the READCOMMITTED hint is ignored and the transaction
is run using the SERIALIZABLE isolation level.
For example:
begin transaction
select #maxChangeStamp = MAX(MaxChangeTimeStamp) from TSMChangeTimeStamp with (updlock)
Note that update locks may be promoted to a table lock if there is no index for your table (Microsoft KB article 179362).
Explicitly requesting an XLOCK may also work.
Also note your UPDATE statement does not have a WHERE clause. This causes the UPDATE to lock and update every record in the table (if applicable in your case).

Does SQL Server wrap Select...Insert Queries into an implicit transaction?

When I perform a select/Insert query, does SQL Server automatically create an implicit transaction and thus treat it as one atomic operation?
Take the following query that inserts a value into a table if it isn't already there:
INSERT INTO Table1 (FieldA)
SELECT 'newvalue'
WHERE NOT EXISTS (Select * FROM Table1 where FieldA='newvalue')
Is there any possibility of 'newvalue' being inserted into the table by another user between the evaluation of the WHERE clause and the execution of the INSERT clause if I it isn't explicitly wrapped in a transaction?
You are confusing between transaction and locking. Transaction reverts your data back to the original state if there is any error. If not, it will move the data to the new state. You will never ever have your data in an intermittent state when the operations are transacted. On the other hand, locking is the one that allows or prevents multiple users from accessing the data simultaneously. To answer your question, select...insert is atomic and as long as no granular locks are explicitly requested, no other user will be able to insert while select..insert is in progress.
John, the answer to this depends on your current isolation level. If you're set to READ UNCOMMITTED you could be looking for trouble, but with a higher isolation level, you should not get additional records in the table between the select and insert. With a READ COMMITTED (the default), REPEATABLE READ, or SERIALIZABLE isolation level, you should be covered.
Using SSMS 2016, it can be verified that the Select/Insert statement requests a lock (and so most likely operates atomically):
Open a new query/connection for the following transaction and set a break-point on ROLLBACK TRANSACTION before starting the debugger:
BEGIN TRANSACTION
INSERT INTO Table1 (FieldA) VALUES ('newvalue');
ROLLBACK TRANSACTION --[break-point]
While at the above break-point, execute the following from a separate query window to show any locks (may take a few seconds to register any output):
SELECT * FROM sys.dm_tran_locks
WHERE resource_database_id = DB_ID()
AND resource_associated_entity_id = OBJECT_ID(N'dbo.Table1');
There should be a single lock associated to the BEGIN TRANSACTION/INSERT above (since by default runs in an ISOLATION LEVEL of READ COMMITTED)
OBJECT ** ********** * IX LOCK GRANT 1
From another instance of SSMS, open up a new query and run the following (while still stopped at the above break-point):
INSERT INTO Table1 (FieldA)
SELECT 'newvalue'
WHERE NOT EXISTS (Select * FROM Table1 where FieldA='newvalue')
This should hang with the string "(Executing)..." being displayed in the tab title of the query window (since ##LOCK_TIMEOUT is -1 by default).
Re-run the query from Step 2.
Another lock corresponding to the Select/Insert should now show:
OBJECT ** ********** 0 IX LOCK GRANT 1
OBJECT ** ********** 0 IX LOCK GRANT 1
ref: How to check which locks are held on a table

Resources