I have a long running function that should be inserting new rows. How do I check the progress of this function?
I was thinking dirty reads would work so I read http://www.postgresql.org/docs/8.4/interactive/sql-set-transaction.html and came up with the following code and ran it in a new session:
SET SESSION CHARACTERISTICS AS SERIALIZABLE;
SELECT * FROM MyTable;
Postgres gives me a syntax error. What am I doing wrong? If I do it right, will I see the inserted records while that long function is still running?
Thanks.
PostgreSQL does not implement a way for you to see this from outside the function, aka READ UNCOMMITTED isolation level. Your basic two options are:
Have the function use RAISE NOTICE every now and then to show you how far along you are
Use something like dblink from the function back to the same database, and update a counter table from there. Since that's a completely separate transaction, the counter will be visible as soon as that transaction commits - you don't have to wait for the main transaction (around the function call) to finish.
For versions up to 9.0: PostgreSQL Transaction Isolation
In PostgreSQL, you can request any of the four standard transaction isolation levels. But internally, there are only two distinct isolation levels, which correspond to the levels Read Committed and Serializable. When you select the level Read Uncommitted you really get Read Committed, and when you select Repeatable Read you really get Serializable, so the actual isolation level might be stricter than what you select. This is permitted by the SQL standard: the four isolation levels only define which phenomena must not happen, they do not define which phenomena must happen.
For versions from 9.1 to current(15): PostgreSQL Transaction Isolation
In PostgreSQL, you can request any of the four standard transaction isolation levels, but internally only three distinct isolation levels are implemented, i.e., PostgreSQL's Read Uncommitted mode behaves like Read Committed. This is because it is the only sensible way to map the standard isolation levels to PostgreSQL's multiversion concurrency control architecture.
Dirty read doesn't occur in PostgreSQL even the isolation level is READ UNCOMMITTED. And, the documentation says below:
PostgreSQL's Read Uncommitted mode behaves like Read Committed.
So, READ UNCOMMITTED has the same characteristics of READ COMMITTED in PostgreSQL different from other databases so in short, READ UNCOMMITTED and READ COMMITTED are the same in PostgreSQL.
And, this table below shows which anomaly occurs in which isolation level in PostgreSQL according to my experiments:
Anomaly
Read Uncommitted
Read Committed
Repeatable Read
Serializable
Dirty Read
No
No
No
No
Non-repeatable Read
Yes
Yes
No
No
Phantom Read
Yes
Yes
No
No
Lost Update
Yes
Yes
No
No
Write Skew(Serialization Anomaly)
Yes
Yes
Yes
No
With SELECT FOR UPDATE:
Anomaly
Read Uncommitted
Read Committed
Repeatable Read
Serializable
Dirty Read
No
No
No
No
Non-repeatable Read
No
No
No
No
Phantom Read
No
No
No
No
Lost Update
No
No
No
No
Write Skew(Serialization Anomaly)
No
No
No
No
Related
Recently I was thinking about query consistency in various SQL and NoSQL databases. What happens, when I have a (long running) query and rows are inserted or updated while the query is running? A simple theoretic example:
Let’s assume the following query takes a long time:
SELECT SUM(salary) FROM emp;
And while this query is running, another transaction does:
UPDATE emp SET salary = salary * 1.05 WHERE salary > 10000;
COMMIT;
When the SUM query has read half of the updated employees before the update, and the other half after the update, I would get an inconsistent nonsense result. Does this phenomenon have a name? By definition, it is not really a phantom read, because just one query is involved.
How do various DBs handle this situation? I am especially interested in SQL Server, MongoDB, RavenDB and Azure Table Storage.
Oracle for example guarantees statement-level read consistency, which says that the data returned by a single query is committed and consistent for a single point in time.
UPDATE: SQL Server seems to only prevent this kind of problem when READ_COMMITTED_SNAPSHOT is set to ON.
I believe the term you're looking for is "Dirty Read"
I can answer this one for SQL server.
You get 5 options for transaction isolation level, where the default is READ COMMITTED.
Only READ UNCOMMITTED allows dirty reads. You'll have to specifically enable that using SET TRANSACTION LEVEL READ UNCOMMITTED.
READ UNCOMMITTED is equivalent to NOLOCK, but syntactically nicer (opinion) as it doesn't need to be repeated for each table in your query.
Possible isolation levels are as below. I've linked the docs for more detail, if future readers find the link stale please edit.
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql
READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SNAPSHOT
SERIALIZABLE
By default (read committed), you get your query and the update is blocked by the shared lock taken by your SELECT, until it completes.
If you enable Read Committed Snapshot Isolation Level (RCSI) as a database option, you continue to see the previous version of the data but the update isn't blocked.
Similarly, if the update was running first, when you have RSCI enabled, it doesn't block you, but you see the data before the update started.
RCSI is generally (but not 100% always) a good thing. I always design with it on. In Azure SQL DB, it's on by default.
The specification for the Repeatable-Read isolation level defines that a transaction with this IL will prevent other transactions from updating any rows that this transaction has read until this transaction has completed. Thus, repeatable reads are guaranteed.
Consider the following order of operations for two concurrent transactions T1 and T2, both using repeatable read IL:
T1: Read row
T2: Read row
T1: Update row
T2: Update row
I think that the update in step 3 would violate the specification for the isolation level, since T2 would read a different value if it read the row again.
The converse can be said for the update in step 4.
So, what different options are available for RDBMSs in general resolve this conflict?
More specifically, how is this handled in SQL Server 2017+?
Will this result in a deadlock since neither transaction can complete its operations?
Or would one transaction be rolled back?
I've seen that Lost Updates are prevented in SQL Server. What does this mean for the resolution of this specific case?
I have perused the answers to these questions:
Repeatable read and lock compatibility table
Repeatable Read - am I understanding this right?
repeatable read and second lost updates issue
MySQL Repeatable Read isolation level and Lost Update phenomena
And although the last one asks a similar question but doesn't include any specific info about how RDBMSs which prevent lost updates for txs with this isolation level handle this case.
I feel like I should know this, but I can't find anything that specifically outlines this, so here goes.
The documentation for SQL Server describes REPEATABLE READ as:
Specifies that statements cannot read data that has been modified but
not yet committed by other transactions and that no other transactions
can modify data that has been read by the current transaction until
the current transaction completes
This makes sense, but what actually happens when one of these situation arises? If, for example, Transaction A reads row 1, and then Transaction B attempts to update row 1, what happens? Does Transaction B wait until Transaction A has finished and then try again? Or is an exception thrown?
REPEATABLE READ takes S-locks on all rows that have been read by query plan operators for the duration of the transaction. The answer to your question follows from that:
If the read comes first it S-locks the row and the write must wait.
If the write comes first the S-lock waits for the write to commit.
Under Hekaton it works differently because there are no locks.
I have read committed snapshot isolation and allow isolation ON for my database. I'm still receiving a deadlock error. I'm pretty sure I know what is happening...
First transaction gets a sequence number at the beginning of its transaction.
Second one gets a later sequence number at the beginning of its transaction, but after the first transaction has already gotten its (second sequence number is more recent than first).
Second transaction makes it to the update statement first. When it checks the row versioning it sees the record that precedes both transactions since the first one hasn't reached the update yet. It finds that the row's sequence number is in a committed state and moves on it's merry way.
The first transaction takes it's turn and like the second transaction finds the same committed sequence number because it won't see the second one because it is newer than itself. When it tries to commit it finds that another transaction has already updated records that are trying to be committed and has to roll itself back.
Here is my question: Will this rollback appear as a deadlock in a trace?
In a comment attached to the original question you said: "I'm just wondering if an update conflict will appear as a deadlock or if it will appear as something different." I actually had exactly these types of concerns when I started looking into using snapshot isolation. Eventually I realized that there is significant difference between READ_COMMITTED_SNAPSHOT and isolation level SNAPSHOT.
The former uses row versioning for reads, but continues to use exclusive locking for writes. So, READ_COMMITTED_SNAPHOT is actually something in between pure pessimistic and pure optimistic concurrency control. Because it uses locks for writing, update conflicts are not possible, but deadlocks are. At least in SQL Server those deadlocks will be reported as deadlocks just as they are with 'normal' pessimistic locking.
The latter (isolation level SNAPSHOT) is pure optimistic concurrency control. Row versioning is used for both reads and writes. Deadlocks are not possible, but update conflicts are. The latter are reported as update conflicts and not as deadlocks.
The snapshot transaction is rolled back, and it receives the following error message:
Msg 3960, Level 16, State 4, Line 1
Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot
isolation to access table 'Test.TestTran' directly or indirectly in database 'TestDatabase' to
update, delete, or insert the row that has been modified or deleted by another transaction.
Retry the transaction or change the isolation level for the update/delete statement.
To prevent deadlock enable both
ALLOW_SNAPSHOT_ISOLATION and READ_COMMITTED_SNAPSHOT
ALTER DATABASE [BD] SET READ_COMMITTED_SNAPSHOT ON;
ALTER DATABASE [BD] SET ALLOW_SNAPSHOT_ISOLATION ON;
here explain the differences
http://technet.microsoft.com/en-us/sqlserver/gg545007.aspx
When using ColdFusion 8 with MSSQL, when tracing, my DBA noticed the cfquery calls are getting appended with SET TRANSACTION ISOLATION LEVEL READ COMMITTED which is not in the query itself. He recommended to remove it or change to uncommitted for performance reasons.
Is this something that ColdFusion is adding and is that by default in ColdFusion and/or MSSQL?
I am using ColdFusion's default MSSQL drivers and I am able to temporary change by using <cftransaction isolation="read_uncommitted"> tag around each of the cfquerys.
Are there any other ways to stop that from being appended in ColdFusion or is cftransaction the best method?
Last question, when using isolation="read_uncommitted" why is it adding SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED before but right after the query adding SET TRANSACTION ISOLATION LEVEL READ COMMITTED?
Thank you in advance.
Committed is the default isolation level for any query to the DB that does not otherwise have an isolation level specified. You are changing it for the duration of your execution and then it reverts back to "committed". The creation of the statement is a part of what goes on "under the hood" as CF and the JDBC Driver work together. Using "read_uncommitted" is faster because it reads without preventing any other connection or query from altering or reading the data. So it opens up the possibility of a "dirty read" (where you are reading uncommitted and therefore possibly incorrect data) but in many cases that's not much of an issue so your DBA could be right.
This is not being interpreted correctly, read_committed is an 'isolation' issue, if some other task has the table open for update/insert/delete the transaction that is in 'read_committed' will be held waiting for locks to be released from the table until transactions ARE committed. If the transaction is set for 'read_uncommitted' it will read directly from the existing data and will NOT wait for the pending update/insert/delete . Hence the term, 'Dirty' meaning that anything pending, not committed will not be returned, but it won't be locked and delayed either.