I'm interested in how I can implement pessimistic locking, with very specific behavior.
(The reason I tagged the question with Sybase+Oracle+MSSQL, is because I'd be happy with a solution or "that's impossible!" for any one of them)
What I want is this:
1 - be able to lock a row (so that process can later do update, but no other process can lock the row)
2 - when another process tries to lock same row it should get notification that record is locked - I don't want this process to hang (I believe simple timeout can be used here)
3 - when another process tries to read record, it should be able to read it the way it is currently in database (but I don't want to use dirty reads).
The above 3 requirements are currently solved by application using shared memory - and performing record-locking outside database. I'd like to move the locking into the database.
So far, I'm having conflicts between #1 and #3 - if I lock record by doing 'update ...' by updating a field to same value, than 'select' from another process hangs.
Edit:
I'm having some luck now with snapshot isolation level on MSSQL. I can do both the locking, and reads without using dirty reads.
The reason I don't want to use dirty-reads, is that if a report is running, it might read multiple tables, and issue multiple queries. Snapshot gives me a consistent snapshot of the datatabase. With a dirty read, I'd have mismatching data - if there were any updates in the middle.
I think Oracle has snapshot as well, so now I'm most interested in Sybase.
In Oracle you can use select for update nowait to lock a record.
select * from tab where id=1234 for update nowait;
If another process try to execute the same statment it gets an exception:
ORA-00054: resource busy and acquire with NOWAIT specified
until the first process(session) performs commit or rollback.
normally, oracle don't permit dirty reads
Your described conflict between #1 and #3 is a logical one: you can either let the database do dirty reads OR you block the reads. If you could read the locked row, it is a dirty read by definition. That has nothing to do with the specific database system you use!
So if you want it that way: Yes, what you want is impossible with all 3 systems because it hurts the definition of "dirty read".
Related
I'm wondering what is the benefit to use SELECT WITH (NOLOCK) on a table if the only other queries affecting that table are SELECT queries.
How is that handled by SQL Server? Would a SELECT query block another SELECT query?
I'm using SQL Server 2012 and a Linq-to-SQL DataContext.
(EDIT)
About performance :
Would a 2nd SELECT have to wait for a 1st SELECT to finish if using a locked SELECT?
Versus a SELECT WITH (NOLOCK)?
A SELECT in SQL Server will place a shared lock on a table row - and a second SELECT would also require a shared lock, and those are compatible with one another.
So no - one SELECT cannot block another SELECT.
What the WITH (NOLOCK) query hint is used for is to be able to read data that's in the process of being inserted (by another connection) and that hasn't been committed yet.
Without that query hint, a SELECT might be blocked reading a table by an ongoing INSERT (or UPDATE) statement that places an exclusive lock on rows (or possibly a whole table), until that operation's transaction has been committed (or rolled back).
Problem of the WITH (NOLOCK) hint is: you might be reading data rows that aren't going to be inserted at all, in the end (if the INSERT transaction is rolled back) - so your e.g. report might show data that's never really been committed to the database.
There's another query hint that might be useful - WITH (READPAST). This instructs the SELECT command to just skip any rows that it attempts to read and that are locked exclusively. The SELECT will not block, and it will not read any "dirty" un-committed data - but it might skip some rows, e.g. not show all your rows in the table.
On performance you keep focusing on select.
Shared does not block reads.
Shared lock blocks update.
If you have hundreds of shared locks it is going to take an update a while to get an exclusive lock as it must wait for shared locks to clear.
By default a select (read) takes a shared lock.
Shared (S) locks allow concurrent transactions to read (SELECT) a resource.
A shared lock as no effect on other selects (1 or a 1000).
The difference is how the nolock versus shared lock effects update or insert operation.
No other transactions can modify the data while shared (S) locks exist on the resource.
A shared lock blocks an update!
But nolock does not block an update.
This can have huge impacts on performance of updates. It also impact inserts.
Dirty read (nolock) just sounds dirty. You are never going to get partial data. If an update is changing John to Sally you are never going to get Jolly.
I use shared locks a lot for concurrency. Data is stale as soon as it is read. A read of John that changes to Sally the next millisecond is stale data. A read of Sally that gets rolled back John the next millisecond is stale data. That is on the millisecond level. I have a dataloader that take 20 hours to run if users are taking shared locks and 4 hours to run is users are taking no lock. Shared locks in this case cause data to be 16 hours stale.
Don't use nolocks wrong. But they do have a place. If you are going to cut a check when a byte is set to 1 and then set it to 2 when the check is cut - not a time for a nolock.
I have to add one important comment. Everyone is mentioning that NOLOCKreads only dirty data. This is not precise. It is also possible that you'll get the same row twice or the whole row is skipped during your read. The reason is that you could ask for some data at the same time when SQL Server is re-balancing b-tree.
Check another threads
https://stackoverflow.com/a/5469238/2108874
http://www.sqlmag.com/article/sql-server/quaere-verum-clustered-index-scans-part-iii.aspx)
With the NOLOCK hint (or setting the isolation level of the session to READ UNCOMMITTED) you tell SQL Server that you don't expect consistency, so there are no guarantees. Bear in mind though that "inconsistent data" does not only mean that you might see uncommitted changes that were later rolled back, or data changes in an intermediate state of the transaction. It also means that in a simple query that scans all table/index data SQL Server may lose the scan position, or you might end up getting the same row twice.
At my work, we have a very big system that runs on many PCs at the same time, with very big tables with hundreds of thousands of rows, and sometimes many millions of rows.
When you make a SELECT on a very big table, let's say you want to know every transaction a user has made in the past 10 years, and the primary key of the table is not built in an efficient way, the query might take several minutes to run.
Then, our application might me running on many user's PCs at the same time, accessing the same database. So if someone tries to insert into the table that the other SELECT is reading (in pages that SQL is trying to read), then a LOCK can occur and the two transactions block each other.
We had to add a "NO LOCK" to our SELECT statement, because it was a huge SELECT on a table that is used a lot by a lot of users at the same time and we had LOCKS all the time.
I don't know if my example is clear enough? This is a real life example.
The SELECT WITH (NOLOCK) allows reads of uncommitted data, which is equivalent to having the READ UNCOMMITTED isolation level set on your database. The NOLOCK keyword allows finer grained control than setting the isolation level on the entire database.
Wikipedia has a useful article: Wikipedia: Isolation (database systems)
It is also discussed at length in other stackoverflow articles.
select with no lock - will select records which may / may not going to be inserted. you will read a dirty data.
for example - lets say a transaction insert 1000 rows and then fails.
when you select - you will get the 1000 rows.
say I want to do the following transactions in read committed mode (in postgres).
T1: r(A) -> w(A)
T2: r(A) -> w(A)
If the operations where called in this order:
r1(A)->r2(A)->w1(A)->c1->w2(A)->c2
I would exspect that T2 has to wait at r(A). Because T1 would set an exclusive lock for A at the first read, because it wants to write it later. But with MVCC there are are no read locks?
Now i've got 2 questions:
If I use JDBC to read some data and then execute a separte command for inserting the read data. How does the database know that it has to make an exclusiv lock when it is only reading? Increasing an read lock to a write lock is not allowed in 2PL, as far as I know.
I think my assumtions are wrong... Where does this scenario wait or is one transaction killed? Read uncommitted shouldn't allow lost updates, but I can't see how this can work.
I would be happy if someone could help me. Thanks
I would exspect that T2 has to wait at r(A). Because T1 would set an exclusive lock for A at the first read, because it wants to write it later. But with MVCC there are no read locks?
There are write locks if you specify for update in your select statements. In that case, r2(A) would wait to read if it's trying to lock the same rows as r1(A).
http://www.postgresql.org/docs/9.0/interactive/explicit-locking.html
A deadlock occurs if two transactions start and end up requesting each others already locked rows:
r11(A) -> r22(A) -> r12(A) (same as r22) vs r21(A) (same as r11) -> deadlock
"But with MVCC there are are no read locks?"
MVCC is a different beast. There are no "locks" in MVCC because in that scenario, the system maintains as many versions of a single row as might be needed by the transactions that are running concurrently. "Former contents" of a row are not "lost by an update" (i.e. physically overwritten and destroyed), and thus making sure that a reader does not get to see "new updates", is addressed by "redirecting" that reader's inquiries to the "former content", which is not locked (hence the term "snapshot isolation"). Note that MVCC, in principle, cannot be applied to updating transactions.
"If I use JDBC to read some data and then execute a separate command for inserting the read data. How does the database know that it has to make an exclusive lock when it is only reading? Increasing an read lock to a write lock is not allowed in 2PL, as far as I know."
You are wrong about 2PL. 2PL means that acquired locks are never released until commit time. It does not mean that an existing lock cannot be strengthened. Incidentally : that is why isolation levels such as "cursor stability" are not 2PL : they do release read locks prior to commit time.
The default transaction mode in PostgreSQL is READ COMMITTED, however READ COMMITTED does not provide the level of serialization that you are looking for.
You are looking for the SERIALIZABLE transaction level. Look at the SET TRANSACTION command after reading PostgreSQL's documentation on Transaction Serialization Levels, specifically the SERIALIZABLE mode. PostgreSQL's MVCC docs are also worth reading.
Cheers.
I have a server application, and a database. Multiple instances of the server can run at the same time, but all data comes from the same database (on some servers it is postgresql, in other cases ms sql server).
In my application, there is a process that is performed which can take hours. I need to ensure that this process is only executed one at a time. If one server is processing, no other server instance can process until the first one has completed.
The process depends on one table (let's call it 'ProcessTable'). What I do is, before any server starts the hour-long process, I set a boolean flag in the ProcessTable which indicates that this record is 'locked' and is being processed (not all records in this table are processed / locked, so I need to specifically mark each record which is needed by the process). So when the next server instance comes along while the previous instance is still processing, it sees the boolean flags and throws an exception.
The problem is, that 2 server instances might both be activated at nearly the same time, and when both check the ProcessTable, there may not be any flags set, but both servers are actually in the process of 'setting' the flags but since the transaction hasn't yet commited for either process, neither process will see the locking done by the other process. This is because the locking mechanism itself may take a few seconds, so there is that window of opportunity where 2 servers might still be able to process at the same time.
It appears that what I need is a single record in my 'Settings' table which should store a boolean flag called 'LockInProgress'. So before even a server can lock the needed records in the ProcessTable, it first must make sure that it has full rights to do the locking by checking the 'LockInProgress' column in the Settings table.
So my question is, how do I prevent two servers from both modifying that LockInProgress column in the settings table, at the same time... or am I going about this in the wrong manner?
Please note that I need to support both postgresql and ms sql server as some servers use one database, and some servers use the other.
Thanks in advance...
How about obtaining a lock on the record first and then update the record to show "locked". This would avoid the 2nd instance to get a lock successfully and thereby the update of record fails.
The point is to make sure the lock and update as one atomic step.
Make a stored procedure that hands out the lock, and run it under 'serializable' isolation. This will guarantee that one and only one process can get at the resource at any given time.
Note that this means that the second process trying to get at the lock will block until the first process releases it. Also, if you have to get multiple locks in this manner, make sure that the design of the process guarantees that the locks will be acquired and released in the same order. This will avoid deadlock situations where two processes hold resources while waiting for each other to release locks.
Unless you can't deal with your other processes blocking this would probably be easier to implement and more robust than attempting to implement 'test and set' semantics.
I've been thinking about this, and I think this is the simplest way of doing things; I just execute a command like this:
update settings set settingsValue = '333' where settingsKey = 'ProcessLock' and settingsValue = '0'
'333' would be a unique value which each server process gets (based on date/time, server name, + random value etc).
If no other process has locked the table, then the settingsValue would be = to 0, and that statement would adjust the settingsValue.
If another process has already locked the table, then that statement becomes a no-op, and nothing get's modified.
I then immediately commit the transaction.
Finally, I requery the table for the settingsValue, and if it is the correct value, then our lock succeeded and we continue on, otherwise an exception is thrown, etc. When we're done with the lock, we reset the value back down to 0.
Since I'm using SERIALIZATION transaction mode, I can't see this causing any issues... please correct me if I'm wrong.
We're using a SQL Server 2005 database (no row versioning) with a huge select statement, and we're seeing it block other statements from running (seen using sp_who2). I didn't realise SELECT statements could cause blocking - is there anything I can do to mitigate this?
SELECT can block updates. A properly designed data model and query will only cause minimal blocking and not be an issue. The 'usual' WITH NOLOCK hint is almost always the wrong answer. The proper answer is to tune your query so it does not scan huge tables.
If the query is untunable then you should first consider SNAPSHOT ISOLATION level, second you should consider using DATABASE SNAPSHOTS and last option should be DIRTY READS (and is better to change the isolation level rather than using the NOLOCK HINT). Note that dirty reads, as the name clearly states, will return inconsistent data (eg. your total sheet may be unbalanced).
From documentation:
Shared (S) locks allow concurrent transactions to read (SELECT) a resource under pessimistic concurrency control. For more information, see Types of Concurrency Control. No other transactions can modify the data while shared (S) locks exist on the resource. Shared (S) locks on a resource are released as soon as the read operation completes, unless the transaction isolation level is set to repeatable read or higher, or a locking hint is used to retain the shared (S) locks for the duration of the transaction.
A shared lock is compatible with another shared lock or an update lock, but not with an exlusive lock.
That means that your SELECT queries will block UPDATE and INSERT queries and vice versa.
A SELECT query will place a temporary shared lock when it reads a block of values from the table, and remove it when it done reading.
For the time the lock exists, you will not be able to do anything with the data in the locked area.
Two SELECT queries will never block each other (unless they are SELECT FOR UPDATE)
You can enable SNAPSHOT isolation level on your database and use it, but note that it will not prevent UPDATE queries from being locked by SELECT queries (which seems to be your case).
It, though, will prevent SELECT queries from being locked by UPDATE.
Also note that SQL Server, unlike Oracle, uses lock manager and keeps it locks in an in-memory linked list.
That means that under heavy load, the mere fact of placing and removing a lock may be slow, since the linked list should itself be locked by the transaction thread.
To perform dirty reads you can either:
using (new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions {
IsolationLevel = System.Transactions.IsolationLevel.ReadUncommitted }))
{
//Your code here
}
or
SelectCommand = "SELECT * FROM Table1 WITH (NOLOCK) INNER JOIN Table2 WITH (NOLOCK) ..."
remember that you have to write WITH (NOLOCK) after every table you want to dirty read
You could set the transaction level to Read Uncommitted
You might also get deadlocks:
"deadlocks involving only one table"
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/01/reproducing-deadlocks-involving-only-one-table.aspx
and or incorrect results:
"Selects under READ COMMITTED and REPEATABLE READ may return incorrect results."
http://www2.sqlblog.com/blogs/alexander_kuznetsov/archive/2009/04/10/selects-under-read-committed-and-repeatable-read-may-return-incorrect-results.aspx
You can use WITH(READPAST) table hint. It's different than the WITH(NOLOCK). It will get the data before the transaction was started and will not block anyone. Imagine that, you ran the statement before the transaction was started.
SELECT * FROM table1 WITH (READPAST)
I've got in an ASP.NET application this process :
Start a connection
Start a transaction
Insert into a table "LoadData" a lot of values with the SqlBulkCopy class with a column that contains a specific LoadId.
Call a stored procedure that :
read the table "LoadData" for the specific LoadId.
For each line does a lot of calculations which implies reading dozens of tables and write the results into a temporary (#temp) table (process that last several minutes).
Deletes the lines in "LoadDate" for the specific LoadId.
Once everything is done, write the result in the result table.
Commit transaction or rollback if something fails.
My problem is that if I have 2 users that start the process, the second one will have to wait that the previous has finished (because the insert seems to put an exclusive lock on the table) and my application sometimes falls in timeout (and the users are not happy to wait :) ).
I'm looking for a way to be able to have the users that does everything in parallel as there is no interaction, except the last one: writing the result. I think that what is blocking me is the inserts / deletes in the "LoadData" table.
I checked the other transaction isolation levels but it seems that nothing could help me.
What would be perfect would be to be able to remove the exclusive lock on the "LoadData" table (is it possible to force SqlServer to only lock rows and not table ?) when the Insert is finished, but without ending the transaction.
Any suggestion?
Look up SET TRANSACTION ISOLATION LEVEL READ COMMITTED SNAPSHOT in Books OnLine.
Transactions should cover small and fast-executing pieces of SQL / code. They have a tendancy to be implemented differently on different platforms. They will lock tables and then expand the lock as the modifications grow thus locking out the other users from querying or updating the same row / page / table.
Why not forget the transaction, and handle processing errors in another way? Is your data integrity truely being secured by the transaction, or can you do without it?
if you're sure that there is no issue with cioncurrent operations except the last part, why not start the transaction just before those last statements, Whichever they are that DO require isolation), and commit immediately after they succeed.. Then all the upfront read operations will not block each other...