weird phenomena during deadlocks involving IMAGE or TEXT columns - sybase

This is something very disturbing I stumbled upon while stress-testing an application using Sybase ASE 15.7.
We have the following table:
CREATE TABLE foo
(
i INT NOT NULL,
blob IMAGE
);
ALTER TABLE foo ADD PRIMARY KEY (i);
The table has, even before starting the test, a single row with some data in the IMAGE column. No rows are either deleted or inserted during the test. So the table always contains a single row. Column blob is only updated (in transaction T1 below) to some value (not NULL).
Then, we have the following two transactions:
T1: UPDATE foo SET blob=<some not null value> WHERE i=1
T2: SELECT * FROM foo WHERE i=1
For some reason, the above transactions may deadlock under load (approx. 10 threads doing T1 20 times in a loop and another 10 threads doing T2 20 times in loop).
This is already weird enough, but there's more to come. T1 is always chosen as the deadlock victim. So, the application logic, on the event of a deadlock (error code 1205) simply retries T1. This should work and should normally be the end of the story. However …
… it happens that sometimes T2 will retrieve a row in which the value of the blob column is NULL! This is even though the table already starts with a row and the updates simply reset the previous (non-NULL) value to some other (non-NULL) value. This is 100% reproducible in every test run.
This is observed with the READ COMMITTED serialization level.
I verified that the above behavior also occurs with the TEXT column type but not with VARCHAR.
I've also verified that obtaining an exlusive lock on table foo in transaction T1 makes the issue go away.
So I'd like to understand how can something that so fundamentally breaks transaction isolation be even possible? In fact, I think this is worse than transaction isolation as T1 never sets the value of the blob column to NULL.
The test code is written in Java using the jconn4.jar driver (class com.sybase.jdbc4.jdbc.SybDriver) so I don't rule out that this may be a JDBC driver bug.
update
This is reproducible simply using isql and spawning several shells in parallel that continuously execute T1 in a loop. So I am removing the Java and JDBC tags as this is definitely server-related.

Your example create table code by default would create an allpages locked table unless your DBA has changed the system-wide 'lock scheme' parameter via sp_configure to another value(you can check this yourself as anyone via sp_configure 'lock scheme'.
Unless you have a very large number of rows they are all going to be sat on a single data page because an int is only 4 bytes long and the blob data is stored at the end of the table (unless you use the in-row LOB functionality in ASE15.7 and up). This is why you are getting deadlocks. You have by definition created a single hotspot where all the data is being accessed at the page level. This is even more likely where larger page sizes > 2k are used, since by their nature they will have even more rows per page and with allpages locking, even more likelihood of contention.
Change your locking scheme to datarows (unless you are planning to have very high rowcounts) as has been said above and your problem should go away. I will add that your blob column looks to allow nulls from your code, so you should also consider setting the 'dealloc_first_txtpg' attribute for your table to avoid wasted space if you have nulls in your image column.

We've seen all kinds of weird stuff with isolation level 1. I'm under the impression that when T2 is in progress, T1 can change data and T2 might return intermediate result of T1.
Try isolation level 2 and see if it helps (does for us).

Related

Primary key conflict even when TABLOCKX and HOLDLOCK hints

I have a table which is used to create locks with unique key to control execution of a critical section over multiple servers, i.e. only one thread at a time from all the web servers can enter that critical section.
The lock mechanism starts by trying to add a record to the database, and if successful it enters the region, otherwise it waits. When it exits the critical section, it removes that key from the table. I have the following procedure for this:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
DECLARE #startTime DATETIME2
DECLARE #lockStatus INT
DECLARE #lockTime INT
SET #startTime = GETUTCDATE()
IF EXISTS (SELECT * FROM GuidLocks WITH (TABLOCKX, HOLDLOCK) WHERE Id = #lockName)
BEGIN
SET #lockStatus = 0
END
ELSE
BEGIN
INSERT INTO GuidLocks VALUES (#lockName, GETUTCDATE())
SET #lockStatus = 1
END
SET #lockTime = (SELECT DATEDIFF(millisecond, #startTime, GETUTCDATE()))
SELECT #lockStatus AS Status, #lockTime AS Duration
COMMIT TRANSACTION GetLock
So I do a SELECT on the table and use TABLOCKX and HOLDLOCK so I get an exclusive lock on the complete table and hold it until the end of the transaction. Then depending on the result, I either return fail status (0), or create a new record and return (1).
However, I am getting this exception from time to time and I just don't know how it is happening:
System.Data.SqlClient.SqlException: Violation of PRIMARY KEY constraint 'PK_GuidLocks'. Cannot insert duplicate key in object 'dbo.GuidLocks'. The duplicate key value is (XXXXXXXXX). The statement has been terminated.
Any idea how this is happening? How is it possible that two threads managed to obtain an exclusive lock on the same table and tried to insert rows at the same time?
UPDATE: It looks readers might have not fully understand my question here, so I would like to elaborate: My understanding is that using TABLOCKX obtains an exclusive lock on the table. I also understood from the documentation (and I could be mistaken) that if I use the HOLDLOCK statement, then the lock will be held till the end of the transaction, which in this case, I assume (and apparently my assumption is wrong, but that's what I understood from the documentation) is the outer transaction initiated by the BEGIN TRANSACTION statement and ended by COMMIT TRANSACTION statement. So the way I understand things here is that by the time SQL Server reach the SELECT statement having the TABLOCKX and HOLDLOCK, it will try to obtain an exclusive lock on the whole table, and will not release it until the execution of COMMIT TRANSACTION. If that's the case, how comes two threads seam to be trying to execute the same INSERT statement at the same time?
If you look up the documentation for tablock and holdlock, you'll see that it is not doing what you think it is:
Tablock: Specifies that the acquired lock is applied at the table level. The
type of lock that is acquired depends on the statement being executed.
For example, a SELECT statement may acquire a shared lock. By
specifying TABLOCK, the shared lock is applied to the entire table
instead of at the row or page level. If HOLDLOCK is also specified,
the table lock is held until the end of the transaction.
So the reason that your query is not working is because you are only getting a shared lock from the table. What Frisbee is attempting to point out is that you don't need to re-implement all of the transaction isolating and locking code because there is a more natural syntax that handles this implicitly. His version is better than yours because it's much more easy to not make a mistake that introduces bugs.
More generally, when ordering statements in your query, you should place the statements requiring the more restrictive lock first.
In my concurrent programming text many years ago, we read the parable of the blind train engineers who needed to transport trains both directions through a single track pass across the Andes only one track wide. In the first mutex model, an engineer would walk up to a synchronization bowl at the top of the pass and, if it was empty, place a pebble in to lock the pass. After driving through the pass he would remove his pebble to unlock the pass for the next train. This is the mutex model you have implemented and it doesn't work. In the parable a crach occurred soon after implementation, and sure enough there were two pebbles in the bowl - we have encountered a READ-READ-WRITE-WRTE anomaly due to the multi-threaded environment.
The parable then describes a second mutex model, where there is already a single pebble in the bowl. Each engineer walks up to the bowl and removes the pebble if one is there, placing it in his pocket while he drives through the pass. Then he restores the pebble to unlock the pass for the next train. If an engineer finds the bowl empty he keeps trying (or blocks for some length of time) until a pebble is available. This is the model that works.
You can implement this (correct) model by having (only ever) a single row in the GuidLocks table with a (by default) NULL value for the lock holder. In a suitable transaction each process UPDATES (in place) this single row with it's SPID exactly if the old value IS NULL; returning 1 if this succeeds and 0 if it fails. It again updates this column back to NULL when it releases the lock.
This will ensure that the resource being locked actually includes the row being modified, which in your case is clearly not always true.
See the answer by usr to this question for an interesting example.
I believe that you are being confused by the error message - clearly the engine is locating the row of a potential conflict before testing for the existence of a lock, resulting in a misleading error message, and that since (due to implementing model 1 above instead of model 2) the TABLOCK is being held on the resource used by the SELECT instead of the resource used by an INSERT/UPDATE, a second process is able to sneak in.
Note that, especially in the presence of support for snapshot isolation, the resource on which you have taken your TABLOCKX (the table snapshot before any inserts) does not guarantee to include the resource to which you have written the lock specifics (the table snapshot after an insert) .
Use an app lock.
exec sp_getapplock #resource = #lockName,
#LockMode='Exclusive',
#LockOwner = 'Session';
Your approach is incorrect from many point of view: granularity (table lock), scope (transaction which commit), leakage (will leak locks). Session scope app locks is what you actually intend to use.
INSERT INTO GuidLocks
select #lockName, GETUTCDATE()
where not exists ( SELECT *
FROM GuidLocks
WHERE Id = #lockName );
IF ##ROWCOUNT = 0 ...
to be safe about optimization
SELECT 1
FROM GuidLocks

Avoiding Locking Contention on DB2 zOS

I want to place DB2 Triggers for Insert, Update and Delete on DB2 Tables heavily used in parallel online Transactions. The tables are shared by several members on a Sysplex, DB2 Version 10.
In each of the DB2 Triggers I want to insert a row into a central table and have one background process calling a Stored Procedure to read this table every second to process the newly inserted rows, ordered by sequence of the insert (sequence number or timestamp).
I'm very concerned about DB2 Index locking contention and want to make sure that I do not introduce Deadlocks/Timeouts to the applications with these Triggers.
Obviously I would take advantage of DB2 Features to reduce locking like rowlevel locking, but still see no real good approach how to avoid index contention.
I see three different options to select the newly inserted rows.
Put a sequence number in the table and the store the last processed sequence number in the background process. I would do the following select Statement:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE SEQ_NO > 'last-seq-number'
ORDER BY SEQ_NO;
Locking Level must be CS to avoid selecting uncommited rows, which will be later rolled back.
I think I need one Index on the table with SEQ_NO ASC
Pro: Background process only reads rows and makes no updates/deletes (only shared locks)
Neg: Index contention because of ascending key used.
I can clean-up processed records later (e.g. by rolling partions).
Put a Status field in the table (processed and unprocessed) and change the Select as follows:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE STATUS = 'unprocessed'
ORDER BY TIMESTAMP;
Later I would update the STATUS on the selected rows to "processed"
I think I need an Index on STATUS
Pro: No ascending sequence number in the index and no direct deletes
Cons: Concurrent updates by online transactions and the background process
Clean-up would happen in off-hours
DELETE the processed records instead of the status field update.
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
ORDER BY TIMESTAMP;
Since the table contains very few records, no index is required which could create a hot spot.
Also I think I could SELECT with Isolation Level UR, because I would detect potential uncommitted data on the later delete of this row.
For a Primary Key index I could use GENERATE_UNIQUE,which is random an not ascending.
Pro: No Index hot spot and the Inserts can be spread across the tablespace by random UNIQUE_ID
Con: Tablespace scan and sort on every call of the Stored Procedure and deleting records in parallel to the online inserts.
Looking forward what the community thinks about this problem. This must be a pretty common problem e.g. SAP should have a similar issue on their Batch Input tables.
I tend to favour Option 3, because it avoids index contention.
May be there is still another solution in your minds out there.
I think you are going to have numerous performance problems with your various solutions.
(I know premature optimazation is a sin, but experience tells us that some things are just not going to work in a busy system).
You should be able to use DB2s autoincrement feature to get your sequence number, with little or know performance implications.
For the rest perhaps you should look at a Queue based solution.
Have your trigger drop the operation (INSERT/UPDATE/DELETE) and the keys of the row into a MQ queue,
Then have a long running backgound task (in CICS?) do your post processing as its processing one update at a time you should not trip over yourself. Having a single loaded and active task with the ability to batch up units of work should give you a throughput in the order of 3 to 5 hundred updates a second.

Updating Identity with DELETE - OUTPUT - INSERT

I need to update an identity column in a very specific scenario (most of the time the identity will be left alone). When I do need to update it, I simply need to give it a new value and so I'm trying to use a DELETE + INSERT combo.
At present I have a working query that looks something like this:
DELETE Test_Id
OUTPUT DELETED.Data,
DELETED.Moredata
INTO Test_id
WHERE Id = 13
(This is only an example, the real query is slightly more complex.)
A colleague brought up an important point. She asked if this wont cause a deadlock since we are writing and reading from the same table. Although in the example it works fine (half a dozen rows), in a real world scenario with tens of thousands of rows this might not work.
Is this a real issue? If so, is there a way to prevent it?
I set up an SQL Fiddle example.
Thanks!
My first thought was, yes it can. And maybe it is still possible, however in this simplified version of the statement it would be very hard to hit an deadlock. You're selecting a single row for which probably row level locks are acquired plus the fact that the locks required for the delete and the insert are acquired very fast after each other.
I've did some testing against a table holding a million rows execution the statement 5 million times on 6 different connections in parallel. Did not hit a single deadlock.
But add the reallive query, an table with indexes and foreign keys and you just might have a winner. I've had a similar statement which did cause deadlocks.
I have encountered deadlock errors with a similar statement.
UPDATE A
SET x=0
OUTPUT INSERTED.ID, 'a' INTO B
So for this statement to complete mssql needs to take locks for the updates on table A, locks for the inserts on table B and shared (read) locks on table A to validate the foreign key table B has to table A.
And last but not least, mssql decided it would be wise to use parallelism on this particular query causing the statement to deadlock on itself. To resolve this I've simply set "MAXDOP 1" query hint on the statement to prevent parallelism.
There is however no definite answer to prevent deadlocks. As they say with mssql ever so ofter, it depends. You could take an exclusive using the TABLOCKX table hint. This will prevent a deadlock, however it's probably not desirable for other reasons.

SQLServer when is UPDLOCK Applied in Select

I am issuing the following query with an UPDLOCK applied:
select #local_var = Column
from table (UPDLOCK)
where OtherColumn = #parameter
What happens is that multiple connections hit this routine which is used inside a stored procedure to compute a unique id. Once the lock acquires we compute the next id, update the value in the row and commit. This is done because the client has a specific formatting requirement for certain Object ID's in their system.
The UPDLOCK locks the correct row and blocks the other processes, but every now and then we get a duplicate id. It seems the local variable is given the current value before the row is locked. I had assumed that the lock would be obtained before the select portion of the statement was processed.
I am using SQLServer 2012 and the isolation level is set to read committed.
If there is other information required, just let me know. Or if I am doing something obviously stupid, that information is also welcome.
From the SQL Server documentation on UPDLOCK:
Use update locks instead of shared locks while reading a table, and hold locks until the end of the statement or transaction. UPDLOCK has the advantage of allowing you to read data (without blocking other readers) and update it later with the assurance that the data has not changed since you last read it.
That means that other processes can still read the values.
Try using XLOCK instead, that will lock other reads out as well.
I think the issue is that your lock is only being held during this Select.
So once your Stored Proc has the Value, it releases the Lock, BEFORE it goes on to update the id (or insert a new row or whatever).
This means that another query running in Parallel is able to Query for the same value and then Update/Insert the same row.
You should additinoally add a HOLDLOCK to your 'with' statement so that the lock gets held a little longer.
This is treated quite well in this Answer

Understanding SQL Server LOCKS on SELECT queries

I'm wondering what is the benefit to use SELECT WITH (NOLOCK) on a table if the only other queries affecting that table are SELECT queries.
How is that handled by SQL Server? Would a SELECT query block another SELECT query?
I'm using SQL Server 2012 and a Linq-to-SQL DataContext.
(EDIT)
About performance :
Would a 2nd SELECT have to wait for a 1st SELECT to finish if using a locked SELECT?
Versus a SELECT WITH (NOLOCK)?
A SELECT in SQL Server will place a shared lock on a table row - and a second SELECT would also require a shared lock, and those are compatible with one another.
So no - one SELECT cannot block another SELECT.
What the WITH (NOLOCK) query hint is used for is to be able to read data that's in the process of being inserted (by another connection) and that hasn't been committed yet.
Without that query hint, a SELECT might be blocked reading a table by an ongoing INSERT (or UPDATE) statement that places an exclusive lock on rows (or possibly a whole table), until that operation's transaction has been committed (or rolled back).
Problem of the WITH (NOLOCK) hint is: you might be reading data rows that aren't going to be inserted at all, in the end (if the INSERT transaction is rolled back) - so your e.g. report might show data that's never really been committed to the database.
There's another query hint that might be useful - WITH (READPAST). This instructs the SELECT command to just skip any rows that it attempts to read and that are locked exclusively. The SELECT will not block, and it will not read any "dirty" un-committed data - but it might skip some rows, e.g. not show all your rows in the table.
On performance you keep focusing on select.
Shared does not block reads.
Shared lock blocks update.
If you have hundreds of shared locks it is going to take an update a while to get an exclusive lock as it must wait for shared locks to clear.
By default a select (read) takes a shared lock.
Shared (S) locks allow concurrent transactions to read (SELECT) a resource.
A shared lock as no effect on other selects (1 or a 1000).
The difference is how the nolock versus shared lock effects update or insert operation.
No other transactions can modify the data while shared (S) locks exist on the resource.
A shared lock blocks an update!
But nolock does not block an update.
This can have huge impacts on performance of updates. It also impact inserts.
Dirty read (nolock) just sounds dirty. You are never going to get partial data. If an update is changing John to Sally you are never going to get Jolly.
I use shared locks a lot for concurrency. Data is stale as soon as it is read. A read of John that changes to Sally the next millisecond is stale data. A read of Sally that gets rolled back John the next millisecond is stale data. That is on the millisecond level. I have a dataloader that take 20 hours to run if users are taking shared locks and 4 hours to run is users are taking no lock. Shared locks in this case cause data to be 16 hours stale.
Don't use nolocks wrong. But they do have a place. If you are going to cut a check when a byte is set to 1 and then set it to 2 when the check is cut - not a time for a nolock.
I have to add one important comment. Everyone is mentioning that NOLOCKreads only dirty data. This is not precise. It is also possible that you'll get the same row twice or the whole row is skipped during your read. The reason is that you could ask for some data at the same time when SQL Server is re-balancing b-tree.
Check another threads
https://stackoverflow.com/a/5469238/2108874
http://www.sqlmag.com/article/sql-server/quaere-verum-clustered-index-scans-part-iii.aspx)
With the NOLOCK hint (or setting the isolation level of the session to READ UNCOMMITTED) you tell SQL Server that you don't expect consistency, so there are no guarantees. Bear in mind though that "inconsistent data" does not only mean that you might see uncommitted changes that were later rolled back, or data changes in an intermediate state of the transaction. It also means that in a simple query that scans all table/index data SQL Server may lose the scan position, or you might end up getting the same row twice.
At my work, we have a very big system that runs on many PCs at the same time, with very big tables with hundreds of thousands of rows, and sometimes many millions of rows.
When you make a SELECT on a very big table, let's say you want to know every transaction a user has made in the past 10 years, and the primary key of the table is not built in an efficient way, the query might take several minutes to run.
Then, our application might me running on many user's PCs at the same time, accessing the same database. So if someone tries to insert into the table that the other SELECT is reading (in pages that SQL is trying to read), then a LOCK can occur and the two transactions block each other.
We had to add a "NO LOCK" to our SELECT statement, because it was a huge SELECT on a table that is used a lot by a lot of users at the same time and we had LOCKS all the time.
I don't know if my example is clear enough? This is a real life example.
The SELECT WITH (NOLOCK) allows reads of uncommitted data, which is equivalent to having the READ UNCOMMITTED isolation level set on your database. The NOLOCK keyword allows finer grained control than setting the isolation level on the entire database.
Wikipedia has a useful article: Wikipedia: Isolation (database systems)
It is also discussed at length in other stackoverflow articles.
select with no lock - will select records which may / may not going to be inserted. you will read a dirty data.
for example - lets say a transaction insert 1000 rows and then fails.
when you select - you will get the 1000 rows.

Resources