We are trying to determine more efficient ways to perform some database operations.
One of the issues that we have is with an ancient primary key system where the primary key for a new record is selected by finding the MAX value in the table and then adding 1 (we cannot change this implementation, please don't suggest this as an answer).
There are some different approaches that we can take to resolve this issue (table-valued parameters, temp tables, etc), but we can never assume that another process won't insert another record during this process and business rules will not allow us to lock the table.
So, the crux of my question is, if we get the current MAX value in a sub-query using an UPDLOCK hint, will the lock hint last for the life of the containing query?
For example:
INSERT
INTO table1
( PKColumn,
DataColumn1,
DataColumn2 )
VALUES SELECT MAX(ISNULL(PKColumn, 0) + 1) FROM table1 WITH (UPDLOCK)) + RowNumber ,
DataColumn1 ,
DataColumn2
FROM #Table1Temp
If we use this to insert 100,000 records, for example, will the UPDLOCK hint hold on the table until all records are inserted or is it released as soon as the initial value is retrieved?
Specifies that update locks are to be taken and held until the
transaction completes. UPDLOCK takes update locks for read operations
only at the row-level or page-level. If UPDLOCK is combined with
TABLOCK, or a table-level lock is taken for some other reason, an
exclusive (X) lock will be taken instead.
So yes. The transaction will last at least as long as that statement. (Possibly longer if you aren't using auto commit transactions and have multiple statements in a transaction)
Related
I want to place DB2 Triggers for Insert, Update and Delete on DB2 Tables heavily used in parallel online Transactions. The tables are shared by several members on a Sysplex, DB2 Version 10.
In each of the DB2 Triggers I want to insert a row into a central table and have one background process calling a Stored Procedure to read this table every second to process the newly inserted rows, ordered by sequence of the insert (sequence number or timestamp).
I'm very concerned about DB2 Index locking contention and want to make sure that I do not introduce Deadlocks/Timeouts to the applications with these Triggers.
Obviously I would take advantage of DB2 Features to reduce locking like rowlevel locking, but still see no real good approach how to avoid index contention.
I see three different options to select the newly inserted rows.
Put a sequence number in the table and the store the last processed sequence number in the background process. I would do the following select Statement:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE SEQ_NO > 'last-seq-number'
ORDER BY SEQ_NO;
Locking Level must be CS to avoid selecting uncommited rows, which will be later rolled back.
I think I need one Index on the table with SEQ_NO ASC
Pro: Background process only reads rows and makes no updates/deletes (only shared locks)
Neg: Index contention because of ascending key used.
I can clean-up processed records later (e.g. by rolling partions).
Put a Status field in the table (processed and unprocessed) and change the Select as follows:
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
WHERE STATUS = 'unprocessed'
ORDER BY TIMESTAMP;
Later I would update the STATUS on the selected rows to "processed"
I think I need an Index on STATUS
Pro: No ascending sequence number in the index and no direct deletes
Cons: Concurrent updates by online transactions and the background process
Clean-up would happen in off-hours
DELETE the processed records instead of the status field update.
SELECT COLUMN_1, .... Column_n
FROM CENTRAL_TABLE
ORDER BY TIMESTAMP;
Since the table contains very few records, no index is required which could create a hot spot.
Also I think I could SELECT with Isolation Level UR, because I would detect potential uncommitted data on the later delete of this row.
For a Primary Key index I could use GENERATE_UNIQUE,which is random an not ascending.
Pro: No Index hot spot and the Inserts can be spread across the tablespace by random UNIQUE_ID
Con: Tablespace scan and sort on every call of the Stored Procedure and deleting records in parallel to the online inserts.
Looking forward what the community thinks about this problem. This must be a pretty common problem e.g. SAP should have a similar issue on their Batch Input tables.
I tend to favour Option 3, because it avoids index contention.
May be there is still another solution in your minds out there.
I think you are going to have numerous performance problems with your various solutions.
(I know premature optimazation is a sin, but experience tells us that some things are just not going to work in a busy system).
You should be able to use DB2s autoincrement feature to get your sequence number, with little or know performance implications.
For the rest perhaps you should look at a Queue based solution.
Have your trigger drop the operation (INSERT/UPDATE/DELETE) and the keys of the row into a MQ queue,
Then have a long running backgound task (in CICS?) do your post processing as its processing one update at a time you should not trip over yourself. Having a single loaded and active task with the ability to batch up units of work should give you a throughput in the order of 3 to 5 hundred updates a second.
My boss keeps on forcing me to write SELECT queries with with (nolock) to prevent deadlocks. But AFAIK, Select statements by default does not have locks, so selecting with with (nolock) and selecting without doesn't make any difference. Please correct me if I am wrong.
The two queries:
SELECT * from EMP with (nolock)
SELECT * from EMP
Isn't both are same. If I don't put nolock will it be prone to deadlocks? Please tell me what should I use.
Nolocks should be used with extreme caution. The most common understanding of nolock (read uncommitted) hint is that it reads data that has not been committed yet. However, there are other side effects that can be very dangerous. (search for "nolock" and "page splits")
There's a really good write up here... https://www.itprotoday.com/sql-server/beware-nolock-hint
In short, "nolocking"ing everything is not always a good idea... if ever.
Assuming we have default Transaction Isolation Level READ COMMITTED ,there is a chance for a dead lock even in a very simple SELECT statement , Imagine a scenario where User1 is only reading data and User2 trys to Update some data and there a non-clustered index on that table, it is possible.
User1 is reading Some Data and obtains a shared lock on the non-clustered index in order to perform a lookup, and then tries to obtain a shared lock on the page contianing the data in order to return the data itself.
User2 who is writing/Updating first obtains an exlusive lock on the database page containing the data, and then attempts to obtain an exclusive lock on the index in order to update the index.
SELECT statements do indeed apply locks unless there is a statement at the top of the query SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED.
By all means use WITH (NOLOCK) in SELECT statement on tables that have a clustered index, but it would be wiser to only do so if there's a need to.
Hint: The easiest way to add a clustered index to a table is to add an Id Primary Key column.
The result set can contain rows that have not yet been committed, that are often later rolled back.
If WITH(NOLOCK) is applied to a table that has a non-clustered index then row-indexes can be changed by other transactions as the row data is being streamed into the result-table. This means that the result-set can be missing rows or display the same row multiple times.
READ COMMITTED adds an additional issue where data is corrupted within a single column where multiple users change the same cell simultaneously.
Bearing in mind the issues WITH(NOLOCK) causes will help you tune your database.
As for your boss, just think of them as a challenge.
I am issuing the following query with an UPDLOCK applied:
select #local_var = Column
from table (UPDLOCK)
where OtherColumn = #parameter
What happens is that multiple connections hit this routine which is used inside a stored procedure to compute a unique id. Once the lock acquires we compute the next id, update the value in the row and commit. This is done because the client has a specific formatting requirement for certain Object ID's in their system.
The UPDLOCK locks the correct row and blocks the other processes, but every now and then we get a duplicate id. It seems the local variable is given the current value before the row is locked. I had assumed that the lock would be obtained before the select portion of the statement was processed.
I am using SQLServer 2012 and the isolation level is set to read committed.
If there is other information required, just let me know. Or if I am doing something obviously stupid, that information is also welcome.
From the SQL Server documentation on UPDLOCK:
Use update locks instead of shared locks while reading a table, and hold locks until the end of the statement or transaction. UPDLOCK has the advantage of allowing you to read data (without blocking other readers) and update it later with the assurance that the data has not changed since you last read it.
That means that other processes can still read the values.
Try using XLOCK instead, that will lock other reads out as well.
I think the issue is that your lock is only being held during this Select.
So once your Stored Proc has the Value, it releases the Lock, BEFORE it goes on to update the id (or insert a new row or whatever).
This means that another query running in Parallel is able to Query for the same value and then Update/Insert the same row.
You should additinoally add a HOLDLOCK to your 'with' statement so that the lock gets held a little longer.
This is treated quite well in this Answer
After reading this interesting article I have some questions.
This table shows a deadlock situation :
T1 holds X lock on all rows with c1=5 on table t_lock1 while T2 holds
X lock on all rows with C1=1 on table t_lock2.
Now each of these transactions wants to update the rows previously
locked by the other. This results in a deadlock.
Question #1
Do transactions obtain locks? I know that reading from table is done by a shared lock, and write to a table is done using an exclusive lock (I'm talking about the default locking settings).
So it seems from this example that transaction also holds a lock ....is it correct?
Question #2
...T1 holds X lock on all rows with c1=5 on table t_lock1...
IMHO as I've said the locking is not per row (although it can be made, but the author didn't mentioned it) - so why does he say : on all rows with C1=5 ?
Do transactions obtain locks?
No. The statement that you execute - a SELECT or an UPDATE will acquire the locks. Depending on your transaction isolation level setting, the duration of how long the (shared) locks (for a reading SELECT) will be held differs - that's all. Shared locks normally are held only very briefly, while update and exclusive locks are held until the transaction ends. The transaction might hold the locks - but it's not the transaction that acquires the locks...
*...T1 holds X lock on all rows with c1=5 on table t_lock1...*
IMHO as I've said the locking is not per row ( although it can be made , but the author didn't mentioned it) so why does he say : on all rows with C1=5 ?
The locking is per row - by default. But why do you think there's only a single row with C1=5? There could be multiple - possibly thousands - and the UPDATE statement will lock all those rows affected by the UPDATE statement.
For question 1: SQL Server reads the source tables rows using U-locks, then updates them converting them to X-locks only on those rows which qualify for the update. Notice the distinction between reading many rows, then filtering them down to those which get written. Those two sets are locked differently.
As there are no selects in your queries only U and X locks are taken. S-lock are not taken for update-queries on the table being updated. This is a heuristic deadlock-avoidance scheme.
Question 2: Locking can be done at different granularity but for low row counts it is usually per row (and this can be forced). Maybe the author assumes an index on C1 which would mean that only the rows with C1=1 need to be read and locked. All other rows wouldn't be touched.
If there was no index SQL Server would indeed read all rows of the table, U-lock them while doing that and then X-lock those which satisfy C1=1. The author indeed mentiones that only rows with C1=1 are x-locked.
Consider this statement:
update TABLE1
set FormatCode = case T.FormatCode when null then TABLE1.FormatCode else T.FormatCode end,
CountryCode = case T.CountryCode when null then TABLE1.CountryCode else T.CountryCode end
<SNIP ... LOTS of similar fields being updated>
FROM TABLE2 AS T
WHERE TABLE1.KEYFIELD = T.KEYFIELD
TABLE1 is used by other applications and so locking on it should be minimal
TABLE2 is not used by anybody else so I do not care about it.
TABLE1 and TABLE2 contain 600K rows each.
Would the above statement cause a table lock on TABLE1?
How can I modify it to cause the minimal lock on it ?
Maybe use a cursor to read the rows of TABLE2 one by one and then for each row update the respective row of TABLE1?
Sql will use row locks first. If enough rows in a index page is locked SQL will issue a page lock. If enough pages are locked SQL will issue a table lock.
So it really depends on how many locks is issued. You could user the locking hint ROWLOCK in your update statement. The down side is that you will probably have thousand of row lock instead of hundreds of page locks or one table lock. Locks use resources so while ROWLOCK hints will probably not issue a table lock it might even be worse as it could starve your server of resources and slowing it down in anycase.
You could batch the update say 1000 at a time. Cursors is really going to news things up even more. Experiment monitor analyse the results and make a choice based on the data you have gathered.
As marc_s has suggested introducing a more restrictive WHERE clause to reduce the number of rows should help here.
Since your update occurs nightly it seems you'll be looking to only update the records that have updated since the previous update occurred (ie a days worth of updates). But this will only benefit you if a subset of the records have changed rather than all of then.
I'd probably try to SELECT out the Id's for the rows that have changed into a temp table and then joining the temp table as part of the update. To determine the list of Id's a couple of options come to mind on how you can do this such as making use of a last changed column on TABLE2 (if TABLE2 has one); alternatively you could compare each field between TABLE1 and TABLE2 to see if they differ (watch for nulls), although this would be a lot of extra sql to include and probably a maintenance pain. 3rd option that I can think of would be to have an UPDATE trigger against TABLE2 to insert the KEYFIELD of the rows as they are updated during the day into our temp table, the temp table could be cleared down following your nightly update.