Database Row Locking - database

My C# program is doing the following (pseudo code):
START TRANSACTION READ COMMITTED
Select isOriginal, * from myTable where tnr = x;
//Record found?
Yes:
//isOriginal?
Yes:
update myTable set is_active = 0 where tnr = x;
No:
delete from myTable where tnr = x;
//Do some simple logic on the values
Insert into myTable (newvalues)
No:
return record_not_found;
END TRANSACTION
However, when I start two instances of my program and both edit the same record at the same time two records are inserted as they both find the record in the select query.
What should happen is that the first transaction finds the record and inserts a new row while the second transaction returns a record not found.
How can I fix this? Put my transaction to serializable? Check the return value of update/delete? What's the best way?
Edit:
It should work on Sybase, Oracle & SQL Server.

without knowing what db you are using you could setup a lock field in the db.
where each concurrent thread has a pid or thread id or at least a unique timestamp
just do a
update myTable set lock = <pid> where pid = null limit 1;
select isOriginal, * from myTable where lock = <pid>

If you are using MS SQL You will need to look at NOLOCK and ROWLOCK
NOLOCK tells SQL Server to ignore any type of locks and read directly from the actual tables. The pro is it has great performance, the con is in this way you are circumventing a locked system. ROWLOCK on the other hand asks SQL Server to use row-level locks. Performance does hinder with rowlock so you need to determine if you need to lock on UPDATES / DELETES
In your case SELECT isOriginal, * FROM myTable WITH (NOLOCK) WHERE tnr=x
Then UPDATE myTable WITH (ROWLOCK) SET is_active=0 WHERE tnr=x

Related

Insert from select or update from select with commit every 1M records

I've already seen a dozen such questions but most of them get answers that doesn't apply to my case.
First off - the database is am trying to get the data from has a very slow network and is connected to using VPN.
I am accessing it through a database link.
I have full write/read access on my schema tables but I don't have DBA rights so I can't create dumps and I don't have grants for creation new tables etc.
I've been trying to get the database locally and all is well except for one table.
It has 6.5 million records and 16 columns.
There was no problem getting 14 of them but the remaining two are Clobs with huge XML in them.
The data transfer is so slow it is painful.
I tried
insert based on select
insert all 14 then update the other 2
create table as
insert based on select conditional so I get only so many records and manually commit
The issue is mainly that the connection is lost before the transaction finishes (or power loss or VPN drops or random error etc) and all the GBs that have been downloaded are discarded.
As I said I tried putting conditionals so I get a few records but even this is a bit random and requires focus from me.
Something like :
Insert into TableA
Select * from TableA#DB_RemoteDB1
WHERE CREATION_DATE BETWEEN to_date('01-Jan-2016') AND to_date('31-DEC-2016')
Sometimes it works sometimes it doesn't. Just after a few GBs Toad is stuck running but when I look at its throughput it is 0KB/s or a few Bytes/s.
What I am looking for is a loop or a cursor that can be used to get maybe 100000 or a 1000000 at a time - commit it then go for the rest until it is done.
This is a one time operation that I am doing as we need the data locally for testing - so I don't care if it is inefficient as long as the data is brought in in chunks and a commit saves me from retrieving it again.
I can count already about 15GBs of failed downloads I've done over the last 3 days and my local table still has 0 records as all my attempts have failed.
Server: Oracle 11g
Local: Oracle 11g
Attempted Clients: Toad/Sql Dev/dbForge Studio
Thanks.
You could do something like:
begin
loop
insert into tablea
select * from tablea#DB_RemoteDB1 a_remote
where not exists (select null from tablea where id = a_remote.id)
and rownum <= 100000; -- or whatever number makes sense for you
exit when sql%rowcount = 0;
commit;
end loop;
end;
/
This assumes that there is a primary/unique key you can use to check if a row int he remote table already exists in the local one - in this example I've used a vague ID column, but replace that with your actual key column(s).
For each iteration of the loop it will identify rows in the remote table which do not exist in the local table - which may be slow, but you've said performance isn't a priority here - and then, via rownum, limit the number of rows being inserted to a manageable subset.
The loop then terminates when no rows are inserted, which means there are no rows left in the remote table that don't exist locally.
This should be restartable, due to the commit and where not exists check. This isn't usually a good approach - as it kind of breaks normal transaction handling - but as a one off and with your network issues/constraints it may be necessary.
Toad is right, using bulk collect would be (probably significantly) faster in general as the query isn't repeated each time around the loop:
declare
cursor l_cur is
select * from tablea#dblink3 a_remote
where not exists (select null from tablea where id = a_remote.id);
type t_tab is table of l_cur%rowtype;
l_tab t_tab;
begin
open l_cur;
loop
fetch l_cur bulk collect into l_tab limit 100000;
forall i in 1..l_tab.count
insert into tablea values l_tab(i);
commit;
exit when l_cur%notfound;
end loop;
close l_cur;
end;
/
This time you would change the limit 100000 to whatever number you think sensible. There is a trade-off here though, as the PL/SQL table will consume memory, so you may need to experiment a bit to pick that value - you could get errors or affect other users if it's too high. Lower is less of a problem here, except the bulk inserts become slightly less efficient.
But because you have a CLOB column (holding your XML) this won't work for you, as #BobC pointed out; the insert ... select is supported over a DB link, but the collection version will get an error from the fetch:
ORA-22992: cannot use LOB locators selected from remote tables
ORA-06512: at line 10
22992. 00000 - "cannot use LOB locators selected from remote tables"
*Cause: A remote LOB column cannot be referenced.
*Action: Remove references to LOBs in remote tables.

SQL Server : incrementing non identity int column by procedure call

I have a column in DB table which has to be increment when let's say some item is selected. But it can be selected parallel and for any records it has to start from 0. My solution is to increment the value from DB procedure, but can I be sure that the first procedure manages to increment the value before another procedure want to load the value to increment? I mean:
t0 Value is 10
t1 Procedure1 valueToInc = Value
t2 Procedure2 valueToInc = Value
t3 Procedure1 valueToInc ++
t4 Procedure2 valueToInc ++
t5 Value = 11
t6 Value = 11
Value written back from Procedure1 is 11 but from Procedure2 is obviously also 11 (need to secure 12 there).
I have also checked identity (property) and sequence (Transact-SQL) but nothing seems to be suitable for me.
Edit
What I´m trying to solve is that I have a console application - TCP server and MSSQL database, where I have a User table. Each time the single user wants to login, I have to increment users loginCount field. Any parallelization here should not be possible or is manageable from code, I know, but it was told me that I have to hande parallel acces by database, so not just to use update query. I have it as job interview project...
I wanted to make understanding easier by my first explanation, but it won´t work.
You can just use
UPDATE Users
SET LoginCount = ISNULL(LoginCount,0) + 1
WHERE UserId = #UserId
This is entirely safe under conditions of concurrency.
Use a transaction with transaction isolation level equal to SERIALIZABLE.
SERIALIZABLE
Statements cannot read data that has been modified but not yet committed by other transactions.
No other transactions can modify data that has been read by the current transaction until the current transaction completes.
Other transactions cannot insert new rows with key values that would fall in the range of keys read by any statements in the current transaction until the current transaction completes.
Don't load the Value to increment it: increment it, then select it (within the transaction). This will lock the table/row (depending) from updates/selects of other transactions.

Trouble with SQL Server locks

I am running into an issue where SQL Server is causing a significant number of locks (95 to 150) on our main table. They are typically short duration locks, lasting under 3 seconds, but I would like to eliminate those if I possibly can. We have also noticed that typically there are no blocks, but occasionally we have a situation where the blocks seem to "cascade" and then the entire system slows down considerably.
Background
We have up to 600 virtual machines processing data and we loaded a table in SQL so we could monitor any records that got stalled and records that were marked complete. We typically have between 200,000 and 1,000,000 records in this table during our processing.
What we are trying to accomplish
We are attempting to get the next available record (Status = 0). However, since there can be multiple hits on the stored proc simultaneously, we are trying to make sure each VM gets a unique record. This is important because processing takes between 1.5 and 2.5 minutes per record and we want to make this as clean as possible.
Our thought process to this point
UPDATE TOP (1) dbo.Test WITH (ROWLOCK)
SET Status = 1,
VMID = #VMID,
ReadCount = ReadCount + 1,
ProcessDT = GETUTCDATE()
OUTPUT INSERTED.RowID INTO #retValue
WHERE Status = 0
This update was causing us a few issues with locks, so we re-worked the process a little bit and changed the where to a sub-query to return the top 1 RowID (primary key) from the table. This seemed to help things run a little bit smoother, but then we occasionally get over-loaded in the database again.
UPDATE TOP (1) dbo.Test WITH (ROWLOCK)
SET Status = 1,
VMID = #VMID,
ReadCount = ReadCount + 1,
ProcessDT = GETUTCDATE()
OUTPUT INSERTED.RowID INTO #retValue
-- WHERE Status = 0
WHERE RowID IN (SELECT TOP 1 RowID FROM do.Test WHERE Status = 0 ORDER BY RowID)
We discovered that having a significant number of Status 1 and 2 records int he table causes slowdowns. We figured it was from a table scan on the Status column. We added the following index but it did not help solve the locks.
CREATE NONCLUSTERED INDEX IX_Test_Status_RowID
ON [dbo].[Test] ([Status])
INCLUDE ([RowID])
The final step after the UPDATE, we use the RowID returned to select out the details:
SELECT 'Test' as FileName, *, #Nick as [Nickname]
FROM Test WITH (NOLOCK)
WHERE RowID IN (SELECT id from #retValue)
Types of locks
The majority of the blocks are LCK_M_U and LCK_M_S, which I would expect with that UPDATE and SELECT query. We did have 1 or 2 LCK_M_X locks as well occasionally. That made me think we may still be getting collisions on our "unique" record code.
Questions
Are these locks and the number of locks just normal SQL operations for this type load?
Is the sub-query causing more issues than a TOP(1) in the UPDATE we started with? I am trying to get confirmation I can remove the ORDER BY statement and remove that extra step of processing.
Would a different index help? I wondered if the index updating was a possible cause of the locks initially, but now I am not sure.
Is there a better or more efficient way to get a unique RowID?
Is the WITH (ROWLOCK) causing more locks than leaving it off would cause? The idea is ROWLOCK would only lock the 1 specific record and allow another proc to update another record and select without locking the table or page.
Does anyone have any tools they recommend to stress test and run 100 queries simultaneously in order to test any potential solutions?
Sorry for all the questions, just trying to make sure I am as clear as possible on our process and the questions we have.
Thanks in advance for any insight as this is a really frustrating issue for us.
Hardware
We are running SQL Server 2008 R2 on a Dual Xeon CPU with 24 GB of RAM. So we should have plenty of horsepower for this process.
It looks like the best solution to the issue was to create a separate table with an identity and use the ##IDENTITY from the insert to determine the next row to process. That has solved all my lock issues so far in my stress testing. Thanks to all who pointed my in the right direction!

Sql Server Ignore rowlock hint

This is a general question about how to lock range of values (and nothing else!) when they are not exists in table yet. The trigger for the question was that I want to do "insert if not exists", I don't want to use MERGE because I need to support SQL Server 2005.
In the first connection I:
begin transaction
select data from a table using (SERIALIZABLE, ROWLOCK) + where clause to respecify range
wait...
In the second connection, I insert data to the table with values that do not match the where clause in the first connection
I would expect that the second connection won't be affected by the first one, but it finishes only after I commit (or rollback) the first connection's transaction.
What am I missing?
Here is my test code:
First create this table:
CREATE TABLE test
(
VALUE nvarchar(100)
)
Second, open new query window sql server managements studio and execute the following:
BEGIN TRANSACTION;
SELECT *
FROM test WITH (SERIALIZABLE,ROWLOCK)
WHERE value = N'a';
Third, open another new query window and execute the following:
INSERT INTO test VALUES (N'b');
Notice that the second query doesn't ends until the transaction in the first window ends
You are missing an index on VALUE.
Without that SQL Server has nothing to take a key range lock on and will lock the whole table in order to lock the range.
Even when the index is added however you will still encounter blocking with the scenario in your question. The RangeS-S lock doesn't lock the specific range given in your query. Instead it locks the range between the keys either side of the selected range.
When there are no such keys either side the range lock extends to infinity. You would need to add a value between a and b (for example aa) to prevent this happening in your test and the insert of b being blocked.
See Bonus Appendix: Range Locks in this article for more about this.

Modify SQL result set before returning from stored procedure

I have a simple table in my SQL Server 2008 DB:
Tasks_Table
-id
-task_complete
-task_active
-column_1
-..
-column_N
The table stores instructions for uncompleted tasks that have to be executed by a service.
I want to be able to scale my system in future. Until now only 1 service on 1 computer read from the table. I have a stored procedure, that selects all uncompleted and inactive tasks. As the service begins to process tasks it updates the task_active flag in all the returned rows.
To enable scaleing of the system I want to enable deployment of the service on more machines. Because I want to prevent a task being returned to more than 1 service I have to update the stored procedure that returns uncompleted and inactive tasks.
I figured that i have to lock the table (only 1 reader at a time - I know I have to use an apropriate ISOLATION LEVEL), and updates the task_active flag in each row of the result set before returning the result set.
So my question is how to modify the SELECT result set iin the stored procedure before returning it?
This is the typical dequeue pattern, is implemented using the OUTPUT clause and and is described in the MSDN, see the Queues paragraph in OUTPUT Clause (Transact-SQL):
UPDATE TOP(1) Tasks_Table WITH (ROWLOCK, READPAST)
SET task_active = 1
OUTPUT INSERTED.id,INSERTED.column_1, ...,INSERTED.column_N
WHERE task_active = 0;
The ROWLOCK,READPAST hint allows for high throughput and high concurency: multiple threads/processed can enqueue new tasks while mutliple threads/process dequeue tasks. There is no order guarantee.
Updated
If you want to order the result you can use a CTE:
WITH cte AS (
SELECT TOP(1) id, task_active, column_1, ..., column_N
FROM Task_Table WITH (ROWLOCK, READPAST)
WHERE task_active = 0
ORDER BY <order by criteria>)
UPDATE cte
SET task_active = 1
OUTPUT INSERTED.id, INSERTED.column_1, ..., INSERTED.column_N;
I discussed this and other enqueue/dequeue techniques on the article Using Tables as Queues.

Resources