Row level locking in CockroachDB - database

I need to take row level lock for update and at the same time allow other select queries to get their intended row which is not locked.
What I observed was if I take a lock on row1, no other select queries are allowed which are searching for other rows.
I have below table schema -
CREATE TABLE lock_test(
id int NOT NULL DEFAULT unique_rowid(),
col_with_unique_index text NOT NULL,
other_col text NOT NULL,
CONSTRAINT "primary" PRIMARY KEY (id ASC),
UNIQUE INDEX unique_idx (col_with_unique_index ASC),
FAMILY "primary"(id, col_with_unique_index, other_col));
Inserted below 2 rows -
insert into lock_test(col_with_unique_index, other_col) values('val1', 'some_val');
insert into lock_test(col_with_unique_index, other_col) values('val2', 'some_val');
Opened 2 terminals -
1st terminal -
begin;
select * from lock_test where col_with_unique_index = 'val1' for update;
2nd terminal -
select * from lock_test where col_with_unique_index = 'val2';
Expected 2nd terminal to show the result for val2 but it did not(went into waiting), instead after I executed commit in 1st terminal 2nd terminal showed the result.
I tried changing my where clause from col_with_unique_index to id which is the primary key here, and this time 2nd terminal did not wait and displayed the expected result.
I'm unable to understand the behaviour here. Can I only take row level locks if I have primary key in my where clause?

I tried changing my where clause from col_with_unique_index to id which is the primary key here, and this time 2nd terminal did not wait and displayed the expected result.
I'm unable to understand the behaviour here. Can I only take row level locks if I have primary key in my where clause?
I suspect what happened here is that the number of rows in the table was very small and the optimizer determined that it would be cheaper to scan the primary index than to look at the unique index. We have tuned these hueristics a bit lately to account for this contention. You can force a scan against the secondary index using: select * from lock_test#unique_idx where col_with_unique_index = 'val2'; Or you can add more data and run stats (should run automatically).

I have tested row-level locking with cockroachDB v22.1.
In terminal 1:
SELECT version();
version
----------------------------------------------------------------------------------------
CockroachDB CCL v22.1.12 (x86_64-pc-linux-gnu, built 2022/12/12 19:53:40, go1.17.11)
(1 row)
CREATE TABLE my_table (
id INT PRIMARY KEY,
number INT
);
INSERT INTO my_table(id, number)
VALUES
(1, 100),
(2, 200),
(3, 300);
BEGIN TRANSACTION;
SELECT * FROM my_table WHERE id = 1 FOR UPDATE;
-- stop here
In terminal 2:
SELECT * FROM my_table; -- It will stop here
In terminal 1:
COMMIT TRANSACTION;
In terminal 2:
The output will continue.
id | number
-----+---------
1 | 100
2 | 200
3 | 300
(3 rows)

Related

How to avoid SQL Server using default row estimate for missing values?

I got a table with a couple of million rows with a Status column which is 2nd column in a non-clustered index.
The status is char(10) and contains "New", "Processing", "Processed" and "Failed"
A polling function checks for new rows
SELECT TOP 1 ... FROM Table WHERE firstColumnInIdex = 1 AND Status = 'New' ORDER BY Id
(It is actually an update to status "Processing" and some other differences but it doesn't matter here)
The query uses the non-clustered index but the row estimate is ~30% of rows so the memory grant is in the GB-range.
My testing shows the problem is the statistics. Since there are normally no rows with status "New" in the table, "New" is not present in the statistics (which says millions "Processed" and thousands "Failed"). SQL Server seems go for a default estimate, in this case ~30% of rows, if the value is not found in the statistics.
I added a row to the table with status "New" and created new statistics with FULLSCAN NORECOMPUTE. (So it becomes millions "Processed", thousands "Failed" and 1 "New")
Now the row estimate is 1 row and the query cost goes down from 82 to 6 with a small memory grant.
(dropping the statistics causes 30% again)
While this trick solves the problem it feels like a hack which might stop working some day (e.g. some future dba finds this outdated statistics and deletes/updates it).
Is there a better way to solve this? e.g.
using integer status instead?
Making SQL Server aware of the "New" status with a foreign key or constraint?
Version is 2016SP1
One thing I find useful is a filtered index.
Assuming this is a queue and things start with status 'new', you
Select one or all of the 'new' rows (getting the PK IDs)
Act on those IDs
Update the status according to IDs
In these cases, you could create a filtered index which is basically just an up-to-date list of all the rows with status 'new'.
CREATE NONCLUSTERED INDEX ix_myindex ON [myTable]
([ID])
WHERE (Status = 'New')
Note - the index will be very 'hot' e.g., have a lot of changes (as soon as they're no longer 'new', they get removed from the index).
However, the idea is to keep so small that it doesn't really matter.
Make sure the index has all the fields you need to identify the relevant rows (e.g., your PK) to keep it as simple/small as possible, and see if it works.
UPDATE FOLLOWING COMMENTS
The issues is probably related to the 'Ascending key problem' - feel free to research and review.
I may have made a minor mistake above - often filtered indexes work better if you actually include the field you're filtering on. Therefore the following may be better.
CREATE NONCLUSTERED INDEX ix_myindex ON [myTable]
([ID], [Status])
WHERE (Status = 'New')
Regarding the approach in the solution - the idea is that we're going to completely ignore statistics. Instead, we actually create a temporary table with the relevant number of rows, and those will instead limit the cardinality estimates.
For testing, I have a table called 'test' that has about 1.5 million rows, with an ID PK and 4 columns with UUIDs (essentially random data).
I use this to create a new table 'test2' with a status column. Approx 80% of these have status 'Processed', 10% status 'Processing', 10% status 'Failed'.
I then insert a new row with status 'New'. Note that the statistics do not update.
However, I then use the filtered index to identify the relevant rows by putting them into a temp table - and using that table for further processing.
SETUP
IF OBJECT_ID (N'test2', N'U') IS NOT NULL DROP TABLE dbo.Test2;
GO
CREATE TABLE [dbo].[test2](
[ID] [int] NOT NULL,
[Status] [varchar](12) NULL,
[col2] [varchar](100) NULL,
[col3] [varchar](100) NULL,
[col4] [varchar](100) NULL,
[col5] [varchar](100) NULL,
CONSTRAINT [PK_test2] PRIMARY KEY CLUSTERED ([ID] ASC)
);
GO
CREATE NONCLUSTERED INDEX [IX_test2_StatusNew] ON [dbo].[test2] ([ID] ASC, [Status] ASC)
WHERE ([Status]='New');
GO
INSERT INTO dbo.Test2 (ID, Status, Col2, Col3, Col4, Col5)
SELECT ID, CASE WHEN ID % 12 < 10 THEN 'Processed' WHEN ID % 12 = 10 THEN 'Processing' ELSE 'Failed' END,
Col2, Col3, Col4, Col5
FROM dbo.Test;
GO
CREATE STATISTICS [S_Status] ON [dbo].[test2]([Status]);
GO
DBCC SHOW_STATISTICS ('dbo.Test2', 'S_Status');
/*
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
Failed 0 141420 0 1
Processed 0 1417080 0 1
Processing 0 141420 0 1
*/
Here's my stored procedure - it starts with flagging the appropriate rows (changing their status to 'Processing') and recording their IDs.
The IDs are then used to process the rows in the table, and then update the status again to 'Processed'.
For brevity here, I haven't included any transactions or error-checking.
CREATE PROCEDURE UpdateTest2News
AS
BEGIN
SET NOCOUNT ON;
CREATE TABLE #IDs_to_process (ID int PRIMARY KEY);
UPDATE test2
SET Status = 'Processing'
OUTPUT deleted.ID
INTO #IDs_to_process
WHERE Status = 'New';
UPDATE test2
SET Col2 = NEWID(),
Col3 = NEWID(),
Col4 = NEWID(),
Col5 = NEWID()
FROM test2
INNER JOIN #IDs_to_Process IDs ON test2.ID = IDs.ID;
UPDATE test2
SET Status = 'Processed'
FROM test2
INNER JOIN #IDs_to_Process IDs ON test2.ID = IDs.ID;
END;
I then add a new row (with status 'New') into Test2. When checking the stats, they haven't changed (not enough changes have occurred to force an update).
SELECT TOP 1 ID FROM dbo.test2 ORDER BY ID DESC; -- Getting the latest value for next step
/* Max ID = 1699920 */
INSERT INTO dbo.Test2 (ID, Status, Col2, Col3, Col4, Col5)
SELECT 1699921, 'New', NULL, NULL, NULL, NULL;
DBCC SHOW_STATISTICS ('dbo.Test2', 'S_Status');
/* Same as above */
DBCC SHOW_STATISTICS ('dbo.Test2', 'IX_test2_StatusNew');
/* No records represented in stats */
GO
Now, the final steps
Run SET STATISTICS TIME, IO ON; to see processing stats
Also set 'Include actual execution plan' to see estimates vs actuals etc
EXEC UpdateTest2News
Here are a cleaned-up version stats - which are pretty darn good.
Stats summary
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table '#IDs_to_process___...________________0000000000BC'. Scan count 0, logical reads 2
Table 'test2'. Scan count 1, logical reads 7
Table 'Worktable'. Scan count 1, logical reads 5
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 14 ms.
SQL Server parse and compile time:
CPU time = 25 ms, elapsed time = 25 ms.
Table 'test2'. Scan count 0, logical reads 11
Table '#IDs_to_process________...__________0000000000BC'. Scan count 1, logical reads 2
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 593 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'test2'. Scan count 0, logical reads 3
Table '#IDs_to_process_____...______0000000000BC'. Scan count 1, logical reads 2
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 45 ms.
SQL Server Execution Times:
CPU time = 61 ms, elapsed time = 683 ms.
And here is the execution plan/etc, and estimates vs actuals are good too.
Note - It does remember/cache the execution plans, which could turn into an issue when you have vastly different numbers of 'new' rows.
If needed, you can put OPTION (RECOMPILE) on statements 2 or 3 within the stored procedure, so it takes the new estimates of the number of rows.
Also the command UPDATE STATISTICS test2 (IX_test2_StatusNew) WITH fullscan is trivial to run (as there are almost no rows in that index) if desired - that may help in your situation.

SQL Server : recreate table in appropriate order

I've deleted some records (more precisely row 4) from a table in a SQL Server database. Now the first column goes like this (1,2,3,5) without row 4:
ID Name
------------
1 Luk
2 Sky
3 Philips
5 Andrey
How can I recreate this table and insert all data again in appropriate order?
Like this:
ID Name
--------
1 Luk
2 Sky
3 Philips
4 Andrey
EDIT:
But if i have another column (number) that is not a key, like this:
ID Number Name
------------
1 1 Luk
2 2 Sky
3 3 Philips
5 5 Andrey
Then can i recreate column Number and Name,
ID Number Name
------------
1 1 Luk
2 2 Sky
3 3 Philips
5 4 Andrey 'Can i do this, and if can HOW?
I would make a pretty strong case for never storing this number, since it is calculated, instead you could just create a view:
CREATE VIEW dbo.YourView
AS
SELECT ID,
Number = ROW_NUMBER() OVER(ORDER BY ID),
Name
FROM dbo.YourTable;
GO
This way after you have deleted rows, your view will already be in sync without having to perform any updates.
If you need to store the value, then almost the same query applies, but just placed inside a common table expression, which is then updated:
WITH CTE AS
( SELECT ID,
Number,
NewNumber = ROW_NUMBER() OVER(ORDER BY ID)
FROM dbo.YourTable
)
UPDATE CTE
SET Number = NewNumber;
You can use dbcc command
DBCC CHECKIDENT('tableName', RESEED, 0)
It would reset identity to 0.
Note it would require to truncate table first.
You can make the ID to auto increment which by default, the starting value for AUTO_INCREMENT is 1, and it will increment by 1 for each new record.
E.g MSSQL uses IDENTITY keyword to auto increment whereas MySQL uses the AUTO_INCREMENT keyword to perform an auto-increment feature.
MSSQL
ID int IDENTITY(1,1) PRIMARY KEY
MySQL
ID int NOT NULL AUTO_INCREMENT

sql server deadlock case

I have a deadlock problem between 2 processes that insert data in the same table
These 2 processes run exactly the same SQL orders on a table with a primary key (identity) and a unique index.
the sequence of SQL order is the following, for each process in an explicit transaction :
begin trans
select CUSTID from CUSTOMERS where CUSTNUMBER='unique value'
------- the row is never found in this case so... insert the data
insert into CUST(CUSTNUMBER) values('unique value')
------- then we must read the value generated for the pk
select CUSTID from CUSTOMERS where CUSTNUMBER='unique value'
commit
each process work on a distinct data set and have no common values for "CUSTNUMBER"
the deadlock occurs in this case :
spid 1 : select custid... for unique value 1
spid 2 : select custid... for unique value 2
spid 1 : insert unique value 1
spid 2 : insert unique value 2
spid 2 : select custid again for value 2 <--- Deadlock Victim !
spid 1 : select custid again for value 1
The deadlock graph show that the problem occurs on the unique index on CUSTNUMBER
The killed process had a lock OwnerMode:X and was RequestMode:S on the unique index for the same HoBt ID.
The winner process was OnwerMode:X and RequestMode:S for the same HoBt ID
I have no idea to explain that, maybe someone can help me ?
try using OUTPUT to get rid of the final SELECT:
begin trans
select CUSTID from CUSTOMERS where CUSTNUMBER='unique value'
------- the row is never found in this case so... insert the data
insert into CUST(CUSTNUMBER) OUTPUT INSERTED.CUSTID values('unique value')
--^^^^^^^^^^^^^^^ will return a result set of CUSTIDs
commit
OR
DECLARE #x table (CUSTID int)
begin trans
select CUSTID from CUSTOMERS where CUSTNUMBER='unique value'
------- the row is never found in this case so... insert the data
insert into CUST(CUSTNUMBER) OUTPUT INSERTED.CUSTID INTO #x values('unique valu')
--^^^^^^^^^^^^^^^^^^^^^^ will store a set of CUSTIDs
-- into the #x table variable
commit
I have no explanation to the deadlock only another way of doing what you are doing using merge and output. It requires that you use SQL Server 2008 (or higher). Perhaps it will take care of your deadlock issue.
declare #dummy int;
merge CUSTOMERS as T
using (select 'unique value') as S(CUSTNUMBER)
on T.CUSTNUMBER = S.CUSTNUMBER
when not matched then
insert (CUSTNUMBER) values(S.CUSTNUMBER)
when matched then
update set #dummy = 1
output INSERTED.CUSTID;
This will return the newly created CUSTID if there was no match and the already existing CUSTID if there where a match for CUSTNUMBER.
It would be best if you post the actual deadlock graph (the .xml file, not the picture!). W/o that noone can be sure, but is likely that you see a case of the read-write deadlock that occurs due to the order of using vs. applying updates to the secondary indexes. I cannot reommend a solution w/o seeing the deadlock graph and the exact table schema (clustered index and all non-clustered indexes).
On a separate note the SELECT->if not exists->INSERT pattern is always wrong under concurrency, there isn't anything to prevent two threads from trying to insert the same row. A much better patter is to simply insert always and catch the duplicate key violation exception that occurs (is also more performant). As for your second SELECT, use OUTPUT clause as other have already suggested. so basically this whole ordeal can be reduced an insert int a try/catch block. MERGE will also work.
An alternative to using output is replacing the last select with a select scope_identity() if the CUSTID column is an identity column.

Sequential Guid and fragmentation

I'm trying to understand how sequential guid performs better than a regular guid.
Is it because with regular guid, the index use the last byte of the guid to sort? Since it's random it will cause alot of fragmentation and page splits since it will often move data to another page to insert new data?
Sequential guid sine it is sequential it will cause alot less page splits and fragmentation?
Is my understanding correct?
If anyone can shed more lights on the subject, I'll appreciated very much.
Thank you
EDIT:
Sequential guid = NEWSEQUENTIALID(),
Regular guid = NEWID()
You've pretty much said it all in your question.
With a sequential GUID / primary key new rows will be added together at the end of the table, which makes things nice an easy for SQL server. In comparison a random primary key means that new records could be inserted anywhere in the table - the chance of the last page for the table being in the cache is fairly likely (if that's where all of the reads are going), however the chance of a random page in the middle of the table being in the cache is fairly low, meaning additional IO is required.
On top of that, when inserting rows into the middle of the table there is the chance that there isn't enough room to insert the extra row. If this is the case then SQL server needs to perform additional expensive IO operations in order to create room for the record - the only way to avoid this is to have gaps scattered amongst the data to allow for extra records to be inserted (known as a Fill factor), which in itself causes performance issues because the data is spread over more pages and so more IO is required to access the entire table.
I defer to Kimberly L. Tripp's wisdom on this topic:
But, a GUID that is not sequential -
like one that has it's values
generated in the client (using .NET)
OR generated by the newid() function
(in SQL Server) can be a horribly bad
choice - primarily because of the
fragmentation that it creates in the
base table but also because of its
size. It's unnecessarily wide (it's 4
times wider than an int-based identity
- which can give you 2 billion (really, 4 billion) unique rows). And,
if you need more than 2 billion you
can always go with a bigint (8-byte
int) and get 263-1 rows.
Read more: http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx#ixzz0wDK6cece
To visualize the whole picture util named ostress might be used.
E.g. you can create two tables: one with normal GUID as PK, another with sequential GUID:
-- normal one
CREATE TABLE dbo.YourTable(
[id] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_YourTable] PRIMARY KEY NONCLUSTERED (id)
);
-- sequential one
CREATE TABLE dbo.YourTableSeq(
[id] [uniqueidentifier] NOT NULL CONSTRAINT [df_yourtable_id] DEFAULT (newsequentialid()),
CONSTRAINT [PK_YourTableSeq] PRIMARY KEY NONCLUSTERED (id)
);
Then with a given util you run a numbero of inserts with selection of statistics about index fragmentation:
ostress -Slocalhost -E -dYourDB -Q"INSERT INTO dbo.YourTable VALUES (NEWID()); SELECT count(*) AS Cnt FROM dbo.YourTable; SELECT AVG_FRAGMENTATION_IN_PERCENT AS AvgPageFragmentation, PAGE_COUNT AS PageCounts FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, N'LIMITED') DPS INNER JOIN sysindexes SI ON DPS.OBJECT_ID = SI.ID AND DPS.INDEX_ID = SI.INDID WHERE SI.NAME = 'PK_YourTable';" -oE:\incoming\TMP\ -n1 -r10000
ostress -Slocalhost -E -dYourDB -Q"INSERT INTO dbo.YourTableSeq DEFAULT VALUES; SELECT count(*) AS Cnt FROM dbo.YourTableSeq; SELECT AVG_FRAGMENTATION_IN_PERCENT AS AvgPageFragmentation, PAGE_COUNT AS PageCounts FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, N'LIMITED') DPS INNER JOIN sysindexes SI ON DPS.OBJECT_ID = SI.ID AND DPS.INDEX_ID = SI.INDID WHERE SI.NAME = 'PK_YourTableSeq';" -oE:\incoming\TMP\ -n1 -r10000
Then in file E:\incoming\TMP\query.out you will find your statistics.
My results are:
"Normal" GUID:
Records AvgPageFragmentation PageCounts
----------------------------------------------
1000 87.5 8
2000 93.75 16
3000 96.15384615384616 26
4000 96.875 32
5000 96.969696969696969 33
10000 98.571428571428584 70
Sequential GUID:
Records AvgPageFragmentation PageCounts
----------------------------------------------
1000 83.333333333333343 6
2000 63.636363636363633 11
3000 41.17647058823529 17
4000 31.818181818181817 22
5000 25.0 28
10000 12.727272727272727 55
As you can see with sequentially generated GUID being inserted, index is much less fragmented as the insert operation leads to new page allocation rarer.

How to control order of Update query execution?

I have a table in MS SQL 2005. And would like to do:
update Table
set ID = ID + 1
where ID > 5
And the problem is that ID is primary key and when I do this I have an error, because when this query comes to row with ID 8 it tries to change the value to 9, but there is old row in this table with value 9 and there is constraint violation.
Therefore I would like to control the update query to make sure that it's executed in the descending order.
So no for ID = 1,2,3,4 and so on, but rather ID = 98574 (or else) and then 98573, 98572 and so on. In this situation there will be no constraint violation.
So how to control order of update execution? Is there a simple way to acomplish this programmatically?
Transact SQL defers constraint checking until the statement finishes.
That's why this query:
UPDATE mytable
SET id = CASE WHEN id = 7 THEN 8 ELSE 7 END
WHERE id IN (7, 8)
will not fail, though it swaps id's 7 and 8.
It seems that some duplicate values are left after your query finishes.
Try this:
update Table
set ID = ID * 100000 + 1
where ID > 5
update Table
set ID = ID / 100000
where ID > 500000
Don't forget the parenthesis...
update Table
set ID = (ID * 100000) + 1
where ID > 5
If the IDs get too big here, you can always use a loop.
Personally I would not update an id field this way, I would create a work table that is the old to new table. It stores both ids and then all the updates are done from that. If you are not using cascade delete (which could incidentally lock your tables for a long time), then start with the child tables and work up, other wise start with the pk table. Do not do this unless you are in single user mode or you can get some nasty data integrity problems if other users are changin things while the tables are not consistent with each other.
PKs are nothing to fool around with changing and if at all possible should not be changed.
Before you do any changes to production data in this way, make sure to take a full backup. Messing this up can cost you your job if you can't recover.

Resources