Different lock behaviour when using `IN`

Different lock behaviour when using `IN` - sql-server

I am having difficulty understanding the lock behaviour difference in the following examples, which cause me to have to resolve a deadlock.
I use (updlock,holdlock) for the common scenario of "select first, update/delete later". It might be relevant that in this specific case what is going to happen "later" is delete.
First, let me set up a case where everything works fine.
As a "control panel query", let's create a very simple table, add some rows, and also prepare a lock selection query based on ashman786's post:
/*
create table color(id int primary key,descr nvarchar(50))
truncate table color
insert color(id,descr) values (0,'red'),(1,'green'),(2,'blue')
select * from color
*/
SELECT L.request_session_id AS SPID,
O.Name AS LockedObjectName,
L.resource_type+isnull(': '+L.resource_description,'') AS LockedResource,
L.request_mode AS LockType
FROM sys.dm_tran_locks L
JOIN sys.partitions P ON P.hobt_id = L.resource_associated_entity_id
JOIN sys.objects O ON O.object_id = P.object_id
JOIN sys.dm_exec_sessions ES ON ES.session_id = L.request_session_id
JOIN sys.dm_tran_session_transactions TST ON ES.session_id = TST.session_id
JOIN sys.dm_tran_active_transactions AT ON TST.transaction_id = AT.transaction_id
JOIN sys.dm_exec_connections CN ON CN.session_id = ES.session_id
CROSS APPLY sys.dm_exec_sql_text(CN.most_recent_sql_handle) AS ST
WHERE resource_database_id = db_id()
ORDER BY L.request_session_id
For test 1, copy-paste each the following in a new session, each modifies one pk value:
select ##SPID, ##trancount
--52 is my spid
begin tran
select * from color with (updlock,holdlock) where id=1
delete color where id=1
commit
rollback
select ##SPID, ##trancount
--56 is my spid
begin tran
select * from color with (updlock,holdlock) where id=2
delete color where id=2
commit
rollback
Running the first line shows the SPID, which we conveniently write down in the following line.
If we create & insert the color table, then run the begin tran and select on both tabs, then the lock query shows:
+------+------------------+---------------------+----------+
| SPID | LockedObjectName | LockedResource | LockType |
+------+------------------+---------------------+----------+
| 52 | color | PAGE: 1:704 | IU |
| 52 | color | KEY: (8194443284a0) | U |
| 56 | color | PAGE: 1:704 | IU |
| 56 | color | KEY: (61a06abd401c) | U |
+------+------------------+---------------------+----------+
Seems fine. Now run the deletes on both sides without committing, and we get a similar result:
+------+------------------+---------------------+----------+
| SPID | LockedObjectName | LockedResource | LockType |
+------+------------------+---------------------+----------+
| 52 | color | PAGE: 1:704 | IX |
| 52 | color | KEY: (8194443284a0) | X |
| 56 | color | PAGE: 1:704 | IX |
| 56 | color | KEY: (61a06abd401c) | X |
+------+------------------+---------------------+----------+
Following this, committing works. A dataflow variation of one session doing both select and delete also works fine.
For test 2, instead of using the pk value as literal, we will use a temp table:
select ##SPID, ##trancount
--60 is my spid
begin tran
drop table if exists #t
select * into #t from color with (updlock,holdlock) where id=1
delete color where id in (select id from #t)
commit
select ##SPID, ##trancount
-- 54 is my spid
begin tran
drop table if exists #t
select * into #t from color with (updlock,holdlock) where id=2
delete color where id in (select id from #t)
commit
Running begin tran and select into on both sessions results in exactly the same lock results as before: 2xU Key locks, 2xIU Page locks. However, as soon as one of the delete statements is run (#60), the picture changes:
+------+------------------+---------------------+----------+
| SPID | LockedObjectName | LockedResource | LockType |
+------+------------------+---------------------+----------+
| 54 | color | PAGE: 1:704 | IU |
| 54 | color | KEY: (61a06abd401c) | U |
| 60 | color | PAGE: 1:704 | IX |
| 60 | color | KEY: (8194443284a0) | X |
| 60 | color | KEY: (61a06abd401c) | U |
+------+------------------+---------------------+----------+
Changing its own U/IU to X/IX is exactly what happened previously, but this time, 60 has also a U lock on the other session's key! Of course, when the delete of the other session happens, a deadlock is formed for this reason. This seems to happen regardless of holdlock by the way.
So...why this difference? Will I have to make do without the temp table in order to avoid deadlocks, or is there some workaround?

The execution plan for
delete color where id=2
is very simple
It just has a single "Clustered Index Delete" operator with an equality seek predicate on id to tell it the row to delete.
The locks taken out in this case are an IX lock on the table, an IX lock on the page containing the row, and finally an X lock on the key of the row to be deleted. No U locks are taken at all.
When using the temp table the execution plan looks as follows.
It reads each row from color in turn and acquires a U lock on it. If the semi join operator finds that there was a match for that row in the temp table the U lock is converted to an X lock and the row is deleted. Otherwise the U lock is released.
If the execution plan was driven by the temp table instead then it could avoid reading and locking unneeded rows in color.
One way of doing this would be to write the query as
DELETE color
FROM color WITH (forceseek)
WHERE id IN (SELECT #t.id
FROM #t)
The execution plan now reads the one row in #t - checks whether it exists in color (taking a U lock on just that row) and then deletes it.
As there is no constraint on #t ensuring that id is unique it removes duplicates first rather than potentially attempting to delete the same row multiple times.

Related

Optimize SQL query Select on Select Case

I was looking for some threads in here that mention optimization in queries, but i couldn't resolve my problem.
I need to perform a query in SQL Server that involve using a select case on my primary select, this is the description of the main table:
WS:
| Oid | model_code | product_code | year |
In my query, I need to select all of this columns plus an extra column that compares to another table if by some criteria the values from my main table exist on my other table, let me explain my other table and then I explain what i mean by this.
TA:
| Oid | model_code | product_code | year |
Both tables have matching columns, so for example, if on my table WS I have this result:
| Oid | model_code | product_code | year |
| 1 | 13 | 123 | 2018 |
And on my TA table I have this:
| Oid | model_code | product_code | year |
| 1 | 25 | 134 | 2016 |
| 2 | 13 | 123 | 2018 |
| 3 | 67 | 582 | 2017 |
I need to print an "Exist" result on that row because the row on my main table match exactly with this 3 column values.
So my query on that row should print something like this:
| model_code | product_code | year | Exist |
| 13 | 123 | 2018 | Yes |
The query I was trying to use to make this happen, was this:
SELECT
WS.Oid, WS.model_code, WS.product_code, Ws.year,
(SELECT
CASE
WHEN EXISTS (SELECT 1 FROM TA
WHERE TA.model_code = Ws.model_code
AND TA.product_code = Ws.product_code
AND TA.[Year] = Ws.[Year])
THEN 'Yes'
ELSE 'No'
END) as 'Exist'
FROM
Ws
And it works, the problem is that on my real tables there are more columns and more rows (about 960,000) and for example, a query around 50,000 elements (using this query) takes more than a minute, and the same query with same elements but without the select case, takes about 2 seconds, so the difference is immense.
I'm sure that a more viable way to achieve this exist, in less time, but I don't know how. any recommendations?

Unless already there, an index on ta (model_code, product_code, year) might help.
CREATE INDEX ta_model_code_product_code_year
ON ta (model_code,
product_code,
year);
Though chances are that the optimizer already rewrites your query in such a way, another thing you could try is to (explicitly) rewrite the query using a left join. I assume oid is NOT NULL in ta.
SELECT ws.oid,
ws.model_code,
ws.product_code,
ws.year,
CASE
WHEN ta.oid IS NULL THEN
'No'
ELSE
'Yes'
END exist
FROM ws
LEFT JOIN ta
ON ta.model_code = ws.model_code
AND ta.product_code = ws.product_code
AND ta.year = ws.year;
With that you want the index from above and maybe try one one ws (model_code, product_code, year) too.
CREATE INDEX ws_model_code_product_code_year
ON ws (model_code,
product_code,
year);
You might also want to play with the order of the columns in the indexes. If for a column more distinct values exist in ta, put it before a column where fewer distinct values exist in ta. But keep the order in both indexes identical, i.e. if you shift a column in the index on ta also move it in the index on ws the same way.

What you want to do is join the two tables together, instead of looking for a matching record for each record. Try something like this:
SELECT
WS.model_code, WS.product_code, Ws.year,
SELECT CASE
WHEN TA.OID IS NOT NULL THEN 'Yes'
ELSE 'No'
END As 'Exist'
FROM WS LEFT OUTER JOIN TA ON
TA.model_code = Ws.model_code
AND TA.product_code = Ws.product_code
AND TA.[Year] = Ws.[Year]
That will print all of the records from the WS table, and if there's a matching record in the TA table, the 'Exist' column will say 'Yes', otherwise it will say 'No'.
This uses one query to do everything. Your original approach would do a completely separate sub-query to check the TA table, and that is creating your performance issue.
You may also want to look at putting indexes on these 3 fields in each table to make the matching go even faster.

SQL Merge Duplicate Values in Table and Related Mapping Table

I have two tables. One is the parent data table, the other is a mapping table for fulfilling a many-to-many relationship between this parent data table and the main table. My problem is that the parent and mapping tables have duplicate values that need to be merged. I can seemingly remove the duplicates from the parent table, but the mapping table needs to have the duplicate data merged in the same fashion. There is a FK and related cascading delete/update on the Mapping Table. How do I ensure the merges from the following statement also get reflected in the Mapping Table?
Before
Parent Table_A:
| ID | ProductName | MFG_ID |
|------+-------------+------------+
| 1 | ACME_123 | 123 |
| 2 | ACME_123 | 456 |
Mapping Table
| ID | MainRecordID | ParentTable.MFG_ID|
|------+--------------+-----------------------+
| 1 | 1 | 123 |
| 2 | 2 | 456 |
Desired After
Parent Table_A:
| ID | ProductName | MFG_ID|
|------+-------------+------------+
| 1 | ACME_123 | 123 |
Mapping Table
| ID | MainRecordID | ParentTable.MFG_ID|
|------+--------------+-----------------------+
| 1 | 1 | 123 |
| 2 | 2 | 123 |
Proposed Code to Merge Table_A Duplicates
MERGE Table_A
USING
(
SELECT
MIN(ID) ID,
ProductName,
MIN(MFG_ID) MFG_ID,
FROM Table_A
GROUP BY ProductName
) NewData ON Table_A.ID = NewData.ID
WHEN MATCHED THEN
UPDATE SET
Table_A.ProductName = NewData.ProductName
WHEN NOT MATCHED BY SOURCE THEN DELETE;

Split it into two separate statements wrapped in an explicit transaction instead of a merge. Something like this:
declare #src table
(
Id int,
ProductName varchar(128),
MFG_ID int
)
set xact_abort on
insert into #src
select
Id = min(ID),
ProductName = ProductName,
MFG_ID = MIN(MFG_ID) ,
from Table_A
group by ProductName
begin tran
delete o
from Table_A o
where not exists
(
select 1
from #src i
where o.id = i.id
)
update t
set ProductName = s.ProductName
from Table_A t
inner join #Src s
on t.Id = s.Id
commit tran

How is BLOB stored in an indexed view?

The Question
Assuming I make an indexed view on a table containing a varbinary(max) column, will the binary content be physically copied into the indexed view's B-Tree, or will the original fields just be "referenced" somehow, without physically duplicating their content?
In other words, if I make an indexed view on a table containing BLOBs, will that duplicate the storage needed for BLOBs?
More Details
When using a full-text index on binary data, such as varbinary(max), we need an additional "filter type" column to specify how to extract text from that binary data so it can be indexed, something like this:
CREATE FULLTEXT INDEX ON <table or indexed view> (
<data column> TYPE COLUMN <type column>
)
...
In my particular case, these fields are in different tables, and I'm trying to use indexed view to join them together, so they can be used in a full-text index.
Sure, I could copy the type field into the BLOB table and maintain it manually (keeping it synchronized with the original), but I'm wondering if I can make the DBMS do it for me automatically, which would be preferable unless there is a steep price to pay in terms of storage.
Also, merging these two tables into one would have negative consequences of its own, not to go into too much details here...

will that duplicate the storage needed for BLOBs?
Yes. The indexed view will have its own copy.
You can see this from
CREATE TABLE dbo.T1
(
ID INT IDENTITY PRIMARY KEY,
Blob VARBINARY(MAX)
);
DECLARE #vb VARBINARY(MAX) = CAST(REPLICATE(CAST('ABC' AS VARCHAR(MAX)), 1000000) AS VARBINARY(MAX));
INSERT INTO dbo.T1
VALUES (#vb),
(#vb),
(#vb);
GO
CREATE VIEW dbo.V1
WITH SCHEMABINDING
AS
SELECT ID,
Blob
FROM dbo.T1
GO
CREATE UNIQUE CLUSTERED INDEX IX
ON dbo.V1(ID)
SELECT o.NAME AS object_name,
p.index_id,
au.type_desc AS allocation_type,
au.data_pages,
partition_number,
au.total_pages,
au.used_pages
FROM sys.allocation_units AS au
JOIN sys.partitions AS p
ON au.container_id = p.partition_id
JOIN sys.objects AS o
ON p.object_id = o.object_id
WHERE o.object_id IN ( OBJECT_ID('dbo.V1'), OBJECT_ID('dbo.T1') )
Which returns
+-------------+----------+-----------------+------------+------------------+-------------+------------+
| object_name | index_id | allocation_type | data_pages | partition_number | total_pages | used_pages |
+-------------+----------+-----------------+------------+------------------+-------------+------------+
| T1 | 1 | IN_ROW_DATA | 1 | 1 | 2 | 2 |
| T1 | 1 | LOB_DATA | 0 | 1 | 1129 | 1124 |
| V1 | 1 | IN_ROW_DATA | 1 | 1 | 2 | 2 |
| V1 | 1 | LOB_DATA | 0 | 1 | 1129 | 1124 |
+-------------+----------+-----------------+------------+------------------+-------------+------------+

Prevent deleting single row in SQL Server but let other rows be deleted in same transaction

Lets say I have the following table:
PKID | UID | FKID
-----------------
1 | ABC | 1
2 | BCD | 2
3 | CDE | 2
4 | DEF | 1
5 | EFG | 3
What I want to do is block deletes with a trigger (or other way if there is a better way to do this) but only for the rows where FKID = 1 but still allow other rows to be deleted. So, if someone types DELETE FROM sampleTable I would want only rows 2, 3, and 5 to be deleted, and 1 and 4 should remain.

Try this.
CREATE TRIGGER dbo.trg_tablename_delete ON dbo.tablename
FOR DELETE
AS
SET NOCOUNT ON
IF EXISTS (SELECT * FROM deleted WHERE FKID = 1)
BEGIN
RAISERROR ('Cannot delete this record!', 0, 1) WITH NOWAIT
ROLLBACK
END
SET NOCOUNT OFF
GO

SQL Server insertion performance

Let's suppose I have the following table with a clustered index on a column (say, a)
CREATE TABLE Tmp
(
a int,
constraint pk_a primary key clustered (a)
)
Then, let's assume that I have two sets of a very large number of rows to insert to the table.
1st set) values are sequentially increasing (i.e., {0,1,2,3,4,5,6,7,8,9,..., 999999997, 999999998, 99999999})
2nd set) values are sequentially decreasing (i.e., {99999999,999999998,999999997, ..., 3,2,1,0}
do you think there would be performance difference between inserting values in the first set and the second set? If so, why?
thanks

SQL Server will generally try and sort large inserts into clustered index order prior to insert anyway.
If the source for the insert is a table variable however then it will not take account of the cardinality unless the statement is recompiled after the table variable is populated. Without this it will assume the insert will only be one row.
The below script demonstrates three possible scenarios.
The insert source is already exactly in correct order.
The insert source is exactly in reversed order.
The insert source is exactly in reversed order but OPTION (RECOMPILE) is used so SQL Server compiles a plan suited for inserting 1,000,000 rows.
Execution Plans
The third one has a sort operator to get the inserted values into clustered index order first.
/*Create three separate identical tables*/
CREATE TABLE Tmp1(a int primary key clustered (a))
CREATE TABLE Tmp2(a int primary key clustered (a))
CREATE TABLE Tmp3(a int primary key clustered (a))
DBCC FREEPROCCACHE;
GO
DECLARE #Source TABLE (N INT PRIMARY KEY (N ASC))
INSERT INTO #Source
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM sys.all_columns c1, sys.all_columns c2, sys.all_columns c3
SET STATISTICS TIME ON;
PRINT 'Tmp1'
INSERT INTO Tmp1
SELECT TOP (1000000) N
FROM #Source
ORDER BY N
PRINT 'Tmp2'
INSERT INTO Tmp2
SELECT TOP (1000000) 1000000 - N
FROM #Source
ORDER BY N
PRINT 'Tmp3'
INSERT INTO Tmp3
SELECT 1000000 - N
FROM #Source
ORDER BY N
OPTION (RECOMPILE)
SET STATISTICS TIME OFF;
Verify Results and clean up
SELECT object_name(object_id) AS name,
page_count,
avg_fragmentation_in_percent,
fragment_count,
avg_fragment_size_in_pages
FROM
sys.dm_db_index_physical_stats(db_id(), object_id('Tmp1'), 1, NULL, 'DETAILED')
WHERE index_level = 0
UNION ALL
SELECT object_name(object_id) AS name,
page_count,
avg_fragmentation_in_percent,
fragment_count,
avg_fragment_size_in_pages
FROM
sys.dm_db_index_physical_stats(db_id(), object_id('Tmp2'), 1, NULL, 'DETAILED')
WHERE index_level = 0
UNION ALL
SELECT object_name(object_id) AS name,
page_count,
avg_fragmentation_in_percent,
fragment_count,
avg_fragment_size_in_pages
FROM
sys.dm_db_index_physical_stats(db_id(), object_id('Tmp3'), 1, NULL, 'DETAILED')
WHERE index_level = 0
DROP TABLE Tmp1, Tmp2, Tmp3
STATISTICS TIME ON results
+------+----------+--------------+
| | CPU Time | Elapsed Time |
+------+----------+--------------+
| Tmp1 | 6718 ms | 6775 ms |
| Tmp2 | 7469 ms | 7240 ms |
| Tmp3 | 7813 ms | 9318 ms |
+------+----------+--------------+
Fragmentation Results
+------+------------+------------------------------+----------------+----------------------------+
| name | page_count | avg_fragmentation_in_percent | fragment_count | avg_fragment_size_in_pages |
+------+------------+------------------------------+----------------+----------------------------+
| Tmp1 | 3345 | 0.448430493 | 17 | 196.7647059 |
| Tmp2 | 3345 | 99.97010463 | 3345 | 1 |
| Tmp3 | 3345 | 0.418535127 | 16 | 209.0625 |
+------+------------+------------------------------+----------------+----------------------------+
Conclusion
In this case all three of them ended up using exactly the same number of pages. However Tmp2 is 99.97% fragmented compared with only 0.4% for the other two. The insert to Tmp3 took the longest as this required an additional sort step first but this one time cost needs to be set against the benefit to future scans against the table of minimal fragmentation.
The reason why Tmp2 is so heavily fragmented can be seen from the below query
WITH T AS
(
SELECT TOP 3000 file_id, page_id, a
FROM Tmp2
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%)
ORDER BY a
)
SELECT file_id, page_id, MIN(a), MAX(a)
FROM T
group by file_id, page_id
ORDER BY MIN(a)
With zero logical fragmentation the page with the next highest key value would be the next highest page in the file but the pages are exactly in the opposite order of what they are supposed to be.
+---------+---------+--------+--------+
| file_id | page_id | Min(a) | Max(a) |
+---------+---------+--------+--------+
| 1 | 26827 | 0 | 143 |
| 1 | 26826 | 144 | 442 |
| 1 | 26825 | 443 | 741 |
| 1 | 26824 | 742 | 1040 |
| 1 | 26823 | 1041 | 1339 |
| 1 | 26822 | 1340 | 1638 |
| 1 | 26821 | 1639 | 1937 |
| 1 | 26820 | 1938 | 2236 |
| 1 | 26819 | 2237 | 2535 |
| 1 | 26818 | 2536 | 2834 |
| 1 | 26817 | 2835 | 2999 |
+---------+---------+--------+--------+
The rows arrived in descending order so for example values 2834 to 2536 were put into page 26818 then a new page was allocated for 2535 but this was page 26819 rather than page 26817.
One possible reason why the insert to Tmp2 took longer than Tmp1 is because as the rows are being inserted in exactly reverse order on the page every insert to Tmp2 means the slot array on the page needs to be rewritten with all previous entries moved up to make room for the new arrival.

To answer this question, you only need to look up what effect clustering has on data and the manner in which it is logically ordered. By clustering ascending, higher numbers get added on to the end of the table; inserts will be very fast. When inserting in reverse, it will be inserted in between two other records (read up on page splitting); this will result in slower inserts. This actually has other negative effects as well (read up on fill factor).

It has to do with allocating pages sequentially as is done for a clustered index. With the first they would naturally cluster together. But in the second, I think you would have to keep moving the page locations to have them sequentially ascending. However, I really only understand SQL server at a conceptual level, so you'd have to test.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Different lock behaviour when using `IN` - sql-server

Related

Optimize SQL query Select on Select Case

SQL Merge Duplicate Values in Table and Related Mapping Table

How is BLOB stored in an indexed view?

Prevent deleting single row in SQL Server but let other rows be deleted in same transaction

SQL Server insertion performance

Categories

Resources