I am querying a table in SQL Server DB that gets continuous inserts from other sources. The SELECT statement used to read the data from this table is used in my ETL job and it queries only a selected partition in the table.
SELECT *
FROM REALTIMESRC
WHERE PARTITION = '2018-11';
I understand that SELECT statement by default introduces a Shared Lock on the rows that it selects.
When this table gets inserts from other sources in the same partition where I am querying, does data insert get impacted due to my Select operation?
I am presuming that shared lock introduced by Select statement will be applicable at row table and doesn't apply to new inserts which happens in parallel. Can someone please clarify this?
I understand that SELECT statement by default introduces a shared lock on the rows that it selects.
That is correct, yes.
When this table gets inserts from other sources in the same partition
where I am querying, does data insert get impacted due to my Select operation?
No, since the insert only introduces new rows that you haven't selected, there shouldn't be any problem.
I am presuming that shared lock introduced by Select statement will be applicable at row table and doesn't apply to new inserts which happens in parallel.
Yes, that is correct - the INSERT and SELECT should work just fine in parallel.
There might be some edge cases where you could run into trouble:
if the INSERT statement tries to insert more than 5000 rows in a single transaction, SQL Server might opt to escalate those 5000 individual locks into a table-level exclusive lock - at which point no more SELECT operations would be possible until the INSERT transaction completes
Related
A common case for DB transactions is performing operations on multiple tables, as you can then easily rollback all operations if one fails. However, a common scenario I run into is wanting to insert records to multiple tables where the later inserts need the serial ID from the previous inserts.
Since the ID is not generated/available until the transaction is actually committed, how can one accomplish this? If you have to commit after the first insert in order to get the ID and then execute the second insert, it seems to defeat the purpose of the transaction in the first place because after committing (or if I don't use a transaction at all) I cannot rollback the first insert if the second insert fails.
This seems like such a common use case for DB transactions that I can't imagine it would not be supported in some way. How can this be accomplished?
cte (common table expression) with data modifying statements should cover your need, see the manual.
Typical example :
WITH cte AS (INSERT INTO table_A (id) VALUES ... RETURNING id)
INSERT INTO table_B (id) SELECT id FROM cte
see the demo in dbfiddle
If I am performing SELECTS on a table, is it possible that a SELECT query on a table can block INSERTS into the same table?
A SELECT will take a shared lock on the rows, but it shouldn't effect inserts correct?
The query is a LIKE clause - will this block an insert? Does it have the potential?
SELECT * FROM USERS WHERE Description LIKE '%HELLO%'
Reference:
I read this response SQL Server SELECT statements causing blocking and I am confused how this would block an insert.
A SELECT will take a shared lock on the rows, but it shouldn't effect
inserts correct?
No, it's not exact.
When you make a SELECT, it can acquire shared locks on pages and even on the whole table,
you can test it by yourself by using paglock or tablock hints (of course you should use repeatable read or serializable to see them, as all the shared locks in read committed are released as soon as they are no needed anymore).
The same situation can be modelled this way:
if object_id('dbo.t') is not null drop table dbo.t;
select top 10000 cast(row_number() over(order by getdate()) as varchar(10)) as col
into dbo.t
from sys.columns c1
cross join sys.columns c2;
set transaction isolation level serializable;
begin tran
select *
from dbo.t
where col like '%0%';
select resource_type,
request_mode,
count(*) as cnt
from sys.dm_tran_locks
where request_session_id = ##spid
group by resource_type,
request_mode;
Here you see lock escalation result, my query wanted more than 5000 locks per statement so instead of them server took only one lock, shared lock on the table.
Now if you try to insert any row in this table from another session, your INSERT will be blocked.
This is because any insert first need to acquire IX on a table and IX on a page, but IX on a table is incompatible with S on the same table, so it will be blocked.
This way your select could block your insert.
To see what exactly happens on your server you should use sys.dm_tran_locks filtered by both your session_id.
General info - this is called SQL Server Concurrency and in SQL Server you will find two models:
Pessimistic;
Optimistic.
Answering your question - yes, you can block any insert during read and this is called "Pessimistic Concurrency". However, this model comes with specific properties and you have to be careful because:
Data being read is locked, so that no other user can modify the data;
Data being modified is locked, so that no other user can read or modify the data;
The number of locks acquired is high because every data access operation (read/write) acquires a lock;
Writers block readers and other writers. Readers block writers.
The point is that you should use Pessimistic Concurrency only if the locks are held for a short period of time and only if the cost of each lock is lower than rolling back the transaction in case of a conflict, as Neeraj said.
I would recommend to read more about isolation levels applying both Pessimistic and Optimistic models here.
EDIT - I found a very detailed explanation about isolation levels on Stack, here.
I know that NOLOCK is default for SELECT operations. So, if I even don't write with (NOLOCK) keyword for a select query, the row won't be locked.
I couldn't find what happens if with (ROWLOCK) is not specified for UPDATE and DELETE query. Is there a difference between below queries?
UPDATE MYTABLE set COLUMNA = 'valueA';
and
UPDATE MYTABLE WITH (ROWLOCK) set COLUMNA = 'valueA';
If there is no hint, then the db engine chooses the LOCK mdoe as a function of the operation(select/modify), the level of isolation and granularity, and the possibility of escalating the granularity level. Specifying ROWLOCKX does not give 100% of the result of the fact that it will be X on a rows. In general, a very large topic for such a broad issue
Read first about Lock Modes that https://technet.microsoft.com/en-us/library/ms175519(v=sql.105).aspx
If
In statement 1 (without rowlock) the DBMS decides to lock the entire table or the page that updating record is in it. so it means while updating the row all or number of other rows in the table are locked and could not be updated or deleted.
Statement 2 (with (ROWLOCK)) suggests the DBMS to only lock the record that is being updated. But be ware that this is just a HINT and there is no guarantee that the it will be accepted by the DBMS.
So, if I even don't write with (NOLOCK) keyword for a select query, the row won't be locked.
select queries always take a lock and it is called shared lock and duration of the lock depends on your isolation level
Is there a difference between below queries?
UPDATE MYTABLE set COLUMNA = 'valueA';
and
UPDATE MYTABLE WITH (ROWLOCK) set COLUMNA = 'valueA';
Suppose your first statement affects more than 5000 locks,locks will be escalated to table ,but with rowlock ...SQLServer won't lock total table
When executing a SELECT statement with a JOIN of two tables SQL Server seems to
lock both tables of the statement individually. For example by a query like
this:
SELECT ...
FROM
table1
LEFT JOIN table2
ON table1.id = table2.id
WHERE ...
I found out that the order of the locks depends on the WHERE condition. The
query optimizer tries to produce an execution plan that only reads as much
rows as necessary. So if the WHERE condition contains a column of table1
it will first get the result rows from table1 and then get the corresponding
rows from table2. If the column is from table2 it will do it the other way
round. More complex conditions or the use of indexes may have an effect on
the decision of the query optimizer too.
When the data read by a statement should be updated later in the transaction
with UPDATE statements it is not guaranteed that the order of the UPDATE
statements matches the order that was used to read the data from the 2 tables.
If another transaction tries to read data while a transaction is updating the
tables it can cause a deadlock when the SELECT statement is executed in
between the UPDATE statements because neither the SELECT can get the lock on
the first table nor can the UPDATE get the lock on the second table. For
example:
T1: SELECT ... FROM ... JOIN ...
T1: UPDATE table1 SET ... WHERE id = ?
T2: SELECT ... FROM ... JOIN ... (locks table2, then blocked by lock on table1)
T1: UPDATE table2 SET ... WHERE id = ?
Both tables represent a type hierarchy and are always loaded together. So it
makes sense to load an object using a SELECT with a JOIN. Loading both tables
individually would not give the query optimizer a chance to find the best
execution plan. But since UPDATE statements can only update one table at a
time this can causes deadlocks when an object is loaded while the object
is updated by another transaction. Updates of objects often cause UPDATEs on
both tables when properties of the object that belong to different types of the
type hierarchy are updated.
I have tried to add locking hints to the SELECT statement, but that does not
change the problem. It just causes the deadlock in the SELECT statements when
both statements try to lock the tables and one SELECT statement gets the lock
in the opposite order of the other statement. Maybe it would be possible to
load data for updates always with the same statement forcing the locks to be
in the same order. That would prevent a deadlock between two transactions that
want to update the data, but would not prevent a transaction that only reads
data to deadlock which needs to have different WHERE conditions.
The only work-a-round so this so far seems to be that reads may not get locks
at all. With SQL Server 2005 this can be done using SNAPSHOT ISOLATION. The
only way for SQL Server 2000 would be to use the READ UNCOMMITED isolation
level.
I would like to know if there is another possibilty to prevent the SQL Server
from causing these deadlocks?
This will never happen under snapshot isolation, when readers do not block writers. Other than that, there is no way to prevent such things. I wrote a lot of repro scripts here: Reproducing deadlocks involving only one table
Edit:
I don't have access to SQL 2000, but I would try to serialize access to the object by using sp_getapplock, so that reading and modifications never run concurrently. If you cannot use sp_getapplock, roll out your own mutex.
Another way to fix this is to split the select... from... join into multiple select statements. Set the isolation level to read committed. Use table variable to pipe data from select to be joined to other. Use distinct to filter down inserts into these table variables.
So if I've two tables A, B. I'm inserting/updating into A and then B. Where as the sql's query optimizer prefers to read B first and A. I'll split the single select into 2 selects. First I'll read B. Then pass on this data to next select statement which reads A.
Here deadlock won't happen because the read locks on table B will be released as soon as 1st statement is done.
PS I've faced this issue and this worked very good. Much better than my force order answer.
I was facing the same issue. Using query hint FORCE ORDER will fix this issue. The downside is you won't be able to leverage best plan that query optimizer has for your query, but this will prevent the deadlock.
So (this is from "Bill the Lizard" user) if you have a query FROM table1 LEFT JOIN table2 and your WHERE clause only contains columns from table2 the execution plan will normally first select the rows from table2 and then look up the rows from table1. With a small result set from table2 only a few rows from table1 have to be fetched. With FORCE ORDER first all rows from table1 have to be fetched, because it has no WHERE clause, then the rows from table2 are joined and the result is filtered using the WHERE clause. Thus degrading performance.
But if you know this won't be the case, use this. You might want to optimize the query manually.
The syntax is
SELECT ...
FROM
table1
LEFT JOIN table2
ON table1.id = table2.id
WHERE ...
OPTION (FORCE ORDER)
My kindergarten SQL Server taught me that a trigger may be fired with multiple rows in the inserted and deleted pseudo tables. I mostly write my trigger code with this in mind, often resulting in some cursor based cludge. Now I'm really only able to test them firing for a single row at a time. How can I generate a multirow trigger and will SQL Server actually ever send a multirow trigger? Can I set a flag so that SQL Server will only fire single row triggers??
Trigger definitions should always handle multiple rows.
Taken from SQLTeam:
-- BAD Trigger code following:
CREATE TRIGGER trg_Table1
ON Table1
For UPDATE
AS
DECLARE #var1 int, #var2 varchar(50)
SELECT #var1 = Table1_ID, #var2 = Column2
FROM inserted
UPDATE Table2
SET SomeColumn = #var2
WHERE Table1_ID = #var1
The above trigger will only work for the last row in the inserted table.
This is how you should implement it:
CREATE TRIGGER trg_Table1
ON Table1
FOR UPDATE
AS
UPDATE t2
SET SomeColumn = i.SomeColumn
FROM Table2 t2
INNER JOIN inserted i
ON t2.Table1_ID = i.Table1_ID
Yes, if a statement affects more than one row, it should be handled by a single trigger call, as you might want to revert the whole transaction. It is not possible to split it to separate trigger calls logically and I don't think SQL Server provides such a flag. You can make SQL Server call your trigger with multiple rows by issuing an UPDATE or DELETE statement that affects multiple rows.
First it concerns me that you are making the triggers handle multiple rows by using a cursor. Do not do that! Use a set-based statment instead jioining to the inserted or deleted pseudotables. Someone put one of those cursor based triggerson our database before I came to work here. It took over forty minutes to handle a 400,00 record insert (and I often have to do inserts of over 100,000 records to this table for one client). Changing it to a set-based solution changed the time to less than a minute. While all triggers must be capable of handling multiple rows, you must not do so by creating a performance nightmare.
If you can write a select statment for the cusor, you can write an insert, update or delete based on the same select statment which is set-based.
I've always written my triggers to handle multiple rows, it was my understanding that if a single query inserted/updated/deleted multiple rows then only one trigger would fire and as such you would have to use a cursor to move through the records one by one.
One SQL statement always invokes one trigger execution - that's part of the definition of a trigger. (It's also a circumstance that seems to at least once trip up everyone who writes a trigger.) I believe you can discover how many records are being affected by inspecting ##ROWCOUNT.