dirty reads: Different results within single query? - sql-server

In SQL Server 2014, when I issue the following SQL:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT * FROM TableA
UNION ALL
SELECT * FROM TableB
WHERE NOT EXISTS (
SELECT 1 FROM TableA WHERE TableA.ID = TableB.ID
)
Is it possible to read different versions of one table even within a single Statement because of dirty reads?
Example: Reading 2 Rows from TableA in the first part of the union but reading just 1 Row from TableA in the inner select of the second part of the union because one row got deleted by another transaction meanwhile.

Short answer: yes, depending on the execution plan generated. It doesn't matter that you're doing it in a single statement; no special privileges are associated with a statement boundary. READ UNCOMMITTED means no locking on data for any reason, and that's exactly what you'll get. This is also why using that generally is very much not recommended; it's terribly easy to get inconsistent/"impossible" results. Heck, even a single SELECT is not safe: you're not even guaranteed that rows will not be skipped or duplicated!

Seems to me its god damn possible.
The query execution plan will look like this:
It looks like there will be two different reads from TableA, so it really depends on time delay between them and amount of CRUD operation made to those table.
READ UNCOMMITTED is really not so great choice for such query.

Related

How to use results from first query in later queries within a DB Transaction

A common case for DB transactions is performing operations on multiple tables, as you can then easily rollback all operations if one fails. However, a common scenario I run into is wanting to insert records to multiple tables where the later inserts need the serial ID from the previous inserts.
Since the ID is not generated/available until the transaction is actually committed, how can one accomplish this? If you have to commit after the first insert in order to get the ID and then execute the second insert, it seems to defeat the purpose of the transaction in the first place because after committing (or if I don't use a transaction at all) I cannot rollback the first insert if the second insert fails.
This seems like such a common use case for DB transactions that I can't imagine it would not be supported in some way. How can this be accomplished?
cte (common table expression) with data modifying statements should cover your need, see the manual.
Typical example :
WITH cte AS (INSERT INTO table_A (id) VALUES ... RETURNING id)
INSERT INTO table_B (id) SELECT id FROM cte
see the demo in dbfiddle

Which type of locking mode for INSERT, UPDATE or DELETE operations in Sql Server?

I know that NOLOCK is default for SELECT operations. So, if I even don't write with (NOLOCK) keyword for a select query, the row won't be locked.
I couldn't find what happens if with (ROWLOCK) is not specified for UPDATE and DELETE query. Is there a difference between below queries?
UPDATE MYTABLE set COLUMNA = 'valueA';
and
UPDATE MYTABLE WITH (ROWLOCK) set COLUMNA = 'valueA';
If there is no hint, then the db engine chooses the LOCK mdoe as a function of the operation(select/modify), the level of isolation and granularity, and the possibility of escalating the granularity level. Specifying ROWLOCKX does not give 100% of the result of the fact that it will be X on a rows. In general, a very large topic for such a broad issue
Read first about Lock Modes that https://technet.microsoft.com/en-us/library/ms175519(v=sql.105).aspx
If
In statement 1 (without rowlock) the DBMS decides to lock the entire table or the page that updating record is in it. so it means while updating the row all or number of other rows in the table are locked and could not be updated or deleted.
Statement 2 (with (ROWLOCK)) suggests the DBMS to only lock the record that is being updated. But be ware that this is just a HINT and there is no guarantee that the it will be accepted by the DBMS.
So, if I even don't write with (NOLOCK) keyword for a select query, the row won't be locked.
select queries always take a lock and it is called shared lock and duration of the lock depends on your isolation level
Is there a difference between below queries?
UPDATE MYTABLE set COLUMNA = 'valueA';
and
UPDATE MYTABLE WITH (ROWLOCK) set COLUMNA = 'valueA';
Suppose your first statement affects more than 5000 locks,locks will be escalated to table ,but with rowlock ...SQLServer won't lock total table

Does UNION or UNION all build one massive query that locks all tables selected?

I'm being told by my lead DBA that I wrote poorly formed code because I used a UNION ALL to accumulate results of successive queries on different tables. I thought when a query with multiple select statements that had results UNIONed executed separately so when each select statement executes it places a shared lock on the table that is released when finished and the next select starts.
I thought the results were accumulated in some buffer or tmp table.
Would someone kindly tell me what goes on behind the scenes and what resources consumed when a results of a hundred select statements are UNIONed. Each select operates on one table and collects schema, Table, and Column names.
Sorry, I don't have query plan. The DBA complained the query was too big to show much of the plan. His comments are below the query.
SELECT 'R_Stage' as TheSchema, 'DateFrozenSectionModF63x086' as TheTable, 'PersonModTextStaffSID' as TheColumn, COUNT(*) as NullCount
FROM [R_Stage].[DateFrozenSectionModF63x086] WHERE [PersonModTextStaffSID] = -1
UNION ALL
SELECT 'R_Stage' as TheSchema, 'DateFrozenSectionModF63x086' as TheTable, 'LabDataLabSubjectSID' as TheColumn, COUNT(*) as NullCount
FROM [R_Stage].[DateFrozenSectionModF63x086] WHERE [LabDataLabSubjectSID] = -1
UNION ALL
SELECT 'R_Stage' as TheSchema, 'DateFrozenSectionModF63x086' as TheTable, 'LabDataPatientSID' as TheColumn, COUNT(*) as NullCount
FROM [R_Stage].[DateFrozenSectionModF63x086] WHERE [LabDataPatientSID] = -1
UNION ALL
SELECT 'R_Stage' as TheSchema, 'DateGrossDescChangedF63x087' as TheTable, 'PersonModTextStaffSID' as TheColumn, COUNT(*) as NullCount
FROM [R_Stage].[DateGrossDescChangedF63x087] WHERE [PersonModTextStaffSID] = -1
UNION
ALL SELECT 'R_Stage' as TheSchema, 'DateGrossDescChangedF63x087' as TheTable, 'LabDataLabSubjectSID' as TheColumn, COUNT(*) as NullCount
FROM [R_Stage].[DateGrossDescChangedF63x087] WHERE [LabDataLabSubjectSID] = -1
UNION ALL
SELECT 'R_Stage' as TheSchema, 'DateGrossDescChangedF63x087' as TheTable, 'LabDataPatientSID' as TheColumn, COUNT(*) as NullCount
FROM [R_Stage].[DateGrossDescChangedF63x087] WHERE [LabDataPatientSID] = -1
UNION ALL
In any case the query above could have certainly been written in a much more efficient way. As written for every table in the query it will scan the entire table for every UNION which is 791 times. Just looking at the first few lines of the query we can see these are just count’s from the same table which could have been done with a single scan of this table using a CASE expression for the count and you would have gotten all the counts in one pass per table.
The bottom line is that right now we only have a few users on the FRE and processes like this are already affecting many users / jobs. Imagine when we have hundreds to thousands of users. We simply can’t afford to run processes that are not vetted or properly tested like these two examples. This is nothing personal and should not be taken as such, it is all about the overall well being of the server and all the users. It is part of my job to point out such issues so they can be addressed when I see them and this is unquestionably one of those times. These can’t be run again until they are rewritten to ensure they do what they are intended to do and that they are efficient enough to not cause issues with other processes.
The advice from your DBA seems quite reasonable. He/she doesn't mention locking, and it's not clear why you've mentioned that as the problem.
As the DBA states, you're executing 791 queries that the database engine then unions together. This will impose a load on the database. Assuming your DBA is correct about those queries being full table scans, that means the entire table is going to be read 791 times.
Regardless of any locking, that is going to thrash the disks, overrun file system and database caches, and load up the CPU running those queries.
Assuming your database is large enough that it doesn't fit in the RAM file system or database cache, that means it has to be read from disk in full each time.
If the query were rewritten as your DBA advises so that it only made 1 full table scan through the database, the impact on the file system would be 1/791 of the query as currently written.
If your database does indeed take read locks at the same time, your query will impact updaters of that table 791 times.
Your DBA's recommendations have the effect of making the proposed query roughly 791 times as efficient.
If we assume just as a working example that your table is 100 meg, at a disk read speed of 100 mb/s it will take around 1 second to process each of 791 queries, so the full query would take around 14 minutes. Rewritten as your DBA advises it will take around 1 second.
This isn't a locking problem, it's a classic I/O performance problem. If you have locking problems as well, that just makes it worse.
The exact performance characteristics of your query depend on many factors, including how large the table is, what indexes are defined (noting that indexes can make a query slower in certain circumstances), how 'wide' the table is, the types of columns in the table, what hardware the query is running on, what database system you use, how fast the disks are, how much RAM your DB has, what else is happening on the system, and on and on. so it's not possible to give a definitive answer without a lot more information.
But avoiding 791 full table scans is a good start towards improved performance.
I'm sorry that post just made my eyes hurt. It sounds like you are needing to write a script to clean up or identify a problem. To make this easy You could automate as script that will spit out smallish testable sql statements code before you post up those 300 tables. If your dba will let you use cursors and temp tables, both of which should be avoided when possible, however, this seems more like an identify the problem and or clean up issue rather than focus on efficiency. That being said, I would not want to lock those tables up on a production system for periods of time...so do a lot of smaller task to reduce locks and reach the same goal. You can run this script in sql server admin and copy the output as input to give to your dba, maybe it helps.
SET NOCOUNT ON
DECLARE #OUTPUT TABLE
(
TheSchema NVARCHAR(45),
TheTable NVARCHAR(45),
Field1 NVARCHAR(45),
Field2 NVARCHAR(45),
Field3 NVARCHAR(45)
)
INSERT #OUTPUT SELECT 'R_Stage','DateFrozenSectionModF63x086','PersonModTextStaffSID','LabDataLabSubjectSID','LabDataPatientSID'
INSERT #OUTPUT SELECT 'R_Stage','DateFrozenSectionModF63x087','PersonModTextStaffSID','LabDataLabSubjectSID','LabDataPatientSID'
INSERT #OUTPUT SELECT 'R_Stage','DateFrozenSectionModF63x088','PersonModTextStaffSID','LabDataLabSubjectSID','LabDataPatientSID'
INSERT #OUTPUT SELECT 'R_Stage','DateFrozenSectionModF63x089','PersonModTextStaffSID','LabDataLabSubjectSID','LabDataPatientSID'
INSERT #OUTPUT SELECT 'R_Stage','DateFrozenSectionModF63x090','PersonModTextStaffSID','LabDataLabSubjectSID','LabDataPatientSID'
INSERT #OUTPUT SELECT 'R_Stage','DateFrozenSectionModF63x091','PersonModTextStaffSID','LabDataLabSubjectSID','LabDataPatientSID'
INSERT #OUTPUT SELECT 'R_Stage','DateFrozenSectionModF63x092','PersonModTextStaffSID','LabDataLabSubjectSID','LabDataPatientSID'
INSERT #OUTPUT SELECT 'R_Stage','DateFrozenSectionModF63x093','PersonModTextStaffSID','LabDataLabSubjectSID','LabDataPatientSID'
INSERT #OUTPUT SELECT 'R_Stage','DateFrozenSectionModF63x094','PersonModTextStaffSID','LabDataLabSubjectSID','LabDataPatientSID'
INSERT #OUTPUT SELECT 'R_Stage','DateFrozenSectionModF63x095','PersonModTextStaffSID','LabDataLabSubjectSID','LabDataPatientSID'
DECLARE #TheSchema NVARCHAR(45),#TheTable NVARCHAR(45),#Field1 NVARCHAR(45),#Field2 NVARCHAR(45),#Field3 NVARCHAR(45)
DECLARE LOOP CURSOR FOR
SELECT TheSchema,TheTable,Field1,Field2,Field3 FROM #OUTPUT
PRINT '
IF (EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = ''__MY_SCAN''))
DROP TABLE __MY_SCAN
CREATE TABLE __MY_SCAN(
TheShema NVARCHAR(45),
TheTable NVARCHAR(45),
Field1NullCount INT,
Field2NullCount INT,
Field3NullCount INT
)'
OPEN LOOP
FETCH NEXT FROM LOOP INTO #TheSchema,#TheTable,#Field1,#Field2,#Field3
WHILE(##FETCH_STATUS=0) BEGIN
PRINT
'INSERT __MY_SCAN
SELECT
'''+#TheSchema+''' AS '+#TheSchema+',
'''+#TheTable+''' AS '+#TheTable+',
COUNT(Field1),
COUNT(Field2),
COUNT(Field3)
FROM
(
SELECT
Field1=CASE WHEN '+#Field1+'=-1 THEN 1 ELSE 0 END,
Field2=CASE WHEN '+#Field2+'=-1 THEN 1 ELSE 0 END,
Field3=CASE WHEN '+#Field3+'=-1 THEN 1 ELSE 0 END
FROM
'+#TheTable+'
WHERE
'+#Field1+'=-1 OR '+#Field2+'=-1 OR '+#Field3+'=-1
)AS X
GO'
FETCH NEXT FROM LOOP INTO #TheSchema,#TheTable,#Field1,#Field2,#Field3
END
CLOSE LOOP
DEALLOCATE LOOP
PRINT '
SELECT * FROM __MY_SCAN
GO
DROP TABLE __MY_SCAN
GO
'

How to force SQL Server to process CONTAINS clauses before WHERE clauses?

I have a SQL query that uses both standard WHERE clauses and full text index CONTAINS clauses. The query is built dynamically from code and includes a variable number of WHERE and CONTAINS clauses.
In order for the query to be fast, it is very important that the full text index be searched before the rest of the criteria are applied.
However, SQL Server chooses to process the WHERE clauses before the CONTAINS clauses and that causes tables scans and the query is very slow.
I'm able to rewrite this using two queries and a temporary table. When I do so, the query executes 10 times faster. But I don't want to do that in the code that creates the query because it is too complex.
Is there an a way to force SQL Server to process the CONTAINS before anything else? I can't force a plan (USE PLAN) because the query is built dynamically and varies a lot.
Note: I have the same problem on SQL Server 2005 and SQL Server 2008.
You can signal your intent to the optimiser like this
SELECT
*
FROM
(
SELECT *
FROM
WHERE
CONTAINS
) T1
WHERE
(normal conditions)
However, SQL is declarative: you say what you want, not how to do it. So the optimiser may decide to ignore the nesting above.
You can force the derived table with CONTAINS to be materialised before the classic WHERE clause is applied. I won't guarantee performance.
SELECT
*
FROM
(
SELECT TOP 2000000000
*
FROM
....
WHERE
CONTAINS
ORDER BY
SomeID
) T1
WHERE
(normal conditions)
Try doing it with 2 queries without temp tables:
SELECT *
FROM table
WHERE id IN (
SELECT id
FROM table
WHERE contains_criterias
)
AND further_where_classes
As I noted above, this is NOT as clean a way to "materialize" the derived table as the TOP clause that #gbn proposed, but a loop join hint forces an order of evaluation, and has worked for me in the past (admittedly usually with two different tables involved). There are a couple of problems though:
The query is ugly
you still don't get any guarantees that the other WHERE parameters don't get evaluated until after the join (I'll be interested to see what you get)
Here it is though, given that you asked:
SELECT OriginalTable.XXX
FROM (
SELECT XXX
FROM OriginalTable
WHERE
CONTAINS XXX
) AS ContainsCheck
INNER LOOP JOIN OriginalTable
ON ContainsCheck.PrimaryKeyColumns = OriginalTable.PrimaryKeyColumns
AND OriginalTable.OtherWhereConditions = OtherValues

Deadlock caused by SELECT JOIN statement with SQL Server

When executing a SELECT statement with a JOIN of two tables SQL Server seems to
lock both tables of the statement individually. For example by a query like
this:
SELECT ...
FROM
table1
LEFT JOIN table2
ON table1.id = table2.id
WHERE ...
I found out that the order of the locks depends on the WHERE condition. The
query optimizer tries to produce an execution plan that only reads as much
rows as necessary. So if the WHERE condition contains a column of table1
it will first get the result rows from table1 and then get the corresponding
rows from table2. If the column is from table2 it will do it the other way
round. More complex conditions or the use of indexes may have an effect on
the decision of the query optimizer too.
When the data read by a statement should be updated later in the transaction
with UPDATE statements it is not guaranteed that the order of the UPDATE
statements matches the order that was used to read the data from the 2 tables.
If another transaction tries to read data while a transaction is updating the
tables it can cause a deadlock when the SELECT statement is executed in
between the UPDATE statements because neither the SELECT can get the lock on
the first table nor can the UPDATE get the lock on the second table. For
example:
T1: SELECT ... FROM ... JOIN ...
T1: UPDATE table1 SET ... WHERE id = ?
T2: SELECT ... FROM ... JOIN ... (locks table2, then blocked by lock on table1)
T1: UPDATE table2 SET ... WHERE id = ?
Both tables represent a type hierarchy and are always loaded together. So it
makes sense to load an object using a SELECT with a JOIN. Loading both tables
individually would not give the query optimizer a chance to find the best
execution plan. But since UPDATE statements can only update one table at a
time this can causes deadlocks when an object is loaded while the object
is updated by another transaction. Updates of objects often cause UPDATEs on
both tables when properties of the object that belong to different types of the
type hierarchy are updated.
I have tried to add locking hints to the SELECT statement, but that does not
change the problem. It just causes the deadlock in the SELECT statements when
both statements try to lock the tables and one SELECT statement gets the lock
in the opposite order of the other statement. Maybe it would be possible to
load data for updates always with the same statement forcing the locks to be
in the same order. That would prevent a deadlock between two transactions that
want to update the data, but would not prevent a transaction that only reads
data to deadlock which needs to have different WHERE conditions.
The only work-a-round so this so far seems to be that reads may not get locks
at all. With SQL Server 2005 this can be done using SNAPSHOT ISOLATION. The
only way for SQL Server 2000 would be to use the READ UNCOMMITED isolation
level.
I would like to know if there is another possibilty to prevent the SQL Server
from causing these deadlocks?
This will never happen under snapshot isolation, when readers do not block writers. Other than that, there is no way to prevent such things. I wrote a lot of repro scripts here: Reproducing deadlocks involving only one table
Edit:
I don't have access to SQL 2000, but I would try to serialize access to the object by using sp_getapplock, so that reading and modifications never run concurrently. If you cannot use sp_getapplock, roll out your own mutex.
Another way to fix this is to split the select... from... join into multiple select statements. Set the isolation level to read committed. Use table variable to pipe data from select to be joined to other. Use distinct to filter down inserts into these table variables.
So if I've two tables A, B. I'm inserting/updating into A and then B. Where as the sql's query optimizer prefers to read B first and A. I'll split the single select into 2 selects. First I'll read B. Then pass on this data to next select statement which reads A.
Here deadlock won't happen because the read locks on table B will be released as soon as 1st statement is done.
PS I've faced this issue and this worked very good. Much better than my force order answer.
I was facing the same issue. Using query hint FORCE ORDER will fix this issue. The downside is you won't be able to leverage best plan that query optimizer has for your query, but this will prevent the deadlock.
So (this is from "Bill the Lizard" user) if you have a query FROM table1 LEFT JOIN table2 and your WHERE clause only contains columns from table2 the execution plan will normally first select the rows from table2 and then look up the rows from table1. With a small result set from table2 only a few rows from table1 have to be fetched. With FORCE ORDER first all rows from table1 have to be fetched, because it has no WHERE clause, then the rows from table2 are joined and the result is filtered using the WHERE clause. Thus degrading performance.
But if you know this won't be the case, use this. You might want to optimize the query manually.
The syntax is
SELECT ...
FROM
table1
LEFT JOIN table2
ON table1.id = table2.id
WHERE ...
OPTION (FORCE ORDER)

Resources