Using NOLOCK on a table which is being joined to itself - sql-server

I'm working with an awful view which internally joins many, many tables together, some of which are the same table.
I'm wondering, when a table is being joined to itself, how is the NOLOCK hint interpreted if it's on one of the joins and not the other? Is the NOLOCK still in effect on the table, or is the table locked altogether if NOLOCK is not included on one of the joins of the same table?
For example (this is pseduo-code, assume that there are valid JOIN ON conditions):
SELECT *
FROM Table1 t1 (NOLOCK)
JOIN Table2 t2 (NOLOCK)
JOIN Table2_Table2 tt (NOLOCK)
JOIN Table2 t22 (NOLOCK)
JOIN Table1 t11
Does Table1 get locked or stay NOLOCKed?

Yes it does get locked by the last Table1 t11 call. Each table locking hint is applied to the specific reference. If you apply it to only one of the table references that is only for that reference and the others will have their own individual locking settings. You can test this using BEGIN TRANSACTION and execute two different queries.
Query 1 (locks the table)
Intentionally commenting out the COMMIT TRANSACTION
BEGIN TRANSACTION
SELECT *
FROM Table1 WITH (TABLOCK)
-- COMMIT TRANSACTION
Since COMMIT TRANSACTION was commented out, the transaction is not closed and will still hold the lock. When the second query is run the first lock will still apply on the table from the first query.
Query 2 (this query will hang because of the first lock will block on Table1 t11)
BEGIN TRANSACTION
SELECT *
FROM Table1 t1 (NOLOCK)
JOIN Table2 t2 (NOLOCK)
JOIN Table2_Table2 tt (NOLOCK)
JOIN Table2 t22 (NOLOCK)
JOIN Table1 t11
COMMIT TRANSACTION

I would guess that not using nolock is going to result in some type of locking, regardless if it is joined elsewhere in the query with nolock. So it would result in a row lock likely, so put nolock next to the join that is missing it.

In very simplified terms, think of it like this: Each of the tables you reference in a query results in a physical execution plan operator accessing that table. Table hints apply to that operator. This means that you can have mixed locking hints for the same table. The locking behavior that you request is applied to those rows that this particular operator happens to read. The respective operator might scan a table, or scan a range of rows, or read a single row. Whatever it is, it is performed under the specified locking options.
Look at the execution plan for your query to find the individual operators.

Related

Deadlock in dependent multiple update statements

In a SP, three tables are getting updated in a single transaction. These update are dependent on each other. But intermittently deadlock is happening during this update. It is not happening consistently but rather intermittently.
A WCF service is being called and that calls the SP. The input of the SP is a XML. The XML is parsed wing the OPENXML method and the values are used to update the tables.
#Table is a table variable ,populated by OPENXML on applying the input XML of the SP. The input XML contains only one ID.
<A>
<Value>XYZ</Value>
<ID>1</ID>
</A>
BEGIN TRAN
--update Table1
Update Table1
Set ColumnA = A.value
JOIN #Table A
ON Table1.ID = A.ID
--update Table2
Update Table2
Set ColumnA = Table1.ColumnA
JOIN Table1
ON Table1.ID = Table2.ID
--update Table3
Update Table3
Set ColumnA = Table1.ColumnA
JOIN Table1
ON Table1.ID = Table3.ID
COMMIT TRAN
In Table1 , ID column is primary key.
In Table2, in ID column no index are available.
Here sometimes deadlock is happening while updating Table2.
Receiving the error "Transaction (Process ID 100) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction."
Advise is required on resolving this intermittent deadlock issue.
Deadlocks are often the result of more data being touched than needed by queries. Query and index tuning can help ensure only data needed by queries are accessed and locked, reducing both blocking and the likelihood of deadlocks by concurrent sessions.
Because your queries join on ID with no other criteria, an index on that column may help avoid the UPDATE and DELETE statements from touching other rows. I see from your comments that there was no index on the table2 ID column so a clustered index scan was performed. Not only did the scan result in suboptimal performance, it can lead to blocking and deadlocking when concurrent sessions contend for the same rows.
Adding a non-clustered index on ID changed the plan from a full clustered index scan to a non-clustered index seek. This should reduce, if not eliminate, the deadlocks going forward and improve performance considerably too. I like to say that performance and concurrency go hand-in-hand, an especially important detail with data modification statements.

Efficient way to delete large number of records in SQL Server

Consider this query
Delete from T1 where T1.ID in (select ID from T2)
Both T1 and T2 are massive tables in the order of millions of records.
T1 is a "live" table and T2 is an "archive" table. After we copy records from T1 to T2, we want to clear it out from T1. T1 is read optimized with many indexes.
What is the efficient way to perform this operation ?
I'm in .net environment, so code based solution will also work.
Delete the data in batches to avoid locking, growing transactional log and reclaim the space in transaction log
There are no universal method to delete the data effectively. You have to try to use all THREE methods for your table DB design:
IN (SELECT ...)
EXISTS ()
INNER JOIN
In most cases, for large number of rows, EXISTS and INNER JOIN outperforms IN (SELECT..), and often EXISTS outperforms INNER JOIN
For performance of the database it is best to delete batches of records so that locking is minimised.
DELETE TOP (1000)
FROM T1
WHERE T1.ID IN (SELECT ID
FROM T2)
You can optimise the number of records that are deleted.
Re-run the script until no records are deleted

netezza left outer join query performance

I have a question related to Netezza query performance .I have 2 tables Table A and Table B and Table B is the sub set of Table A with data alteration .I need to update those new values to table A from table B
We can have 2 approaches here
1) Left outer join and select relevant columns and insert in target table
2) Insert table a data into target table and update those values from tableB using join
I tried both and logically both are same.But Explain plan is giving different cost
for normal select
a)Sub-query Scan table "TM2" (cost=0.1..1480374.0 rows=8 width=4864 conf=100)
update
b)Hash Join (cost=356.5..424.5 rows=2158 width=27308 conf=21)
for left outer join
Sub-query Scan table "TM2" (cost=51.0..101474.8 rows=10000000 width=4864 conf=100)
From this I feel left outer join is better .Can anyone put some thought on this and guide
Thanks
The reason that the cost of insert into table_c select ... from table_a; update table_c set ... from table_b; is higher is because you're inserting, deleting, then inserting. Updates in Netezza mark the records to be updated as deleted, then inserts new rows with the updated values. Once the data is written to an extent, it's never (to my knowledge) altered.
With insert into table_c select ... from table_a join table_b using (...); you're only inserting once, thereby only updating all the zone maps once. The cost will be noticeably lower.
Netezza does an excellent job of keeping you away from the disk on reads, but it will write to the disk as often as you tell it to. In the case of updates, seemingly more so. Try to only write as often as is necessary to gain benefits of new distributions and co-located joins. Any more than that, and you're just using excess commit actions.

SQL Server NOLOCK with JOIN, Bulk load

Following is the scenario I have:
I have a stored procedure that returns data by joining 4 tables.
Twice in the middle of the day there is a bulk upload to one of the above 4 tables. The load continues for 10-15 minutes. I do not want the UI that invokes this stored procedure to freeze/block/slow down during this 10-15 minute window. I do not care about showing dirty/uncommitted data from the above tables. Following are my queries:
Do I need to use NOLOCK on just the table which is being loaded during the day OR NOLOCK needs to be added to all 4 tables of the join.
For e.g.
SELECT *
FROM Table1 T1 WITH (NOLOCK) --this is the table that will be bulk-loaded twice during the day
INNER JOIN Table2 T2 WITH (NOLOCK)
INNER JOIN Table3 T3 WITH (NOLOCK)
INNER JOIN Table4 T4 WITH (NOLOCK)
OR is this sufficient
SELECT *
FROM Table1 T1 WITH (NOLOCK) --this is the table that will be bulk-loaded twice during the day
INNER JOIN Table2 T2
INNER JOIN Table3 T3
INNER JOIN Table4 T4
If I add a SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED at the beginning of the retrieval procedure and reset it back to READ COMMITTED at the end, will there be any difference?
Thanks
Vikas
You only need to add NOLOCK for the tables that will be locked for prolonged periods of time, so adding NOLOCK to only Table1 is sufficient.
If you set the isolation level to READ UNCOMMITTED, you do not need to add NOLOCK at all, since it will be automatically applied to all queried tables. In other words you will create a situation similar to the first example in your question item 1 where NOLOCK is applied to all tables participating in the SELECT.
By the way, make sure you add ON conditions to your INNER JOIN clauses, because as presented they are not valid Transact-SQL.

Do the order of JOINs make a difference?

Say I have a query like the one below:
SELECT t1.id, t1.Name
FROM Table1 as t1 --800,000 records
INNER JOIN Table2 as t2 --500,000 records
ON t1.fkID = t2.id
INNER JOIN Table3 as t3 -- 1,000 records
ON t1.OtherId = t3.id
Would i see a performance improvement if I changed the order of my joins on Table2 and Table3. See below:
SELECT t1.id, t1.Name
FROM Table1 as t1 --800,000 records
INNER JOIN Table3 as t3 -- 1,000 records
ON t1.OtherId = t3.id
INNER JOIN Table2 as t2 --500,000 records
ON t1.fkID = t2.id
I've heard that the Query Optimizer will try to determine the best order but doesn't always work. Does the version of SQL Server you are using make a difference?
The order of joins makes no difference.
What does make a difference is ensuring your statistics are up to date.
One way to check your statistics is to run a query in SSMS and include the Actual execution plan. If the Estimated number of rows is very different to the Actual number of rows used by any part of the execution plan, then your statistics are out of date.
Statistics are rebuilt when the related indexes are rebuilt. If your production maintenance window allows, I would update statistics every night.
This will update statistics for all tables in a database:
exec sp_MSforeachtable "UPDATE STATISTICS ?"
The order of joins makes a difference only if you specify OPTION (FORCE ORDER). Otherwise, the optimizer will rearrange your query in whichever way it deems most efficient.
There actually are certain instances where I find that I need to use FORCE ORDER, but of course they are few and far between. If you aren't sure, just SET STATISTICS [TIME|IO] ON and see for yourself. You'll probably find that your version runs slower than the optimized version in most if not all cases.
The Query Optimizer should easily handle these as exactly the same query, and work out the best way of doing it.
A lot of it is more about the statistics than the number of records. For example, if the vast majority of values in t1.fkID are identical, this information can influence the QO a lot.

Resources