Following is the scenario I have:
I have a stored procedure that returns data by joining 4 tables.
Twice in the middle of the day there is a bulk upload to one of the above 4 tables. The load continues for 10-15 minutes. I do not want the UI that invokes this stored procedure to freeze/block/slow down during this 10-15 minute window. I do not care about showing dirty/uncommitted data from the above tables. Following are my queries:
Do I need to use NOLOCK on just the table which is being loaded during the day OR NOLOCK needs to be added to all 4 tables of the join.
For e.g.
SELECT *
FROM Table1 T1 WITH (NOLOCK) --this is the table that will be bulk-loaded twice during the day
INNER JOIN Table2 T2 WITH (NOLOCK)
INNER JOIN Table3 T3 WITH (NOLOCK)
INNER JOIN Table4 T4 WITH (NOLOCK)
OR is this sufficient
SELECT *
FROM Table1 T1 WITH (NOLOCK) --this is the table that will be bulk-loaded twice during the day
INNER JOIN Table2 T2
INNER JOIN Table3 T3
INNER JOIN Table4 T4
If I add a SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED at the beginning of the retrieval procedure and reset it back to READ COMMITTED at the end, will there be any difference?
Thanks
Vikas
You only need to add NOLOCK for the tables that will be locked for prolonged periods of time, so adding NOLOCK to only Table1 is sufficient.
If you set the isolation level to READ UNCOMMITTED, you do not need to add NOLOCK at all, since it will be automatically applied to all queried tables. In other words you will create a situation similar to the first example in your question item 1 where NOLOCK is applied to all tables participating in the SELECT.
By the way, make sure you add ON conditions to your INNER JOIN clauses, because as presented they are not valid Transact-SQL.
Related
Consider this query
Delete from T1 where T1.ID in (select ID from T2)
Both T1 and T2 are massive tables in the order of millions of records.
T1 is a "live" table and T2 is an "archive" table. After we copy records from T1 to T2, we want to clear it out from T1. T1 is read optimized with many indexes.
What is the efficient way to perform this operation ?
I'm in .net environment, so code based solution will also work.
Delete the data in batches to avoid locking, growing transactional log and reclaim the space in transaction log
There are no universal method to delete the data effectively. You have to try to use all THREE methods for your table DB design:
IN (SELECT ...)
EXISTS ()
INNER JOIN
In most cases, for large number of rows, EXISTS and INNER JOIN outperforms IN (SELECT..), and often EXISTS outperforms INNER JOIN
For performance of the database it is best to delete batches of records so that locking is minimised.
DELETE TOP (1000)
FROM T1
WHERE T1.ID IN (SELECT ID
FROM T2)
You can optimise the number of records that are deleted.
Re-run the script until no records are deleted
I am using sql-server this is my query:
select asst_id,camp_asst.amp_asst_id,asst.camp_asst_id,lyty_no,campaign_id
into camp.asst_respy
from camp.asst_respy respy
inner join camp.camp_wave wave on wave.wave_cd=resp.camp_id
inner join camp.camp_cust custy on cust.cust_lyty_no=resp.big_id
inner join camp.camp_asst assty on asst.sst_trck_url=resp.dum_url
inner join camp.camp_camp_assty camp_asst on camp_asst.camp_asst_id=asst.asst_id
inner join camp.camp_cust_assty cust_asst on cust_asst.camp_camp_asst_id=camp_asst.asst_id -- this table has about 16 billion rows.
inner join camp.camp_camp_custy camp_cust on camp_cust.camp_camp_cust_id=cust_asst.cust_id
please somebody guide me in doing the join,the join is taking very long time. to happen
and there are indexes defined on table,I am looking to partition the table to make this happen please guide
remaining all tables used have about >10 Million rows.
I'm working with an awful view which internally joins many, many tables together, some of which are the same table.
I'm wondering, when a table is being joined to itself, how is the NOLOCK hint interpreted if it's on one of the joins and not the other? Is the NOLOCK still in effect on the table, or is the table locked altogether if NOLOCK is not included on one of the joins of the same table?
For example (this is pseduo-code, assume that there are valid JOIN ON conditions):
SELECT *
FROM Table1 t1 (NOLOCK)
JOIN Table2 t2 (NOLOCK)
JOIN Table2_Table2 tt (NOLOCK)
JOIN Table2 t22 (NOLOCK)
JOIN Table1 t11
Does Table1 get locked or stay NOLOCKed?
Yes it does get locked by the last Table1 t11 call. Each table locking hint is applied to the specific reference. If you apply it to only one of the table references that is only for that reference and the others will have their own individual locking settings. You can test this using BEGIN TRANSACTION and execute two different queries.
Query 1 (locks the table)
Intentionally commenting out the COMMIT TRANSACTION
BEGIN TRANSACTION
SELECT *
FROM Table1 WITH (TABLOCK)
-- COMMIT TRANSACTION
Since COMMIT TRANSACTION was commented out, the transaction is not closed and will still hold the lock. When the second query is run the first lock will still apply on the table from the first query.
Query 2 (this query will hang because of the first lock will block on Table1 t11)
BEGIN TRANSACTION
SELECT *
FROM Table1 t1 (NOLOCK)
JOIN Table2 t2 (NOLOCK)
JOIN Table2_Table2 tt (NOLOCK)
JOIN Table2 t22 (NOLOCK)
JOIN Table1 t11
COMMIT TRANSACTION
I would guess that not using nolock is going to result in some type of locking, regardless if it is joined elsewhere in the query with nolock. So it would result in a row lock likely, so put nolock next to the join that is missing it.
In very simplified terms, think of it like this: Each of the tables you reference in a query results in a physical execution plan operator accessing that table. Table hints apply to that operator. This means that you can have mixed locking hints for the same table. The locking behavior that you request is applied to those rows that this particular operator happens to read. The respective operator might scan a table, or scan a range of rows, or read a single row. Whatever it is, it is performed under the specified locking options.
Look at the execution plan for your query to find the individual operators.
This query takes 16 seconds to run
SELECT
WO.orderid
FROM
WebOrder as WO
INNER JOIN Addresses AS A ON WO.AddressID = A.AddressID
LEFT JOIN SalesOrders as SO on SO.SO_Number = WO.SalesOrderID
If I comment out either of the joins, it runs in a small fraction of a second. Example:
SELECT
WO.orderid
FROM
WebOrder as WO
INNER JOIN Addresses AS A ON WO.AddressID = A.AddressID
-- LEFT JOIN SalesOrders as SO on SO.SO_Number = WO.SalesOrderID
or
SELECT
WO.orderid
FROM
WebOrder as WO
-- INNER JOIN Addresses AS A ON WO.AddressID = A.AddressID
LEFT JOIN SalesOrders as SO on SO.SO_Number = WO.SalesOrderID
Notes
There exists about 40,000 records each in tables SalesOrders and Adddresses.
I have indexes or PKeys on all fields used in the ON clauses.
Execution Plan for the slow version (SalesOrders Join commented out)
Execution Plan for fast version
Why do these joins when used in conjunction with one another cause this to go from ~0.01 seconds to 16 seconds?
Your execution plan doesn't show any expensive operations, I would try to following to troubleshoot bad performance:
Rebuild Indexes
Update Stats
DBCC FREEPROCCACHE
Personally I wouldn't expect the latter to do anything -- it looks like you have a sensible query plan as it is.
Say I have a query like the one below:
SELECT t1.id, t1.Name
FROM Table1 as t1 --800,000 records
INNER JOIN Table2 as t2 --500,000 records
ON t1.fkID = t2.id
INNER JOIN Table3 as t3 -- 1,000 records
ON t1.OtherId = t3.id
Would i see a performance improvement if I changed the order of my joins on Table2 and Table3. See below:
SELECT t1.id, t1.Name
FROM Table1 as t1 --800,000 records
INNER JOIN Table3 as t3 -- 1,000 records
ON t1.OtherId = t3.id
INNER JOIN Table2 as t2 --500,000 records
ON t1.fkID = t2.id
I've heard that the Query Optimizer will try to determine the best order but doesn't always work. Does the version of SQL Server you are using make a difference?
The order of joins makes no difference.
What does make a difference is ensuring your statistics are up to date.
One way to check your statistics is to run a query in SSMS and include the Actual execution plan. If the Estimated number of rows is very different to the Actual number of rows used by any part of the execution plan, then your statistics are out of date.
Statistics are rebuilt when the related indexes are rebuilt. If your production maintenance window allows, I would update statistics every night.
This will update statistics for all tables in a database:
exec sp_MSforeachtable "UPDATE STATISTICS ?"
The order of joins makes a difference only if you specify OPTION (FORCE ORDER). Otherwise, the optimizer will rearrange your query in whichever way it deems most efficient.
There actually are certain instances where I find that I need to use FORCE ORDER, but of course they are few and far between. If you aren't sure, just SET STATISTICS [TIME|IO] ON and see for yourself. You'll probably find that your version runs slower than the optimized version in most if not all cases.
The Query Optimizer should easily handle these as exactly the same query, and work out the best way of doing it.
A lot of it is more about the statistics than the number of records. For example, if the vast majority of values in t1.fkID are identical, this information can influence the QO a lot.