Why LEFT JOIN increase query time so much? - sql-server

I'm using SQL Server 2012 and encountered strange problem.
This is the original query I've been using:
DELETE FROM [TABLE_TEMP]
INSERT INTO [TABLE_TEMP]
SELECT H.*, NULL
FROM [TABLE_Accounts_History] H
INNER JOIN [TABLE_For_Filtering] A ON H.[RSIN] = A.[RSIN]
WHERE
H.[NUM] = (SELECT TOP 1 [NUM] FROM [TABLE_Accounts_History]
WHERE [RSIN] = H.[RSIN]
AND [AccountSys] = H.[AccountSys]
AND [Cl_Acc_Typ] = H.[Cl_Acc_Typ]
AND [DATE_DEAL] < #dte
ORDER BY [DATE_DEAL] DESC)
AND H.[TYPE_DEAL] <> 'D'
Table TABLE_Accounts_History consists of 3 200 000 records.
Table TABLE_For_Filtering is around 1 500 records.
Insert took me 2m 40s and inserted 1 600 000 records for further work.
But then I decided to attach a column from pretty small table TABLE_Additional (only around 100 recs):
DELETE FROM [TABLE_TEMP]
INSERT INTO [TABLE_TEMP]
SELECT H.*, P.[prof_type]
FROM [TABLE_Accounts_History] H
INNER JOIN [TABLE_For_Filtering] A ON H.[RSIN] = A.[RSIN]
LEFT JOIN [TABLE_Additional] P ON H.[ACCOUNTSYS] = P.[AccountSys]
WHERE H.[NUM] = ( SELECT TOP 1 [NUM]
FROM [TABLE_Accounts_History]
WHERE [RSIN] = H.[RSIN]
AND [AccountSys] = H.[AccountSys]
AND [Cl_Acc_Typ] = H.[Cl_Acc_Typ]
AND [DATE_DEAL] < #dte
ORDER BY [DATE_DEAL] DESC)
AND H.[TYPE_DEAL] <> 'D'
And now it takes ages this query to complete. Why is it so? How such small left join possibly can dump performance? How can I improve it?
An update: no luck so far with LEFT JOIN. Indexes, no indexes, hinted indexes.. For now I've found a workaround by using my first query and UPDATE after it:
UPDATE [TABLE_TEMP]
SET [PROF_TYPE] = P1.[prof_type]
FROM [TABLE_TEMP] A1
LEFT JOIN
[TABLE_Additional] P1
ON A1.[ACCOUNTSYS] = P1.[AccountSys]
Takes only 5s and does pretty much the same I've been trying to achieve. Still SQL Server performance is mystery to me.

The 'small' left join is actually doing a lot of extra work for you. SQL Server has to go back to TABLE_Additional for each row from your inner join between and TABLE_Accounts_History and TABLE_For_Filtering. You can help SQL Server a few ways to speed this up by trying some indexing. You could:
1) Ensure TABLE_Accounts_History has an index on the Foreign Key H.[ACCOUNTSYS]
2) If you think that TABLE_Additional will always be accessed by the AccountSys, i.e. you will be requesting AccountSys in ordered groups, you could create a Clustered Index on TABLE_Additional.AccountSys. (in orther words physically order the table on disk in order of AccountSys)
3) You could also ensure there is a foreign key index on TABLE_Accounts_History.

left outer join selects all rows from left table. In Your case your left table has 3 200 000 this much rows and then comparing with each record to your right table. One solution is to use Indexes which will reduce retrieval time.

Related

Why does sql server do a scan on joins when there are no records in source table

The idea of the below query is to use the CTE to get the primary key of all rows in [Archive].[tia_tia_object] that meet the filter.
The execution time for the query within the CTE is 0 seconds.
The second part is supposed to do joins on other tables, to filter the data some more, but only if there are any rows returned in the CTE. This was the only way I could get the SQL server to use the correct indexes.
Why does it spend time (see execution plan) looking in TIA_TIA_AGREEMENT_LINE and TIA_TIA_OBJECT, when CTE returns 0 rows?
WITH cte_vehicle
AS (SELECT O.[Seq_no],
O.Object_No
FROM [Archive].[tia_tia_object] O
WHERE O.RECORD_TIMESTAMP >
(SELECT LastLoadTimeStamp FROM staging.Ufngetlastloadtimestamp('Staging.CoveredObject'))
AND O.[Meta_iscurrent] = 1
AND O.OBJECT_TYPE IN ( 'BIO01', 'CAO01', 'DKV', 'GFO01',
'KMA', 'KNO01', 'MCO01', 'VEO01',
'SVO01', 'AUO01' ))
SELECT O.[Seq_no] AS [Bkey_CoveredObject],
Cast(O.[Agr_Line_No] AS BIGINT) AS [Agr_Line_No],
O.[Cover_Start_Date] AS [CoverageFrom],
O.[Cover_End_Date] AS [CoverageTo],
O.[Timestamp] AS [TIMESTAMP],
O.[Record_Timestamp] AS [RECORD_TIMESTAMP],
O.[Newest] AS [Newest],
O.LOCATION_ID AS LocationNo,
O.[Cust_no],
O.[N01]
FROM cte_vehicle AS T
INNER JOIN [Archive].[tia_tia_object] O
ON t.Object_No = O.Object_No
AND t.Seq_No = O.Seq_No
INNER JOIN [Archive].[tia_tia_agreement_line] AL
ON O.Agr_line_no = AL.Agr_line_no
INNER JOIN [Archive].[tia_tia_policy] P
ON AL.Policy_no = P.Policy_no
WHERE P.[Transaction_type] <> 'D'
Execution plan:
Because it still needs to check and look for records. Even if there are no records in that table, it doesn't know that until it actually checks.
Much like if someone gives you a sealed box, you don't know it's empty or not till you open it.

MS SQL - Delete query taking too much time

I have the following query script:
declare #tblSectionsList table
(
SectionID int,
SectionCode varchar(255)
)
--assume #tblSectionsList has 50 sections- rows
DELETE
td
from
[dbo].[InventoryDocumentDetails] td
inner join [dbo].InventoryDocuments th
on th.Id = td.InventoryDocumentDetail_InventoryDocument
inner join #tblSectionsList ts
on ts.SectionID = th.InventoryDocument_Section
This script contains three tables, where #tblSectionsList is a temporary table, it may contains 50 records. Then I am using this table in the join condition with the InventoryDocuments table, then further joined to the InventoryDocumentDetails table. All joins are based on INT foreign-keys.
On the week-end I put this query on server and it is still running even after 2 days,4 hours... Can any body tell me if I am doing something wrong. Or is there any idea to improve its performance? Even I don't know how much more time it will take to give me the result.
Before this I also tried to create an index on the InventoryDocumentDetails table with following script:
CREATE NONCLUSTERED INDEX IX_InventoryDocumentDetails_InventoryDocument
ON dbo.InventoryDocumentDetails (InventoryDocumentDetail_InventoryDocument);
But this script also take more than one day and did not finish so I cancelled this query.
Additional info:
I am using MS SQL 2008 R2.
InventoryDocuments table contains 2108137 rows, has primary key 'Id'.
InventoryDocumentDetails table contains 25055158 rows, has primary key 'Id'.
Both tables have primary keys defined.
CUP - Intel Xeon - with 32 GB RAM
No indexes are defined, because now when I am going to create a new index, that query also get suspended.
Query Execution Plan (1):
2nd Part:
The following query give one row for this and showing status='suspended', and wait_type='LCK_M_IX'
SELECT r.session_id as spid, r.[status], r.command, t.[text], OBJECT_NAME(t.objectid, t.[dbid]) as object, r.logical_reads, r.blocking_session_id as blocked, r.wait_type, s.host_name, s.host_process_id, s.program_name, r.start_time
FROM sys.dm_exec_requests AS r LEFT OUTER JOIN sys.dm_exec_sessions s ON s.session_id = r.session_id OUTER APPLY sys.dm_exec_sql_text(r.[sql_handle]) AS t
WHERE r.session_id <> ##SPID AND r.session_id > 50
What happens when you change the Inner Join to EXISTS
DELETE td
FROM [dbo].[InventoryDocumentDetails] td
WHERE EXISTS (SELECT 1
FROM [dbo].InventoryDocuments th
WHERE EXISTS (SELECT 1
FROM #tblSectionsList ts
WHERE ts.SectionID = th.InventoryDocument_Section)
AND th.Id = td.InventoryDocumentDetail_InventoryDocument)
It sometimes can be more efficient time-wise to truncate a table and re-import the records you want to keep. A delete operation on a large tables is incredibly slow compared to an insert. Of course this is only an option if you can take your table offline. Also, only do this if your logging is set to simple.
Drop triggers table A.
Bulk copy table A to B.
Truncate table A
Enable Identity Insert.
Insert Into A From B Where A.ID Not in ID's to delete.
Disable Identity Insert.
Rebuild indexes.
Enable triggers
Try like the below. It might give you some idea at least.
DELETE FROM [DBO].[INVENTORYDOCUMENTDETAILS] WHERE INVENTORYDOCUMENTDETAILS_PK IN (
(SELECT INVENTORYDOCUMENTDETAILS_PK FROM
[DBO].[INVENTORYDOCUMENTDETAILS] TD
INNER JOIN [DBO].INVENTORYDOCUMENTS TH ON TH.ID = TD.INVENTORYDOCUMENTDETAIL_INVENTORYDOCUMENT
INNER JOIN #TBLSECTIONSLIST TS ON TS.SECTIONID = TH.INVENTORYDOCUMENT_SECTION
)

Prevent MORE records returned by JOINING a lookup table?

I am having a problem. My Lookup table is producing MORE records than my original query..
I feel I am missing something basic. How do I prevent ending up with more records by bringing in a column or two from the 2nd table?
-- 140930
SELECT COUNT(ID)
FROM dbo.USER_ACCOUNTS AS A
-- 143324
LEFT JOIN dbo.DOMAIN AS B
ON A.Domain = B.DOMAIN
As you can see my count grows to 143324 after the join. I have tried outer joins as well. There are only 150 or so domains to join on. AND some should not even be in the results because no domain match should be found!?
This is SQL SERVER 2008 R2
|Thanks|
SELECT COUNT(ID)
FROM dbo.USER_ACCOUNTS AS A
WHERE EXISTS (
SELECT 1
FROM dbo.DOMAIN AS B
WHERE A.Domain = B.DOMAIN
)

How can I speed up this SQL view?

I'm a beginner at this so hope you can help. I'm working in SQL server 2008R2 and have a view that is comprised from four tables all joined together:
SELECT DISTINCT ad.award_id,
bl.funding_id,
bl.budget_line,
dd4.monthnumberofyear AS month,
dd4.yearcalendar AS year,
CASE
WHEN frb.full_value IS NULL THEN '0'
ELSE frb.full_value
END AS Expenditure_value,
bl.budget_id,
frb.accode,
'Actual' AS Type
FROM dw.dbo.dimdate5 AS dd4
LEFT OUTER JOIN dbo.award_data AS ad
ON dd4.fulldate BETWEEN ad.usethisstartdate AND
ad.usethisenddate
LEFT OUTER JOIN dbo.budget_line AS bl
ON bl.award_id = ad.award_id
LEFT OUTER JOIN dw.dbo.fctresearchbalances AS frb
ON frb.el3 = bl.award_id
AND frb.element4groupidnew = bl.budget_line
AND dd4.yearfiscal = frb.yr
AND dd4.monthnumberfiscal = frb.period
The view has 9 columns and 1.5 million rows and growing. A select * from this view was taking 20 minutes for all the rows. I added indexes on the fields in the tables that are joined on and that improved it to 10 minutes. My question is what else could I do to get the select to run faster?
Many thanks, Violet.
Try getting rid of the case statement.
If you have 1.5 million rows, if you're interesting in the aggregation of those rows rather than the whole set, you might want to sum the rows in fctResearchBalances first and then do the joins.
(It's a bit difficult to determine what else you might benefit from, without seeing the access plan.)
1- You can use stored procedure to have buffer cache.
2- you can use indexed view , this means creating index on schemabound views.
3- you can use query hints in join to order the query optimizer to use special kind of join.
4- you can use table partitioning .
SELECT DISTINCT --#1 - potential bottleneck
ad.award_id
, bl.funding_id
, bl.budget_line
, [month] = dd4.monthnumberofyear
, [year] = dd4.yearcalendar
, Expenditure_value = ISNULL(frb.full_value, '0')
, bl.budget_id
, frb.accode
, [type] = 'Actual'
FROM dbo.dimdate5 dd4
LEFT JOIN dbo.award_data ad ON dd4.fulldate BETWEEN ad.usethisstartdate AND ad.usethisenddate
LEFT JOIN dbo.budget_line bl ON bl.award_id = ad.award_id
LEFT JOIN dbo.fctresearchbalances frb ON frb.el3 = bl.award_id --#2 - join by multiple columns
AND frb.element4groupidnew = bl.budget_line
AND dd4.yearfiscal = frb.yr
AND dd4.monthnumberfiscal = frb.period
The CASE statement can be replace by
COALESCE(frb.full_value,'0') AS Expenditure_value
Without more info it's not possible to tell exactly what is wrong but just to give you some pointers.
When you have so many LEFT JOINS the order of the joins can make a difference.
Do you have standard indexes or covering indexes with included columns?
If you don't have covering indexes, then primary keys matter in the joins. Including all the primary key columns in the join condition will speed up the query.
Then look at your data - do you need all those LEFT JOINS base on the foreign keys between those tables? Depending on your keys a LEFT JOIN may be equivalent to an INNER JOIN.
And with all those LEFT JOINS is having a DISTINCT really useful?
How much RAM do you have? If you have 8GB+ then 1.5m rows is nothing for SQL Server. You need to optimise those joins.

SQL Server : pagination into Excel

I've got a large data set that I need to get into excel to get some pivot tables and analysis going.
I normally am able to do this as the data never reaches the 1 million line mark. I just do a SQL Server data import and specify my SQL statement.
Here is my current SQL
WITH n AS (
Select A1.AccountID, A1.ParentAccountID, A1.Name
FROM Account AS A1
WHERE A1.ParentAccountID = 92
UNION ALL
SELECT A2.AccountID, A2.ParentAccountID, A2.Name
FROM Account AS A2
JOIN n
ON A2.ParentAccountID=n.AccountID
)
select n.*, D.DeviceID, A.*, P.*
FROM n
LEFT OUTER JOIN
Device AS D
ON D.AccountID = n.AccountID
LEFT OUTER JOIN
Audit as A
ON A.AccountID = n.AccountID
RIGHT OUTER JOIN
DeviceAudit As P
ON P.AuditID = A.AuditID
WHERE A.AuditDate > CAST('2013-03-11' AS DATETIME)
ORDER BY n.AccountID ASC, P.DeviceID ASC, A.AuditDate DESC
This right now is returning to me 100% of what I need. 18 million records for the past 30 days. I was hoping there would be a simple way to find the next 100,000 or 500,000 records.
I can use TOP 100000 to get my first chunk, though I do not seem to have an offset available to me.
At present this runs and completes in 20 minutes. This is 1 of many account hierarchies that I have to perform this for. Hopefully this pagination will not be too expensive cpu wise.
I did try exporting to a CSV in hopes of importing it, though that just gives me a 12GB csv file that I do not have time to and break apart.
Yes, you can do paginated subqueries on the row number since SQL 2005. Add a row number to the select clause of your original query:
, ROW_NUMBER() OVER (ORDER BY {whatever id}) AS row
Then you can make your old query a subquery and query against the row:
SELECT TOP {results per page} *
FROM ({your previous sql statement})
WHERE row > {page# * results per page}

Resources