Possible causes slow order by on sql server statement - sql-server

I have the next query which returns 1550 rows.
SELECT *
FROM V_InventoryMovements -- 2 seconds
ORDER BY V_InventoryMovements.TransDate -- 23 seconds
It takes about 2 seconds to return the results.
But when I include the ORDER BY clause, then it takes about 23 seconds.
It is a BIG change just for adding an ORDER BY.
I would like to know what is happening, and a way to improve the query with the ORDER BY. To quit the ORDER BY should not be the solution.
Here a bit of information, please let me know if you need more info.
V_InventoryMovements
CREATE VIEW [dbo].[V_InventoryMovements]
AS
SELECT some_fields
FROM FinTime
RIGHT OUTER JOIN V_Outbound ON FinTime.StdDate = dbo.TruncateDate(V_Outbound.TransDate)
LEFT OUTER JOIN ReasonCode_Grouping ON dbo.V_Outbound.ReasonCode = dbo.ReasonCode_Grouping.ReasonCode
LEFT OUTER JOIN Items ON V_Outbound.ITEM = Items.Item
LEFT OUTER JOIN FinTime ON V_Outbound.EventDay = FinTime.StdDate
V_Outbound
CREATE VIEW [dbo].[V_Outbound]
AS
SELECT V_Outbound_WMS.*
FROM V_Outbound_WMS
UNION
SELECT V_Transactions_Calc.*
FROM V_Transactions_Calc
V_OutBound_WMS
CREATE VIEW [dbo].[V_OutBound_WMS]
AS
SELECT some_fields
FROM Transaction_Log
INNER JOIN MFL_StartDate ON Transaction_Log.TransDate >= MFL_StartDate.StartDate
LEFT OUTER JOIN Rack ON Transaction_Log.CHARGE = Rack.CHARGE AND Transaction_Log.CHARGE_LFD = Rack.CHARGE_LFD
V_Transactions_Calc
CREATE VIEW [dbo].[V_Transactions_Calc]
AS
SELECT some_fields
FROM Transactions_Calc
INNER JOIN MFL_StartDate ON dbo.Transactions_Calc.EventDay >= dbo.MFL_StartDate.StartDate
And here I will also share a part of the execution plan (the part where you can see the main cost). I don't know exactly how to read it and improve the query. Let me know if you need to see the rest of the execution plan. But all the other parts are 0% of Cost. The main Cost is in the: Nested Loops (Left Outer Join) Cost 95%.
Execution Plan With ORDER BY
Execution Plan Without ORDER BY

I think the short answer is that the optimizer is executing in a different order in an attempt to minimize the cost of the sorting, and doing a poor job. Its job is made very hard by the views within views within views, as GuidoG suggests. You might be able to convince it to execute differently by creating some additional index or statistics, but its going to be hard to advise on that remotely.
A possible workaround might be to select into a temp table, then apply the ordering afterwards:
SELECT *
INTO #temp
FROM V_InventoryMovements;
SELECT *
FROM #temp
ORDER BY TransDate

Related

Trying to find a solution to long running SQL code where I think NESTED SQL statement is the culprit

I have a SQL statement that has a weird 2nd nested SQL statement that I think is causing this query to run for 6+ min and any suggestions/help would be appreciated. I tried creating a TEMP table for the values in the nested SQL statement and just do a simple join but there is nothing to join on in the SQL code so that is why they used a 1=1 in the ON statement for the join. Here is the SQL code:
Declare #TransactionEndDate datetime;
Select #TransactionEndDate = lastmonth_end from dbo.DTE_udfCommonDates(GETDATE());
Select ''''+TreatyName as Treaty,
cast(EndOfMonth as Date) as asOfDate,
Count(Distinct ClaimSysID) as ClaimCount,
Count(Distinct FeatureSysID) as FeatureCount,
Sum(OpenReserve) as OpenReserve
From (
Select
TreatyName,
EndOfMonth,
dbo.CMS_Claims.ClaimSysID,
FeatureSysID,
sum(IW_glGeneralLedger.TransactionAmount)*-1 as OpenReserve
From dbo.CMS_Claims
Inner Join dbo.CMS_Claimants
On dbo.CMS_Claims.ClaimSysID = dbo.CMS_Claimants.ClaimSysID
Inner Join dbo.CMS_Features
On dbo.CMS_Features.ClaimantSysID = dbo.CMS_Claimants.ClaimantSysID
Left Join dbo.IW_glGeneralLedger
On IW_glGeneralLedger.FeatureID = dbo.CMS_Features.FeatureSysID
Left Join dbo.IW_glSubChildAccount
On dbo.IW_glSubChildAccount.glSubChildAccountID = dbo.IW_glGeneralLedger.glSubChildAccountSysID
Left Join dbo.IW_glAccountGroup
On dbo.IW_glAccountGroup.glAccountGroupID = dbo.IW_glSubChildAccount.glAccountGroupSysID
Left Join dbo.IW_BankRegister
On dbo.IW_BankRegister.BankRegisterSysID = dbo.IW_glGeneralLedger.BankRegisterID
Left Join dbo.IW_BankRegisterStatus
On dbo.IW_BankRegisterStatus.BankRegisterStatusSysID = dbo.IW_BankRegister.BankRegisterStatusID
**Left Join (Select Distinct dbo.DTE_get_month_end(dt) as EndOfMonth
From IW_Calendar
Where dt Between '3/1/2004'
and #TransactionEndDate) as dates
on 1=1**
Left Join dbo.IW_ReinsuranceTreaty
On dbo.IW_ReinsuranceTreaty.TreatySysID = IW_glGeneralLedger.PolicyTreatyID
Where dbo.IW_glGeneralLedger.TransactionDate Between '1/1/2004 00:00:00' And EndOfMonth
And dbo.IW_glAccountGroup.Code In ('RESERVEINDEMNITY')
And (
(dbo.IW_glGeneralLedger.BankRegisterID Is Null)
Or (
(IW_BankRegister.PrintedDate Between '1/1/2004 00:00:00' And EndOfMonth Or dbo.IW_glGeneralLedger.BankRegisterID = 0)
And
(dbo.IW_BankRegisterStatus.EnumValue In ('Approved','Outstanding','Cleared','Void') Or dbo.IW_glGeneralLedger.BankRegisterID = 0))
)
Group By TreatyName, dbo.CMS_Claims.ClaimSysID, FeatureSysID, EndOfMonth
Having sum(IW_glGeneralLedger.TransactionAmount) <> 0
) As Data
Group By TreatyName,EndOfMonth
Order By EndOfMonth, TreatyName
This nested SQL code only provides a table of End of Month values in one column called EndOfMonth and this is what I'm trying to fix:
Select Distinct dbo.DTE_get_month_end(dt) as EndOfMonth
From IW_Calendar
Where dt Between '3/1/2004'
and #TransactionEndDate
Please use the below methods to increase the query performance.
Use temporary tables. ( load relevant data into temporary tables with necessary where conditions and then join).
Use clustered and non clustered indexes to your tables.
Create Multiple-Column Indexes.
Index the ORDER-BY / GROUP-BY / DISTINCT Columns for Better Response Time.
Use Parameterized Queries.
Use query hints accordingly.
NOLOCK: In the event that data is locked, this tells SQL Server to read data from the last known value available, also known as a dirty read. Since it is possible to use some old values and some new values, data sets can contain inconsistencies. Do not use this in any place in which data quality is important.
RECOMPILE: Adding this to the end of a query will result in a new execution plan being generated each time this query executed. This should not be used on a query that is executed often, as the cost to optimize a query is not trivial. For infrequent reports or processes, though, this can be an effective way to avoid undesired plan reuse. This is often used as a bandage when statistics are out of date or parameter sniffing is occurring.
MERGE/HASH/LOOP: This tells the query optimizer to use a specific type of join as part of a join operation. This is super-risky as the optimal join will change as data, schema, and parameters evolve over time. While this may fix a problem right now, it will introduce an element of technical debt that will remain for as long as the hint does.
OPTIMIZE FOR: Can specify a parameter value to optimize the query for. This is often used when we want performance to be controlled for a very common use case so that outliers do not pollute the plan cache. Similar to join hints, this is fragile and when business logic changes, this hint usage may become obsolete.

SQL Query is taking infinite time when using with order by

Below is my SQL Query
select top(10) ClientCode
FROM (((Branch INNER JOIN BusinessLocation ON
Branch.BranchCode=BusinessLocation.BranchCode)
INNER JOIN Center ON BusinessLocation.LocationCode = Center.LocationCode)
INNER JOIN Groups ON Center.CenterCode = Groups.CenterCode)
INNER JOIN Client ON Groups.GroupCode = Client.GroupCode
WHERE
((Client.CBStatus) IS NULL) AND ((Branch.PartnerName) in
('SVCL','Edelweiss'))
order by Client.ClientCode DESC
When i run it without order by it runs fine , but with order by it is not finishing execution. Why is this behavior ?
When you select using TOP statement, calculations and joins for every row are not necessarily calculated. When you try to order, at least one cell for all rows need to be calculated. It is a long query because your table is large and the behavior is not faulty. Don't let the fast running query without the order by mislead you about the complexity of your second query.
You can create an index on clientcode column. That would speed things up.

Force joined view not to be optimized

I have a somewhat complex view which includes a join to another view. For some reason the generated query plan is highly inefficient. The query runs for many hours. However if I select the sub-view into a temporary table first and then join with this, the same query finished in a few minutes.
My question is: Is there some kind of query hint or other trick which will force the optimizer to execute the joined sub-view in isolation before performing the join, just as when using a temp table? Clearly the default strategy chosen by the optimizer is not optimal.
I cannot use the temporary table-trick since views does not allow temporary tables. I understand I could probably rewrite everything to a stored procedure, but that would break composeability of views, and it seems also like bad for maintenance to rewrite everything just to trick the optimizer to not use a bad optimization.
Adam Machanic explained one such way at a SQL Saturday I recently attended. The presentation was called Clash of the Row Goals. The method involves using a TOP X at the beginning of the sub-select. He explained that when doing a TOP X, the query optimizer assumes it is more efficient to grab the TOP X rows one at a time. As long as you set X as a sufficiently large number (limit of INT or BIGINT?), the query will always get the correct results.
So one example that Adam provided:
SELECT
x.EmployeeId,
y.totalWorkers
FROM HumanResources.Employee AS x
INNER JOIN
(
SELECT
y0.ManagerId,
COUNT(*) AS totalWorkers
FROM HumanResources.Employee AS y0
GROUP BY
y0.ManagerId
) AS y ON
y.ManagerId = x.ManagerId
becomes:
SELECT
x.EmployeeId,
y.totalWorkers
FROM HumanResources.Employee AS x
INNER JOIN
(
SELECT TOP(2147483647)
y0.ManagerId,
COUNT(*) AS totalWorkers
FROM HumanResources.Employee AS y0
GROUP BY
y0.ManagerId
) AS y ON
y.ManagerId = x.ManagerId
It is a super cool trick and very useful.
When things get messy the query optimize often resorts to loop joins
If materializing to a temp fixed it then most likely that is the problem
The optimizer often does not deal with views very well
I would rewrite you view to not uses views
Join Hints (Transact-SQL)
You may be able to use these hints on views
Try merge and hash
Try changing the order of join
Move condition into the join whenever possible
select *
from table1
join table2
on table1.FK = table2.Key
where table2.desc = 'cat1'
should be
select *
from table1
join table2
on table1.FK = table2.Key
and table2.desc = 'cat1'
Now the query optimizer will get that correct but as the query gets more complex the query optimize goes into what I call stupid mode and loop joins. But that is also done to protect the server and have as little in memory as possible.

Why is this CTE so much slower than using temp tables?

We had an issue since a recent update on our database (I made this update, I am guilty here), one of the query used was much slower since then. I tried to modify the query to get faster result, and managed to achieve my goal with temp tables, which is not bad, but I fail to understand why this solution performs better than a CTE based one, which does the same queries. Maybe it has to do that some tables are in a different DB ?
Here's the query that performs badly (22 minutes on our hardware) :
WITH CTE_Patterns AS (
SELECT
PEL.iId_purchased_email_list,
PELE.sEmail
FROM OtherDb.dbo.Purchased_Email_List PEL WITH(NOLOCK)
INNER JOIN OtherDb.dbo.Purchased_Email_List_Email AS PELE WITH(NOLOCK) ON PELE.iId_purchased_email_list = PEL.iId_purchased_email_list
WHERE PEL.bPattern = 1
),
CTE_Emails AS (
SELECT
ILE.iId_newsletterservice_import_list,
ILE.iId_newsletterservice_import_list_email,
ILED.sEmail
FROM dbo.NewsletterService_import_list_email AS ILE WITH(NOLOCK)
INNER JOIN dbo.NewsletterService_import_list_email_distinct AS ILED WITH(NOLOCK) ON ILED.iId_newsletterservice_import_list_email_distinct = ILE.iId_newsletterservice_import_list_email_distinct
WHERE ILE.iId_newsletterservice_import_list = 1000
)
SELECT I.iId_newsletterservice_import_list,
I.iId_newsletterservice_import_list_email,
BL.iId_purchased_email_list
FROM CTE_Patterns AS BL WITH(NOLOCK)
INNER JOIN CTE_Emails AS I WITH(NOLOCK) ON I.sEmail LIKE BL.sEmail
When running both CTE queries separately, it's super fast (0 secs in SSMS, returns 122 rows and 13k rows), when running the full query, with INNER JOIN on sEmail, it's super slow (22 minutes)
Here's the query that performs well, with temp tables (0 sec on our hardware) and which does the eaxct same thing, returns the same result :
SELECT
PEL.iId_purchased_email_list,
PELE.sEmail
INTO #tb1
FROM OtherDb.dbo.Purchased_Email_List PEL WITH(NOLOCK)
INNER JOIN OtherDb.dbo.Purchased_Email_List_Email PELE ON PELE.iId_purchased_email_list = PEL.iId_purchased_email_list
WHERE PEL.bPattern = 1
SELECT
ILE.iId_newsletterservice_import_list,
ILE.iId_newsletterservice_import_list_email,
ILED.sEmail
INTO #tb2
FROM dbo.NewsletterService_import_list_email AS ILE WITH(NOLOCK)
INNER JOIN dbo.NewsletterService_import_list_email_distinct AS ILED ON ILED.iId_newsletterservice_import_list_email_distinct = ILE.iId_newsletterservice_import_list_email_distinct
WHERE ILE.iId_newsletterservice_import_list = 1000
SELECT I.iId_newsletterservice_import_list,
I.iId_newsletterservice_import_list_email,
BL.iId_purchased_email_list
FROM #tb1 AS BL WITH(NOLOCK)
INNER JOIN #tb2 AS I WITH(NOLOCK) ON I.sEmail LIKE BL.sEmail
DROP TABLE #tb1
DROP TABLE #tb2
Tables stats :
OtherDb.dbo.Purchased_Email_List : 13 rows, 2 rows flagged bPattern = 1
OtherDb.dbo.Purchased_Email_List_Email : 324289 rows, 122 rows with patterns (which are used in this issue)
dbo.NewsletterService_import_list_email : 15.5M rows
dbo.NewsletterService_import_list_email_distinct ~1.5M rows
WHERE ILE.iId_newsletterservice_import_list = 1000 retrieves ~ 13k rows
I can post more info about tables on request.
Can someone help me understand this ?
UPDATE
Here is the query plan for the CTE query :
Here is the query plan with temp tables :
As you can see in the query plan, with CTEs, the engine reserves the right to apply them basically as a lookup, even when you want a join.
If it isn't sure enough it can run the whole thing independently, in advance, essentially generating a temp table... let's just run it once for each row.
This is perfect for the recursion queries they can do like magic.
But you're seeing - in the nested Nested Loops - where it can go terribly wrong.
You're already finding the answer on your own by trying the real temp table.
Parallelism. If you noticed in your TEMP TABLE query, the 3rd Query indicates Parallelism in both distributing and gathering the work of the 1st Query. And Parallelism when combining the results of the 1st and 2nd Query. The 1st Query also incidentally has a relative cost of 77%. So the Query Engine in your TEMP TABLE example was able to determine that the 1st Query can benefit from Parallelism. Especially when the Parallelism is Gather Stream and Distribute Stream, so its allowing the divying up of work (join) because the data is distributed in such a way that allows for divying up the work then recombining. Notice the cost of the 2nd Query is 0% so you can ignore that as no cost other than when it needs to be combined.
Looking at the CTE, that is entirely processed Serially and not in Parallel. So somehow with the CTE it could not figure out the 1st Query can be run in Parallel, as well as the relationship of the 1st and 2nd query. Its possible that with multiple CTE expressions it assumes some dependency and did not look ahead far enough.
Another test you can do with the CTE is keep the CTE_Patterns but eliminate the CTE_Emails by putting that as a "subquery derived" table to the 3rd Query in the CTE. It would be curious to see the Execution Plan, and see if there is Parallelism when expressed that way.
In my experience it's best to use CTE's for recursion and temp tables when you need to join back to the data. Makes for a much faster query typically.

Why is this non-correlated query so slow?

I have this query...
SELECT Distinct([TargetAttributeID]) FROM
(SELECT distinct att1.intAttributeID as [TargetAttributeID]
FROM AST_tblAttributes att1
INNER JOIN
AST_lnkProfileDemandAttributes pda
ON pda.intAttributeID=att1.intAttributeID AND pda.intProfileID = #intProfileID
union all
SELECT distinct ca2.intAttributeID as [TargetAttributeID] FROM
AST_lnkCapturePolicyAttributes ca2
INNER JOIN
AST_lnkEmployeeCapture ec2 ON ec2.intAdminCaptureID = ca2.intAdminCaptureID AND ec2.intTeamID = 57
WHERE ec2.dteCreatedDate >= #cutoffdate) x
Execution Plan for the above query
The two inner distincts are looking at 32 and 10,000 rows respectively. This query returns 5 rows and executes in under 1 second.
If I then use the result of this query as the subject of an IN like so...
SELECT attx.intAttributeID,attx.txtAttributeName,attx.txtAttributeLabel,attx.txtType,attx.txtEntity FROM
AST_tblAttributes attx WHERE attx.intAttributeID
IN
(SELECT Distinct([TargetAttributeID]) FROM
(SELECT Distinct att1.intAttributeID as [TargetAttributeID]
FROM AST_tblAttributes att1
INNER JOIN
AST_lnkProfileDemandAttributes pda
ON pda.intAttributeID=att1.intAttributeID AND pda.intProfileID = #intProfileID
union all
SELECT Distinct ca2.intAttributeID as [TargetAttributeID] FROM
AST_lnkCapturePolicyAttributes ca2
INNER JOIN
AST_lnkEmployeeCapture ec2 ON ec2.intAdminCaptureID = ca2.intAdminCaptureID AND ec2.intTeamID = 57
WHERE ec2.dteCreatedDate >= #cutoffdate) x)
Execution Plan for the above query
Then it takes over 3 minutes! If I just take the result of the query and perform the IN "manually" then again it comes back extremely quickly.
However if I remove the two inner DISTINCTS....
SELECT attx.intAttributeID,attx.txtAttributeName,attx.txtAttributeLabel,attx.txtType,attx.txtEntity FROM
AST_tblAttributes attx WHERE attx.intAttributeID
IN
(SELECT Distinct([TargetAttributeID]) FROM
(SELECT att1.intAttributeID as [TargetAttributeID]
FROM AST_tblAttributes att1
INNER JOIN
AST_lnkProfileDemandAttributes pda
ON pda.intAttributeID=att1.intAttributeID AND pda.intProfileID = #intProfileID
union all
SELECT ca2.intAttributeID as [TargetAttributeID] FROM
AST_lnkCapturePolicyAttributes ca2
INNER JOIN
AST_lnkEmployeeCapture ec2 ON ec2.intAdminCaptureID = ca2.intAdminCaptureID AND ec2.intTeamID = 57
WHERE ec2.dteCreatedDate >= #cutoffdate) x)
Execution Plan for the above query
..then it comes back in under a second.
What is SQL Server thinking? Can it not figure out that it can perform the two sub-queries and use the result as the subject of the IN. It seems as slow as a correlated sub-query, but it isn't correlated!!!
In Show Estimate Execution plan there are three Clustered Index Scans each with a cost of 100%! (Execution Plan is here)
Can anyone tell me why the inner DISTINCTS make this query so much slower (but only when used as the subject of an IN...) ?
UPDATE
Sorry it's taken me a while to get these execution plans up...
Query 1
Query 2 (The slow one)
Query 3 - No Inner Distincts
Honestly I think it comes down to the fact that, in terms of relational operators, you have a gratuitously baroque query there, and SQL Server stops searching for alternate execution plans within the time it allows itself to find one.
After the parse and bind phase of plan compilation, SQL Server will apply logical transforms to the resulting tree, estimate the cost of each, and choose the one with the lowest cost. It doesn't exhaust all possible transformations, just as many as it can compute within a given window. So presumably, it has burned through that window before it arrives at a good plan, and it's the addition of the outer semi-self-join on AST_tblAttributes that pushed it over the edge.
How is it gratuitously baroque? Well, first off, there's this (simplified for noise reduction):
select distinct intAttributeID from (
select distinct intAttributeID from AST_tblAttributes ....
union all
select distinct intAttributeID from AST_tblAttributes ....
)
Concatenating two sets, and projecting the unique elements? Turns out there's operator for that, it's called UNION. So given enough time during plan compilation and enough logical transformations, SQL Server will realize what you really mean is:
select intAttributeID from AST_tblAttributes ....
union
select intAttributeID from AST_tblAttributes ....
But wait, you put this in a correlated subquery. Well, a correlated subquery is a semi-join, and the right relation does not require logical dedupping in a semi-join. So SQL Server may logically rewrite the query as this:
select * from AST_tblAttributes
where intAttributeID in (
select intAttributeID from AST_tblAttributes ....
union all
select intAttributeID from AST_tblAttributes ....
)
And then go about physical plan selection. But to get there, it has to see though the cruft first, and that may fall outside the optimization window.
EDIT:
Really, the way to explore this for yourself, and corroborate the speculation above, is to put both versions of the query in the same window and compare estimated execution plans side-by-side (Ctrl-L in SSMS). Leave one as is, edit the other, and see what changes.
You will see that some alternate forms are recognized as logically equivalent and generate to the same good plan, and others generate less optimal plans, as you bork the optimizer.**
Then, you can use SET STATISTICS IO ON and SET STATISTICS TIME ON to observe the actual amount of work SQL Server performs to execute the queries:
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT ....
SELECT ....
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
The output will appear in the messages pane.
** Or not--if they all generate the same plan, but actual execution time still varies like you say, something else may be going on--it's not unheard of. Try comparing actual execution plans and go from there.
El Ronnoco
First of all a possible explanation:
You say that: "This query returns 5 rows and executes in under 1 second.". But how many rows does it ESTIMATE are returned? If the estimate is very much off, using the query as part of the IN part could cause you to scan the entire: AST_tblAttributes in the outer part, instead of index seeking it (which could explain the big difference)
If you shared the query plans for the different variants (as a file, please), I think I should be able to get you an idea of what is going on under the hood here. It would also allow us to validate the explanation.
Edit: each DISTINCT keyword adds a new Sort node to your query plan. Basically, by having those other DISTINCTs in there, you're forcing SQL to re-sort the entire table again and again to make sure that it isn't returning duplicates. Each such operation can quadruple the cost of the query. Here's a good review of the effects that the DISTINCT operator can have, intended an unintended. I've been bitten by this, myself.
Are you using SQL 2008? If so, you can try this, putting the DISTINCT work into a CTE and then joining to your main table. I've found CTEs to be pretty fast:
WITH DistinctAttribID
AS
(
SELECT Distinct([TargetAttributeID])
FROM (
SELECT distinct att1.intAttributeID as [TargetAttributeID]
FROM AST_tblAttributes att1
INNER JOIN
AST_lnkProfileDemandAttributes pda
ON pda.intAttributeID=att1.intAttributeID AND pda.intProfileID = #intProfileID
UNION ALL
SELECT distinct ca2.intAttributeID as [TargetAttributeID] FROM
AST_lnkCapturePolicyAttributes ca2
INNER JOIN
AST_lnkEmployeeCapture ec2 ON ec2.intAdminCaptureID = ca2.intAdminCaptureID AND ec2.intTeamID = 57
WHERE ec2.dteCreatedDate >= #cutoffdate
) x
SELECT attx.intAttributeID,
attx.txtAttributeName,
attx.txtAttributeLabel,
attx.txtType,
attx.txtEntity
FROM AST_tblAttributes attx
JOIN DistinctAttribID attrib
ON attx.intAttributeID = attrib.TargetAttributeID

Resources