SQL Query is taking infinite time when using with order by - sql-server

Below is my SQL Query
select top(10) ClientCode
FROM (((Branch INNER JOIN BusinessLocation ON
Branch.BranchCode=BusinessLocation.BranchCode)
INNER JOIN Center ON BusinessLocation.LocationCode = Center.LocationCode)
INNER JOIN Groups ON Center.CenterCode = Groups.CenterCode)
INNER JOIN Client ON Groups.GroupCode = Client.GroupCode
WHERE
((Client.CBStatus) IS NULL) AND ((Branch.PartnerName) in
('SVCL','Edelweiss'))
order by Client.ClientCode DESC
When i run it without order by it runs fine , but with order by it is not finishing execution. Why is this behavior ?

When you select using TOP statement, calculations and joins for every row are not necessarily calculated. When you try to order, at least one cell for all rows need to be calculated. It is a long query because your table is large and the behavior is not faulty. Don't let the fast running query without the order by mislead you about the complexity of your second query.
You can create an index on clientcode column. That would speed things up.

Related

Possible causes slow order by on sql server statement

I have the next query which returns 1550 rows.
SELECT *
FROM V_InventoryMovements -- 2 seconds
ORDER BY V_InventoryMovements.TransDate -- 23 seconds
It takes about 2 seconds to return the results.
But when I include the ORDER BY clause, then it takes about 23 seconds.
It is a BIG change just for adding an ORDER BY.
I would like to know what is happening, and a way to improve the query with the ORDER BY. To quit the ORDER BY should not be the solution.
Here a bit of information, please let me know if you need more info.
V_InventoryMovements
CREATE VIEW [dbo].[V_InventoryMovements]
AS
SELECT some_fields
FROM FinTime
RIGHT OUTER JOIN V_Outbound ON FinTime.StdDate = dbo.TruncateDate(V_Outbound.TransDate)
LEFT OUTER JOIN ReasonCode_Grouping ON dbo.V_Outbound.ReasonCode = dbo.ReasonCode_Grouping.ReasonCode
LEFT OUTER JOIN Items ON V_Outbound.ITEM = Items.Item
LEFT OUTER JOIN FinTime ON V_Outbound.EventDay = FinTime.StdDate
V_Outbound
CREATE VIEW [dbo].[V_Outbound]
AS
SELECT V_Outbound_WMS.*
FROM V_Outbound_WMS
UNION
SELECT V_Transactions_Calc.*
FROM V_Transactions_Calc
V_OutBound_WMS
CREATE VIEW [dbo].[V_OutBound_WMS]
AS
SELECT some_fields
FROM Transaction_Log
INNER JOIN MFL_StartDate ON Transaction_Log.TransDate >= MFL_StartDate.StartDate
LEFT OUTER JOIN Rack ON Transaction_Log.CHARGE = Rack.CHARGE AND Transaction_Log.CHARGE_LFD = Rack.CHARGE_LFD
V_Transactions_Calc
CREATE VIEW [dbo].[V_Transactions_Calc]
AS
SELECT some_fields
FROM Transactions_Calc
INNER JOIN MFL_StartDate ON dbo.Transactions_Calc.EventDay >= dbo.MFL_StartDate.StartDate
And here I will also share a part of the execution plan (the part where you can see the main cost). I don't know exactly how to read it and improve the query. Let me know if you need to see the rest of the execution plan. But all the other parts are 0% of Cost. The main Cost is in the: Nested Loops (Left Outer Join) Cost 95%.
Execution Plan With ORDER BY
Execution Plan Without ORDER BY
I think the short answer is that the optimizer is executing in a different order in an attempt to minimize the cost of the sorting, and doing a poor job. Its job is made very hard by the views within views within views, as GuidoG suggests. You might be able to convince it to execute differently by creating some additional index or statistics, but its going to be hard to advise on that remotely.
A possible workaround might be to select into a temp table, then apply the ordering afterwards:
SELECT *
INTO #temp
FROM V_InventoryMovements;
SELECT *
FROM #temp
ORDER BY TransDate

Trying to find a solution to long running SQL code where I think NESTED SQL statement is the culprit

I have a SQL statement that has a weird 2nd nested SQL statement that I think is causing this query to run for 6+ min and any suggestions/help would be appreciated. I tried creating a TEMP table for the values in the nested SQL statement and just do a simple join but there is nothing to join on in the SQL code so that is why they used a 1=1 in the ON statement for the join. Here is the SQL code:
Declare #TransactionEndDate datetime;
Select #TransactionEndDate = lastmonth_end from dbo.DTE_udfCommonDates(GETDATE());
Select ''''+TreatyName as Treaty,
cast(EndOfMonth as Date) as asOfDate,
Count(Distinct ClaimSysID) as ClaimCount,
Count(Distinct FeatureSysID) as FeatureCount,
Sum(OpenReserve) as OpenReserve
From (
Select
TreatyName,
EndOfMonth,
dbo.CMS_Claims.ClaimSysID,
FeatureSysID,
sum(IW_glGeneralLedger.TransactionAmount)*-1 as OpenReserve
From dbo.CMS_Claims
Inner Join dbo.CMS_Claimants
On dbo.CMS_Claims.ClaimSysID = dbo.CMS_Claimants.ClaimSysID
Inner Join dbo.CMS_Features
On dbo.CMS_Features.ClaimantSysID = dbo.CMS_Claimants.ClaimantSysID
Left Join dbo.IW_glGeneralLedger
On IW_glGeneralLedger.FeatureID = dbo.CMS_Features.FeatureSysID
Left Join dbo.IW_glSubChildAccount
On dbo.IW_glSubChildAccount.glSubChildAccountID = dbo.IW_glGeneralLedger.glSubChildAccountSysID
Left Join dbo.IW_glAccountGroup
On dbo.IW_glAccountGroup.glAccountGroupID = dbo.IW_glSubChildAccount.glAccountGroupSysID
Left Join dbo.IW_BankRegister
On dbo.IW_BankRegister.BankRegisterSysID = dbo.IW_glGeneralLedger.BankRegisterID
Left Join dbo.IW_BankRegisterStatus
On dbo.IW_BankRegisterStatus.BankRegisterStatusSysID = dbo.IW_BankRegister.BankRegisterStatusID
**Left Join (Select Distinct dbo.DTE_get_month_end(dt) as EndOfMonth
From IW_Calendar
Where dt Between '3/1/2004'
and #TransactionEndDate) as dates
on 1=1**
Left Join dbo.IW_ReinsuranceTreaty
On dbo.IW_ReinsuranceTreaty.TreatySysID = IW_glGeneralLedger.PolicyTreatyID
Where dbo.IW_glGeneralLedger.TransactionDate Between '1/1/2004 00:00:00' And EndOfMonth
And dbo.IW_glAccountGroup.Code In ('RESERVEINDEMNITY')
And (
(dbo.IW_glGeneralLedger.BankRegisterID Is Null)
Or (
(IW_BankRegister.PrintedDate Between '1/1/2004 00:00:00' And EndOfMonth Or dbo.IW_glGeneralLedger.BankRegisterID = 0)
And
(dbo.IW_BankRegisterStatus.EnumValue In ('Approved','Outstanding','Cleared','Void') Or dbo.IW_glGeneralLedger.BankRegisterID = 0))
)
Group By TreatyName, dbo.CMS_Claims.ClaimSysID, FeatureSysID, EndOfMonth
Having sum(IW_glGeneralLedger.TransactionAmount) <> 0
) As Data
Group By TreatyName,EndOfMonth
Order By EndOfMonth, TreatyName
This nested SQL code only provides a table of End of Month values in one column called EndOfMonth and this is what I'm trying to fix:
Select Distinct dbo.DTE_get_month_end(dt) as EndOfMonth
From IW_Calendar
Where dt Between '3/1/2004'
and #TransactionEndDate
Please use the below methods to increase the query performance.
Use temporary tables. ( load relevant data into temporary tables with necessary where conditions and then join).
Use clustered and non clustered indexes to your tables.
Create Multiple-Column Indexes.
Index the ORDER-BY / GROUP-BY / DISTINCT Columns for Better Response Time.
Use Parameterized Queries.
Use query hints accordingly.
NOLOCK: In the event that data is locked, this tells SQL Server to read data from the last known value available, also known as a dirty read. Since it is possible to use some old values and some new values, data sets can contain inconsistencies. Do not use this in any place in which data quality is important.
RECOMPILE: Adding this to the end of a query will result in a new execution plan being generated each time this query executed. This should not be used on a query that is executed often, as the cost to optimize a query is not trivial. For infrequent reports or processes, though, this can be an effective way to avoid undesired plan reuse. This is often used as a bandage when statistics are out of date or parameter sniffing is occurring.
MERGE/HASH/LOOP: This tells the query optimizer to use a specific type of join as part of a join operation. This is super-risky as the optimal join will change as data, schema, and parameters evolve over time. While this may fix a problem right now, it will introduce an element of technical debt that will remain for as long as the hint does.
OPTIMIZE FOR: Can specify a parameter value to optimize the query for. This is often used when we want performance to be controlled for a very common use case so that outliers do not pollute the plan cache. Similar to join hints, this is fragile and when business logic changes, this hint usage may become obsolete.

Issue in sql server with join query

I have four tables for join I am trying to join with views in sql server. i have successfully done join query and retrieving data from multiple table with join query. But I Execute the same query sql server shows the different result every time.
SELECT DISTINCT
dbo.tbl_verifyFinger2.ID
, dbo.tbl_verifyCnicDetails.fID
, dbo.tbl_verifyCnicDetails.colGRName
, dbo.tbl_verifyFinger2.colCompanyID
, dbo.tbl_verifyAvailableFingers.colCNIC
, dbo.tbl_agent.agent_id
, dbo.tbl_agent.colIMSI
, dbo.tbl_verifyFinger2.colDate
, dbo.tbl_verifyFinger2.colStatusMessage
FROM dbo.tbl_verifyFinger2
INNER JOIN dbo.tbl_verifyCnicDetails
ON dbo.tbl_verifyFinger2.ID = dbo.tbl_verifyCnicDetails.fID
INNER JOIN dbo.tbl_verifyAvailableFingers
ON dbo.tbl_verifyFinger2.colCNIC = dbo.tbl_verifyAvailableFingers.colCNIC
INNER JOIN dbo.tbl_agent
ON dbo.tbl_verifyAvailableFingers.colIMSI = dbo.tbl_agent.colIMSI
Cause SQL Server not allow to use ORDER By clause inside views, to get same preview of result every time, you must include ORDER BY clause in you outer SELECT query, at the end of query.
Of course, carefully choose columns in ORDER BY clause, because it must be deterministic which guarantee that every time sorted result will be the same and moving your rows up and down will not be presented more.
SELECT
*
FROM schema_name.view_name AS v
ORDER BY
v.column_name (ASC|DESC) --If ommiting directions, ASC is the default

Why is this CTE so much slower than using temp tables?

We had an issue since a recent update on our database (I made this update, I am guilty here), one of the query used was much slower since then. I tried to modify the query to get faster result, and managed to achieve my goal with temp tables, which is not bad, but I fail to understand why this solution performs better than a CTE based one, which does the same queries. Maybe it has to do that some tables are in a different DB ?
Here's the query that performs badly (22 minutes on our hardware) :
WITH CTE_Patterns AS (
SELECT
PEL.iId_purchased_email_list,
PELE.sEmail
FROM OtherDb.dbo.Purchased_Email_List PEL WITH(NOLOCK)
INNER JOIN OtherDb.dbo.Purchased_Email_List_Email AS PELE WITH(NOLOCK) ON PELE.iId_purchased_email_list = PEL.iId_purchased_email_list
WHERE PEL.bPattern = 1
),
CTE_Emails AS (
SELECT
ILE.iId_newsletterservice_import_list,
ILE.iId_newsletterservice_import_list_email,
ILED.sEmail
FROM dbo.NewsletterService_import_list_email AS ILE WITH(NOLOCK)
INNER JOIN dbo.NewsletterService_import_list_email_distinct AS ILED WITH(NOLOCK) ON ILED.iId_newsletterservice_import_list_email_distinct = ILE.iId_newsletterservice_import_list_email_distinct
WHERE ILE.iId_newsletterservice_import_list = 1000
)
SELECT I.iId_newsletterservice_import_list,
I.iId_newsletterservice_import_list_email,
BL.iId_purchased_email_list
FROM CTE_Patterns AS BL WITH(NOLOCK)
INNER JOIN CTE_Emails AS I WITH(NOLOCK) ON I.sEmail LIKE BL.sEmail
When running both CTE queries separately, it's super fast (0 secs in SSMS, returns 122 rows and 13k rows), when running the full query, with INNER JOIN on sEmail, it's super slow (22 minutes)
Here's the query that performs well, with temp tables (0 sec on our hardware) and which does the eaxct same thing, returns the same result :
SELECT
PEL.iId_purchased_email_list,
PELE.sEmail
INTO #tb1
FROM OtherDb.dbo.Purchased_Email_List PEL WITH(NOLOCK)
INNER JOIN OtherDb.dbo.Purchased_Email_List_Email PELE ON PELE.iId_purchased_email_list = PEL.iId_purchased_email_list
WHERE PEL.bPattern = 1
SELECT
ILE.iId_newsletterservice_import_list,
ILE.iId_newsletterservice_import_list_email,
ILED.sEmail
INTO #tb2
FROM dbo.NewsletterService_import_list_email AS ILE WITH(NOLOCK)
INNER JOIN dbo.NewsletterService_import_list_email_distinct AS ILED ON ILED.iId_newsletterservice_import_list_email_distinct = ILE.iId_newsletterservice_import_list_email_distinct
WHERE ILE.iId_newsletterservice_import_list = 1000
SELECT I.iId_newsletterservice_import_list,
I.iId_newsletterservice_import_list_email,
BL.iId_purchased_email_list
FROM #tb1 AS BL WITH(NOLOCK)
INNER JOIN #tb2 AS I WITH(NOLOCK) ON I.sEmail LIKE BL.sEmail
DROP TABLE #tb1
DROP TABLE #tb2
Tables stats :
OtherDb.dbo.Purchased_Email_List : 13 rows, 2 rows flagged bPattern = 1
OtherDb.dbo.Purchased_Email_List_Email : 324289 rows, 122 rows with patterns (which are used in this issue)
dbo.NewsletterService_import_list_email : 15.5M rows
dbo.NewsletterService_import_list_email_distinct ~1.5M rows
WHERE ILE.iId_newsletterservice_import_list = 1000 retrieves ~ 13k rows
I can post more info about tables on request.
Can someone help me understand this ?
UPDATE
Here is the query plan for the CTE query :
Here is the query plan with temp tables :
As you can see in the query plan, with CTEs, the engine reserves the right to apply them basically as a lookup, even when you want a join.
If it isn't sure enough it can run the whole thing independently, in advance, essentially generating a temp table... let's just run it once for each row.
This is perfect for the recursion queries they can do like magic.
But you're seeing - in the nested Nested Loops - where it can go terribly wrong.
You're already finding the answer on your own by trying the real temp table.
Parallelism. If you noticed in your TEMP TABLE query, the 3rd Query indicates Parallelism in both distributing and gathering the work of the 1st Query. And Parallelism when combining the results of the 1st and 2nd Query. The 1st Query also incidentally has a relative cost of 77%. So the Query Engine in your TEMP TABLE example was able to determine that the 1st Query can benefit from Parallelism. Especially when the Parallelism is Gather Stream and Distribute Stream, so its allowing the divying up of work (join) because the data is distributed in such a way that allows for divying up the work then recombining. Notice the cost of the 2nd Query is 0% so you can ignore that as no cost other than when it needs to be combined.
Looking at the CTE, that is entirely processed Serially and not in Parallel. So somehow with the CTE it could not figure out the 1st Query can be run in Parallel, as well as the relationship of the 1st and 2nd query. Its possible that with multiple CTE expressions it assumes some dependency and did not look ahead far enough.
Another test you can do with the CTE is keep the CTE_Patterns but eliminate the CTE_Emails by putting that as a "subquery derived" table to the 3rd Query in the CTE. It would be curious to see the Execution Plan, and see if there is Parallelism when expressed that way.
In my experience it's best to use CTE's for recursion and temp tables when you need to join back to the data. Makes for a much faster query typically.

Sql query optimization

I have a query that I want to execute that fastest possible.
Here it is:
select d.InvoiceDetailId,a.Fee,a.FeeTax
from InvoiceDetail d
LEFT JOIN InvoiceDetail a on a.AdjustDetailId = d.InvoiceDetailId
I put an ascending index on AdjustDetailId column
I then ran the query with 'Show Actual Execution Plan' and the result estimated subtree cost(off of the topmost select node) was 2.07
I then thought, maybe I can do something to improve this so I added a conditional to the left join like so:
select d.InvoiceDetailId,a.Fee,a.FeeTax
from InvoiceDetail d
LEFT JOIN InvoiceDetail a on a.AdjustDetailId is not null
and a.AdjustDetailId = d.InvoiceDetailId
I re-ran and I got a subtree cost of .98. So I thought, great I made it twice as fast. Well I then clicked show client statistics and then clicked execute 4-5 times with both queries and believe it or not the first query averaged out to be faster. I don't get it. By the way the query returns 120K rows.
Any insight?
Maybe i get tainted results because of caching, but I don't know if that is the case or how to reset the caching.
EDIT:
Okay I googled how to clear query cache so I added the following before the queries:
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
I then ran each query 5 times and the first query was still a little faster(13%).
1st Query: Client Processing time: 239.4
2nd Query: Client Processing time: 290
So I guess the question is, why do you think so? Could it be when the table quadruples in size that the second query will be faster? Or the left join is causing the query to hit the index twice so it will always be slower.
Please don't flame me, I'm just trying to get educated.
EDIT # 2:
I need to get all the InvoiceDetails, not just the adjusted ones hence the left join.
EDIT # 3:
The real problem I'm trying to solve with the query is to sum up all of the InvoiceDetail rows but at the same time adjust them as well. So ultimately it seems that the best query to perform is the following. I thought doing a join then adding the joined in table would be the only way but it seems that grouping by a conditional solves the problem most elegantly.
SELECT CASE WHEN AdjustDetailId IS NULL THEN InvoiceDetailId ELSE AdjustDetailId END AS InvoiceDetailId
,SUM(Fee + FeeTax) AS Fee
FROM dbo.InvoiceDetail d
GROUP BY CASE WHEN AdjustDetailId IS NULL THEN InvoiceDetailId ELSE AdjustDetailId END
Example: With the following rows
InvoiceDetailId|Fee|FeeTax|AdjustDetailId
1|300|0|NULL
2|-100|0|1
3|-50|0|1
4|250|0|NULL
My desire was to get the following:
InvoiceDetailId|Fee
1|150
4|250
Thanks everybody for your input.
If you want to make that query really fast, you need to
turn the LEFT JOIN into an INNER JOIN
make sure the InvoiceDetail.AdjustDetailId and InvoiceDetail.InvoiceDetailId are indexed
SELECT
d.InvoiceDetailId, a.Fee, a.FeeTax
FROM
dbo.InvoiceDetail d
INNER JOIN
dbo.InvoiceDetail a ON a.AdjustDetailId = d.InvoiceDetailId
Next, you need to make sure your statistics are up to date, so that the cost-based query optimizer can work properly.
In order to update the statistics, use the UPDATE STATISTICS (table) command - see the MSDN docs on UPDATE STATISTICS here
I would have guessed that they would be the same, (with the same execution plan) since it is impossible for a predicate like a.AdjustDetailId = d.InvoiceDetailId to be true if one side is null... So adding the Is Not Null condition is redundant. But maybe the processor is executing additional unnecessary steps with that additional predicate in there...
But what the other answer mentions is more important. Do you really need to output all the rows where there is no matching record (Invoices without a Adjusting Invoice) ?? If not change it to an Inner join and it will speed up a lot.
if you really need them, however, You might try a Union
Select d.InvoiceDetailId,a.Fee,a.FeeTax
From InvoiceDetail d
Join InvoiceDetail a
On a.AdjustDetailId = d.InvoiceDetailId
Union
Select InvoiceDetailId, null, null
from InvoiceDetail
Where AdjustDetailId Is Null
Which does the same thing without using an outer join...
(It is problematic as to whether two queries with a union will run faster than the single outer join query... )
You only have 1 table in this query, right?
If you use
select InvoiceDetailId, Fee, FeeTax
from InvoiceDetail
That WILL get all the rows, not just the adjusted ones.
Asuming you are doing a self-join, and doing it for a good reason, I would index InvoiceDetailId and AdjustDetailId and see which index(es) the execution plan uses.
You could also try "include" the Fee and FeeTax columns in your index - this will help a lot if the table is really wide.
For your queries, I can think of 3 different reasonable execution plans:
LOOP JOIN OUTER [a.AdjustDetailId = d.InvoiceDetailId]
TABLE SCAN InvoiceDetail d
TABLE SCAN InvoiceDetail a
HASH JOIN OUTER [a.AdjustDetailId = d.InvoiceDetailId]
TABLE SCAN InvoiceDetail d
TABLE SCAN InvoiceDetail a
LOOP JOIN OUTER
HASH JOIN OUTER [x.AdjustDetailId = d.InvoiceDetailId] AS y
TABLE SCAN InvoiceDetail d
INDEX SEEK [InvoiceDetail, AdjustDetailId IS NOT NULL] x
InvoiceDetail a [a.AdjustDetailId = y.AdjustDetailId]
Perhaps adding the IS NOT NULL condition makes the optimizer choose another one of the plans, it's hard to say.

Resources