TSQL Query performance - sql-server

I have the following query which takes about 20s to complete.
declare #shoppingBasketID int
select #shoppingBasketID = [uid]
from shoppingBasket sb
where sb.requestID = 21918154 and sb.[status] > 0
select
ingredientGroup.shoppingBasketItemID as itemID,
ingredientGroup.[uid] as groupUID
from shoppingBasketItem item
left outer join shoppingBasketItemBundle itemBundle on itemBundle.primeMenuItemID = item.[uid]
left outer join shoppingBasketItem bundleItem on bundleItem.[uid] = isnull(itemBundle.linkMenuItemID, item.[uid])
left outer join shoppingBasketItemIngredientGroup ingredientGroup on ingredientGroup.shoppingBasketItemID = isnull(itemBundle.linkMenuItemID, item.[uid])
left outer join shoppingBasketItemIngredient ingredient on ingredient.shoppingBasketItemIngredientGroupID = ingredientGroup.[uid]
where item.shoppingBasketID = #shoppingBasketID
The 'shoppingBasketItemIngredient' table has 40 millions rows.
When I change the last line to the following the query returns the results almost instantly. (I moved the first select into the second select query).
where item.shoppingBasketID = (select [uid] from shoppingBasket sb where sb.requestID = 21918154 and sb.[status] > 0)
Do you know why?

This is too long for a comment.
Queries in stored procedures are compiled the first time they are run and the query plan is cached. So, if you test the stored procedure on an empty table, then it might generate a bad query plan -- and that doesn't get updated automatically.
You can force a recompile at either the stored procedure or query level, using the option WITH (RECOMPILE). Here is some documentation.

You could add a query hint.
When using a variable the query optimizer could generate a slow execution plan.
It's easier for a query optimizer to calculate the optimal plan when a fixed value is used.
But by adding the right hint(s) it could go for a faster execution plan.
For example:
select
...
where item.shoppingBasketID = #shoppingBasketID
OPTION ( OPTIMIZE FOR (#shoppingBasketID UNKNOWN) );
In the example UNKNOWN was used, but you can give a value instead.

Related

Trying to find a solution to long running SQL code where I think NESTED SQL statement is the culprit

I have a SQL statement that has a weird 2nd nested SQL statement that I think is causing this query to run for 6+ min and any suggestions/help would be appreciated. I tried creating a TEMP table for the values in the nested SQL statement and just do a simple join but there is nothing to join on in the SQL code so that is why they used a 1=1 in the ON statement for the join. Here is the SQL code:
Declare #TransactionEndDate datetime;
Select #TransactionEndDate = lastmonth_end from dbo.DTE_udfCommonDates(GETDATE());
Select ''''+TreatyName as Treaty,
cast(EndOfMonth as Date) as asOfDate,
Count(Distinct ClaimSysID) as ClaimCount,
Count(Distinct FeatureSysID) as FeatureCount,
Sum(OpenReserve) as OpenReserve
From (
Select
TreatyName,
EndOfMonth,
dbo.CMS_Claims.ClaimSysID,
FeatureSysID,
sum(IW_glGeneralLedger.TransactionAmount)*-1 as OpenReserve
From dbo.CMS_Claims
Inner Join dbo.CMS_Claimants
On dbo.CMS_Claims.ClaimSysID = dbo.CMS_Claimants.ClaimSysID
Inner Join dbo.CMS_Features
On dbo.CMS_Features.ClaimantSysID = dbo.CMS_Claimants.ClaimantSysID
Left Join dbo.IW_glGeneralLedger
On IW_glGeneralLedger.FeatureID = dbo.CMS_Features.FeatureSysID
Left Join dbo.IW_glSubChildAccount
On dbo.IW_glSubChildAccount.glSubChildAccountID = dbo.IW_glGeneralLedger.glSubChildAccountSysID
Left Join dbo.IW_glAccountGroup
On dbo.IW_glAccountGroup.glAccountGroupID = dbo.IW_glSubChildAccount.glAccountGroupSysID
Left Join dbo.IW_BankRegister
On dbo.IW_BankRegister.BankRegisterSysID = dbo.IW_glGeneralLedger.BankRegisterID
Left Join dbo.IW_BankRegisterStatus
On dbo.IW_BankRegisterStatus.BankRegisterStatusSysID = dbo.IW_BankRegister.BankRegisterStatusID
**Left Join (Select Distinct dbo.DTE_get_month_end(dt) as EndOfMonth
From IW_Calendar
Where dt Between '3/1/2004'
and #TransactionEndDate) as dates
on 1=1**
Left Join dbo.IW_ReinsuranceTreaty
On dbo.IW_ReinsuranceTreaty.TreatySysID = IW_glGeneralLedger.PolicyTreatyID
Where dbo.IW_glGeneralLedger.TransactionDate Between '1/1/2004 00:00:00' And EndOfMonth
And dbo.IW_glAccountGroup.Code In ('RESERVEINDEMNITY')
And (
(dbo.IW_glGeneralLedger.BankRegisterID Is Null)
Or (
(IW_BankRegister.PrintedDate Between '1/1/2004 00:00:00' And EndOfMonth Or dbo.IW_glGeneralLedger.BankRegisterID = 0)
And
(dbo.IW_BankRegisterStatus.EnumValue In ('Approved','Outstanding','Cleared','Void') Or dbo.IW_glGeneralLedger.BankRegisterID = 0))
)
Group By TreatyName, dbo.CMS_Claims.ClaimSysID, FeatureSysID, EndOfMonth
Having sum(IW_glGeneralLedger.TransactionAmount) <> 0
) As Data
Group By TreatyName,EndOfMonth
Order By EndOfMonth, TreatyName
This nested SQL code only provides a table of End of Month values in one column called EndOfMonth and this is what I'm trying to fix:
Select Distinct dbo.DTE_get_month_end(dt) as EndOfMonth
From IW_Calendar
Where dt Between '3/1/2004'
and #TransactionEndDate
Please use the below methods to increase the query performance.
Use temporary tables. ( load relevant data into temporary tables with necessary where conditions and then join).
Use clustered and non clustered indexes to your tables.
Create Multiple-Column Indexes.
Index the ORDER-BY / GROUP-BY / DISTINCT Columns for Better Response Time.
Use Parameterized Queries.
Use query hints accordingly.
NOLOCK: In the event that data is locked, this tells SQL Server to read data from the last known value available, also known as a dirty read. Since it is possible to use some old values and some new values, data sets can contain inconsistencies. Do not use this in any place in which data quality is important.
RECOMPILE: Adding this to the end of a query will result in a new execution plan being generated each time this query executed. This should not be used on a query that is executed often, as the cost to optimize a query is not trivial. For infrequent reports or processes, though, this can be an effective way to avoid undesired plan reuse. This is often used as a bandage when statistics are out of date or parameter sniffing is occurring.
MERGE/HASH/LOOP: This tells the query optimizer to use a specific type of join as part of a join operation. This is super-risky as the optimal join will change as data, schema, and parameters evolve over time. While this may fix a problem right now, it will introduce an element of technical debt that will remain for as long as the hint does.
OPTIMIZE FOR: Can specify a parameter value to optimize the query for. This is often used when we want performance to be controlled for a very common use case so that outliers do not pollute the plan cache. Similar to join hints, this is fragile and when business logic changes, this hint usage may become obsolete.

Predicates on views: How can I evaluate the predicate before the view joins?

I have a query:
SELECT TOP 1 * FROM vwTimeBlocks
WHERE vwTimeBlocks.id = #id
AND vwTimeBlocks.organization = #org
organization is indexed in some of the tables within the view. I expected that the WHERE would be evaluated on the underlying tables in the view before any JOINs, but this isn't the case:
The execution plan suggests that the WHERE is being applied on the result of vwBlocks. Replacing the vwBlocks definition gives me this:
SELECT TOP 1 * FROM
( SELECT ... all the things...
FROM tblTimeBlocks
LEFT OUTER JOIN tblA
ON tblA.id = tblTimeBlocks.F
AND tblTimeBlocks.B = 1000
LEFT OUTER JOIN tblB
ON tblB.id = tblTimeBlocks.F
AND tblTimeBlocks.B = 2000
) AS subTimeBlocks
WHERE subTimeBlocks.id = #timeblock AND subTimeBlocks.organization = #org;
Comparing the execution plan of this against the original query shows they're identical. However, when I place the WHERE clause inside the subquery:
SELECT TOP 1 * FROM
( SELECT ... all the things...
FROM tblTimeBlocks
LEFT OUTER JOIN tblA
ON tblA.id = tblTimeBlocks.F
AND tblTimeBlocks.B = 1000
LEFT OUTER JOIN tblB
ON tblB.id = tblTimeBlocks.F
AND tblTimeBlocks.B = 2000
WHERE subTimeBlocks.id = #timeblock AND subTimeBlocks.organization = #org
) AS subTimeBlocks
The overall query cost drops, and the execution plan for the new query is much better; here's a comparison, with the top execution plan showing the WHERE outside the subquery, and the bottom plan with the WHERE inside the subquery.
Note that the relative costs of the queries is at 91%-9%, and the first filters on organization quite late, while the second uses two seeks.
Why is this the case? I expected the query optimizer to optimize better than it did.
Given this situation, I'd obviously like to use the second query, and somehow push the predicate into the view portion of the query. Because the predicate relies on a parameter, I can't just alter the view definition. One option is to replace the view with a table-defined function, but this situation is replicated all over my database. Is there any other way?
Is there any way I can hint, or otherwise ask sqlserver to use the predicate in my view? I don't know why the query optimizer doesn't look at the query and push the WHERE as early as possible.
In case it's relevant, this is running on:
Microsoft SQL Azure (RTM) - 12.0.2000.8

Why does sql server do a scan on joins when there are no records in source table

The idea of the below query is to use the CTE to get the primary key of all rows in [Archive].[tia_tia_object] that meet the filter.
The execution time for the query within the CTE is 0 seconds.
The second part is supposed to do joins on other tables, to filter the data some more, but only if there are any rows returned in the CTE. This was the only way I could get the SQL server to use the correct indexes.
Why does it spend time (see execution plan) looking in TIA_TIA_AGREEMENT_LINE and TIA_TIA_OBJECT, when CTE returns 0 rows?
WITH cte_vehicle
AS (SELECT O.[Seq_no],
O.Object_No
FROM [Archive].[tia_tia_object] O
WHERE O.RECORD_TIMESTAMP >
(SELECT LastLoadTimeStamp FROM staging.Ufngetlastloadtimestamp('Staging.CoveredObject'))
AND O.[Meta_iscurrent] = 1
AND O.OBJECT_TYPE IN ( 'BIO01', 'CAO01', 'DKV', 'GFO01',
'KMA', 'KNO01', 'MCO01', 'VEO01',
'SVO01', 'AUO01' ))
SELECT O.[Seq_no] AS [Bkey_CoveredObject],
Cast(O.[Agr_Line_No] AS BIGINT) AS [Agr_Line_No],
O.[Cover_Start_Date] AS [CoverageFrom],
O.[Cover_End_Date] AS [CoverageTo],
O.[Timestamp] AS [TIMESTAMP],
O.[Record_Timestamp] AS [RECORD_TIMESTAMP],
O.[Newest] AS [Newest],
O.LOCATION_ID AS LocationNo,
O.[Cust_no],
O.[N01]
FROM cte_vehicle AS T
INNER JOIN [Archive].[tia_tia_object] O
ON t.Object_No = O.Object_No
AND t.Seq_No = O.Seq_No
INNER JOIN [Archive].[tia_tia_agreement_line] AL
ON O.Agr_line_no = AL.Agr_line_no
INNER JOIN [Archive].[tia_tia_policy] P
ON AL.Policy_no = P.Policy_no
WHERE P.[Transaction_type] <> 'D'
Execution plan:
Because it still needs to check and look for records. Even if there are no records in that table, it doesn't know that until it actually checks.
Much like if someone gives you a sealed box, you don't know it's empty or not till you open it.

SQL query showing drastically different performance depending on the parameters

We have a stored procedure that searches products based on a number of input parameters that differ from one scenario to the next. Depending on the input parameters the search involves anywhere from two to about a dozen different tables. In order to avoid unnecessary joins we build the actual search query as dynamic SQL and execute it inside the stored procedure.
In one of the most basic scenarios the user searches products by a keyword alone (see Query 1 below), which usually takes less than a second. However, if they search by a keyword and department (Query 2 below), the execution time goes up to well over a minute, and the execution plan looks somewhat different (the attached snapshots of the plans are showing just the parts that differ).
Query 1 (fast)
SELECT DISTINCT
Product.ProductID, Product.Title
FROM
Product
INNER JOIN ProductVariant ON (ProductVariant.ProductID = Product.ProductID)
WHERE (1=1)
AND (CONTAINS((Product.*), #Keywords) OR CONTAINS((ProductVariant.*), #Keywords))
AND (Product.SourceID = #SourceID)
AND (Product.ProductStatus = #ProductStatus)
AND (ProductVariant.ProductStatus = #ProductStatus)
Query 2 (slow)
SELECT DISTINCT
Product.ProductID, Product.Title
FROM
Product
INNER JOIN ProductVariant ON (ProductVariant.ProductID = Product.ProductID)
WHERE (1=1)
AND (CONTAINS((Product.*), #Keywords) OR CONTAINS((ProductVariant.*), #Keywords))
AND (Product.SourceID = #SourceID)
AND (Product.DepartmentID = #DepartmentID)
AND (Product.ProductStatus = #ProductStatus)
AND (ProductVariant.ProductStatus = #ProductStatus)
Both the Product and ProductVariant table have some string columns that participate in the full-text index. The Product table has a non-clustered indexed on the SourceID column and another non-clustered indexed on SourceID+DepartmentID (this redundancy is not an oversight but is intended). ProductVariant.ProductID is a FK to Product and has a non-clustered index on it. Statistics are updated for all indexes and columns, and no missing indexes are reported by SQL Management Studio.
Any suggestions on what might be causing this drastically different performance?
P.S. Forgot to mention that Product.DepartmentID is a FK to a table of departments, in case it makes any difference.
Thanks to #MartinSmith for the suggestion to break the full-text search logic out into temp tables and then using them to filter the results of the main query. The following returns in just 2 seconds:
SELECT
[Key] AS ProductID
INTO
#matchingProducts
FROM
CONTAINSTABLE(Product, *, #Keywords)
SELECT
[Key] AS VariantID
INTO
#matchingVariants
FROM
CONTAINSTABLE(ProductVariant, *, #Keywords)
SELECT DISTINCT
Product.ProductID, Product.Title
FROM
Product
INNER JOIN ProductVariant ON (ProductVariant.ProductID = Product.ProductID)
LEFT OUTER JOIN #matchingProducts ON #matchingProducts.ProductID = Product.ProductID
LEFT OUTER JOIN #matchingVariants ON #matchingVariants.VariantID = ProductVariant.VariantID
WHERE (1=1)
AND (Product.SourceID = #SourceID)
AND (Product.ProductStatus = #ProductStatus)
AND (ProductVariant.ProductStatus = #ProductStatus)
AND (Product.DepartmentID = #DepartmentID)
AND (NOT #matchingProducts.ProductID IS NULL OR NOT #matchingVariants.VariantID IS NULL)
Curiously, when I tried to simplify the above solution using nested queries as shown below, the results were somewhere in-between in terms of speed (around 25 secs). Theoretically, the query below should be identical to the one above, yet somehow SQL Server internally compiles the second one differently.
SELECT DISTINCT
Product.ProductID, Product.Title
FROM
Product
INNER JOIN ProductVariant ON (ProductVariant.ProductID = Product.ProductID)
LEFT OUTER JOIN
(
SELECT
[Key] AS ProductID
FROM
CONTAINSTABLE(Product, *, #Keywords)
) MatchingProducts
ON MatchingProducts.ProductID = Product.ProductID
LEFT OUTER JOIN
(
SELECT
[Key] AS VariantID
FROM
CONTAINSTABLE(ProductVariant, *, #Keywords)
) MatchingVariants
ON MatchingVariants.VariantID = ProductVariant.VariantID
WHERE (1=1)
AND (Product.SourceID = #SourceID)
AND (Product.ProductStatus = #ProductStatus)
AND (ProductVariant.ProductStatus = #ProductStatus)
AND (Product.DepartmentID = #DepartmentID)
AND (NOT MatchingProducts.ProductID IS NULL OR NOT MatchingVariants.VariantID IS NULL)
This may have been your mistake
In order to avoid unnecessary joins we build the actual search query as dynamic SQL and execute it inside the stored procedure.
The dynamic SQL cannot be optimized by the server in most cases. There are certain techniques to mitigate this. Read more The Curse and Blessings of Dynamic SQL
Get rid of your dynamic SQL and build a decent query using proper indices. I assure you; the SQL Server knows better than you when it comes to optimizing. Define ten different queries if you must (or a hundred).
Secondly... Why would you ever expect the same execution plan when running different queries? using different columns/indices? The execution plans and results you get seem perfectly natural to me, given your approach.
You will not get the same execution plan / performance because you are not querying column 'DepartmentID' the first query.

select statment performance degradation when using DISTINCT with parameters

Note for bounty - START:
PARAMETERS SNIFFING (that is the only "idea" that was reported in pre-bounty questions) is not the issue here, as you can read in the "update" section at the end of the question. The problem is really related to how sql server creates execution plans for a parametrized query when distinct is used.
I uploaded a very simple database backup (it works with sql server 2008 R2) here (you must wait 20 seconds before downloading). Against this DB you can try to run the following queries:
-- PARAMETRIZED QUERY
declare #IS_ADMINISTRATOR int
declare #User_ID int
set #IS_ADMINISTRATOR = 1 -- 1 for administrator 0 for normal
set #User_ID = 50
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
DOC.DOCUMENT_ID
FROM
DOCUMENTS DOC LEFT OUTER JOIN
FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)
WHERE
1 = #IS_ADMINISTRATOR OR ROL.USER_ID = #USER_ID
-- NON PARAMETRIZED QUERY
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
DOC.DOCUMENT_ID
FROM
DOCUMENTS DOC LEFT OUTER JOIN
FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)
WHERE
1 = 1 OR ROL.USER_ID = 50
Final note: I noticed DSTINCT is the problem, my goal is to achieve the same speed (or at least almost the same speed) in both queries.
Note for bounty - END:
Original question:
I noticed that there is an heavy difference in performance between
-- Case A
select distinct * from table where id > 1
compared to (this is the sql generated by my Delphi application)
-- Case B1
exec sp_executesql N'select distinct * from table where id > #P1',N'#P1 int',1
that is equivalent to
-- Case B2
declare #P1 int
set #P1 = 1
select distinct * from table where id > #P1
A performs much faster than B1 and B2. The performance becomes the same in case I remove DISTINCT.
May you comment on this?
Here i posted a trivial query, I noticed this on a query with 3 INNER JOIN. Anyway not a complex query.
Note: I was expecting to have THE EXACT SAME PERFORMANCE, in cases A and B1/B2.
So are there some caveats in using DISTINCT?
UPDATE:
I tried to disable parameter sniffing using DBCC TRACEON (4136, -1) (the flag to disable parameter sniffing) but nothing changes. So in this case the problem is NOT LINKED TO PARAMETERS SNIFFING. Any idea?
The problem isn't that DISTINCT is causing a performance degradation with parameters, it's that the rest of the query isn't being optimized away in the parameterized query because the optimizer won't just optimize away all of the joins using 1=#IS_ADMINISTRATOR like it will with just 1=1. It won't optimize the joins away without distinct because it needs to return duplicates based on the result of the joins.
Why? Because the execution plan tossing out all of the joins would be invalid for any value other than #IS_ADMINISTRATOR = 1. It will never generate that plan regardless of whether you are caching plans or not.
This performs as well as the non parameterized query on my 2008 server:
-- PARAMETRIZED QUERY
declare #IS_ADMINISTRATOR int
declare #User_ID int
set #IS_ADMINISTRATOR = 1 -- 1 for administrator 0 for normal
set #User_ID = 50
IF 1 = #IS_ADMINISTRATOR
BEGIN
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
DOC.DOCUMENT_ID
FROM
DOCUMENTS DOC LEFT OUTER JOIN
FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)
WHERE
1 = 1
END
ELSE
BEGIN
SELECT DISTINCT -- PLEASE REMEMBER DISTINCT MAKES THE DIFFERENCE!!!
DOC.DOCUMENT_ID
FROM
DOCUMENTS DOC LEFT OUTER JOIN
FOLDERS FOL ON FOL.FOLDER_ID = DOC.FOLDER_ID LEFT OUTER JOIN
ROLES ROL ON (FOL.FOLDER_ID = ROL.FOLDER_ID)
WHERE
ROL.USER_ID = #USER_ID
END
What's clear from the query plan I see running your example is that #IS_ADMINISTRATOR = 1 does not get optimized out the same as 1=1. In your non-parameterized example, the JOINS are completely optimized out, and it just returns every id in the DOCUMENTS table (very simple).
There are also different optimizations missing when #IS_ADMINISTRATOR <> 1. For instance, the LEFT OUTER JOINS are automatically changed to INNER JOINs without that OR clause, but they are left as-is with that or clause.
See also this answer: SQL LIKE % FOR INTEGERS for a dynamic SQL alternative.
Of course, this doesn't really explain the performance difference in your original question, since you don't have the OR in there. I assume that was an oversight.
But also see "parameter sniffing" issue.
Why does a parameterized query produces vastly slower query plan vs non-parameterized query
https://groups.google.com/group/microsoft.public.sqlserver.programming/msg/1e4a2438bed08aca?hl=de
Have you tried running your second (slower) query without dynamic SQL? Have you cleared the cache and rerun the first query? You may be experiencing parameter sniffing with the parameterized dynamic SQL query.
I think the DISTINCT is a red herring and not the actual issue.

Resources