select top 10 ... and select top 30 follows different execution plan - sql-server

During query optimization I encounted a strange behaviour of sql server (Sql Server 2008 R2 Enterprise). I created several indexes on tables, as well as some indexed views. I have two queries, for example:
select top 10 N0."Oid",N1."ObjectType",N1."OptimisticLockField" from ((("dbo"."Issue" N0
inner join "dbo"."Article" N1 on (N0."Oid" = N1."Oid"))
inner join "dbo"."ProductLink" N2 on (N1."ProductLink" = N2."Oid"))
inner join "dbo"."Technology" N3 on (N2."Technology" = N3."Oid"))
where (N1."GCRecord" is null and (N0."IsPrivate" = 0) and ((N0."HasMarkedAnswers" = 0) or N0."HasMarkedAnswers" is null) and (N3."Name" = N'Discussions'))
order by N1."ModifiedOn" desc
and
select top 30 N0."Oid",N1."ObjectType",N1."OptimisticLockField" from ((("dbo"."Issue" N0
inner join "dbo"."Article" N1 on (N0."Oid" = N1."Oid"))
inner join "dbo"."ProductLink" N2 on (N1."ProductLink" = N2."Oid"))
inner join "dbo"."Technology" N3 on (N2."Technology" = N3."Oid"))
where (N1."GCRecord" is null and (N0."IsPrivate" = 0) and ((N0."HasMarkedAnswers" = 0) or N0."HasMarkedAnswers" is null) and (N3."Name" = N'Discussions'))
order by N1."ModifiedOn" desc
both queries are the same, except first starts with select top 10 and second with select top 30. Both queries returns the same result set - 6 rows. But the second query is 5 times faster then the first one! I looked at the actual execution plans for both queries, and of course, they differs. Second query uses indexed view, and performs great, and the first query denies to use it, using indexes on tables instead. I repeat myself - both queries are the same, to the same table, at the same server, they differs only by number in "top" part.
I tried to force optimizer to use indexed view in the first query by updating statistics, destroing indexes it used and so on. No matter how I try actual execution do not use indexed view for the first query and always use it for the second one.
I am really intrested in the reasons causing such behavior. Any suggestions?
Update I am not sure that it can help without decribing corresponding indexes and view, but this is actual execution plan diagramms:
for select top 19:
for select top 18:
another confusing fact is that for the select top 19 query sometimes indexed view is used, sometimes not

The only thing I can think of is perhaps the optimizer in the first query concluded that the specifying criteria is not selective enough for the "better" execution plan to be used.
If you are still investigating this see if TOP 60, 90, 100, ... produces the second execution plan and performs well. You could also tinker with it to see what the threshold is for the optimizer to select the second plan in this case.
Also try the queries without the order by statement to see if that is affecting the selection of the query plan (check the index on that field, etc)
Beyond that, you said you can't use index hints so perhaps a re-write where you select top X from your Article table (N1) with a bunch of exists statements in your where clause would provide better performance for you.

Related

Same query, same DB, different execution plans & dramatically different times to execute

I'm confronted with a problem I cannot get my mind around. We're running SQL Server 2012. I have run into a pair of essentially identical queries which yield different execution plans and dramatically different times to execute (1 sec vs 40+ sec)... and they even return the exact same records. The only difference between them is the category the records are queried by.
This query runs in 1 second:
SELECT P.idProduct, P.sku, P.description, P.price, P.listhidden, P.listprice, P.serviceSpec, P.bToBPrice, P.smallImageUrl,P.noprices,P.stock, P.noStock,P.pcprod_HideBTOPrice,P.pcProd_BackOrder,P.FormQuantity,P.pcProd_BTODefaultPrice,cast(P.sDesc as varchar(8000)) sDesc, 0, 0, P.pcprod_OrdInHome, P.sales, P.pcprod_EnteredOn, P.hotdeal, P.pcProd_SkipDetailsPage
FROM products P INNER JOIN categories_products CP ON P.idProduct = CP.idProduct
WHERE CP.idCategory=494 AND active=-1 AND configOnly=0 and removed=0 AND formQuantity=0
AND ((SELECT TOP 1 SP.stock FROM products SP WHERE SP.pcprod_ParentPrd = P.idProduct AND SP.description LIKE N'%(9-12 Months)' AND SP.removed=0) > 0)
ORDER BY P.description Asc
The second runs 40 seconds or more, but the ONLY difference is the idCategory queried:
SELECT P.idProduct, P.sku, P.description, P.price, P.listhidden, P.listprice, P.serviceSpec, P.bToBPrice, P.smallImageUrl,P.noprices,P.stock, P.noStock,P.pcprod_HideBTOPrice,P.pcProd_BackOrder,P.FormQuantity,P.pcProd_BTODefaultPrice,cast(P.sDesc as varchar(8000)) sDesc, 0, 0, P.pcprod_OrdInHome, P.sales, P.pcprod_EnteredOn, P.hotdeal, P.pcProd_SkipDetailsPage
FROM products P INNER JOIN categories_products CP ON P.idProduct = CP.idProduct
WHERE CP.idCategory=628 AND active=-1 AND configOnly=0 and removed=0 AND formQuantity=0
AND ((SELECT TOP 1 SP.stock FROM products SP WHERE SP.pcprod_ParentPrd = P.idProduct AND SP.description LIKE N'%(9-12 Months)' AND SP.removed=0) > 0)
ORDER BY P.description Asc
They even return the exact same records in the exact same order.
Execution plan for the 1st query:
Execution plan for the 2nd query:
[EDIT] The plans here are the actual, not the estimated, execution plans.
The categories_products table is a simple lookup table with only the two fields idCategory and idProduct. Even the records returned are exactly the same (it just happens to be that for SP.description LIKE N'%(9-12 Months)', the same products are assigned to these 2 categories). The only other difference between the two is that CP.idCategory 628 was just created this morning (but i don't see what difference that could make).
[EDIT: but that's exactly what did make the difference]
How can this be? How can simply changing the CP.idCategory queried here yield a different execution plan, and even more importantly: how is it that one takes some 40 times as long to execute?
Ultimately, I'm at a loss to figure out how to improve the dreadful performance of the 2nd query given that there's no essential difference between the two that I can understand.
This problem is description column. The length of description of idCategory[628] longer than idCategory[494]. Because you are using SP.description LIKE N'%(9-12 Months)'. Length of description is too long then you get slowly.
Also you are using ORDER BY P.description Asc.

Adding Conditional around query increases time by over 2400%

Update: I will get query plan as soon as I can.
We had a poor performing query that took 4 minutes for a particular organization. After the usual recompiling the stored proc and updating statistics didn't help, we re-wrote the if Exists(...) to a select count(*)... and the stored procedure when from 4 minutes to 70 milliseconds. What is the problem with the conditional that makes a 70 ms query take 4 minutes? See the examples
These all take 4+ minutes:
if (
SELECT COUNT(*)
FROM ObservationOrganism omo
JOIN Observation om ON om.ObservationID = omo.ObservationMicID
JOIN Organism o ON o.OrganismID = omo.OrganismID
JOIN ObservationMicDrug omd ON omd.ObservationOrganismID = omo.ObservationOrganismID
JOIN SIRN srn ON srn.SIRNID = omd.SIRNID
JOIN OrganismDrug od ON od.OrganismDrugID = omd.OrganismDrugID
WHERE
om.StatusCode IN ('F', 'C')
AND o.OrganismGroupID <> -1
AND od.OrganismDrugGroupID <> -1
AND (om.LabType <> 'screen' OR om.LabType IS NULL)) > 0
print 'records';
-
IF (EXISTS(
SELECT *
FROM ObservationOrganism omo
JOIN Observation om ON om.ObservationID = omo.ObservationMicID
JOIN Organism o ON o.OrganismID = omo.OrganismID
JOIN ObservationMicDrug omd ON omd.ObservationOrganismID = omo.ObservationOrganismID
JOIN SIRN srn ON srn.SIRNID = omd.SIRNID
JOIN OrganismDrug od ON od.OrganismDrugID = omd.OrganismDrugID
WHERE
om.StatusCode IN ('F', 'C')
AND o.OrganismGroupID <> -1
AND od.OrganismDrugGroupID <> -1
AND (om.LabType <> 'screen' OR om.LabType IS NULL))
print 'records'
This all take 70 milliseconds:
Declare #recordCount INT;
SELECT #recordCount = COUNT(*)
FROM ObservationOrganism omo
JOIN Observation om ON om.ObservationID = omo.ObservationMicID
JOIN Organism o ON o.OrganismID = omo.OrganismID
JOIN ObservationMicDrug omd ON omd.ObservationOrganismID = omo.ObservationOrganismID
JOIN SIRN srn ON srn.SIRNID = omd.SIRNID
JOIN OrganismDrug od ON od.OrganismDrugID = omd.OrganismDrugID
WHERE
om.StatusCode IN ('F', 'C')
AND o.OrganismGroupID <> -1
AND od.OrganismDrugGroupID <> -1
AND (om.LabType <> 'screen' OR om.LabType IS NULL);
IF(#recordCount > 0)
print 'records';
It doesn't make sense to me why moving the exact same Count(*) query into an if statement causes such degradation or why 'Exists' is slower than Count. I even tried the exists() in a select CASE WHEN Exists() and it is still 4+ minutes.
Given that my previous answer was mentioned, I'll try to explain again because these things are pretty tricky. So yes, I think you're seeing the same problem as the other question. Namely a row goal issue.
So to try and explain what's causing this I'll start with the three types of joins that are at the disposal of the engine (and pretty much categorically): Loop Joins, Merge Joins, Hash Joins. Loop joins are what they sound like, a nested loop over both sets of data. Merge Joins take two sorted lists and move through them in lock-step. And Hash joins throw everything in the smaller set into a filing cabinet and then look for items in the larger set once the filing cabinet has been filled.
So performance wise, loop joins require pretty much no set up and if you're only looking for a small amount of data they're really optimal. Merge are the best of the best as far as join performance for any data size, but require data to be already sorted (which is rare). Hash Joins require a fair amount of setup but allow large data sets to be joined quickly.
Now we get to your query and the difference between COUNT(*) and EXISTS/TOP 1. So the behavior you're seeing is that the optimizer thinks that rows of this query are really likely (you can confirm this by planning the query without grouping and seeing how many records it thinks it will get in the last step). In particular it probably thinks that for some table in that query, every record in that table will appear in the output.
"Eureka!" it says, "if every row in this table ends up in the output, to find if one exists I can do the really cheap start-up loop join throughout because even though it's slow for large data sets, I only need one row." But then it doesn't find that row. And doesn't find it again. And now it's iterating through a vast set of data using the least efficient means at its disposal for weeding through large sets of data.
By comparison, if you ask for the full count of data, it has to find every record by definition. It sees a vast set of data and picks the choices that are best for iterating through that entire set of data instead of just a tiny sliver of it.
If, on the other hand, it really was correct and the records were very well correlated it would have found your record with the smallest possible amount of server resources and maximized its overall throughput.

SQL query tune up while fetching data for counting the records by joining 2 tables

I am having problem in fetching a number of records from while joining tables. Please see the below query:
SELECT
H.EIN,
H.OUC,
(
SELECT
COUNT(1)
FROM
tbl_Checks C
INNER JOIN INFM_People_OR.dbo.tblHierarchy P
ON P.EIN = C.EIN_Checked
WHERE
(H.EIN IN (P.L1, P.L2)
OR H.EIN = C.EIN_Checked)
AND C.[Read] = 1
) AS [Read]
FROM
INFM_People_OR.dbo.tblHierarchy H
LEFT JOIN tbl_Checks C
ON H.EIN = C.EIN_Checked
WHERE
H.L1 = #EIN
GROUP BY
H.EIN,
H.OUC,
C.Check_Date
Even if there are just 100 records this query takes a much more time(around 1 min).
Please suggest a solution to tune up this query as it is throwing error in front end
Given just the query there are a few things that stick out as being non-optimal:
Any use of OR will be slower:
WHERE
(H.EIN IN (P.L1, P.L2)
OR H.EIN = C.EIN_Checked)
AND C.[Read] = 1
If there's any way to rework this based off of your data set so that both the IN and the OR are replaced with ANDs that would help.
Also, use of a local variable in the WHERE clause will not work well with the optimizer:
WHERE
H.L1 = #EIN
Finally, make sure you have indexes (and hopefully these are integer fields) where you are doing your joins and group bys (H.EIN, H.OUC, C.Check_Date
The size of the result set (100 records) doesn't matter as much as the size of the joined tables and whether or not they have appropriate indexes.
The Estimated number of rows affected is 1196880 is very high resulting in high execution time of query. I have also tried to join the tables only once but that it giving different output.
Please suggest any other solution than creating indices as I have already created non-clustered index for the table tbl_checks but it doesn't make any difference.
Below is the SQl execution plan.

Slow running BIDS report query...OR clauses in WHERE branch using IN sub queries

I have a report with 3 sub reports and several queries to optimize in each. The first has several OR clauses in the WHERE branch and the OR's are filtering through IN options which are pulling sub-queries.
I say this mostly from reading this SO post. Specifically LBushkin's second point.
I'm not the greatest at TSQL but I know enough to think this is very inefficient. I think I need to do two things.
I know I need to add indexes to the tables involved.
I think the query can be greatly enhanced.
So it seems that my first step would be to improve the query. From there I can look at what columns and tables are involved and thus determine the indexes.
At this point I haven't posted table schemas as I'm looking more for options / considerations such as using a cte to replace all the IN sub-queries.
If needed I will definitely post whatever would be helpful such as physical reads etc.
SELECT DISTINCT
auth.icm_authorizationid,
auth.icm_documentnumber
FROM
Filteredicm_servicecost AS servicecost
INNER JOIN Filteredicm_authorization AS auth ON
auth.icm_authorizationid = servicecost.icm_authorizationid
INNER JOIN Filteredicm_service AS service ON
service.icm_serviceid = servicecost.icm_serviceid
INNER JOIN Filteredicm_case AS cases ON
service.icm_caseid = cases.icm_caseid
WHERE
(cases.icm_caseid IN
(SELECT icm_caseid FROM Filteredicm_case AS CRMAF_Filteredicm_case))
OR (service.icm_serviceid IN
(SELECT icm_serviceid FROM Filteredicm_service AS CRMAF_Filteredicm_service))
OR (servicecost.icm_servicecostid IN
(SELECT icm_servicecostid FROM Filteredicm_servicecost AS CRMAF_Filteredicm_servicecost))
OR (auth.icm_authorizationid IN
(SELECT icm_authorizationid FROM Filteredicm_authorization AS CRMAF_Filteredicm_authorization))
EXISTS is usually much faster than IN as the query engine is able to optimize it better.
Try this:
WHERE EXISTS (SELECT 1 FROM FROM Filteredicm_case WHERE icm_caseid = cases.icm_caseid)
OR EXISTS (SELECT 1 FROM Filteredicm_service WHERE icm_serviceid = service.icm_serviceid)
OR EXISTS (SELECT 1 FROM Filteredicm_servicecost WHERE icm_servicecostid = servicecost.icm_servicecostid)
OR EXISTS (SELECT 1 FROM Filteredicm_authorization WHERE icm_authorizationid = auth.icm_authorizationid)
Furthermore, an index on Filteredicm_case.icm_caseid, an index on Filteredicm_service.icm_serviceid, an index on Filteredicm_servicecost.icm_servicecostid, and an index on Filteredicm_authorization.icm_authorizationid will increase performance of this query. They look like they should be keys already, however, so I suspect that indices already exist.
However, unless I'm misreading, there's no way this WHERE clause will ever evaluate to anything other than true.
The clause you wrote says WHERE cases.icm_caseid IN (SELECT icm_caseid FROM Filteredicm_case AS CRMAF_Filteredicm_case). However, cases is an alias to Filteredicm_case. That's the equivalent of WHERE Filteredicm_case.icm_caseid IN (SELECT icm_caseid FROM Filteredicm_case AS CRMAF_Filteredicm_case). That will be true as long as Filteredicm_case.icm_caseid isn't NULL.
The same error in logic exists for the remaining portions in the WHERE clause:
(service.icm_serviceid IN (SELECT icm_serviceid FROM Filteredicm_service AS CRMAF_Filteredicm_service))
service is an alias for Filteredicm_service. This is always true as long as icm_serviceid is not null
(servicecost.icm_servicecostid IN (SELECT icm_servicecostid FROM Filteredicm_servicecost AS CRMAF_Filteredicm_servicecost))
servicecost is an alias for Filteredicm_servicecost. This is always true as long as icm_servicecostid is not null.
(auth.icm_authorizationid IN (SELECT icm_authorizationid FROM Filteredicm_authorization AS CRMAF_Filteredicm_authorization))
auth is an alias for Filteredicm_authorization. This is always true as long as icm_authorizationid is not null.
I don't understand what you're trying to accomplish.

Short Circuit EXISTS statment in SQL Server query for Table Valued Parameters

I use stored procedures:
In my WHERE clause, I use short circuits (OR's) to speed up execution as the Query Optimiser knows that most of my inputs are defaulted to Null. This allows my query to be flexible and fast.
I have added a Table Valued Parameter to the WHERE clause. The execution time for a report has risen from 150ms to 450ms, reads from 70,000 to 200,000.
...
WHERE
--Integer value parameters
AND ((#hID is Null) OR (h.ID = #hID))
AND ((#dID is Null) OR (d.ID = #dID))
AND ((#mID is NULL) OR (m.ID = #mID))
--New table value parameter
--Execute, Processing time and read's increased.
--No additional JOIN added.
AND (NOT EXISTS (SELECT Null FROM #rIDs) OR r.ID IN (SELECT r FROM #rIDs))
How can I short circuit the NOT EXISTS or speed up this query please? I have tried adding a BIT value and checking if rows are in the Table Valued Parameter before executing the query. The only way I have found is having two queries and executing one over the other. Not great if I have to modify a whole bunch of queries or add multiple Table Valued Parameters to the mix.
Thanks in advance.
EDIT:
A comparison of table value parameter:
AND (NOT EXISTS (SELECT Null FROM #rIDs) OR r.ID IN (SELECT r FROM #rIDs))
and integer parameter:
AND ((#rID) OR (r.ID = #rID))
showed similar execution speed after compilation with TVP at 0 rows and Integer parameter null. I assume the Query Optimiser is short circuiting in the correct manor and my previous comparison was incorrect. Execution plan splits the above cost at 55% vs 45%, which is acceptable. Although the split doesn't change when there are more rows in the TVP, the time to generate the report increases because more pages have to be read from disk. Interesting.
if exists (select * from #rIDs)
begin
.... -- query with TVP
end
else
begin
.... -- query without TVP
end
This allows a separate execution plan for each query.
It looks like you are using a table variable. If you use a temporary table and index the column you are using for your criteria (r in your example), you will avoid a table scan. This however makes it a multiple step process, but they payoff can be huge.
To be more specific to your question, you can change the last line of your example to be
AND EXISTS (SELECT r FROM #rIDs WHERE r = r.ID AND NOT r IS NULL)
If you could post the execution plan, I could give you a much better answer. Click the Display Estimated Execution Plan, right click the execution plan and select Save Execution Plan As...
You could try a LEFT JOIN between your table to be queried (on the left) and your TVP.

Resources