I have following queries:
DECLARE #application_number CHAR(8)= '37832904';
SELECT
la.LEASE_NUMBER AS lease_number,
la.[LEASE_APPLICATION] AS application_number,
tnu.[FOLLOWUP_CODE] AS note_type_code -- catch codes not in codes table
FROM [dbo].[lease_applications] la
LEFT JOIN [dbo].tickler_notes_uniq tnu ON tnu.[ACCOUNT_NUMBER] = la.[ACCOUNT_NUMBER]
WHERE la.LEASE_APPLICATION = #application_number
OR #application_number IS NULL;
SELECT
la.LEASE_NUMBER AS lease_number,
la.[LEASE_APPLICATION] AS application_number,
tnu.[FOLLOWUP_CODE] AS note_type_code -- catch codes not in codes table
FROM [dbo].[lease_applications] la
LEFT JOIN [dbo].tickler_notes_uniq tnu ON tnu.[ACCOUNT_NUMBER] = la.[ACCOUNT_NUMBER]
WHERE la.LEASE_APPLICATION = #application_number;
The only difference between these 2 queries is that I've added checking for the variable if it is NULL or not.
The execution plans of these queries are:
You can find graphical plan here
So the question is. Why the plans are so different?
UPDATE:
The actual execution plan of the first query can be found here
OPTION(RECOMPILE) changed the actual execution plan to the good one. However the downside of that is that my main goal was to create the TVF with these params and then everybody who uses that function is supposed to provide that option.
It is also worth to mention that my main goal is to create TVF with 2 params. Each of it might be null and might be not but at least 1 of them is supposed to be NOT NULL. These params are more or less equal, they are just different keys in the 2 tables that would give the same result anyway (the same number of rows and so on). That's why I wanted to do something like
WHERE (col1 = #param1 OR #param1 IS NULL) AND (col2 = #param2 OR #param2 IS NULL) AND (#param1 IS NOT NULL or #param2 IS NOT NULL)
So, basically I am not interested in ALL records at all
You have two different plans for two different queries.
It makes sense that when you have an equality condition on the WHERE clause(la.LEASE_APPLICATION = #application_number)(and having indexes in place) you get an index seek: working as expected!
On the other hand, when you write both conditions into one WHERE clause (la.LEASE_APPLICATION = #application_number OR #application_number IS NULL) the query optimizer has chosen to do a scan.
Even though the parameter value has been supplied and it is not null, the plan that is being used is the cached one and it can not know at compile time the actual value of your parameter.
This is the case if you have a stored procedure and you are calling it with parameters. This is not the case when executing a simple query using a variable.
As #sepupic has stated, variable values do not get sniffed.
The plan is generated to handle both cases: when you have a value for your parameter as well as when you have none.
One option to fix your problem would be using OPTION(RECOMPILE) as it has been stated already in the comments.
Another option would be to have your queries separated(for ex. having two different stored procedures, called by a third "wrapper" procedure), so that they get optimized accordingly, each one on it's own.
I would suggest you to take a look at this article by Kimberly L. Tripp: Building High Performance Stored Procedures and this other one by Aaron Bertrand: An Updated "Kitchen Sink" Example. I think these are the best articles explaining these kind of scenarios.
Both articles explain this situation, possible problems with it and possible solutions as well such as option(recompile), dynamic sql or having separated stored procedures.
Good luck!
Your queries do not use parameters, they use a variable. The variable is not sniffed at the moment the batch is compiled (compilation = making a plan) because the batch is seen as one whole thing. So server has no idea if the variable is null or is not null. And it must make a plan that will be suitable in both cases.
The first query can filter no rows at all so the scan is selected.
The second query does filter, but the value is unknown, so if you use SQL server 2014 and the fintered column is not unique, the estimation is C^3/4 (C= table cardinality)
The situation can be different if you use RECOMPILE query option. When you add it to your query, it's recompiled AFTER the assignment of table variable is done. In this case the variable value is known, and you'll get another plan. It will be a plan based on column statistics for a known value of your filter
Related
I am using SSRS report whereby I need to pass multiple parameters to some SQL code.
Based on this blog post, the best way to handle multiple parameters is to used a split function, so that is the road I am following.
However, I am having some bad performance after following this.
For example, the following WHERE clause will return the data in 4 seconds:
AND DimBusinessDivision.Id IN (
22
)
This will also correctly return in 4 seconds:
DECLARE #BusinessDivisionId INT = 22
AND DimBusinessDivision.Id IN (
#BusinessDivisionId
)
However, using the split function such as below, It takes 2 minutes (which is the same time it takes without a WHERE clause:
AND DimBusinessDivision.Id IN (
SELECT Item FROM dbo.FuncSplit(#BusinessDivisionId, ',')
)
I've also tried creating a temp table and a table variable before the SQL statement with the results of the table but there's no difference. I have a feeling this has to do with the fact that the values are not literal values and that SQL server doesn't know what query plan to follow, or something similar. Does anyone know of any ways to increase the performance of this?
It simply doesn't like using a table to get the values in even if the table has the same amounts of rows.
UPDATE: I have used the table function as an inner join which has fixed the issue. Any idea's why this made all the difference?
INNER JOIN
dbo.FuncSplit(#BusinessDivisionIds, ',') AS FilteredBusinessDivisions ON
FilteredBusinessDivisions.Item = DimBusinessDivision.Id
A few things to play with:
Try the non-performant query and add OPTION (RECOMPILE); at the end of the query. If it magically runs much faster, then yes the issue was a bad cached query plan. For more information on this specific problem, you can Google "parameter sniffing" for a more thourough explanation.
You may also want to look at the function definition and toss a RECOMPILE in there too, and see what difference that makes.
Look at the estimated query plan and try to determine the difference.
But the root of the problem, I think, is that you are reinventing the wheel with this "split" function. You can have multi-valued parameters in SSRS and use "WHERE col IN #param": https://technet.microsoft.com/en-us/library/aa337396(v=sql.105).aspx
Unless there's a very specific reason you must split a comma separated list and cannot use normal parameters, just use a regular parameter that accepts multiple values.
Edit: I looked at the article you linked to. It's quite easy to have a SELECT ALL option in any reporting tool (not just SSRS), though it's not obvious. Using the "magic value" as written in the article you linked to works just fine. Can I ask what limitation is prompting you to need to do this string splitting?
Recently, one of my colleague working in SQL development got into a problem like this: a procedure ran fine on all environments, but production, which has the most resources. Typical case of parameter sniffing, but the profiler indicated that only one query in the whole procedure took very much to execute:
UPDATE a
SET status_id = 6
FROM usr.tpt_udef_article_grouping_buffer a
LEFT JOIN (SELECT DISTINCT buying_domain_id, suppl_no FROM usr.buyingdomain_supplier_article) b ON a.buying_domain_id = b.buying_domain_id
AND a.suppl_no = b.suppl_no
WHERE a.tpt_file_id = #tpt_file_id
AND a.status_id IS NULL
AND b.suppl_no IS NULL
As I am biased towards development (I have little administration experience), I suggested that this query should be rewritten:
replace LEFT JOIN (SELECT DISTINCT ...) with NOT EXISTS (SELECT 1 ...)
put the appropriate index on table usr.tpt_udef_article_grouping_buffer(SSMS suggested an effort reduced by 95% when query was run outside the procedure)
Also, multiple queries from the procedure shared the same pattern.
I know that parameter sniffing is more related to the plan constructing when running the procedure for the first time after its (re)creation and I think it is also favored by high cyclomatic complexity.
My question is:
Does the way queries in the procedure are written (bad execution plans from the beginning) favor parameter sniffing appearance or just worsen their effects?
Your only parameter here is a.tpt_file_id = #tpt_file_id and if this is parameter sniffing, then the cases must be such that for certain tpt_file_id there are thousands (or more) records, and for certain there is few (or none).
The other reason you get different plans in production than test environment is that the machines are different. You usually have a lot more memory and more CPUs / cores in production environment, causing optimizer to choose different plan and of course if your row counts in the tables are not the same, it of course can lead to into a totally different plan.
You can check this with using option (recompile) to see if the plan changes or look at plan cache that what was the value of the parameter used to create the plan. It can be seen in the properties of the leftmost object in the plan.
Changing the select distinct into exists clause is probably a good idea, and of course indexing the tables properly.
Here is simpler version of one of the SELECT statement from my procedure:
select
...
from
...
where
((#SearchTextList IS NULL) OR
(SomeColumn IN (SELECT SomeRelatedColumn From #SearchTextListTable)))
#SearchTextList is just a varchar variable that holds a comma-separated list of strings. #SearchTextListTable is single column temp table that holds search text values.
This query takes 30 seconds to complete, which is performance issue in my application.
If I get rid of the first condition (i.e. if I remove OR condition), it takes just ONE second.
select
...
from
...
where
SomeColumn IN (SELECT SomeRelatedColumn From #SearchTextListTable)
Can somebody please explain why this much difference?
What's going on internally in SQL Server engine?
Thanks.
Since you said that the SQL is fast when you don't have the OR specified, I assume the table has index for SomeColumn and the amount of rows in #SearchTextListTable is small. When that is the case, SQL Server can decide to use the index for searching the rows.
If you specify the or clause, and the query is like this:
((#SearchTextList IS NULL) OR
(SomeColumn IN (SELECT SomeRelatedColumn From #SearchTextListTable)))
SQL Server can't create a plan where the index is used because the plans are cached and must be usable also when #SearchTextList is NULL.
There's usually 2 ways to improve this, either use dynamic SQL or recompile the plan for each execution.
To get the plan recompiled, just add option (recompile) to the end of the query. Unless this query is executed really often, that should be an ok solution. The downside is that it causes slightly higher CPU usage because the plans can't be re-used.
The other option is to create dynamic SQL and execute it with sp_executesql. Since in that point you know if #SearchTextList will be NULL, you can just omit the SomeColumn IN ... when it's not needed. Be aware of SQL injection in this case and don't just concatenate the variable values into the SQL string, but use variables in the SQL and give those as parameter for sp_executesql.
If you only have this one column in the SQL, you could also make 2 separate procedures for both options and execute them from the original procedure depending on which is the case.
I have a view that runs fast (< 1s) when specifying a value in the where clause:
SELECT *
FROM vwPayments
WHERE AccountId = 8155
...but runs slow (~3s) when that value is a variable:
DECLARE #AccountId BIGINT = 8155
SELECT *
FROM vwPayments
WHERE AccountId = #AccountId
Why is the execution plan different for the second query? Why is it running so much slower?
In the first case the parameter value was known while compiling the statement. The optimizer used the statistics histogram to generate the best plan for that particular parameter value.
When you defined the local variable, SQL server was not able to use the parameter value to find 'the optimal value'. Since the parameter value is unknown at compile time, the optimizer calculates an estimated number of rows based on 'uniform distribution'. The optimizer came up with a plan that would be 'good enough' for any possible input parameter value.
Another interesting article that almost exactly describes your case can be found here.
In short the statistical analysis the query optimizer uses to pick the best plan picks a seek when the value is a known value and it can leverage statistics and a scan when the value is not known. It picks a scan in the second choice because the plan is compiled before the value of the where clause is known.
While I rarely recommend bossing the query analyzer around in this specific case you can use a forceseek hint or other query hints to override the engine. Be aware however, that finding a way to get an optimal plan with the engine's help is a MUCH better solution.
I did a quick Google and found a decent article that goes into the concept of local variables affecting query plans more deeply.
DECLARE #Local_AccountId BIGINT = #AccountId
SELECT *
FROM vwPayments
WHERE AccountId = #Local_AccountId
OPTION(RECOMPILE)
It works for me
It could be parameter sniffing. Try and do the following - I assume it is in a stored procedure?
DECLARE #Local_AccountId BIGINT = #AccountId
SELECT *
FROM vwPayments
WHERE AccountId = #Local_AccountId
For details about parameter sniffing, you can view this link : http://blogs.technet.com/b/mdegre/archive/2012/03/19/what-is-parameter-sniffing.aspx
See if the results are different. I have encountered this problem several times, especially if the query is being called a lot during peaks and the execution plan cached is one which was created when off-peak.
Another option, but you should not need in your case is adding "WITH RECOMPILE" to a procedure definition. This would cause the procedure to be recompiled every time it is called. View http://www.techrepublic.com/article/understanding-sql-servers-with-recompile-option/5662581
I think #souplex made a very good point
Basically at the first case it's just a number and easy for system to understand, while the 2nd one is variable which means every time the system need to find the very value of it and do the check for each statement, which is a different method
I'm having trouble understanding the behavior of the estimated query plans for my statement in SQL Server when a change from a parameterized query to a non-parameterized query.
I have the following query:
DECLARE #p0 UniqueIdentifier = '1fc66e37-6eaf-4032-b374-e7b60fbd25ea'
SELECT [t5].[value2] AS [Date], [t5].[value] AS [New]
FROM (
SELECT COUNT(*) AS [value], [t4].[value] AS [value2]
FROM (
SELECT CONVERT(DATE, [t3].[ServerTime]) AS [value]
FROM (
SELECT [t0].[CookieID]
FROM [dbo].[Usage] AS [t0]
WHERE ([t0].[CookieID] IS NOT NULL) AND ([t0].[ProductID] = #p0)
GROUP BY [t0].[CookieID]
) AS [t1]
OUTER APPLY (
SELECT TOP (1) [t2].[ServerTime]
FROM [dbo].[Usage] AS [t2]
WHERE ((([t1].[CookieID] IS NULL) AND ([t2].[CookieID] IS NULL))
OR (([t1].[CookieID] IS NOT NULL) AND ([t2].[CookieID] IS NOT NULL)
AND ([t1].[CookieID] = [t2].[CookieID])))
AND ([t2].[CookieID] IS NOT NULL)
AND ([t2].[ProductID] = #p0)
ORDER BY [t2].[ServerTime]
) AS [t3]
) AS [t4]
GROUP BY [t4].[value]
) AS [t5]
ORDER BY [t5].[value2]
This query is generated by a Linq2SQL expression and extracted from LINQPad. This produces a nice query plan (as far as I can tell) and executes in about 10 seconds on the database. However, if I replace the two uses of parameters with the exact value, that is replace the two '= #p0' parts with '= '1fc66e37-6eaf-4032-b374-e7b60fbd25ea' ' I get a different estimated query plan and the query now runs much longer (more than 60 seconds, haven't seen it through).
Why is it that performing the seemingly innocent replacement produces a much less efficient query plan and execution? I have cleared the procedure cache with 'DBCC FreeProcCache' to ensure that I was not caching a bad plan, but the behavior remains.
My real problem is that I can live with the 10 seconds execution time (at least for a good while) but I can't live with the 60+ sec execution time. My query will (as hinted above) by produced by Linq2SQL so it is executed on the database as
exec sp_executesql N'
...
WHERE ([t0].[CookieID] IS NOT NULL) AND ([t0].[ProductID] = #p0)
...
AND ([t2].[ProductID] = #p0)
...
',N'#p0 uniqueidentifier',#p0='1FC66E37-6EAF-4032-B374-E7B60FBD25EA'
which produces the same poor execution time (which I think is doubly strange since this seems to be using parameterized queries.
I'm not looking for advise on which indexes to create or the like, I'm just trying to understand why the query plan and execution are so dissimilar on three seemingly similar queries.
EDIT: I have uploaded execution plans for the non-parameterized and the parameterized query as well as an execution plan for a parameterized query (as suggested by Heinz) with a different GUID here
Hope it helps you help me :)
If you provide an explicit value, SQL Server can use statistics of this field to make a "better" query plan decision. Unfortunately (as I've experienced myself recently), if the information contained in the statistics is misleading, sometimes SQL Server just makes the wrong choices.
If you want to dig deeper into this issue, I recommend you to check what happens if you use other GUIDs: If it uses a different query plan for different concrete GUIDs, that's an indication that statistics data is used. In that case, you might want to look at sp_updatestats and related commands.
EDIT: Have a look at DBCC SHOW_STATISTICS: The "slow" and the "fast" GUID are probably in different buckets in the histogram. I've had a similar problem, which I solved by adding an INDEX table hint to the SQL, which "guides" SQL Server towards finding the "right" query plan. Basically, I've looked at what indices are used during a "fast" query and hard-coded those into the SQL. This is far from an optimal or elegant solution, but I haven't found a better one yet...
I'm not looking for advise on which indexes to create or the like, I'm just trying to understand why the query plan and execution are so dissimilar on three seemingly similar queries.
You seem to have two indexes:
IX_NonCluster_Config (ProductID, ServerTime)
IX_NonCluster_ProductID_CookieID_With_ServerTime (ProductID, CookieID) INCLUDE (ServerTime)
The first index does not cover CookieID but is ordered on ServerTime and hence is more efficient for the less selective ProductID's (i. e. those that you have many)
The second index does cover all columns but is not ordered, and hence is more efficient for more selective ProductID's (those that you have few).
In average, you ProductID cardinality is so that SQL Server expects the second method to be efficient, which is what it uses when you use parametrized queries or explicitly provide selective GUID's.
However, your original GUID is considered less selective, that's why the first method is used.
Unfortunately, the first method requires additional filtering on CookieID which is why it's less efficient in fact.
My guess is that when you take the non paramaterized route, your guid has to be converted from a varchar to a UniqueIdentifier which may cause an index not to be used, while it will be used taking the paramatarized route.
I've seen this happen with using queries that have a smalldatetime in the where clause against a column that uses a datetime.
Its difficult to tell without looking at the execution plans, however if I was going to guess at a reason I'd say that its a combinaton of parameter sniffing and poor statistics - In the case where you hard-code the GUID into the query, the query optimiser attempts to optimise the query for that value of the parameter. I believe that the same thing happens with the parameterised / prepared query (this is called parameter sniffing - the execution plan is optimised for the parameters used the first time that the prepared statement is executed), however this definitely doesn't happen when you declare the parameter and use it in the query.
Like I said, SQL server attempt to optimise the execution plan for that value, and so usually you should see better results. It seems here that that information it is basing its decisions on is incorrect / misleading, and you are better off (for some reason) when it optimises the query for a generic parameter value.
This is mostly guesswork however - its impossible to tell really without the execution - if you can upload the executuion plan somewhere then I'm sure someone will be able to help you with the real reason.