Function hanging on #Variable input, but not with hard coded integer - sql-server

Very odd problem. This function has randomly started hanging and timing out when I call it like this:
DECLARE #persId int
SET #persId = 336
SELECT * FROM [CIDER].[dbo].[SMAN_ACL_getPermissions] (
null
,#persId
,1
,null)
GO
But returns super quick when I call it like this:
SELECT * FROM [CIDER].[dbo].[SMAN_ACL_getPermissions] (
null
,336
,1
,null)
GO
Could someone please highlight the difference between these two me? It's making debugging very hard...

The variable could be a null value, whereas the static value definitely is not. This can lead to different execution plans.

You could be falling prey to parameter sniffing. Take a look at the execution plan for the one that isn't performing well. In the plan XML, you'll see two values in the ParameterList tag: ParameterCompiledValue and ParameterRuntimeValue that are self-explanatory. If the data distribution is wildly different for the two, you could be getting a sub-optimal plan for your runtime value. You could try adding a "with (recompile)" to the statement that is running slow within your function and see if it helps

Related

DECIMAL Index on CHAR column

I am dealing with some legacy code that looks like this:
Declare #PolicyId int
;
Select top 1 #PolicyId = policyid from policytab
;
Select col002
From SOMETAB
Where (cast(Col001 as int) = #PolicyId)
;
The code is actually in a loop, but the problem is the same. col001 is a CHAR(10)
Is there a way to specify an index on SOMETAB.Col001 that would speed up this code?
Are there other ways to speed up this code without modifying it?
The context of that question is that I am guessing that a simple index on col001 will not speed up the code because the select statement doing a cast on the column.
I am looking for a solution that does not involve changing this code because this technique was used on several tables and in several scripts for each table.
Once I determine that it is hopeless to speed up this code without changing it I have several options. I am bringing that so this post can stay on the topic of speeding up the code without changing the code.
Shortcut to hopeless if you can not change the code, (cast(Col001 as int) = #PolicyId) is not SARGable.
Sargable
SARGable functions in SQL Server - Rob Farley
SARGable expressions and performance - Daniel Hutmachier
After shortcut, avoid loops when possible and keep your search arguments SARGable. Indexed persisted computed columns are an option if you must maintain the char column and must compare to an integer.
If you cannot change the table structure, cast your parameter to the data type you are searching on in your select statement.
Cast(#PolicyId as char(10)) . This is a code change, and a good place to start looking if you decide to change code based on sqlZim's answer.
Zim's advice is excellent, and searching on int will always be faster than char. But, you may find this method an acceptable alternative to any schema changes.
Is policy stored as an int in PolicyTab and char in SomeTab for certain?

Why is VALUES(CONVERT(XML,'...')) much slower than VALUES(#xml)?

I would like to create a subquery that produces a list of numbers as a single-column result, something like MindLoggedOut did here but without the #x xml variable, so that it can be appended to a WHERE expression as a pure string (subquery) without sql parameters. The problem is that the replacement of the parameter (or variable) makes the query run 5000 times slower, and I don't understand why. What causes this big difference?
Example:
/* Create a minimalistic xml like <b><a>78</a><a>91</a>...</b> */
DECLARE #p_str VARCHAR(MAX) =
'78 91 01 12 34 56 78 91 01 12 34 56 78 91 01 12 34 56';
DECLARE #p_xml XML = CONVERT(XML,
'<b><a>'+REPLACE(#p_str,' ','</a><a>')+'</a></b>'
);
SELECT a.value('(child::text())[1]','INT')
FROM (VALUES (#p_xml)) AS t(x)
CROSS APPLY x.nodes('//a') AS x(a);
This returns one number per row and is quite fast (20x faster than the string-splitter approaches I was using so far, similar to these.
I measured the 20x speed-up in terms of sql server CPU time, with #p_str containing 3000 numbers.)
Now if I inline the definition of #p_xml into the query:
SELECT a.value('(child::text())[1]','INT')
FROM (VALUES (CONVERT(XML,
'<b><a>'+REPLACE(#p_str,' ','</a><a>')+'</a></b>'
))) AS t(x)
CROSS APPLY x.nodes('//a') AS x(a);
then it becames 5000x slower (when #p_str contains thousands of numbers.) Looking at the query plan I cannot find the reason for it.
Plan of the first query (…VALUES(#p_xml)…), and the second (…VALUES(CONVERT(XML,'...'))…)
Could somebody shed some light on it?
UPDATE
Clearly the plan of the first query doesn't include the cost
of the #p_xml = CONVERT(XML, ...REPLACE(...)... ) assignment, but this
cost is not the culprit that could explain the 46ms vs. 234sec
difference between the execution time of the whole script (when
#p_str is large). This difference is systematic (not random)
and was in fact observed in SqlAzure (S1 tier).
Furthermore, when I rewrote the query: replacing CONVERT(XML,...) by a user-defined scalar function:
SELECT a.value('(child::text())[1]','INT')
FROM (VALUES (dbo.MyConvertToXmlFunc(
'<b><a>'+REPLACE(#p_str,' ','</a><a>')+'</a></b>'
))) AS t(x)
CROSS APPLY x.nodes('//a') AS x(a);
where dbo.MyConvertToXmlFunc() is:
CREATE FUNCTION dbo.MyConvertToXmlFunc(#p_str NVARCHAR(MAX))
RETURNS XML BEGIN
RETURN CONVERT(XML, #p_str);
END;
the difference disappeared (plan). So at least I have a workaround... but would like to understand it.
This is basically the same issue as described in this answer by Paul White.
I tried with a string of length 10,745 characters containing 3,582 items.
The execution plan with the string literal ends up performing the string replace and casting this entire string to XML twice for each item (so 7,164 times in total).
The problematic sqltses.dll!CEsExec::GeneralEval4 calls are highlighted in the traces below. The CPU time for the entire call stack was 22.38% (nearly maxing out a single core on a quad core). - 92% of that was taken with these two calls.
Within each call sqltses.dll!ConvertFromStringTypesAndXmlToXml and sqltses.dll!BhReplaceBhStrStr both take nearly equal time.
I have used the same colour coding for the plan below.
The bottom branch of the execution plan is executed once for each split item in the string.
The problematic table valued function in the bottom right is in its open method. The Parameter list for the function is
Scalar Operator([Expr1000]),
Scalar Operator((7)),
Scalar Operator(XML Reader with XPath filter.[id]),
Scalar Operator(getdescendantlimit(XML Reader with XPath filter.[id]))
For the Stream Aggregate the issue is in its getrow method.
[Expr1010] = Scalar Operator(MIN(
SELECT CASE
WHEN [Expr1000] IS NULL
THEN NULL
ELSE
CASE
WHEN datalength([XML Reader with XPath filter].[value]) >= ( 128 )
THEN CONVERT_IMPLICIT(int, [XML Reader with XPath filter].[lvalue], 0)
ELSE CONVERT_IMPLICIT(int, [XML Reader with XPath filter].[value], 0)
END
END
))
Both of these expressions refer to Expr1000 (though the stream aggregate only does so to check if it was NULL)
This is defined in the constant scan at the top right as below.
(Scalar Operator(CONVERT(xml,'<b><a>'+replace([#p_str],' '
,CONVERT_IMPLICIT(varchar(max),'</a><a>',0))+'</a></b>',0)))
It is clear from the trace that the issue is the same as in the previously linked answer and that this is getting repeatedly re-evaluated in the slow plan. When passing as a parameter the expensive calculation only happens once.
Edit: I just realised this is in fact almost exactly the same plan and issue as Paul White blogged about here - The only difference in my tests compared to those described there is that I found the string Replace and the XML conversion to be as bad as each other in the VARCHAR(MAX) case - and for the string replace to outweigh the conversion cost in the non max case.
Max
Non Max
(2000 character source string with 668 items. 6010 chars after replace)
In this test the replace was nearly double the CPU cost of the xml conversion. It seems to be implemented by using code from familiar TSQL functions CHARINDEX and STUFF with a large chunk of time taken up converting the string to unicode. I think this discepancy between my results and those reported by Paul is down to collation (switching to SQL_Latin1_General_CP1_CS_AS from Latin1_General_CS_AS reduces the cost of the string replace significantly)

T-SQL view is "optimized" but I don't understand why

I'm struggling with some sort of auto-optimization when creating a view in TSQL.
The optimization is done both when using the Designer and when using CREATE VIEW.
The output is equal, but I don't understand why it's done.
Can anyone please explain me why this is optimized / why the lower one is better:
[...]
WHERE
today.statusId = 7
AND yesterday.cardId IS NULL
AND NOT (
today.name LIKE 'TEST_%'
AND today.department LIKE 'T_%'
)
gets optimized into the following
[...]
WHERE (
today.statusId = 7
AND yesterday.cardId IS NULL
AND NOT (today.name LIKE 'TEST_%')
)
OR (
today.statusId = 7
AND yesterday.cardId IS NULL
AND NOT (today.department LIKE 'T_%')
)
Isn't the second where clause forcing the view to check statusId and cardId two times no matter of their value? While the first allows it to abort as soon as statusId is 6 (e.g.)
Also in the first one the parentheses part can abort as soon as one value is FALSE.
This behavior also does not change when the inner parentheses contain like 20 values. The optimizer will create 20 blocks checking over and over again the values of statusId and cardId...
Thanks in advance.
The visual designers do not try to optimize your code ever.
Don't use them unless you are prepared to put up with them mangling your formatting and rewriting your queries in this manner.
SQL Server does not guarantee short circuit evaluation in any case but certainly this type of rewrite can act as a pessimisization. Especially if one of the predicates involves a sub query the rewritten form could end up significantly more expensive.
Presumably the reason for the rewrite is just so it can present them easily in a grid and allow editing of points at individual grid intersections - or so you can select the whole column and delete.

sp_executesql is slow with parameters

I'm using dapper-dot-net as an ORM and it produces the following, slow-executing (1700ms), SQL code.
exec sp_executesql N'SELECT TOP 5 SensorValue FROM "Values"
WHERE DeviceId IN (#id1,#id2) AND SensorId = #sensor
AND SensorValue != -32768 AND SensorValue != -32767',N'#id1
bigint,#id2 bigint,#sensor int',#id1=139,#id2=726,#sensor=178
When I modify this code by removing the parameters the query executes blazingly fast (20ms). Should the lack of these parameters actually make this big difference and why?
exec sp_executesql N'SELECT TOP 5 SensorValue FROM "Values"
WHERE DeviceId IN (139,726) AND SensorId = 178
AND SensorValue != -32768 AND SensorValue != -32767'
Add OPTION (RECOMPILE) to the end
... AND SensorValue != -32767 OPTION (RECOMPILE)
I suspect you are experiencing "parameter sniffing"
If that's the case we can leave it with the OPTION or consider alternatives
Update 1
The following article will introduce you to "parameter sniffing" http://pratchev.blogspot.be/2007/08/parameter-sniffing.html
I advice that you get to know the ins and out because it will make you much better in understanding sql server internals (that can bite).
If you understand it you will know that the tradeoff with option recompile can be a performance decrease if the statement is executed very often.
I personally add option recompile after I know the root cause is parameter sniffing and leave it in unless there is a performance issue. Rewriting a statement to avoid bad parameter sniffing leads to loss of intent and this lowers maintainability. But there are cases when the rewrite is justified (use good comments when you do).
Update 2
The best read I had on the subject was in chapter 32 called
"Parameter sniffing: your best friend... except when it isn't by " by GRANT FRITCHEY
It's recommended.
SQL Server MVP Deep Dives, Volume 2
I Recently ran into the same issue. The first thing I did was adding a NonClustered Covering Index on the columns in my where statement.
This improved the execution time on SQL, but when dapper was performing the query it was still slow, in fact it was timing out.
Then i realized that the query generated by dapper was passing in the parameter as nvarchar(4000) where as my db table column was a varchar(80) this caused it to perform an index scan instead of a seek (I suggest you read up on indexes if that does not make sense to you.). upon realizing this I updated my dapper where statement to be like this:
WHERE Reference = convert(varchar(80),#Reference)
Executing the with the where statement above resulted in an index seek, and 100% performance improvement.
Just to Add: The Option(Recompile) did not work for me.
And after all this song and dance, there is a way to tell dapper to do this for you by default:
Dapper.SqlMapper.AddTypeMap(typeof(string),
System.Data.DbType.AnsiString);
This will by default map any string parameters to a varchar(4000) rather than a nvarchar(4000). If you do need Unicode string comparison, then you can explicitly do the conversion on the parameter.

T-SQL Where Clause Case Statement Optimization (optional parameters to StoredProc)

I've been battling this one for a while now. I have a stored proc that takes in 3 parameters that are used to filter. If a specific value is passed in, I want to filter on that. If -1 is passed in, give me all.
I've tried it the following two ways:
First way:
SELECT field1, field2...etc
FROM my_view
WHERE
parm1 = CASE WHEN #PARM1= -1 THEN parm1 ELSE #PARM1 END
AND parm2 = CASE WHEN #PARM2 = -1 THEN parm2 ELSE #PARM2 END
AND parm3 = CASE WHEN #PARM3 = -1 THEN parm3 ELSE #PARM3 END
Second Way:
SELECT field1, field2...etc
FROM my_view
WHERE
(#PARM1 = -1 OR parm1 = #PARM1)
AND (#PARM2 = -1 OR parm2 = #PARM2)
AND (#PARM3 = -1 OR parm3 = #PARM3)
I read somewhere that the second way will short circuit and never eval the second part if true. My DBA said it forces a table scan. I have not verified this, but it seems to run slower on some cases.
The main table that this view selects from has somewhere around 1.5 million records, and the view proceeds to join on about 15 other tables to gather a bunch of other information.
Both of these methods are slow...taking me from instant to anywhere from 2-40 seconds, which in my situation is completely unacceptable.
Is there a better way that doesn't involve breaking it down into each separate case of specific vs -1 ?
Any help is appreciated. Thanks.
I read somewhere that the second way will short circuit and never eval the second part if true. My DBA said it forces a table scan.
You read wrong; it will not short circuit. Your DBA is right; it will not play well with the query optimizer and likely force a table scan.
The first option is about as good as it gets. Your options to improve things are dynamic sql or a long stored procedure with every possible combination of filter columns so you get independent query plans. You might also try using the "WITH RECOMPILE" option, but I don't think it will help you.
if you are running SQL Server 2005 or above you can use IFs to make multiple version of the query with the proper WHERE so an index can be used. Each query plan will be placed in the query cache.
also, here is a very comprehensive article on this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
If you pass in a null value when you want everything, then you can write your where clause as
Where colName = IsNull(#Paramater, ColName)
This is basically same as your first method... it will work as long as the column itself is not nullable... Null values IN the column will mess it up slightly.
The only approach to speed it up is to add an index on the column being filtered on in the Where clause. Is there one already? If not, that will result in a dramatic improvement.
No other way I can think of then doing:
WHERE
(MyCase IS NULL OR MyCase = #MyCaseParameter)
AND ....
The second one is more simpler and readable to ther developers if you ask me.
SQL 2008 and later make some improvements to optimization for things like (MyCase IS NULL OR MyCase = #MyCaseParameter) AND ....
If you can upgrade, and if you add an OPTION (RECOMPILE) to get decent perf for all possible param combinations (this is a situation where there is no single plan that is good for all possible param combinations), you may find that this performs well.
http://blogs.msdn.com/b/bartd/archive/2009/05/03/sometimes-the-simplest-solution-isn-t-the-best-solution-the-all-in-one-search-query.aspx

Resources