Does SQL Server short-circuit IF statements? - sql-server

I am working on optimizing some heavily used stored procedures and ran across a scenario that raised a question that I couldn't find any answers for: when evaluating TSQL in a stored procedure, does SQL Server short-circuit the IF statement?
For example, assume a stored procedure has code similar to:
IF #condition1 = 1
OR EXISTS(SELECT 1 FROM table1 WHERE column1 = #value1)
...
In this scenario does SQL Server short-circuit the evaluation such that the EXISTS statement is never executed when the preceding clause evaluates to true?
If it never or only sometimes does, then we have some rewriting ahead of us.

Even if it appears to work, it should not be relied upon. The CASE statement is the only thing that the documentation states as being short-circuiting, but even that isn't (or at least wasn't) always the case (hee hee). Here is one bug that was fortunately fixed as of SQL Server 2012 (see the comments).
In addition to the rabbit hole (an interesting one, for sure) of links in comments from the comment posted by #Martin on the question, you should also check out this article:
Understanding T-SQL Expression Short-Circuiting
and the discussion forum related to that article.

The good news is that it seems to short-circuit. Here's a minimal example:
DECLARE #condition1 bit = 1
IF (#condition1 = 1) OR EXISTS(SELECT 1 FROM sys.objects)
PRINT 'True'
ELSE
PRINT 'False'
When #condition is set to 1, this is the execution plan: 0 rows scanned from sys.objects
when #condition is set to 0, it scanned the sys.objects table:
But there is no guarantee that this will be the case every time.

Related

select ... into variable from table where 1=0 leads to the replacement of the variable with null

We are migrating a lot of code from SQL Server to Postgresql. We met the following problem, a serious difference between SQL Server and Postgresql.
Of course, below, by the expression 1=0, I meant cases when the query conditions do not return a single record.
A query in SQL Server:
select #variable = t.field
from table t
where 1 = 0
saves the previous value of the variable.
A query in Postgresql:
select t.field
into variable
from table t
where 1 = 0
replaces the previous value of the variable with null.
We have already rewritten a lot of code without taking this feature into account.
Is there an easy way in postgresql, without rewriting the code, to save the value of a variable in such cases? For example, maybe there is some kind of server's or database's or session's settings? We did not find any relevant information in the documentation. We do not understand such a pattern of behavior in postgresql, which requires the introduction of additional variables and lines of code to check the result of the every query.
As far as I know there is no way to change postgresql's behavior here.
I don't have access to the SQL/PSM specifications, so I couldn't tell you which one matches the standard (if any / if SELECT INTO <variable> even is in it).
You don't need to use additional variables though, you can use INTO STRICT and catch the exception when no rows were returned:
DO $$
DECLARE
variable int = 1;
BEGIN
BEGIN
SELECT 1
INTO STRICT variable
WHERE FALSE;
EXCEPTION
WHEN NO_DATA_FOUND THEN
END;
RAISE NOTICE 'kept the previous value: %', variable;
END
$$
shows "kept the previous value: 1".
Though it is obviously more verbose than the SQL Server version.

Why does "= ALL (subquery)" evaluate to true if the subquery returnes no results?

I would expect "= ALL (subquery)" to evaluate to false if the subquery returns no results.
However in testing I find that not to be the case:
--put one record in #Orders
SELECT 1 AS 'OrderID'
INTO #Orders;
--put one record in #OrderLines
SELECT
1 AS 'OrderID'
,1 AS 'OrderLineID'
,3 AS 'Quantity'
INTO #OrderLines;
--as expected this returns the record in #Orders
SELECT *
FROM #Orders
WHERE 3 = ALL
(
SELECT Quantity
FROM #OrderLines
);
--now delete the record in #OrderLines
DELETE FROM #OrderLines;
--this still returns the record from #Orders even though the subquery returns no results
SELECT *
FROM #Orders
WHERE 3 = ALL
(
SELECT Quantity
FROM #OrderLines
);
Execution plan for the final select statement: https://www.brentozar.com/pastetheplan/?id=H1jQ2YgIK
Tested on:
Microsoft SQL Server 2017 (RTM-CU20) (KB4541283) - 14.0.3294.2 (X64)
Microsoft SQL Server 2017 (RTM-CU25) (KB5003830) - 14.0.3401.7 (X64)
When searching I find unofficial sources which say that "= ALL (subquery)" evaluates to true if the subquery returns no results:
"The ALL must be preceded by the comparison operators and evaluates to TRUE if the query returns no rows" https://dotnettutorials.net/lesson/all-operator-sql-server/
"The ALL must be preceded by the comparison operators and evaluates to TRUE if the query returns no rows" https://www.w3resource.com/sql/special-operators/sql_all.php
But I don't see anything in the official documentation (https://learn.microsoft.com/en-us/sql/t-sql/language-elements/all-transact-sql?view=sql-server-ver15) that supports that idea, in fact it would seem to dispute it: "ALL requires the scalar_expression to compare positively to every value that is returned by the subquery"
Questions
Is it expected behavior in SQL Server to evaluate ALL as true if the subquery returns no results?
If the answer to #1 is "yes":
Is it documented somewhere?
What is the explanation for that behavior? In the code example above 3 does not compare positively with no results so it seems highly unintuitive that the query should return results
Thanks for any assistance and insight.
Paraphrasing the documentation:
... scalar_expression = ALL (subquery) would evaluate as FALSE if some of the values of the subquery don't meet the criteria of the expression.
It's subtle, but the intention seems to be return false if some values do not satisfy the condition, true otherwise. In the edge case of there being no values, there are no values that don't satisfy the condition, so it returns true.
The "problem" causing the perhaps surprising result is the word "some", which implies existence. If no values exist, there can't be "some" values that are false, so it's true.
You could say it's based on double negative logic where the edge case happens to fall in the unexpected half of the result.
As a side note, I have written a huge amount of SQL in my career and never used this keyword, nor seen it used.
Recommendation: Do not use.

Is Sql Server's ISNULL() function lazy/short-circuited?

TIs ISNULL() a lazy function?
That is, if i code something like the following:
SELECT ISNULL(MYFIELD, getMyFunction()) FROM MYTABLE
will it always evaluate getMyFunction() or will it only evaluate it in the case where MYFIELD is actually null?
This works fine
declare #X int
set #X = 1
select isnull(#X, 1/0)
But introducing an aggregate will make it fail and proving that the second argument could be evaluated before the first, sometimes.
declare #X int
set #X = 1
select isnull(#X, min(1/0))
It's whichever it thinks will work best.
Now it's functionally lazy, which is the important thing. E.g. if col1 is a varchar which will always contain a number when col2 is null, then
isnull(col2, cast(col1 as int))
Will work.
However, it's not specified whether it will try the cast before or simultaneously with the null-check and eat the error if col2 isn't null, or if it will only try the cast at all if col2 is null.
At the very least, we would expect it to obtain col1 in any case because a single scan of a table obtaining 2 values is going to be faster than two scans obtaining one each.
The same SQL commands can be executed in very different ways, because the instructions we give are turned into lower-level operations based on knowledge of the indices and statistics about the tables.
For that reason, in terms of performance, the answer is "when it seems like it would be a good idea it is, otherwise it isn't".
In terms of observed behaviour, it is lazy.
Edit: Mikael Eriksson's answer shows that there are cases that may indeed error due to not being lazy. I'll stick by my answer here in terms of the performance impact, but his is vital in terms of correctness impact across at least some cases.
Judging from the different behavior of
SELECT ISNULL(1, 1/0)
SELECT ISNULL(NULL, 1/0)
the first SELECT returns 1, the second raises a Msg 8134, Level 16, State 1, Line 4
Divide by zero error encountered. error.
This "lazy" feature you are referring to is in fact called "short-circuiting"
And it does NOT always work especially if you have a udf in the ISNULL expression.
Check this article where tests were run to prove it:
Short-circuiting (mainly in VB.Net and SQL Server)
T-SQL is a declarative language hence it cannot control the algorithm used to get the results.. it just declares what results it needs. It is upto the query engine/optimizer to figure out the cost-effective plan. And in SQL Server, the optimizer uses "contradiction detection" which will never guarantee a left-to-right evaluation as you would assume in procedural languages.
For your example, did a quick test:
Created the scalar-valued UDF to invoke the Divide by zero error:
CREATE FUNCTION getMyFunction
( #MyValue INT )
RETURNS INT
AS
BEGIN
RETURN (1/0)
END
GO
Running the below query did not give me a Divide by zero error encountered error.
DECLARE #test INT
SET #test = 1
SET #test = ISNULL(#test, (dbo.getMyFunction(1)))
SELECT #test
Changing the SET to the below statement did give me the Divide by zero error encountered. error. (introduced a SELECT in ISNULL)
SET #test = ISNULL(#test, (SELECT dbo.getMyFunction(1)))
But with values instead of variables, it never gave me the error.
SELECT ISNULL(1, (dbo.getMyFunction(1)))
SELECT ISNULL(1, (SELECT dbo.getMyFunction(1)))
So unless you really figure out how the optimizer is evaluating these expressions for all permutations, it would be safe to not rely on the short-circuit capabilities of T-SQL.

Issue with parameters in SQL Server stored procedures

I remember reading a while back that randomly SQL Server can slow down and / or take a stupidly long time to execute a stored procedure when it is written like:
CREATE PROCEDURE spMyExampleProc
(
#myParameterINT
)
AS
BEGIN
SELECT something FROM myTable WHERE myColumn = #myParameter
END
The way to fix this error is to do this:
CREATE PROCEDURE spMyExampleProc
(
#myParameterINT
)
AS
BEGIN
DECLARE #newParameter INT
SET #newParameter = #myParameter
SELECT something FROM myTable WHERE myColumn = #newParameter
END
Now my question is firstly is it bad practice to follow the second example for all my stored procedures? This seems like a bug that could be easily prevented with little work, but would there be any drawbacks to doing this and if so why?
When I read about this the problem was that the same proc would take varying times to execute depending on the value in the parameter, if anyone can tell me what this problem is called / why it occurs I would be really grateful, I cant seem to find the link to the post anywhere and it seems like a problem that could occur for our company.
The problem is "parameter sniffing" (SO Search)
The pattern with #newParameter is called "parameter masking" (also SO Search)
You could always use the this masking pattern but it isn't always needed. For example, a simple select by unique key, with no child tables or other filters should behave as expected every time.
Since SQL Server 2008, you can also use the OPTIMISE FOR UNKNOWN (SO). Also see Alternative to using local variables in a where clause and Experience with when to use OPTIMIZE FOR UNKNOWN

performance issue on stored procedure/udf vs plain query from sql mgmnt console in sql server

I have a query that uses nested CTEs which is in a user defined function. I have to use nested CTEs because I want to re-use some calculations/case statements from the previous selects. The query looks similar to what is below.
;with cte1 as
(
select a, b from Table1
),
ct2 as
(
case when a =1 then 1 else 0 end as c, b from cte2
)
select * from cte2
I have this in a udf that is called from multiple stored procs. There are a large number of calculations being done inside this query. I'm noticing a performance difference when the query is run outside of the function. For around 12,000 records, it runs under 11 seconds when the query is run from the SQL management studio, applying all the parameters. When the same parameters are supplied to the udf, it takes around 55 seconds. I tried to put the query inside a stored proc instead of udf, but still the same 55 seconds. It looks like when the query is run from the management console, it uses parallelism for the query but not for function or stored proc.
This is not a major problem at this point but I would like to achieve the same 11 second performance if i can. Has anyone run into a similar scenario before?
display your "settings" from within the stored procedure and from just with SSMS. I have this same thing, faster in SSMS and slower in procedure. You can sometimes resolve this because SSMS is running with differeent settings than the procedure, get them same and you might be able to see the same performance in the procedure. Here is some example code to display the settings:
SELECT SESSIONPROPERTY ('ANSI_NULLS') --Specifies whether the SQL-92 compliant behavior of equals (=) and not equal to (<>) against null values is applied.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ANSI_PADDING') --Controls the way the column stores values shorter than the defined size of the column, and the way the column stores values that have trailing blanks in character and binary data.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ANSI_WARNINGS') --Specifies whether the SQL-92 standard behavior of raising error messages or warnings for certain conditions, including divide-by-zero and arithmetic overflow, is applied.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ARITHABORT') -- Determines whether a query is ended when an overflow or a divide-by-zero error occurs during query execution.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('CONCAT_NULL_YIELDS_NULL') --Controls whether concatenation results are treated as null or empty string values.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('NUMERIC_ROUNDABORT') --Specifies whether error messages and warnings are generated when rounding in an expression causes a loss of precision.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('QUOTED_IDENTIFIER') --Specifies whether SQL-92 rules about how to use quotation marks to delimit identifiers and literal strings are to be followed.
--1 = ON
--0 = OFF
you can just add them to your result set::
SELECT
col1, col2
,SESSIONPROPERTY ('ARITHABORT') AS ARITHABORT
,SESSIONPROPERTY ('ANSI_WARNINGS') AS ANSI_WARNINGS
,SESSIONPROPERTY ('...
FROM ...
if you ry them one at a time, try ARITHABORT first.
see: Resolving an ADO timeout issue in VB6
and Why would SET ARITHABORT ON dramatically speed up a query?

Resources