Is Sql Server's ISNULL() function lazy/short-circuited? - sql-server

TIs ISNULL() a lazy function?
That is, if i code something like the following:
SELECT ISNULL(MYFIELD, getMyFunction()) FROM MYTABLE
will it always evaluate getMyFunction() or will it only evaluate it in the case where MYFIELD is actually null?

This works fine
declare #X int
set #X = 1
select isnull(#X, 1/0)
But introducing an aggregate will make it fail and proving that the second argument could be evaluated before the first, sometimes.
declare #X int
set #X = 1
select isnull(#X, min(1/0))

It's whichever it thinks will work best.
Now it's functionally lazy, which is the important thing. E.g. if col1 is a varchar which will always contain a number when col2 is null, then
isnull(col2, cast(col1 as int))
Will work.
However, it's not specified whether it will try the cast before or simultaneously with the null-check and eat the error if col2 isn't null, or if it will only try the cast at all if col2 is null.
At the very least, we would expect it to obtain col1 in any case because a single scan of a table obtaining 2 values is going to be faster than two scans obtaining one each.
The same SQL commands can be executed in very different ways, because the instructions we give are turned into lower-level operations based on knowledge of the indices and statistics about the tables.
For that reason, in terms of performance, the answer is "when it seems like it would be a good idea it is, otherwise it isn't".
In terms of observed behaviour, it is lazy.
Edit: Mikael Eriksson's answer shows that there are cases that may indeed error due to not being lazy. I'll stick by my answer here in terms of the performance impact, but his is vital in terms of correctness impact across at least some cases.

Judging from the different behavior of
SELECT ISNULL(1, 1/0)
SELECT ISNULL(NULL, 1/0)
the first SELECT returns 1, the second raises a Msg 8134, Level 16, State 1, Line 4
Divide by zero error encountered. error.

This "lazy" feature you are referring to is in fact called "short-circuiting"
And it does NOT always work especially if you have a udf in the ISNULL expression.
Check this article where tests were run to prove it:
Short-circuiting (mainly in VB.Net and SQL Server)
T-SQL is a declarative language hence it cannot control the algorithm used to get the results.. it just declares what results it needs. It is upto the query engine/optimizer to figure out the cost-effective plan. And in SQL Server, the optimizer uses "contradiction detection" which will never guarantee a left-to-right evaluation as you would assume in procedural languages.
For your example, did a quick test:
Created the scalar-valued UDF to invoke the Divide by zero error:
CREATE FUNCTION getMyFunction
( #MyValue INT )
RETURNS INT
AS
BEGIN
RETURN (1/0)
END
GO
Running the below query did not give me a Divide by zero error encountered error.
DECLARE #test INT
SET #test = 1
SET #test = ISNULL(#test, (dbo.getMyFunction(1)))
SELECT #test
Changing the SET to the below statement did give me the Divide by zero error encountered. error. (introduced a SELECT in ISNULL)
SET #test = ISNULL(#test, (SELECT dbo.getMyFunction(1)))
But with values instead of variables, it never gave me the error.
SELECT ISNULL(1, (dbo.getMyFunction(1)))
SELECT ISNULL(1, (SELECT dbo.getMyFunction(1)))
So unless you really figure out how the optimizer is evaluating these expressions for all permutations, it would be safe to not rely on the short-circuit capabilities of T-SQL.

Related

SQL SUBSTRING & PATINDEX of varying lengths

SQL Server 2017.
Given the following 3 records with field of type nvarchar(250) called fileString:
_318_CA_DCA_2020_12_11-01_00_01_VM6.log
_319_CA_DCA_2020_12_12-01_VM17.log
_333_KF_DCA01_00_01_VM232.log
I would want to return:
VM6
VM17
VM232
Attempted thus far with:
SELECT
SUBSTRING(fileString, PATINDEX('%VM[0-9]%', fileString), 3)
FROM dbo.Table
But of course that only returns VM and 1 number.
How would I define the parameter for number of characters when it varies?
EDIT: to pre-emptively answer a question that may come up, yes, the VM pattern will always be proceeded immediately by .log and nothing else. But even if I took that approach and worked backwards, I still don't understand how to define the number of characters to take when the number varies.
here is one way :
DECLARE #test TABLE( fileString varchar(500))
INSERT INTO #test VALUES
('_318_CA_DCA_2020_12_11-01_00_01_VM6.log')
,('_319_CA_DCA_2020_12_12-01_00_01_VM17.log')
,('_333_KF_DCA_2020_12_15-01_00_01_VM232.log')
-- 5 is the length of file extension + 1 which is always the same size '.log'
SELECT
REVERSE(SUBSTRING(REVERSE(fileString),5,CHARINDEX('_',REVERSE(fileString))-5))
FROM #test AS t
This will dynamically grab the length and location of the last _ and remove the .log.
It is not the most efficient, if you are able to write a CLR function usnig C# and import it into SQL, that will be much more efficient. Or you can use this as starting point and tweak it as needed.
You can remove the variable and replace it with your table like below
DECLARE #TESTVariable as varchar(500)
Set #TESTVariable = '_318_CA_DCA_2020_12_11-01_00_01_VM6adf.log'
SELECT REPLACE(SUBSTRING(#TESTVariable, PATINDEX('%VM[0-9]%', #TESTVariable), PATINDEX('%[_]%', REVERSE(#TESTVariable))), '.log', '')
select *,
part = REPLACE(SUBSTRING(filestring, PATINDEX('%VM[0-9]%', filestring), PATINDEX('%[_]%', REVERSE(filestring))), '.log', '')
from table
Your lengths are consistent at the beginning. So get away from patindex and use substring to crop out the beginning. Then just replace the '.log' with an empty string at the end.
select *,
part = replace(substring(filestring,33,255),'.log','')
from table;
Edit:
Okay, from your edit you show differing prefix portions. Then patindex is in fact correct. Here's my solution, which is not better or worse than the other answers but differs with respect to the fact that it avoids reverse and delegates the patindex computation to a cross apply section. You may find it a bit more readable.
select filestring,
part = replace(substring(filestring, ap.vmIx, 255),'.log','')
from table
cross apply (select
vmIx = patindex('%_vm%', filestring) + 1
) ap

How do you use Snowflake RATIO_TO_REPORT to be precise?

I'm having a problem with Snowflake function RATIO_TO_REPORT and it's rounding. It seems like there is default rounding behavior, which causes a sum to be different from 1.
How would you address this issue?
RATIO_TO_REPORT Issue
Cheers,
Joe
Nothing going on here is technically "wrong" - as you've identified, this is a rounding issue. Each of the RATIO_TO_REPORT results is correct, and the sum of the values is also correct.
The best way to get around this is to force RATIO_TO_REPORT to output a more precise number by casting the input to NUMBER rather than INTEGER. In my testing, the below worked well:
-- Create the table for testing
CREATE TABLE R2R_TEST (activity_count integer);
-- Insert the values from the example screenshot.
INSERT INTO R2R_TEST VALUES (210),(11754),(3660),(66);
-- Create the test RATIO_TO_REPORT query with the cast to NUMBER
SELECT RATIO_TO_REPORT(activity_count::number(32,8)) OVER (ORDER BY activity_count) r2r FROM R2R_TEST;
-- Check our work.
with test as (SELECT RATIO_TO_REPORT(activity_count::number(32,8)) OVER (ORDER BY activity_count) r2r FROM R2R_TEST)
SELECT SUM(r2r) FROM test; -- 1.0000000000[...]
You can see that all I've done is cast activity_count to a NUMBER, and this gives the unrounded result.

performance issue on stored procedure/udf vs plain query from sql mgmnt console in sql server

I have a query that uses nested CTEs which is in a user defined function. I have to use nested CTEs because I want to re-use some calculations/case statements from the previous selects. The query looks similar to what is below.
;with cte1 as
(
select a, b from Table1
),
ct2 as
(
case when a =1 then 1 else 0 end as c, b from cte2
)
select * from cte2
I have this in a udf that is called from multiple stored procs. There are a large number of calculations being done inside this query. I'm noticing a performance difference when the query is run outside of the function. For around 12,000 records, it runs under 11 seconds when the query is run from the SQL management studio, applying all the parameters. When the same parameters are supplied to the udf, it takes around 55 seconds. I tried to put the query inside a stored proc instead of udf, but still the same 55 seconds. It looks like when the query is run from the management console, it uses parallelism for the query but not for function or stored proc.
This is not a major problem at this point but I would like to achieve the same 11 second performance if i can. Has anyone run into a similar scenario before?
display your "settings" from within the stored procedure and from just with SSMS. I have this same thing, faster in SSMS and slower in procedure. You can sometimes resolve this because SSMS is running with differeent settings than the procedure, get them same and you might be able to see the same performance in the procedure. Here is some example code to display the settings:
SELECT SESSIONPROPERTY ('ANSI_NULLS') --Specifies whether the SQL-92 compliant behavior of equals (=) and not equal to (<>) against null values is applied.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ANSI_PADDING') --Controls the way the column stores values shorter than the defined size of the column, and the way the column stores values that have trailing blanks in character and binary data.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ANSI_WARNINGS') --Specifies whether the SQL-92 standard behavior of raising error messages or warnings for certain conditions, including divide-by-zero and arithmetic overflow, is applied.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ARITHABORT') -- Determines whether a query is ended when an overflow or a divide-by-zero error occurs during query execution.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('CONCAT_NULL_YIELDS_NULL') --Controls whether concatenation results are treated as null or empty string values.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('NUMERIC_ROUNDABORT') --Specifies whether error messages and warnings are generated when rounding in an expression causes a loss of precision.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('QUOTED_IDENTIFIER') --Specifies whether SQL-92 rules about how to use quotation marks to delimit identifiers and literal strings are to be followed.
--1 = ON
--0 = OFF
you can just add them to your result set::
SELECT
col1, col2
,SESSIONPROPERTY ('ARITHABORT') AS ARITHABORT
,SESSIONPROPERTY ('ANSI_WARNINGS') AS ANSI_WARNINGS
,SESSIONPROPERTY ('...
FROM ...
if you ry them one at a time, try ARITHABORT first.
see: Resolving an ADO timeout issue in VB6
and Why would SET ARITHABORT ON dramatically speed up a query?

SQL Cast Mystery

I have a real mystery with the T-SQL below. As it is, it works with either the DATAP.Private=1, or the cast as int on Right(CRS,1). That is, if I uncomment the DATAP.Private=1, I get the error Conversion failed when converting the varchar value 'M' to data type int, and if I then remove that cast, the query works again. With the cast in place, the query only works without the Private=1.
I cannot for the life of me see how the Private=1 can add anything to the result set that will cause the error, unless Private is ever 'M', but Private is a bit field!
SELECT
cast(Right(CRS,1) as int) AS Company
, cast(PerNr as int) AS PN
, Round(Sum(Cost),2) AS Total_Cost
FROM
DATAP
LEFT JOIN BU_Summary ON DATAP.BU=BU_Summary.BU
WHERE
DATAP.Extension Is Not Null
--And DATAP.Private=1
And Left(CRS,2)='SB'
And DATAP.PerNr Between '1' And '9A'
and Right(CRS,1) <> 'm'
GROUP BY
cast(Right(CRS,1) as int)
, cast(PerNr as int)
ORDER BY
cast(PerNr as int)
I've seen something like this in the past. It's possible the DATAP.Private = 1 clause is generating a query plan that is performing the CRS cast before the Right(CRS,1) <> 'm' filter is applied.
It sure shouldn't do that, but I've had similar problems in T-SQL I've written, particularly when views are involved.
You might be able to reorder the elements to get the query to work, or select uncast data values into a temporary table or table variable and then select and cast from there in a separate statement as a stopgap.
If you check the execution plan it might shed some light about what is being calculated where. This might give you more ideas as to what you might change.
Just a guess, but it may be that when Private = 1, PerNr cannot be anything but a castable number in your data (as it is in the PerNr can equal '9A [or whatever else]', breaking the cast in the group by and order by clauses).
CAST('9A' AS int) fails when I tested it. It looks like you're doing unnecessary CASTS for grouping and sorting. Particularly in the GROUP BY, that will at least kill any chance for optimizing.

Incosistency between MS Sql 2k and 2k5 with columns as function arguments

I'm having trouble getting the following to work in SQL Server 2k, but it works in 2k5:
--works in 2k5, not in 2k
create view foo as
SELECT usertable.legacyCSVVarcharCol as testvar
FROM usertable
WHERE rsrcID in
( select val
from
dbo.fnSplitStringToInt(usertable.legacyCSVVarcharCol, default)
)
--error message:
Msg 170, Level 15, State 1, Procedure foo, Line 4
Line 25: Incorrect syntax near '.'.
So, legacyCSVVarcharCol is a column containing comma-separated lists of INTs. I realize that this is a huge WTF, but this is legacy code, and there's nothing that can be done about the schema right now. Passing "testvar" as the argument to the function doesn't work in 2k either. In fact, it results in a slightly different (and even weirder error):
Msg 155, Level 15, State 1, Line 8
'testvar' is not a recognized OPTIMIZER LOCK HINTS option.
Passing a hard-coded string as the argument to fnSplitStringToInt works in both 2k and 2k5.
Does anyone know why this doesn't work in 2k? Is this perhaps a known bug in the query planner? Any suggestions for how to make it work? Again, I realize that the real answer is "don't store CSV lists in your DB!", but alas, that's beyond my control.
Some sample data, if it helps:
INSERT INTO usertable (legacyCSVVarcharCol) values ('1,2,3');
INSERT INTO usertable (legacyCSVVarcharCol) values ('11,13,42');
Note that the data in the table does not seem to matter since this is a syntax error, and it occurs even if usertable is completely empty.
EDIT: Realizing that perhaps the initial example was unclear, here are two examples, one of which works and one of which does not, which should highlight the problem that's occurring:
--fails in sql2000, works in 2005
SELECT t1.*
FROM usertable t1
WHERE 1 in
(Select val
from
fnSplitStringToInt(t1.legacyCSVVarcharCol, ',')
)
--works everywhere:
SELECT t1.*
FROM usertable t1
WHERE 1 in
( Select val
from
fnSplitStringToInt('1,4,543,56578', ',')
)
Note that the only difference is the first argument to fnSplitStringToInt is a column in the case that fails in 2k and a literal string in the case that succeeds in both.
Passing column-values to a table-valued user-defined function is not supported in SQL Server 2000, you can only use constants, so the following (simpler version) would also fail:
SELECT *, (SELECT TOP 1 val FROM dbo.fnSplitStringToInt(usertable.legacyCSVVarcharCol, ','))
FROM usertable
It will work on SQL Server 2005, though, as you have found out.
I don't think functions can have default values in functions in SS2K.
What happens when you run this SQL in SS2K?
select val
from dbo.fnSplitStringToInt('1,2,3', default)

Resources