SQL Server Function WIthin Case Statement - sql-server

select top 10 *, case
when datediff(day,DateOfSale, getDate()) > 5 then '5'
when datediff(day,DateOfSale, getDate()) > 10 then '10'
... (more datediff clauses)
...
...
else '20'
end as jack
from Foo
Is SQL Server smart enough to evaluate the datediff function call once within the case statement and then use that value for every when clause? Or is the function is getting called 'n' times, where 'n' is the amount of when clauses?

It's hard to see how SQL Server could evaluate the call once. The call has a column as parameter and so has to be evaluated for every row.
Thus, your condition is better written like:
when DateOfSale < dateadd(day, -5, getdate()) then '5'
In this case the difference is small. Date calculations are cheap.
The classic example where the function call does matter is a where condition on a table with an index on the date column. For example, YourTable with an index on (dt). This query would allow an index to be used:
select * from YourTable where dt < dateadd(day, -5, getdate())
While this query would not:
select * from YourTable where datediff(day, DateOfSale, getDate()) > 5

It's puzzling that so many answers are mentioning indexes. Indeed, DATEDIFF is not SARGable, but that's completely irrelevant here as CASE WHEN doesn't cause the query optimizer in SQL Server to consider index usage (other than trying to find a covering scannable path). The candidacy of DATEDIFF-involved expressions for index pathing is completely irrelevant to this question, as far as I can tell.
It's pretty easy to demonstrate that SQL Server does, indeed, stop evaluating predicates inside CASE statements once the first true predicate is found.
To demonstrate that fact, let's cook up some sample data:
CREATE TABLE Diffy (SomeKey INTEGER NOT NULL IDENTITY(1,1), DateOfSale DATE);
DECLARE #ThisOne AS DATE;
SET #ThisONe = '2012-01-01';
WHILE #thisONe < '2013-01-01'
BEGIN
INSERT INTO Diffy (DateOfSale) VALUES(#ThisOne);
SET #ThisOne = DateAdd(d, 1, #ThisOne);
END;
Then, let's SELECT it in the pattern of the original question. Note that the original question specifies a TOP 10 clause without an ORDER BY clause, so the values we actually get back are random. But if we add a clause to the CASE that would poison evaluation, we can see what's happening:
SELECT TOP 10 *, CASE
WHEN datediff(day,DateOfSale, getDate()) > 5 then '5'
WHEN datediff(day,DateOfSale, getDate()) > 10 then '10'
WHEN 1/0 > 1then 'boom'
ELSE '20' END
AS Jack
FROM Diffy;
Note that if we ever evaluated 1/0 > 1, then we'd expect something like a 'Divide by zero error encountered.'. However, running this query against my server yields ten rows, all with '5' in the Jack column.
If we take away the TOP 10, sure enough we get some rows and then get the Divide by zero error. Thus, we can safely conclude that SQL Server is doing early exit evaluation on the CASE statement.
On top of it, the documentation also tells us so:
The CASE statement evaluates its conditions sequentially and stops with the first condition whose condition is satisfied.
Perhaps the question is meant to ask if the common DATEDIFF() subexpression is hoisted from all the CASE statements, computed once, and then evaluated within each predicate's context. By observing the output of SET SHOWPLAN_TEXT ON, I think we can conclude that's not the case:
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN datediff(day,CONVERT_IMPLICIT(datetimeoffset(7),[Scratch3].[dbo].[Diffy].[DateOfSale],0),CONVERT_IMPLICIT(datetimeoffset(3),getdate(),0))>(5) THEN '5' ELSE CASE WHEN datediff(day,CONVERT_IMPLICIT(datetimeoffset(7),[Scratch3].[dbo].[Diffy].[DateOfSale],0),CONVERT_IMPLICIT(datetimeoffset(3),getdate(),0))>(10) THEN '10' ELSE CASE WHEN (1)/(0)>(1) THEN 'boom' ELSE '20' END END END))
|--Index Scan(OBJECT:([Scratch3].[dbo].[Diffy].[DiffyHelper]))
From that, we can conclude that the structure of this query means that DATEDIFF() is evaluated for each row and for each predicate, so O(rows * predicates) calls, at worst. That causes some CPU load for the query, but DATEDIFF() isn't quite that expensive and shouldn't be much of a concern. If, in practice, it turns out to be causing a performance problem, there are ways to manually hoist the computation from the query. For example, DATEDIFF() on the date-relative side of the comparison.

Sure, but not in your case (the expression is based on a table column value that changes for each row), but in any event, don't execute the datediff on the table column value, run a dateadd on the predicate (comparison) value so your query can still use any existing index on DateOfSale...
select top 10 *,
case When DateOfSale < dateadd(day, -20, getDate()) then '20'
When DateOfSale < dateadd(day, -15, getDate()) then '15'
When DateOfSale < dateadd(day, -10, getDate()) then '10'
When DateOfSale < dateadd(day, -5, getDate()) then '5'
else '20' end jack
from Foo

Related

SQL Server Nested CASE and OR in WHERE statement

I have been trying all day to figure out how to (properly) move a nested IIF statement from Access to SQL Server.
The query needs to evaluate a simple NULL/-1/1 (null/yes/no) combobox. If it is blank, it should bring back all records. If YES (-1) then return Demog.[LT Due Date] <= GETDATE(). If NO (1) then Demog.[LT Due Date] >= GETDATE().
Here is the ACCESS SQL that works perfectly:
SELECT Demog.ID, Demog.[Long Term Date]
FROM Demog
WHERE (
(iif(Forms!Demog_Entry!cbox_LTPD=-1,Demog.[Long Term Date]<=Date(),
(iif(Forms!Demog_Entry!cbox_LTPD=0,Demog.[Long Term
Date]>=Date(),isnull(Forms!Demog_Entry!cbox_LTPD))))));
And here is one of the SQL Server Codes I've tried: It evaluates the Null correctly, but if anything else it only returns Demog.[LT Due Date] <= GETDATE().
Declare #LTPD as Bit
Set #LTPD = -1
SELECT Demog.ID, Demog.[LT Due Date]
FROM Demog
WHERE (1=(CASE WHEN #LTPD Is Null THEN 1 ELSE 0 END) OR Demog.[LT Due Date] <= GETDATE())
I have also tried doing a combination of CASE and OR statements. Here is what I have, but again, I just can't get it to evaluate all 3 states correctly.
WHERE (1=(CASE WHEN #LTPD Is Null THEN 1 ELSE 0 END) OR
(#LTPD=-1 AND Demog.[LT Due Date] <= GETDATE()) OR
(#LTPD=1 AND Demog.[LT Due Date] >= GETDATE()))
I also tried using IIF but could not make it work no matter what I did. I'm assuming IIF works in the SELECT statement but not the WHERE statement?
you might just need some ORs
select Demog.ID, Demog.[Long Term Date]
from Demog
where (#LTPD IS NULL)
or (#LTPD = -1 AND Demog.[Long Term Date]<=GetDate())
or (#LTPD = 1 AND Demog.[Long Term Date]>=GetDate())

Using DateDiff in Join [duplicate]

I have a number of days variable which I want to compare against a datetime column (senddate).
I'm currently doing this:
DECLARE #RunDate datetime = '2013-01-01'
DECLARE #CalculationInterval int = 10
DELETE
FROM TableA
WHERE datediff(dd, senddate, #RunDate) > #CalculationInterval
Anything that is older than 10 days should get deleted. We have Index on sendDate column but still the speed is much slower. I know the left side should not have calculation for performance reasons, but what is the optimal way of otherwise solving this issue?
The expression
WHERE datediff(dd, senddate, #RunDate) > #CalculationInterval
won't be able to use an index on the senddate column, because of the function on the column senddate
In order to make the WHERE clause 'SARGable' (i.e. able to use an index), change to the equivalent condition:
WHERE senddate < dateadd(dd, -#CalculationInterval, #RunDate)
[Thanks to #Krystian Lieber, for pointing out incorrect condition ].

Distinct count needed within case when statement

I am creating a query of all people who were screened for smoking status and need a count of unique patients. I am pulling from an encounter table, so the patient could have been asked multiple times. In my case when statement I would like to limit the "Then..." result to something like "Then count distinct patients" but it is giving me an error about aggregates not being allowed within an aggregate. If I remove it, it will then not produce a total as I wish and it's telling me I need it in the group by clause, which I do not want. limit is not an option in sql-server to the best of my knowledge
,count(case when soc.tobacco_user_c in (1, 2, 4, 5) and dmw.SMOKING_CESS_CNSL_YN ='y' then enc.PAT_ID **Here is where I want a unique count of patients** end) Compliant
You can combine DISTINCT with a CASE expression.
Example
SELECT
COUNT(DISTINCT CASE WHEN tobacco = 1 THEN PAT_ID ELSE NULL END)
...
;
I've abbreviated your example to make it easier to read. NULLs will not be included in the final count, so there is no need to worry about off by one errors.
case when soc.tobacco_user_c in (1, 2, 4, 5) and dmw.SMOKING_CESS_CNSL_YN ='y' then COUNT(DISTINCT enc.PAT_ID) ELSE 0 end Compliant
I ended up creating two subqueries and then doing a select count distinct on each of the max columns in those queries to limit the results to one

SQL Datediff in seconds with decimal places

I am trying to extract the difference between two SQL DateTime values in seconds, with decimal places for some performance monitoring.
I have a table, "Pagelog" which has a "created" and "end" datetime. In the past I have been able to do the following:
SELECT DATEDIFF(ms, pagelog_created, pagelog_end)/1000.00 as pl_duration FROM pagelog
However I have started getting the following error:
Msg 535, Level 16, State 0, Line 1
The datediff function resulted in an overflow. The number of dateparts separating two date/time instances is too large. Try to use datediff with a less precise datepart.
I have seen numerous responses to this error stating that I should use a less precise unit of measurement. But this hardly helps when I need to distinguish between 2.1 seconds and 2.9 seconds, because DATEDIFF(s,..,..) will return INT results and lose the accuracy I need.
I originally thought that this had been caused by a few values in my table having a huge range but running this:
SELECT DATEDIFF(s, pagelog_created, pagelog_end) FROM pagelog
ORDER BY DATEDIFF(s, pagelog_created, pagelog_end) DESC
Returns a max value of 30837, which is 8.5 hours or 30,837,000 milliseconds, well within the range of a SQL INT as far as I know?
Any help would be much appreciated, as far as I can tell I have two options:
Somehow fix the problem with the data, finding the culprit values
Find a different way of calculating the difference between the values
Thanks!
The StackOverflow magic seems to have worked, despite spending hours on this problem last week, I re-read my question and have now solved this. I thought I'd update with the answer to help anyone else who has this problem.
The problem here was not that there was a large range, there was a negative range. Which obviously results in a negative overflow. It would have been helpful if the SQL Server error was a little more descriptive but it's not technically wrong.
So in my case, this was returning values:
SELECT * FROM pagelog
WHERE pagelog_created > pagelog_end
Either remove the values, or omit them from the initial result set!
Thanks to Ivan G and Andriy M for your responses too
You can try to avoid overflow like this:
DECLARE #dt1 DATETIME = '2013-01-01 00:00:00.000'
DECLARE #dt2 DATETIME = '2013-06-01 23:59:59.997'
SELECT DATEDIFF(DAY, CAST(#dt1 AS DATE), CAST(#dt2 AS DATE)) * 24 * 60 * 60
SELECT DATEDIFF(ms, CAST(#dt1 AS TIME), CAST(#dt2 AS TIME))/1000.0
SELECT DATEDIFF(DAY, CAST(#dt1 AS DATE), CAST(#dt2 AS DATE)) * 24 * 60 * 60
+ DATEDIFF(ms, CAST(#dt1 AS TIME), CAST(#dt2 AS TIME))/1000.0
First it gets number of seconds in whole days from the DATE portion of the DATETIME and then it adds number of seconds from the TIME portion, after that, it just adds them.
There won't be error because DATEDIFF for minimum and maximum time in TIME data type cannot produce overflow.
You could of course do something like this:
SELECT
DATEDIFF(ms, DATEADD(s, x.sec, pagelog_created), pagelog_end) * 0.001
+ x.sec AS pl_duration
FROM pagelog
CROSS APPLY (
SELECT DATEDIFF(s, pagelog_created, pagelog_end)
) x (sec)
;
As you can see, first, the difference in seconds between pagelog_created and pagelog_end is taken, then the seconds are added back to pagelog_created and the difference in milliseconds between that value and pagelog_end is calculated and added to the seconds.
However, since, as per your investigation, the table doesn't seem to have rows that could cause the overflow, I'd also double check whether that particular fragment was the source of the error.
with cte as(
select rownum = row_number() over(partition by T.TR_ID order by T.[date]),
T.* from [dbo].[TR_Events] T
)
select cte.[date],nex.[date],convert(varchar(10),datediff(s, cte.[date], nex.[date])/3600)+':'+
convert(varchar(10),datediff(s, cte.[date], nex.[date])%3600/60)+':'+
convert(varchar(10),(datediff(s,cte.[date], nex.[date])%60))
from cte
left join cte prev on prev.rownum = cte.rownum - 1
left join cte nex on nex.rownum = cte.rownum + 1

Datediff performance

I have a number of days variable which I want to compare against a datetime column (senddate).
I'm currently doing this:
DECLARE #RunDate datetime = '2013-01-01'
DECLARE #CalculationInterval int = 10
DELETE
FROM TableA
WHERE datediff(dd, senddate, #RunDate) > #CalculationInterval
Anything that is older than 10 days should get deleted. We have Index on sendDate column but still the speed is much slower. I know the left side should not have calculation for performance reasons, but what is the optimal way of otherwise solving this issue?
The expression
WHERE datediff(dd, senddate, #RunDate) > #CalculationInterval
won't be able to use an index on the senddate column, because of the function on the column senddate
In order to make the WHERE clause 'SARGable' (i.e. able to use an index), change to the equivalent condition:
WHERE senddate < dateadd(dd, -#CalculationInterval, #RunDate)
[Thanks to #Krystian Lieber, for pointing out incorrect condition ].

Resources