Using DateDiff in Join [duplicate] - sql-server

I have a number of days variable which I want to compare against a datetime column (senddate).
I'm currently doing this:
DECLARE #RunDate datetime = '2013-01-01'
DECLARE #CalculationInterval int = 10
DELETE
FROM TableA
WHERE datediff(dd, senddate, #RunDate) > #CalculationInterval
Anything that is older than 10 days should get deleted. We have Index on sendDate column but still the speed is much slower. I know the left side should not have calculation for performance reasons, but what is the optimal way of otherwise solving this issue?

The expression
WHERE datediff(dd, senddate, #RunDate) > #CalculationInterval
won't be able to use an index on the senddate column, because of the function on the column senddate
In order to make the WHERE clause 'SARGable' (i.e. able to use an index), change to the equivalent condition:
WHERE senddate < dateadd(dd, -#CalculationInterval, #RunDate)
[Thanks to #Krystian Lieber, for pointing out incorrect condition ].

Related

Alternative to cursor when applying a list of values to a where clause?

What's an alternative to getting a distinct number of dates, say all the dates for September:
9/1/2016
9/2/2016
9/3/2016
and apply each value to a query. Say something like:
Select GuitarId,GuitarBrand
From GuitarSales
Where GuitarDate = #date
I don't want to use a cursor, is there an alternative to doing this?
I tried a CTE but even then I'd have to apply the cursor for each date.
If you want all the dates for a month you can use
Select GuitarId,GuitarBrand
From GuitarSales
Where month(GuitarDate) = 9
and year(GuitarDate) = 2016;
If I understand you correctly, you need a list of all dates in September. This is a quick solution to get a gapless list of all days in September: In your query you can use this as source and LEFT JOIN your actual data.
WITH RunningNumbers AS
(
SELECT TOP(30) ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1 AS Nr
FROM sys.objects
)
SELECT {d'2016-09-01'}+Nr AS RunningDate
FROM RunningNumbers
There are many examples, how you can create a tally table on the fly. Small numbers (like 30 in this example) can be taken easily from any table with sufficient rows.
If you need this more often you might think about a Numbers-Table
a related question: https://stackoverflow.com/a/39387790/5089204
create a persitant numbers table with a lot of usefull side data: https://stackoverflow.com/a/32474751/5089204
Assuming you have an index on GuitarDate here is a way you can create a SARGable where predicate so you can still leverage the speed of using an index seek.
declare #date datetime = '2016-09-10' --just to demonstrate starting with September 10, 2016
select gs.GuitarId
, gs.GuitarBrand
From GuitarSales gs
where gs.GuitarDate >= dateadd(month, datediff(month, 0, #date), 0) --beginning of the month for #date
and gs.GuitarDate < dateadd(month, datediff(month, 0, #date) + 1, 0) --beginning of next month

Convert Access query to SQL Server Query

I have a MS Access query and I want to convert in SQL Server query, any help will be greatly appreciated.
SELECT
dbo_Employees.*,
(SELECT top 1 dbo_attendance.attend_date
FROM dbo_Attendance
WHERE dbo_attendance.ID_Employee=dbo_attendance.ID_Employee
and dbo_attendance.attend_date > dbo_attendance.attend_date
order by dbo_attendance.attend_date asc) AS NextDate,
IIf(IsNull(NextDate),Now(),Nextdate) AS next123,
Next123-dbo_attendance.attend_date AS difference,
dbo_attendance.attend_date,
IIf(dbo_attendance.attend_date+90<Next123,1,0) AS Day90Credit,
IIf(dbo_attendance.attend_date+90<Next123,dbo_attendance.attend_date+90,dbo_attendance.attend_date+365) AS CreditDate,
IIf((Day90Credit=0 And CreditDate<Now()) Or Day90Credit=1,1,0) AS TotalCredit
FROM dbo_attendance, dbo_Employees
WHERE (((dbo_Employees.Employee_ID)=[dbo_attendance].[ID_Employee]));
In sql server (and most every other RDBMS) we use CASE statements instead of iif(). The structure is pretty simple CASE WHEN <condition> THEN <value if true> ELSE <value if false> END.
Changing your iif() over to CASE will be the bulk of the switch over. The first iif() however is better represented as a COALESCE() which allows a list of fields or values. Coalesce will grab the first Non-Null value/field from the list for that record.
The other things that have to be switched is the Date logic. In SQL Server you use DATEADD() to add days (or other date parts like year and month) to a date. you use DATEDIFF() to subtract two dates to get a date part (like Days or Months or Years).
SELECT dbo_Employees.*,
(
SELECT TOP 1 dbo_attendance.attend_date
FROM dbo_Attendance
WHERE dbo_attendance.ID_Employee = dbo_attendance.ID_Employee
AND dbo_attendance.attend_date > dbo_attendance.attend_date
ORDER BY dbo_attendance.attend_date ASC
) AS NextDate,
COALESCE(NextDate, GETDATE()) AS next123,
datediff(day, dbo_attendance.attend_date, COALESCE(NextDate, GETDATE())) AS difference,
dbo_attendance.attend_date,
CASE
WHEN DATEADD(DAY, 90, dbo_attendance.attend_date) < COALESCE(NextDate, GETDATE())
THEN 1
ELSE 0
END AS Day90Credit,
CASE
WHEN DATEADD(DAY, 90, dbo_attendance.attend_date) < COALESCE(NextDate, GETDATE())
THEN dateAdd(DAY, 90, dbo_attendance.attend_date)
ELSE DATEADD(DAY, 365, dbo_attendance.attend_date)
END AS CREDITDATE,
CASE
WHEN (
Day90Credit = 0
AND CreditDate < GETDATE()
)
OR DATEADD(DAY, 90, dbo_attendance.attend_date) < COALESCE(NextDate, GETDATE())
THEN 1
ELSE 0
END AS TotalCredit
FROM dbo_attendance,
dbo_Employees
WHERE dbo_Employees.Employee_ID = [dbo_attendance].[ID_Employee];
Lastly... I can't remember how this works in SQL server since it's been a while since I was in the environment, but you might have to switch instances of dbo_ to dbo.. Your server will cry foul and let you know anyhow.
You can try CTE (Common Table Expressions) in Sql Server for complex calculations, see this link: https://technet.microsoft.com/en-us/library/ms190766(v=sql.105).aspx
I refactored part of your query as below, proceed adding your calculations under WITH block:
WITH Emp_CTE (ID_Employee, attend_date)
AS
(
SELECT emp.*,
(SELECT TOP 1 att.attend_date FROM dbo_Attendance AS att
WHERE att.ID_Employee = emp.ID_Employee
AND att.attend_date > emp.attend_date
ORDER BY att.attend_date ASC) AS [NextDate]
FROM dbo_Employees
)
SELECT ISNULL(NextDate, GETDATE()) AS [next123],
ISNULL(NextDate, GETDATE()) - att.attend_date AS [difference]
FROM Emp_CTE;

Datediff performance

I have a number of days variable which I want to compare against a datetime column (senddate).
I'm currently doing this:
DECLARE #RunDate datetime = '2013-01-01'
DECLARE #CalculationInterval int = 10
DELETE
FROM TableA
WHERE datediff(dd, senddate, #RunDate) > #CalculationInterval
Anything that is older than 10 days should get deleted. We have Index on sendDate column but still the speed is much slower. I know the left side should not have calculation for performance reasons, but what is the optimal way of otherwise solving this issue?
The expression
WHERE datediff(dd, senddate, #RunDate) > #CalculationInterval
won't be able to use an index on the senddate column, because of the function on the column senddate
In order to make the WHERE clause 'SARGable' (i.e. able to use an index), change to the equivalent condition:
WHERE senddate < dateadd(dd, -#CalculationInterval, #RunDate)
[Thanks to #Krystian Lieber, for pointing out incorrect condition ].

SQL Server Function WIthin Case Statement

select top 10 *, case
when datediff(day,DateOfSale, getDate()) > 5 then '5'
when datediff(day,DateOfSale, getDate()) > 10 then '10'
... (more datediff clauses)
...
...
else '20'
end as jack
from Foo
Is SQL Server smart enough to evaluate the datediff function call once within the case statement and then use that value for every when clause? Or is the function is getting called 'n' times, where 'n' is the amount of when clauses?
It's hard to see how SQL Server could evaluate the call once. The call has a column as parameter and so has to be evaluated for every row.
Thus, your condition is better written like:
when DateOfSale < dateadd(day, -5, getdate()) then '5'
In this case the difference is small. Date calculations are cheap.
The classic example where the function call does matter is a where condition on a table with an index on the date column. For example, YourTable with an index on (dt). This query would allow an index to be used:
select * from YourTable where dt < dateadd(day, -5, getdate())
While this query would not:
select * from YourTable where datediff(day, DateOfSale, getDate()) > 5
It's puzzling that so many answers are mentioning indexes. Indeed, DATEDIFF is not SARGable, but that's completely irrelevant here as CASE WHEN doesn't cause the query optimizer in SQL Server to consider index usage (other than trying to find a covering scannable path). The candidacy of DATEDIFF-involved expressions for index pathing is completely irrelevant to this question, as far as I can tell.
It's pretty easy to demonstrate that SQL Server does, indeed, stop evaluating predicates inside CASE statements once the first true predicate is found.
To demonstrate that fact, let's cook up some sample data:
CREATE TABLE Diffy (SomeKey INTEGER NOT NULL IDENTITY(1,1), DateOfSale DATE);
DECLARE #ThisOne AS DATE;
SET #ThisONe = '2012-01-01';
WHILE #thisONe < '2013-01-01'
BEGIN
INSERT INTO Diffy (DateOfSale) VALUES(#ThisOne);
SET #ThisOne = DateAdd(d, 1, #ThisOne);
END;
Then, let's SELECT it in the pattern of the original question. Note that the original question specifies a TOP 10 clause without an ORDER BY clause, so the values we actually get back are random. But if we add a clause to the CASE that would poison evaluation, we can see what's happening:
SELECT TOP 10 *, CASE
WHEN datediff(day,DateOfSale, getDate()) > 5 then '5'
WHEN datediff(day,DateOfSale, getDate()) > 10 then '10'
WHEN 1/0 > 1then 'boom'
ELSE '20' END
AS Jack
FROM Diffy;
Note that if we ever evaluated 1/0 > 1, then we'd expect something like a 'Divide by zero error encountered.'. However, running this query against my server yields ten rows, all with '5' in the Jack column.
If we take away the TOP 10, sure enough we get some rows and then get the Divide by zero error. Thus, we can safely conclude that SQL Server is doing early exit evaluation on the CASE statement.
On top of it, the documentation also tells us so:
The CASE statement evaluates its conditions sequentially and stops with the first condition whose condition is satisfied.
Perhaps the question is meant to ask if the common DATEDIFF() subexpression is hoisted from all the CASE statements, computed once, and then evaluated within each predicate's context. By observing the output of SET SHOWPLAN_TEXT ON, I think we can conclude that's not the case:
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN datediff(day,CONVERT_IMPLICIT(datetimeoffset(7),[Scratch3].[dbo].[Diffy].[DateOfSale],0),CONVERT_IMPLICIT(datetimeoffset(3),getdate(),0))>(5) THEN '5' ELSE CASE WHEN datediff(day,CONVERT_IMPLICIT(datetimeoffset(7),[Scratch3].[dbo].[Diffy].[DateOfSale],0),CONVERT_IMPLICIT(datetimeoffset(3),getdate(),0))>(10) THEN '10' ELSE CASE WHEN (1)/(0)>(1) THEN 'boom' ELSE '20' END END END))
|--Index Scan(OBJECT:([Scratch3].[dbo].[Diffy].[DiffyHelper]))
From that, we can conclude that the structure of this query means that DATEDIFF() is evaluated for each row and for each predicate, so O(rows * predicates) calls, at worst. That causes some CPU load for the query, but DATEDIFF() isn't quite that expensive and shouldn't be much of a concern. If, in practice, it turns out to be causing a performance problem, there are ways to manually hoist the computation from the query. For example, DATEDIFF() on the date-relative side of the comparison.
Sure, but not in your case (the expression is based on a table column value that changes for each row), but in any event, don't execute the datediff on the table column value, run a dateadd on the predicate (comparison) value so your query can still use any existing index on DateOfSale...
select top 10 *,
case When DateOfSale < dateadd(day, -20, getDate()) then '20'
When DateOfSale < dateadd(day, -15, getDate()) then '15'
When DateOfSale < dateadd(day, -10, getDate()) then '10'
When DateOfSale < dateadd(day, -5, getDate()) then '5'
else '20' end jack
from Foo

Return (Cast? Convert?) values as DateTIme from a Distinct year, month query in a store procedure

I have database with a Publications table that is many-to-may joined to iself through a SubPublications table
My stored procedure returns all of the distinct Year-Month combos from a ReleaseDate field of Publications of a specified type that are not related to a specific (by id) publication (hence the 2 params, see below).
QUESTION:
The proc works fine, but I want the return column type as DateTime2 with a dummy date of 1. As it is now, it returns 2 columns of integers. How do I do this?
I know I could do the conversion in my app code, but I'd rather have it delivered as a datetime from the DB.
My SQL ain't great. I don't even know if I should use a cast or a convert.
I can't find an example online of converting back to datetime within a query like that. Can anyone help? Here's the proc I wrote, as it stands:
ALTER PROCEDURE sp_DistinctPubMonthYears
#PubType char(1),
#PubId int = 0
AS
BEGIN
SELECT
DISTINCT TOP (100) PERCENT
DATEPART(month, ReleaseDate) AS month,
DATEPART(year, ReleaseDate) AS year
FROM(
SELECT
Publications.ReleaseDate AS ReleaseDate,
Publications.PublicationId As PubId,
Publications.PubType AS PubType,
SubPublications.PublicationId AS ParentId
FROM
Publications LEFT JOIN SubPublications
ON
Publications.PublicationId = SubPublications.PublicationId
WHERE
Publications.PubType = #PubType AND
Publications.PublicationId <> #PubId AND
(
SubPublications.PublicationId <> #PubId OR
/*either it's parent is NOT the one we're searching on or */
SubPublications.PublicationId IS NULL
/*or it's not joined to anything at all */
)
) AS sub
ORDER BY year ASC, month ASC
END
GO
You don't need TOP and you may as well ORDER BY the expression.
This DATEADD/DATEDIFF expression will give you start of current month
SELECT DISTINCT
CAST(
DATEADD(month, DATEDIFF(month, 0, ReleaseDate), 0) AS datetime2
) AS myCol
FROM(
...
ORDER BY
DATEADD(month, DATEDIFF(month, 0, ReleaseDate), 0)
Edit: As Faust mentioned, we can order on the alias if you prefer.
...
ORDER BY
myCol
In this case the result is the same.
If the CAST was to varchar then you would have different results. This is why I tend to use the expression not the alias but it's quite trivial. Surely I'd test my changes..., no?
DATEADD(MONTH, DATEDIFF(MONTH, '1600-01-01T00:00:00', ReleaseDate), '1600-01-01T00:00:00') should get you your yyyy-MM-dd 00:00:00 date. 1600-01-01T00:00:00 is just an arbitrary date that is best picked to be prior to any dates you may be storing in your ReleaseDate column.

Resources