How to compare column values to declared variable - sql-server

I was asked this interview question.
--Without modifying the following code:
DECLARE #StartDateInput SMALLDATETIME = '1/1/2018',
#EndDateInput SMALLDATETIME = '1/1/2018'
--Modify the following query so that it will return contacts modified at any time on January 1st, 2018
SELECT *
FROM dbo.Contacts
I tried the following query but this was not correct. I'm sure that I'm supposed to use the #EndDateInput variable as well but I wasn't sure how to use it. I don't think that this is the right way to approach this in general either.
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate = SMALLDATETIME

It looks like the question is probing your understanding of date and datetime types, namely that a date with a time is after a date without a time (if there is even such a thing; most timeless dates are considered to be midnight on the relevant date, which is a time too.. in the same way that 1.0 is the same thing as 1, and 1.1 is after 1.0)
I'd use a range:
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate >= #StartDateInput AND ModifiedDate < DATEADD(DAY, 1, #EndDateInput)
Why?
This caters for datetimes that have a time component.
It doesn't modify the row data (always a bad idea, e.g. to cast a million datetimes to a date just to strip the time off, every time you query - precludes using an index on the column and is a massive waste of resources) just to perform the query.
It converts the apparent "end date is inclusive" implied by both #variables being the same, to a form that allows the exclusive behavior of < to work inclusively (adds a day and then gets rows less than the following day, thereby including 23:59:59.999999 ...)
The only thing I would say is that strictly, the spec only calls for one day's records, which means it's not mandatory to use the #EndDateInput at all. It seems logical to use it, but it could be argued that if the spec is that this query will only ever return one day, the #End variable could be discarded and a DATEADD performed on the #Start instead

It is saying "any time" meaning consider the time component. With T-SQL the only reliable way is to use >= and < range query (exclusive upper range):
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate >= #StartDateInput and
ModifiedDate < dateadd(d, 1, #EndDateInput);
PS: Initial declaration of #StartDateInput and #ENdDateInput is not robust and probably by chance pointing to Jan 1st, 2018. If it were '1/2/2018' then it would be ambiguous between Jan 2nd and Feb 1st. Better use ODBC canonical and\or ISO 8601 strings like '20180101'.

Related

TSQL: Tune Dynamic Query Search

In reading on tuning TSQL queries, I've seen advice on avoiding (or being careful) about functions in the WHERE clause. However, in some cases - like searches that require dynamic dates from today's date - I'm curious if a query can be tuned further? For instance, the query below this uses the DATEADD function for the current date, which allows the user at anytime to get the correct information for the past thirty days:
SELECT *
FROM Zoo..Transportation
WHERE ArrivalDate BETWEEN DATEADD(DD,-30,GETDATE()) AND GETDATE()
If I try to eliminate the function, DATEADD, I could declare a variable that will pull that time and then query the data with that set value stored in the variable, such as:
DECLARE #begin DATE
SET #begin = DATEADD(DD,-30,GETDATE())
SELECT *
FROM Zoo..Transportation
WHERE ArrivalDate BETWEEN #begin AND GETDATE()
However, the Execution Plan and Statistics show the exact same number of reads, scans and batch costs.
In these instances of dynamic data (for instance, using today's date as a starting point), how do we reduce or eliminate the use of functions in the WHERE clause?
Functions in the where clause mean doing silly things like:
WHERE DATEPART(WEEK, ArrivalDate) = 1
Or
WHERE CONVERT(CHAR(10), ArrivalDate, 101) = '01/01/2012'
E.g. functions against columns in the where clause, which in most case destroy sargability (in other words, render an index seek useless and force an index or table scan).
There is one exception that I know of:
WHERE CONVERT(DATE, ArrivalDate) = CONVERT(DATE, GETDATE())
But I would not rely on this for any other scenario.
IME, using functions within a WHERE clause is only an issue when it operates on data from your query - this means that the function (which itself may be complex SQL) runs for each value in your query - this will likely cause a table scan or similar as the optmiser doesn't know which Index to use (if any).
Your example above is using DATEADD with the current date - the value is probably calculated once (or if it is calculated for each row in your result set, it won't affect the query plan as it doesn't contain data from your query).

Proper way to index date & time columns

I have a table with the following structure:
CREATE TABLE MyTable (
ID int identity,
Whatever varchar(100),
MyTime time(2) NOT NULL,
MyDate date NOT NULL,
MyDateTime AS (DATEADD(DAY, DATEDIFF(DAY, '19000101', [MyDate]),
CAST([MyDate] AS DATETIME2(2))))
)
The computed column adds date and time into a single datetime2 field.
Most queries against the table have one or more of the following clauses:
... WHERE MyDate < #filter1 and MyDate > #filter2
... ORDER BY MyDate, MyTime
... ORDER BY MyDateTime
In a nutshell, date is usually used for filtering, and full datetime is used for sorting.
Now for questions:
What is the best way to set indices on those 3 date-time columns? 2 separate on date and time or maybe 1 on date and 1 on composite datetime, or something else? Quite a lot of inserts and updates occur on this table, and I'd like to avoid over-indexing.
As I wrote this question, I noticed the long and kind of ugly computed column definition. I picked it up from somewhere a while ago and forgot to investigate if there's a simpler way of doing it. Is there any easier way of combining a date and time2 into a datetime2? Simple addition does not work, and I'm not sure if I should avoid casting to varchar, combining and casting back.
Unfortunately, you didn't mention what version of SQL Server you're using ....
But if you're on SQL Server 2008 or newer, you should turn this around:
your table should have
MyDateTime DATETIME
and then define the "only date" column as
MyDate AS CAST(MyDateTime AS DATE) PERSISTED
Since you make it persisted, it's stored along side the table data (and now calculated every time you query it), and you can easily index it now.
Same applies to the MyTime column.
Having date and time in two separate columns may seem peculiar but if you have queries that use only the date (and/or especially only the time part), I think it's a valid decision. You can create an index on date only or on time or on (date, whatever), etc.
What I don't understand is why you also have the computed datetime column as well. There s no reason to store this value, too. It can easily be calculated when needed.
And if you need to order by datetime, you can use ORDER BY MyDate, MyTime. With an index on (MyDate, MyTime) this should be ok. Range datetime queries would also be using that index.
The answer isn't in your indexing, it's in your querying.
A single DateTime field should be used, or even SmallDateTime if that provides the range of dates and time resolution required by your application.
Index that column, then use queries like this:
SELECT * FROM MyTable WHERE
MyDate >= #startfilterdate
AND MyDate < DATEADD(d, 1, #endfilterdate);
By using < on the end filter, it only includes results from sometime before midnight of that date, which is the day after the user-selected "end date". This is simpler and more accurate than adding 23:59:59, especially since stored times can include microseconds between 23:59:59 and 00:00:00.
Using persisted columns and indexes on them is a waste of server resources.

SQL server 2005 - query aribitrary time interval in date range

I have a DateTime column. I want to extract all records, lets say, from 8:30 to 16:15 within a certain date range. My problem is that I need to compare hour and minute as a single time value. I can test the DATEPART for Greater or Less than some hours value, but if I then do that for minutes my query will fail if the later-in-the-day time has a smaller minutes value.
I have looked at INTERVAL, BETWEEN, DATEPART, DATEDIFF etc, but don't see quite how to to this without a "TimeOfDay" value that I can use across records of different dates.
I have tried subtracting the year, month and day parts of the date so that I can compare just the time of day, but when attmpting to subract, say, the year part of a date I get an overlfow error:
This part works:
select - cast( DATEPART(YEAR, CallTime) as integer) from history
This fails:
select DATEADD(YEAR, - cast( DATEPART(YEAR, CallTime) as integer), CallTime)
from history where calltime is not null
I have also tried casting the hours and minutes parts to chars, concatenating them and comparing to my target range, but this also fails.
I believe newer versions of SQL server may have a function to deal with this situation, but that's not available to me.
I hope and imagine there is a simple, obvious solution to this, but it's eluding me.
Try creating a "MinuteOfDay" function that calculates how many minutes have passed in the day based on a datetime.
CREATE FUNCTION dbo.[MinuteOfDay]
(
#dt datetime
)
RETURNS int
AS
BEGIN
RETURN (datepart(hh,#dt)*60) + datepart(mi,#dt)
END
then use the result of that function to filter.
select *
from MyTable t
where dbo.MinuteOfDay(t.SomeDateTimeColumn) between dbo.MinuteOfDay('1900-1-1 08:30:00') and dbo.MinuteOfDay('1900-1-1 16:15:00')
give this a shot:
DECLARE #StartDateTime datetime
,#EndDateTime datetime
--date range is ALL of January 1st up to & including 31st
SELECT #StartDateTime='2011/01/01'
,#EndDateTime='2011/01/31'
SELECT
*
FROM TableName t
WHERE
t.ColumnDate>=#StartDateTime AND t.ColumnDate<#EndDateTime+1 --date range
AND LEFT(RIGHT(CONVERT(char(19),t.ColumnDate(),120),8),5)>='08:30' --time range start
AND LEFT(RIGHT(CONVERT(char(19),t.ColumnDate(),120),8),5)<='16:15' --time range end
if you have an index on t.ColumnDate, this should be able to take advantage of it.
the "date range" part of the WHERE throws away rows that are not within the intended date range. The "time range start" part of the WHERE throws away rows that are to early in time and the "time range end" throws away rows that are to late.
DATETIME values can be cast as FLOAT. Actually, a DATETIME is stored as a FLOAT.
The whole part of the FLOAT is the days since '12/31/1899' (or something close). The fractional part is the number of hours divided by 24. So 0.5 = 12 Noon.
08:30 is 0.3541666667
16:15 is 0.6770833333
SELECT CAST(CAST('2011-03-25 08:30:00' AS DATETIME) AS FLOAT) = 40625.3541666667
SELECT CAST(CAST('2011-03-25 16:15:00' AS DATETIME) AS FLOAT) = 40625.6770833333
So you could write
SELECT * FROM users WHERE hire_date < 40625.3541666667
Using a DATETIME as FLOAT you can use whichever mathematical functions work best for your query.

SQL Server: calculate working time

How I can calculate the working time in SQL Server between two datetime variables, excluding the holidays?
Any ideas?
Holidays aren't universal - they depends very much on your location. Not even the fact which days of the week are "working" days is the same - it depends on your location.
Because of that, a general, universal answer will not be possible, and for that reason, there's also no system-provided function in T-SQL for doing this. How would SQL Server know what holidays you have in your corner of the world??.
You need to have a table of your holidays somewhere in your system and handle it yourself.
Some posts that might be of some help to you:
Calculate Number of Working Days in SQL Server: this just basically removes any Saturdays and Sundays - but doesn't include other holidays
How do I count the number of business days between two dates? : shows the same main approach, with the addition of a table that contains other holidays like Easter, 4th of July (US National Holiday) and so on
Like marc_s says, you currently need a custom solution. I really hope Microsoft adds some standard functionality: it's tough to get right, and holidays are pretty much standardized by location.
Here's an example:
declare #start_date datetime
declare #end_date datetime
set #start_date = '2010-12-20'
set #end_date = '2010-12-26'
-- A table with all non-working days. This just adds Christmass, but you
-- probably should add weekends as well.
declare #non_working_days table (dt datetime)
insert #non_working_days values ('2010-12-25'), ('2010-12-26')
-- Remove the time part
set #start_date = DATEADD(D, 0, DATEDIFF(D, 0, #start_date))
set #end_date = DATEADD(D, 0, DATEDIFF(D, 0, #end_date))
-- Find the number of non-working-days
declare #nwd_count int
select #nwd_count = count(*)
from #non_working_days
where dt >= #start_date and dt < #end_date
-- Print result
select datediff(DAY, #start_date, #end_date) - #nwd_count
This prints 5, because the 25th is not a working day.
Have a table which has a row for every date you're interested in, and, say, a "working hours" column, or just a "working day" indicator if you want to do it at day granularity. (I find this approach makes the final SQL simpler, plus enables all sorts of other useful queries, but then I'm into data warehousing, rather than operational databases, so you may find the "just list the holidays" approach better, depending...)
You will, of course, have to create that table yourself, working from some feed of holiday dates for the region you're interested in.
Typically you can project these forward at least a year, as most public holidays are agreed a long way in advance (though there are some that pop up at the "last minute" -- in the UK, for example, 29 April will be an extra public holiday in 2010, as there's a royal wedding taking place, and we got less than a year's notice of that.
Then you just
SELECT
SUM(working_hours)
FROM
all_dates
WHERE
the_date BETWEEN #start_date AND #end_date
If you want to do this internationally, it gets incredibly difficult to get your data; there's no sensible source that I know of for international holiday dates, and different regions in a "country" might have different dates -- e.g. you may know that someone's in the United Kingdom, but unless you know if they're in Scotland or not, you won't know if the first two days of the year are a public holiday, or just the first...

SQL Server BETWEEN not as efficient

I recall hearing or reading somewhere that
SELECT * from TABLE where date >= '2009-01-01' AND date <= '2009-12-31'
is more efficient than
SELECT * from TABLE where date BETWEEN '2009-01-01' AND '2009-12-31'
Where date column is a DATETIME type and has the same index. Is this correct?
Not, it's not correct.
Both syntaxes are absolutely same.
BETWEEN is just a syntax sugar, a shorthand for >= … AND <= …
Same is true for all major systems (Oracle, MySQL, PostgreSQL), not only for SQL Server.
However, if you want to check for the date to be in the current year, you should use this syntax:
date >= '2009-01-01' AND date < '2010-01-01'
Note that the last condition is strict.
This is semantically different from BETWEEN and it is a preferred way to query for the current year (rather than YEAR(date) = 2009 which is not sargable).
You cannot rewrite this condition as a BETWEEN since the last inequality is strict and BETWEEN condition includes the range boundaries.
You need the strict condition since DATETIME's, unlike integers, are not well-ordered, that is you cannot tell the "last possible datetime value in 2009" (which is not implementation-specific, of course).

Resources