SQL Server BETWEEN not as efficient - sql-server

I recall hearing or reading somewhere that
SELECT * from TABLE where date >= '2009-01-01' AND date <= '2009-12-31'
is more efficient than
SELECT * from TABLE where date BETWEEN '2009-01-01' AND '2009-12-31'
Where date column is a DATETIME type and has the same index. Is this correct?

Not, it's not correct.
Both syntaxes are absolutely same.
BETWEEN is just a syntax sugar, a shorthand for >= … AND <= …
Same is true for all major systems (Oracle, MySQL, PostgreSQL), not only for SQL Server.
However, if you want to check for the date to be in the current year, you should use this syntax:
date >= '2009-01-01' AND date < '2010-01-01'
Note that the last condition is strict.
This is semantically different from BETWEEN and it is a preferred way to query for the current year (rather than YEAR(date) = 2009 which is not sargable).
You cannot rewrite this condition as a BETWEEN since the last inequality is strict and BETWEEN condition includes the range boundaries.
You need the strict condition since DATETIME's, unlike integers, are not well-ordered, that is you cannot tell the "last possible datetime value in 2009" (which is not implementation-specific, of course).

Related

How to compare column values to declared variable

I was asked this interview question.
--Without modifying the following code:
DECLARE #StartDateInput SMALLDATETIME = '1/1/2018',
#EndDateInput SMALLDATETIME = '1/1/2018'
--Modify the following query so that it will return contacts modified at any time on January 1st, 2018
SELECT *
FROM dbo.Contacts
I tried the following query but this was not correct. I'm sure that I'm supposed to use the #EndDateInput variable as well but I wasn't sure how to use it. I don't think that this is the right way to approach this in general either.
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate = SMALLDATETIME
It looks like the question is probing your understanding of date and datetime types, namely that a date with a time is after a date without a time (if there is even such a thing; most timeless dates are considered to be midnight on the relevant date, which is a time too.. in the same way that 1.0 is the same thing as 1, and 1.1 is after 1.0)
I'd use a range:
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate >= #StartDateInput AND ModifiedDate < DATEADD(DAY, 1, #EndDateInput)
Why?
This caters for datetimes that have a time component.
It doesn't modify the row data (always a bad idea, e.g. to cast a million datetimes to a date just to strip the time off, every time you query - precludes using an index on the column and is a massive waste of resources) just to perform the query.
It converts the apparent "end date is inclusive" implied by both #variables being the same, to a form that allows the exclusive behavior of < to work inclusively (adds a day and then gets rows less than the following day, thereby including 23:59:59.999999 ...)
The only thing I would say is that strictly, the spec only calls for one day's records, which means it's not mandatory to use the #EndDateInput at all. It seems logical to use it, but it could be argued that if the spec is that this query will only ever return one day, the #End variable could be discarded and a DATEADD performed on the #Start instead
It is saying "any time" meaning consider the time component. With T-SQL the only reliable way is to use >= and < range query (exclusive upper range):
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate >= #StartDateInput and
ModifiedDate < dateadd(d, 1, #EndDateInput);
PS: Initial declaration of #StartDateInput and #ENdDateInput is not robust and probably by chance pointing to Jan 1st, 2018. If it were '1/2/2018' then it would be ambiguous between Jan 2nd and Feb 1st. Better use ODBC canonical and\or ISO 8601 strings like '20180101'.

How does one find rows created today if the CreatedDate column is of type datetimeoffset?

I have a column named CreatedDate of type DateTimeOffset, and I need to query to see rows created today. If this column was of type DateTime I would do this:
SELECT *
FROM MyTable
WHERE CreatedDate >= GETDATE()
How does one accomplish this with a DateTimeOffset column, however?
Environment: SQL Server 2014
Take a look at the TODATETIMEOFFSET function that is built into SQL Server.
Here is an example of how it is used (-5 is my timezone offset...your usage may vary)...again, this also considers you are only worried about >= current time as your original question suggested. You would need to adjust usage of GETDATE() if you care about the entire day (see comment on original question).
select * from TestingDates d where d.CreatedDate >= TODATETIMEOFFSET(GETDATE(), '-05:00')

Convert varchar to Date, not work with AND in WHERE Clause

I have a SQL Server table Companies which contains a column UserDefined4 of type nvarchar(100).
This column contains some text plus a date in the format DD.MM.YYYY
I want to select the records in the month of March and April 2014.
I am running this query,
SELECT
(Right(Companies.UserDefined4, 10))
FROM
Companies
WHERE
(Right(Companies.UserDefined4, 10) not like '%[^0-9.]%'
AND (Right(Companies.UserDefined4, 10) not like ''))
AND (CONVERT(Date,Right(Companies.UserDefined4, 10),104) >= '2014-03-01'
and
CONVERT(Date,Right(Companies.UserDefined4, 10),104) <= '2014-04-30')
This query throws an error
Error converting a string to a date and / or time.
I have checked one by one all the records and they contains date in proper format. The strange thing for me is that the same query runs if I put OR instead of AND in following part, in the same query:
(
CONVERT(Date,Right(Companies.UserDefined4, 10),104) >= '2014-03-01'
OR
CONVERT(Date,Right(Companies.UserDefined4, 10),104) <= '2014-04-30'
)
I know, its not a wise decision to save date as a NVarChar but I have to work with this data. I am not the one who designed this database.
I've had similar problems in the past and they were all due to some of the columns not ending with the expected characters (dates in your case) and the "guard clauses" (in your case the 1st two conditions in the WHERE) not stopping the query engine from applying the "range conditions" (in your case, the last two conditions in the WHERE).
Note that I don't know exactly why that happens (maybe query optimization, no "short-circuit" evaluation or the order in that the evaluation occurs isn't what we expect -- this is me speculating), but I've noticed that if you store an intermediate result of the query (with only the "guard clauses" applied) in a temporary structure (a table variable for example) and then apply the "range clauses" to that interim result, it'll work.
For example, this will work even if there are "bad" rows (rows that don't end in a date):
DECLARE #t TABLE (userdate CHAR(10))
INSERT #t
SELECT RIGHT(Companies.UserDefined4, 10)
FROM Companies
WHERE RIGHT(Companies.UserDefined4, 10) NOT LIKE '%[^0-9.]%'
AND RIGHT(Companies.UserDefined4, 10) <> ''
SELECT *
FROM #t
WHERE CONVERT(DATE, userdate, 104) >= '2014-03-01'
AND CONVERT(DATE, userdate, 104) <= '2014-04-30'
You can check a fiddle demonstrating the issue here.
Can you be sure that UserDefined4 really always contains a date in that format in the right-most 10 characters?
If so, you could create a computed column like this:
ALTER TABLE dbo.Companies
ADD DateFromUD4 AS CONVERT(DATE, RIGHT(UserDefined4, 10), 104) PERSISTED
and then the query becomes really simple:
SELECT
(list of columns)
FROM
dbo.Companies
WHERE
DateFromUD4 >= '20140301' AND
DateFromUD4 <= '20140430'
I like to use the ISO-8601 format (YYYYMMDD without any dashes) for specifying dates as string since this is guaranteed to work on any SQL Server regardless of the date and language settings.
Since the conversion goes to a DATE, you won't have to worry about time portions either - this is a date-only computed column. This works in SQL Server 2008 and newer.

TSQL: Tune Dynamic Query Search

In reading on tuning TSQL queries, I've seen advice on avoiding (or being careful) about functions in the WHERE clause. However, in some cases - like searches that require dynamic dates from today's date - I'm curious if a query can be tuned further? For instance, the query below this uses the DATEADD function for the current date, which allows the user at anytime to get the correct information for the past thirty days:
SELECT *
FROM Zoo..Transportation
WHERE ArrivalDate BETWEEN DATEADD(DD,-30,GETDATE()) AND GETDATE()
If I try to eliminate the function, DATEADD, I could declare a variable that will pull that time and then query the data with that set value stored in the variable, such as:
DECLARE #begin DATE
SET #begin = DATEADD(DD,-30,GETDATE())
SELECT *
FROM Zoo..Transportation
WHERE ArrivalDate BETWEEN #begin AND GETDATE()
However, the Execution Plan and Statistics show the exact same number of reads, scans and batch costs.
In these instances of dynamic data (for instance, using today's date as a starting point), how do we reduce or eliminate the use of functions in the WHERE clause?
Functions in the where clause mean doing silly things like:
WHERE DATEPART(WEEK, ArrivalDate) = 1
Or
WHERE CONVERT(CHAR(10), ArrivalDate, 101) = '01/01/2012'
E.g. functions against columns in the where clause, which in most case destroy sargability (in other words, render an index seek useless and force an index or table scan).
There is one exception that I know of:
WHERE CONVERT(DATE, ArrivalDate) = CONVERT(DATE, GETDATE())
But I would not rely on this for any other scenario.
IME, using functions within a WHERE clause is only an issue when it operates on data from your query - this means that the function (which itself may be complex SQL) runs for each value in your query - this will likely cause a table scan or similar as the optmiser doesn't know which Index to use (if any).
Your example above is using DATEADD with the current date - the value is probably calculated once (or if it is calculated for each row in your result set, it won't affect the query plan as it doesn't contain data from your query).

Proper way to index date & time columns

I have a table with the following structure:
CREATE TABLE MyTable (
ID int identity,
Whatever varchar(100),
MyTime time(2) NOT NULL,
MyDate date NOT NULL,
MyDateTime AS (DATEADD(DAY, DATEDIFF(DAY, '19000101', [MyDate]),
CAST([MyDate] AS DATETIME2(2))))
)
The computed column adds date and time into a single datetime2 field.
Most queries against the table have one or more of the following clauses:
... WHERE MyDate < #filter1 and MyDate > #filter2
... ORDER BY MyDate, MyTime
... ORDER BY MyDateTime
In a nutshell, date is usually used for filtering, and full datetime is used for sorting.
Now for questions:
What is the best way to set indices on those 3 date-time columns? 2 separate on date and time or maybe 1 on date and 1 on composite datetime, or something else? Quite a lot of inserts and updates occur on this table, and I'd like to avoid over-indexing.
As I wrote this question, I noticed the long and kind of ugly computed column definition. I picked it up from somewhere a while ago and forgot to investigate if there's a simpler way of doing it. Is there any easier way of combining a date and time2 into a datetime2? Simple addition does not work, and I'm not sure if I should avoid casting to varchar, combining and casting back.
Unfortunately, you didn't mention what version of SQL Server you're using ....
But if you're on SQL Server 2008 or newer, you should turn this around:
your table should have
MyDateTime DATETIME
and then define the "only date" column as
MyDate AS CAST(MyDateTime AS DATE) PERSISTED
Since you make it persisted, it's stored along side the table data (and now calculated every time you query it), and you can easily index it now.
Same applies to the MyTime column.
Having date and time in two separate columns may seem peculiar but if you have queries that use only the date (and/or especially only the time part), I think it's a valid decision. You can create an index on date only or on time or on (date, whatever), etc.
What I don't understand is why you also have the computed datetime column as well. There s no reason to store this value, too. It can easily be calculated when needed.
And if you need to order by datetime, you can use ORDER BY MyDate, MyTime. With an index on (MyDate, MyTime) this should be ok. Range datetime queries would also be using that index.
The answer isn't in your indexing, it's in your querying.
A single DateTime field should be used, or even SmallDateTime if that provides the range of dates and time resolution required by your application.
Index that column, then use queries like this:
SELECT * FROM MyTable WHERE
MyDate >= #startfilterdate
AND MyDate < DATEADD(d, 1, #endfilterdate);
By using < on the end filter, it only includes results from sometime before midnight of that date, which is the day after the user-selected "end date". This is simpler and more accurate than adding 23:59:59, especially since stored times can include microseconds between 23:59:59 and 00:00:00.
Using persisted columns and indexes on them is a waste of server resources.

Resources