Proper way to index date & time columns - sql-server

I have a table with the following structure:
CREATE TABLE MyTable (
ID int identity,
Whatever varchar(100),
MyTime time(2) NOT NULL,
MyDate date NOT NULL,
MyDateTime AS (DATEADD(DAY, DATEDIFF(DAY, '19000101', [MyDate]),
CAST([MyDate] AS DATETIME2(2))))
)
The computed column adds date and time into a single datetime2 field.
Most queries against the table have one or more of the following clauses:
... WHERE MyDate < #filter1 and MyDate > #filter2
... ORDER BY MyDate, MyTime
... ORDER BY MyDateTime
In a nutshell, date is usually used for filtering, and full datetime is used for sorting.
Now for questions:
What is the best way to set indices on those 3 date-time columns? 2 separate on date and time or maybe 1 on date and 1 on composite datetime, or something else? Quite a lot of inserts and updates occur on this table, and I'd like to avoid over-indexing.
As I wrote this question, I noticed the long and kind of ugly computed column definition. I picked it up from somewhere a while ago and forgot to investigate if there's a simpler way of doing it. Is there any easier way of combining a date and time2 into a datetime2? Simple addition does not work, and I'm not sure if I should avoid casting to varchar, combining and casting back.

Unfortunately, you didn't mention what version of SQL Server you're using ....
But if you're on SQL Server 2008 or newer, you should turn this around:
your table should have
MyDateTime DATETIME
and then define the "only date" column as
MyDate AS CAST(MyDateTime AS DATE) PERSISTED
Since you make it persisted, it's stored along side the table data (and now calculated every time you query it), and you can easily index it now.
Same applies to the MyTime column.

Having date and time in two separate columns may seem peculiar but if you have queries that use only the date (and/or especially only the time part), I think it's a valid decision. You can create an index on date only or on time or on (date, whatever), etc.
What I don't understand is why you also have the computed datetime column as well. There s no reason to store this value, too. It can easily be calculated when needed.
And if you need to order by datetime, you can use ORDER BY MyDate, MyTime. With an index on (MyDate, MyTime) this should be ok. Range datetime queries would also be using that index.

The answer isn't in your indexing, it's in your querying.
A single DateTime field should be used, or even SmallDateTime if that provides the range of dates and time resolution required by your application.
Index that column, then use queries like this:
SELECT * FROM MyTable WHERE
MyDate >= #startfilterdate
AND MyDate < DATEADD(d, 1, #endfilterdate);
By using < on the end filter, it only includes results from sometime before midnight of that date, which is the day after the user-selected "end date". This is simpler and more accurate than adding 23:59:59, especially since stored times can include microseconds between 23:59:59 and 00:00:00.
Using persisted columns and indexes on them is a waste of server resources.

Related

How to compare column values to declared variable

I was asked this interview question.
--Without modifying the following code:
DECLARE #StartDateInput SMALLDATETIME = '1/1/2018',
#EndDateInput SMALLDATETIME = '1/1/2018'
--Modify the following query so that it will return contacts modified at any time on January 1st, 2018
SELECT *
FROM dbo.Contacts
I tried the following query but this was not correct. I'm sure that I'm supposed to use the #EndDateInput variable as well but I wasn't sure how to use it. I don't think that this is the right way to approach this in general either.
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate = SMALLDATETIME
It looks like the question is probing your understanding of date and datetime types, namely that a date with a time is after a date without a time (if there is even such a thing; most timeless dates are considered to be midnight on the relevant date, which is a time too.. in the same way that 1.0 is the same thing as 1, and 1.1 is after 1.0)
I'd use a range:
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate >= #StartDateInput AND ModifiedDate < DATEADD(DAY, 1, #EndDateInput)
Why?
This caters for datetimes that have a time component.
It doesn't modify the row data (always a bad idea, e.g. to cast a million datetimes to a date just to strip the time off, every time you query - precludes using an index on the column and is a massive waste of resources) just to perform the query.
It converts the apparent "end date is inclusive" implied by both #variables being the same, to a form that allows the exclusive behavior of < to work inclusively (adds a day and then gets rows less than the following day, thereby including 23:59:59.999999 ...)
The only thing I would say is that strictly, the spec only calls for one day's records, which means it's not mandatory to use the #EndDateInput at all. It seems logical to use it, but it could be argued that if the spec is that this query will only ever return one day, the #End variable could be discarded and a DATEADD performed on the #Start instead
It is saying "any time" meaning consider the time component. With T-SQL the only reliable way is to use >= and < range query (exclusive upper range):
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate >= #StartDateInput and
ModifiedDate < dateadd(d, 1, #EndDateInput);
PS: Initial declaration of #StartDateInput and #ENdDateInput is not robust and probably by chance pointing to Jan 1st, 2018. If it were '1/2/2018' then it would be ambiguous between Jan 2nd and Feb 1st. Better use ODBC canonical and\or ISO 8601 strings like '20180101'.

Using indexes when comparing datetimes

I have two tables, both of which containing millions of rows of data.
tbl_one:
purchasedtm DATETIME,
userid INT,
totalcost INT
tbl_two:
id BIGINT,
eventdtm DATETIME,
anothercol INT
The first table has a clustered index on the first two columns: CLUSTERED INDEX tbl_one_idx ON(purchasedtm, userid)
The second one has a primary key on its ID column, and also a non-clustered index on the eventdtm column.
I want to run a query which looks for rows in which purchasedtm and eventdtm are on the same day.
Originally, I wrote my query as:
WHERE CAST(tbl_one.purchasedtm AS DATE) = CAST(tbl_two.eventdtm AS DATE)
But this was not going to use either of the two indexes.
Later, I changed my query to this:
WHERE tbl_one.purchasedtm >= CAST(tbl_two.eventdtm AS DATE)
AND tbl_one.purchasedtm < DATEADD(DAY, 1, CAST(tbl_two.eventdtm AS DATE))
This way, because only one side of the comparison is wrapped in a function, the other side can still use its index. Correct?
I also have some additional questions:
I can write the query the other way around too, i.e. keeping tbl_two.eventdtm untouched and wrapping tbl_one.purchasedtm in CAST(). Would that make a difference in performance?
If the answer to the previous question is yes is it because eventdtm has its own dedicated index, while looking up purcahsedtm would only be a partial index match?
Are there other factors I can take into consideration for deciding which of the two choices is better? (For example, if there are millions of rows in tbl_one but billions of rows in tbl_two, would that impact which column I should CAST and which one I should not?)
In genera, if you compare two columns that are both indexed, would we gain any performance compared to a similar scenario in which only one of them is indexed?
And lastly, can I perform my original task without using CAST?
Note: I do not have the ability to create or modify indexes, add columns, etc.
Little. late after commenting but...
As discussed in the comments, code such as CAST(DateTimeColumn AS date) is actually SARGable. Rob Farley posted an article on some of the SARGable and non-SARGable functionality here, however, I'll cover a few things off anyway.
Firstly, applying a function to a column will normally make your query non-SARGable, and especially if it changes the order of the values or the order of them is meaningless. Take something like:
SELECT *
FROM TABLE
WHERE RIGHT(COLUMN,5) = 'value';
The order of the values in the column are utterly unhelpful here, as we're focusing on the right hand characters. Unfortunately, as Rob also discusses:
SELECT *
FROM TABLE
WHERE LEFT(COLUMN,5) = 'value';
This is also non-SARGable. However what about the following?
SELECT *
FROM TABLE
WHERE Column LIKE 'value%';
This is, as the logic isn't applied to the column and the order doesn't change. If the value wehre '%value%' then that too would be non-SARGable.
When applying logic that adds (or subtracts) what you want to find, you always want to apply that to the literal value (or function, like GETDATE()`). For example one of these expressions is SARGable the other is not:
Column + 1 = #Variable --non-SARGable
Column = #Variable - 1 --SARGable
The same applies to things like DATEADD
#DateVariable BETWEEN DateColumn AND DATEADD(DAY, 30,DateColumn) --non-SARGable
DateColumn BETWEEN DATEADD(DAY, -30, #DateVariable) AND #DateVariable --SARGable
Changing the datatype (other than to a date) rarely will keep a query SARGable. CONVERT(date,varchardate,112) will not be SARGable, even though the order of the column is unchanged. Converting an decimal to an int, however, had the same result as converting a datetime to a date, and kept SARGability:
CREATE TABLE testtab (n decimal(2,1) PRIMARY KEY CLUSTERED);
INSERT INTO testtab
VALUES(0.1),
(0.3),
(1.1),
(1.7),
(2.4);
GO
SELECT n
FROM testtab
WHERE CONVERT(int,n) = 2;
GO
DROP TABLE testtab;
Hopefully, that gives you enough to go on, but pelase do ask if you want me to add anything further.

Date range based on Column Date

I am using the latest SQL Server. I have a table with a CreatedDate column. I need to write a Query that uses dates that are plus or minus 7 from the Date in CreatedDate. I have no clue how to go about this. My thought was this:
DECLARE #Date datetime
DECLARE #SevenBefore datetime
DECLARE #SevenAfter datetime
SET #Date = CreatedDate
SET #SevenBefore = DATEADD(day,-7,#Date)
SET #SevenAfter = DATEADD(day,7,#Date)
SELECT *
FROM <table>
WHERE <table> BETWEEN #SevenBefore AND #SevenAfter
The issue with this is that I cannot use "CreatedDate" as a SET #DATE because SQL gives an error "Invalid column name 'CreatedDate'"
Any help would be appreciated. I cannot list a date because every date in that column could be different.
Thanks
In this case, you need to stop thinking as a programmer would, and start thinking as a Database programmer would.
Lets work only with this central part of your query:
SELECT *
FROM <table>
WHERE <table> BETWEEN #SevenBefore AND #SevenAfter
Now, you say that the CreatedDate is a column in a table. For this example, I will assume that the CreatedDate is in a table other than the one in your example above. For this purpose, I will give two fake names to the tables. The table with the CreatedDate, I will call tblCreated, and the one from the query above I will call tblData.
Looking above, it's pretty obvious that you can't compare an entire table row to a date. There must be a field in that table that contains a date/time value. I will call this column TargetDate.
Given these assumptions, your query would look something like:
SELECT *
FROM tblCreated tc
INNER JOIN tblData td
ON td.TargetDate BETWEEN DATEADD(day, -7, tc.CreatedDate) and DATEADD(day, 7, tc.CreatedDate)
Looking at this, it is clear that you still need some other associations between the tables. Do you only want all data rows per customer based on the Created date, or perhaps only want Creations where some work was done on them as shown in the Data records, or ??. Without a fuller specification, we can't help with that, though.

T-SQL Select where Subselect or Default

I have a SELECT that retrieves ROWS comparing a DATETIME field to the highest available value of another TABLE.
The Two Tables have the following structure
DeletedRecords
- Id (Guid)
- RecordId (Guid)
- TableName (varchar)
- DeletionDate (datetime)
And Another table which keep track of synchronizations using the following structure
SynchronizationLog
- Id (Guid)
- SynchronizationDate (datetime)
In order to get all the RECORDS that have been deleted since the last synchronization, I run the following SELECT:
SELECT
[Id],[RecordId],[TableName],[DeletionDate]
FROM
[DeletedRecords]
WHERE
[TableName] = '[dbo].[Person]'
AND [DeletionDate] >
(SELECT TOP 1 [SynchronizationDate]
FROM [dbo].[SynchronizationLog]
ORDER BY [SynchronizationDate] DESC)
The problem occurs if I do not have synchronizations available yet, the T-SQL SELECT does not return any row while it should returns all the rows cause there are no synchronization records available.
Is there a T-SQL function like COALESCE that I can use with DateTime?
Your subquery should look like something like this:
SELECT COALESCE(MAX([SynchronizationDate]), '0001-01-01')
FROM [dbo].[SynchronizationLog]
It says: Get the last date, but if there is no record (or all values are NULL), then use the '0001-01-01' date as start date.
NOTE '0001-01-01' is for DATETIME2, if you are using the old DATETIME data type, it should be '1753-01-01'.
Also please note (from https://msdn.microsoft.com/en-us/library/ms187819(v=sql.100).aspx)
Use the time, date, datetime2 and datetimeoffset data types for new work. These types align with the SQL Standard. They are more portable. time, datetime2 and datetimeoffset provide more seconds precision. datetimeoffset provides time zone support for globally deployed applications.
EDIT
An alternative solution is to use NOT EXISTS (you have to test it if its performance is better or not):
SELECT
[Id],[RecordId],[TableName],[DeletionDate]
FROM
[DeletedRecords] DR
WHERE
[TableName] = '[dbo].[Person]'
AND NOT EXISTS (
SELECT 1
FROM [dbo].[SynchronizationLog] SL
WHERE DR.[DeletionDate] <= SL.[SynchronizationDate]
)

SQL Server 2008: Get date part from datetime2

I've been using this very handy method of extracting the date portion from a datetime value:
select dateadd(day, datediff(day, 0, #inDate), 0)
This basically zeroes out the time portion of the date. It's fast, and more importantly it's deterministic - you can use this expression in a persisted computed column, indexed view, etc.
However I'm struggling to come up with something that works with the datetime2 type. The problem being that SQL Server does not permission conversions from integers to the datetime2 type.
Is there an equivalent, deterministic method of stripping the time portion from a datetime2?
Can't you just do a
ALTER TABLE dbo.YourTable
ADD DateOnly AS CAST(YourDate AS DATE) PERSISTED
The CAST(... AS DATE) turns your datetime2 column into a "date-only", and it seems to be deterministic enough to be used in a persisted, computed column definition ...

Resources