TSQL: Tune Dynamic Query Search - sql-server

In reading on tuning TSQL queries, I've seen advice on avoiding (or being careful) about functions in the WHERE clause. However, in some cases - like searches that require dynamic dates from today's date - I'm curious if a query can be tuned further? For instance, the query below this uses the DATEADD function for the current date, which allows the user at anytime to get the correct information for the past thirty days:
SELECT *
FROM Zoo..Transportation
WHERE ArrivalDate BETWEEN DATEADD(DD,-30,GETDATE()) AND GETDATE()
If I try to eliminate the function, DATEADD, I could declare a variable that will pull that time and then query the data with that set value stored in the variable, such as:
DECLARE #begin DATE
SET #begin = DATEADD(DD,-30,GETDATE())
SELECT *
FROM Zoo..Transportation
WHERE ArrivalDate BETWEEN #begin AND GETDATE()
However, the Execution Plan and Statistics show the exact same number of reads, scans and batch costs.
In these instances of dynamic data (for instance, using today's date as a starting point), how do we reduce or eliminate the use of functions in the WHERE clause?

Functions in the where clause mean doing silly things like:
WHERE DATEPART(WEEK, ArrivalDate) = 1
Or
WHERE CONVERT(CHAR(10), ArrivalDate, 101) = '01/01/2012'
E.g. functions against columns in the where clause, which in most case destroy sargability (in other words, render an index seek useless and force an index or table scan).
There is one exception that I know of:
WHERE CONVERT(DATE, ArrivalDate) = CONVERT(DATE, GETDATE())
But I would not rely on this for any other scenario.

IME, using functions within a WHERE clause is only an issue when it operates on data from your query - this means that the function (which itself may be complex SQL) runs for each value in your query - this will likely cause a table scan or similar as the optmiser doesn't know which Index to use (if any).
Your example above is using DATEADD with the current date - the value is probably calculated once (or if it is calculated for each row in your result set, it won't affect the query plan as it doesn't contain data from your query).

Related

How to compare column values to declared variable

I was asked this interview question.
--Without modifying the following code:
DECLARE #StartDateInput SMALLDATETIME = '1/1/2018',
#EndDateInput SMALLDATETIME = '1/1/2018'
--Modify the following query so that it will return contacts modified at any time on January 1st, 2018
SELECT *
FROM dbo.Contacts
I tried the following query but this was not correct. I'm sure that I'm supposed to use the #EndDateInput variable as well but I wasn't sure how to use it. I don't think that this is the right way to approach this in general either.
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate = SMALLDATETIME
It looks like the question is probing your understanding of date and datetime types, namely that a date with a time is after a date without a time (if there is even such a thing; most timeless dates are considered to be midnight on the relevant date, which is a time too.. in the same way that 1.0 is the same thing as 1, and 1.1 is after 1.0)
I'd use a range:
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate >= #StartDateInput AND ModifiedDate < DATEADD(DAY, 1, #EndDateInput)
Why?
This caters for datetimes that have a time component.
It doesn't modify the row data (always a bad idea, e.g. to cast a million datetimes to a date just to strip the time off, every time you query - precludes using an index on the column and is a massive waste of resources) just to perform the query.
It converts the apparent "end date is inclusive" implied by both #variables being the same, to a form that allows the exclusive behavior of < to work inclusively (adds a day and then gets rows less than the following day, thereby including 23:59:59.999999 ...)
The only thing I would say is that strictly, the spec only calls for one day's records, which means it's not mandatory to use the #EndDateInput at all. It seems logical to use it, but it could be argued that if the spec is that this query will only ever return one day, the #End variable could be discarded and a DATEADD performed on the #Start instead
It is saying "any time" meaning consider the time component. With T-SQL the only reliable way is to use >= and < range query (exclusive upper range):
SELECT *
FROM dbo.Contacts
WHERE ModifiedDate >= #StartDateInput and
ModifiedDate < dateadd(d, 1, #EndDateInput);
PS: Initial declaration of #StartDateInput and #ENdDateInput is not robust and probably by chance pointing to Jan 1st, 2018. If it were '1/2/2018' then it would be ambiguous between Jan 2nd and Feb 1st. Better use ODBC canonical and\or ISO 8601 strings like '20180101'.

Is it possible to declare a variable that is visible across all database objects when used in a query, stored procedures, functions, etc?

I am using SQL Server 2008. I need to pull daily prorated amounts from various tables for custom periods.
For example I need to pull values from March 1, 2018 to April 30, 2019. Some years are leap years and, some are not. So, I would use
SELECT amount / CASE WHEN Year(start_date) % 4 = 0 THEN 366 ELSE 365 END..
Is it possible to declare a variable instead of that CASE statement and I could use it in every query, stored procedure, function, trigger...
SELECT amount / #some_var...
Thank you,
Gene
UPDATE:
I know that I can not use the formula above for a leap year calculation because the result of it is not always true. This is just an example, sorry, may be not a good one. I am more interested if it is possible to declare some sort of variable, like environment variable in windows, that I can use across queries.
Most data warehouses have a date table that has a row for each date with date attributes as columns like date,FiscalYear,CalendarYear,FQ,CQ,....
You would then use that in your query like the following.
cross apply (
select Count(*) NumOfDays from DimDate
where CalendarYear = YEAR(start_date) ) ca
then you can use ca.NumOfDays wherever you need it.

Using indexes when comparing datetimes

I have two tables, both of which containing millions of rows of data.
tbl_one:
purchasedtm DATETIME,
userid INT,
totalcost INT
tbl_two:
id BIGINT,
eventdtm DATETIME,
anothercol INT
The first table has a clustered index on the first two columns: CLUSTERED INDEX tbl_one_idx ON(purchasedtm, userid)
The second one has a primary key on its ID column, and also a non-clustered index on the eventdtm column.
I want to run a query which looks for rows in which purchasedtm and eventdtm are on the same day.
Originally, I wrote my query as:
WHERE CAST(tbl_one.purchasedtm AS DATE) = CAST(tbl_two.eventdtm AS DATE)
But this was not going to use either of the two indexes.
Later, I changed my query to this:
WHERE tbl_one.purchasedtm >= CAST(tbl_two.eventdtm AS DATE)
AND tbl_one.purchasedtm < DATEADD(DAY, 1, CAST(tbl_two.eventdtm AS DATE))
This way, because only one side of the comparison is wrapped in a function, the other side can still use its index. Correct?
I also have some additional questions:
I can write the query the other way around too, i.e. keeping tbl_two.eventdtm untouched and wrapping tbl_one.purchasedtm in CAST(). Would that make a difference in performance?
If the answer to the previous question is yes is it because eventdtm has its own dedicated index, while looking up purcahsedtm would only be a partial index match?
Are there other factors I can take into consideration for deciding which of the two choices is better? (For example, if there are millions of rows in tbl_one but billions of rows in tbl_two, would that impact which column I should CAST and which one I should not?)
In genera, if you compare two columns that are both indexed, would we gain any performance compared to a similar scenario in which only one of them is indexed?
And lastly, can I perform my original task without using CAST?
Note: I do not have the ability to create or modify indexes, add columns, etc.
Little. late after commenting but...
As discussed in the comments, code such as CAST(DateTimeColumn AS date) is actually SARGable. Rob Farley posted an article on some of the SARGable and non-SARGable functionality here, however, I'll cover a few things off anyway.
Firstly, applying a function to a column will normally make your query non-SARGable, and especially if it changes the order of the values or the order of them is meaningless. Take something like:
SELECT *
FROM TABLE
WHERE RIGHT(COLUMN,5) = 'value';
The order of the values in the column are utterly unhelpful here, as we're focusing on the right hand characters. Unfortunately, as Rob also discusses:
SELECT *
FROM TABLE
WHERE LEFT(COLUMN,5) = 'value';
This is also non-SARGable. However what about the following?
SELECT *
FROM TABLE
WHERE Column LIKE 'value%';
This is, as the logic isn't applied to the column and the order doesn't change. If the value wehre '%value%' then that too would be non-SARGable.
When applying logic that adds (or subtracts) what you want to find, you always want to apply that to the literal value (or function, like GETDATE()`). For example one of these expressions is SARGable the other is not:
Column + 1 = #Variable --non-SARGable
Column = #Variable - 1 --SARGable
The same applies to things like DATEADD
#DateVariable BETWEEN DateColumn AND DATEADD(DAY, 30,DateColumn) --non-SARGable
DateColumn BETWEEN DATEADD(DAY, -30, #DateVariable) AND #DateVariable --SARGable
Changing the datatype (other than to a date) rarely will keep a query SARGable. CONVERT(date,varchardate,112) will not be SARGable, even though the order of the column is unchanged. Converting an decimal to an int, however, had the same result as converting a datetime to a date, and kept SARGability:
CREATE TABLE testtab (n decimal(2,1) PRIMARY KEY CLUSTERED);
INSERT INTO testtab
VALUES(0.1),
(0.3),
(1.1),
(1.7),
(2.4);
GO
SELECT n
FROM testtab
WHERE CONVERT(int,n) = 2;
GO
DROP TABLE testtab;
Hopefully, that gives you enough to go on, but pelase do ask if you want me to add anything further.

Convert varchar to Date, not work with AND in WHERE Clause

I have a SQL Server table Companies which contains a column UserDefined4 of type nvarchar(100).
This column contains some text plus a date in the format DD.MM.YYYY
I want to select the records in the month of March and April 2014.
I am running this query,
SELECT
(Right(Companies.UserDefined4, 10))
FROM
Companies
WHERE
(Right(Companies.UserDefined4, 10) not like '%[^0-9.]%'
AND (Right(Companies.UserDefined4, 10) not like ''))
AND (CONVERT(Date,Right(Companies.UserDefined4, 10),104) >= '2014-03-01'
and
CONVERT(Date,Right(Companies.UserDefined4, 10),104) <= '2014-04-30')
This query throws an error
Error converting a string to a date and / or time.
I have checked one by one all the records and they contains date in proper format. The strange thing for me is that the same query runs if I put OR instead of AND in following part, in the same query:
(
CONVERT(Date,Right(Companies.UserDefined4, 10),104) >= '2014-03-01'
OR
CONVERT(Date,Right(Companies.UserDefined4, 10),104) <= '2014-04-30'
)
I know, its not a wise decision to save date as a NVarChar but I have to work with this data. I am not the one who designed this database.
I've had similar problems in the past and they were all due to some of the columns not ending with the expected characters (dates in your case) and the "guard clauses" (in your case the 1st two conditions in the WHERE) not stopping the query engine from applying the "range conditions" (in your case, the last two conditions in the WHERE).
Note that I don't know exactly why that happens (maybe query optimization, no "short-circuit" evaluation or the order in that the evaluation occurs isn't what we expect -- this is me speculating), but I've noticed that if you store an intermediate result of the query (with only the "guard clauses" applied) in a temporary structure (a table variable for example) and then apply the "range clauses" to that interim result, it'll work.
For example, this will work even if there are "bad" rows (rows that don't end in a date):
DECLARE #t TABLE (userdate CHAR(10))
INSERT #t
SELECT RIGHT(Companies.UserDefined4, 10)
FROM Companies
WHERE RIGHT(Companies.UserDefined4, 10) NOT LIKE '%[^0-9.]%'
AND RIGHT(Companies.UserDefined4, 10) <> ''
SELECT *
FROM #t
WHERE CONVERT(DATE, userdate, 104) >= '2014-03-01'
AND CONVERT(DATE, userdate, 104) <= '2014-04-30'
You can check a fiddle demonstrating the issue here.
Can you be sure that UserDefined4 really always contains a date in that format in the right-most 10 characters?
If so, you could create a computed column like this:
ALTER TABLE dbo.Companies
ADD DateFromUD4 AS CONVERT(DATE, RIGHT(UserDefined4, 10), 104) PERSISTED
and then the query becomes really simple:
SELECT
(list of columns)
FROM
dbo.Companies
WHERE
DateFromUD4 >= '20140301' AND
DateFromUD4 <= '20140430'
I like to use the ISO-8601 format (YYYYMMDD without any dashes) for specifying dates as string since this is guaranteed to work on any SQL Server regardless of the date and language settings.
Since the conversion goes to a DATE, you won't have to worry about time portions either - this is a date-only computed column. This works in SQL Server 2008 and newer.

Proper way to index date & time columns

I have a table with the following structure:
CREATE TABLE MyTable (
ID int identity,
Whatever varchar(100),
MyTime time(2) NOT NULL,
MyDate date NOT NULL,
MyDateTime AS (DATEADD(DAY, DATEDIFF(DAY, '19000101', [MyDate]),
CAST([MyDate] AS DATETIME2(2))))
)
The computed column adds date and time into a single datetime2 field.
Most queries against the table have one or more of the following clauses:
... WHERE MyDate < #filter1 and MyDate > #filter2
... ORDER BY MyDate, MyTime
... ORDER BY MyDateTime
In a nutshell, date is usually used for filtering, and full datetime is used for sorting.
Now for questions:
What is the best way to set indices on those 3 date-time columns? 2 separate on date and time or maybe 1 on date and 1 on composite datetime, or something else? Quite a lot of inserts and updates occur on this table, and I'd like to avoid over-indexing.
As I wrote this question, I noticed the long and kind of ugly computed column definition. I picked it up from somewhere a while ago and forgot to investigate if there's a simpler way of doing it. Is there any easier way of combining a date and time2 into a datetime2? Simple addition does not work, and I'm not sure if I should avoid casting to varchar, combining and casting back.
Unfortunately, you didn't mention what version of SQL Server you're using ....
But if you're on SQL Server 2008 or newer, you should turn this around:
your table should have
MyDateTime DATETIME
and then define the "only date" column as
MyDate AS CAST(MyDateTime AS DATE) PERSISTED
Since you make it persisted, it's stored along side the table data (and now calculated every time you query it), and you can easily index it now.
Same applies to the MyTime column.
Having date and time in two separate columns may seem peculiar but if you have queries that use only the date (and/or especially only the time part), I think it's a valid decision. You can create an index on date only or on time or on (date, whatever), etc.
What I don't understand is why you also have the computed datetime column as well. There s no reason to store this value, too. It can easily be calculated when needed.
And if you need to order by datetime, you can use ORDER BY MyDate, MyTime. With an index on (MyDate, MyTime) this should be ok. Range datetime queries would also be using that index.
The answer isn't in your indexing, it's in your querying.
A single DateTime field should be used, or even SmallDateTime if that provides the range of dates and time resolution required by your application.
Index that column, then use queries like this:
SELECT * FROM MyTable WHERE
MyDate >= #startfilterdate
AND MyDate < DATEADD(d, 1, #endfilterdate);
By using < on the end filter, it only includes results from sometime before midnight of that date, which is the day after the user-selected "end date". This is simpler and more accurate than adding 23:59:59, especially since stored times can include microseconds between 23:59:59 and 00:00:00.
Using persisted columns and indexes on them is a waste of server resources.

Resources