Which Transact-SQL query is most efficient? - sql-server

I plan to pass exam "Querying Microsoft SQL Server 2012"
I have one question that I have problem to understand.
Question is:
Which Transact-SQL query should you use?
Your database contains a table named Purchases. Thetable includes a
DATETIME column named PurchaseTime that stores the date and time each
purchase is made. There is a non-clustered index on the PurchaseTime
column. The business team wants a report that displays the total
number of purchases madeon the current day. You need to write a query
that will return the correct results in the most efficient manner.
Which Transact-SQL query should you use?
Possible answers are:
A.
SELECT COUNT(*)
FROM Purchases
WHERE PurchaseTime = CONVERT(DATE, GETDATE())
B.
SELECT COUNT(*)
FROM Purchases
WHERE PurchaseTime = GETDATE()
C.
SELECT COUNT(*)
FROM Purchases
WHERE CONVERT(VARCHAR, PurchaseTime, 112) = CONVERT(VARCHAR, GETDATE(), 112)
D.
SELECT COUNT(*)
FROM Purchases
WHERE PurchaseTime >= CONVERT(DATE, GETDATE())
AND PurchaseTime < DATEADD(DAY, 1, CONVERT(DATE, GETDATE()))
This is source: Which Transact-SQL query should you use?
According to them the correct answer is 'D'.
But I do not see why is this more efficient than 'A' ?
In 'D' we call two functions (CONVERT and DATEADD).
Thanks for help.

D will be most efficient as you are not converting the datetime column to any other data type, which means SQL Server can use any indexes defined on the PurchaseTime column.
It is also known as Sargable expression.
C will ignore any indexes defined on the PurchaseTime column and will result in a Clustered scan if there is one or a table scan if it is a heap (a table without a clustered index).
And queries A and B will simply not return the correct results as they will ignore any records older than when this query is executed.

Related

Easier way to count users in T-SQL

I'm using this query in SQL Server 2016 to determine how many users have logged into my system.
The users.lastaccess column contains a unix timestamp, so I use DATEDIFF() to convert it to a yyyy-mm-dd hh:mm:ss date.
SELECT
COUNT(*) AS user_logins
FROM
(SELECT
ROW_NUMBER() OVER(ORDER BY lastaccess DESC) AS Row
FROM
users
WHERE
lastaccess > DATEDIFF(s, '1970-01-01 02:00:00', (SELECT Convert(DateTime, DATEDIFF(DAY, 0, GETDATE()))))
)
The result is a simple number, e.g. 75, representing the number of users who have been authenticated on the system.
The following code returns the count of users. It uses cast to drop the time-of-day from the value returned by GetDate and uses ISO 8601 for the base date/time of the unix system.
select Count(*) as User_Logins
from Users
where LastAccess > DateDiff( s, '1970-01-01T02:00:00', Cast( GetDate() as Date ) );
Why do you need a correlated subquery and a ROW_NUMBER() windowing function at all? And what is that oddball date-based WHERE clause? What are you really checking for - the fact that last_access is not null/empty??
Just use:
SELECT
COUNT(*) AS user_logins
FROM
dbo.users
WHERE
-- your WHERE condition isn't very clear - please add code as needed
-- but *DO NOT* convert dates to string to compare! Compare proper dates!
lastaccess IS NOT NULL
Also: if you have a non-nullable, narrow, fixed-width column in your dbo.Users table, you should have a nonclustered index on this (e.g. on lastaccess - is that column nullable?) - that could speed things up quite a bit

SQL Active Users by day

I've got a table with CustomerID, StartDate and EndDate.
I'm trying to create a table with the following columns: Date, ActiveUsers.
The Date needs to be all dates between 01/01/2016 and today. ActiveUsers is a count of CustomerID where the Date falls between the StartDate and EndDate.
I hope all that makes sense.
I found code that gives me a list of dates but I have no idea how I can join my customers table to this result.
DECLARE #StartDateTime DATE
DECLARE #EndDateTime DATE
SET #StartDateTime = '2016-01-01'
SET #EndDateTime = GETDATE();
WITH DateRange(DateData) AS
(
SELECT #StartDateTime as Date
UNION ALL
SELECT DATEADD(d,1,DateData)
FROM DateRange
WHERE DateData <= #EndDateTime
)
SELECT dr.DateData
FROM DateRange dr
OPTION (MAXRECURSION 0)
GO
This is a simple left join, group by and count:
SELECT DateData, COUNT(CustomerID) as ActiveUsers
FROM DateRange AS D
LEFT JOIN Customers AS C
ON D.DateData >= C.StartDate
AND D.DateData <= C.EndDate
GROUP BY DateData
However, here's a free tip: Using a recursive cte for things like that is fine when the range is small, but if you find yourself having to use OPTION (MAXRECURSION 0) it means you are in danger of a performance hit because of the recursive cte and should replace it with a tally table based solution.
If you don't know what a tally table is, read Jeff Moden's The "Numbers" or "Tally" Table: What it is and how it replaces a loop.
If you don't already have a tally table, read What is the best way to create and populate a numbers table?
Having said that, date related queries often benefit from having a pre-populated calendar table - such a table can save you from calculating weekends, national holidays etc', at a storage price that's practically negligible in modern servers.
Read Aaron Bertrand's Creating a date dimension or calendar table in SQL Server for a step-by-step explanation on how to create one for yourself.

SQL Optimization where clause

Instead of using function in the where clause can we do something different.
DateAdd taking time poor performance i guess..
How to optimize this sql
SELECT cust_id, order_date, price
FROM customers
WHERE DATEADD(DD,50,order_date)>=GETDATE()
Don't run your function on order_date, run the inverse on getdate() instead
select cust_id, order_date, price
from customers
where order_date>=dateadd(Day,-50,getdate())
Function calls on order_date are going to cause an index scan, if you instead run your function on the filter criteria getdate() you can preserve an index seek on this column. (If it has an index).
SARGable functions in SQL Server - Rob Farley

How can I optimize a SQL query that performs a count nested inside a group-by clause?

I have a charting application that dynamically generates SQL Server queries to compute values for each series on a given chart. This generally works quite well, but I have run into a particular situation in which the generated query is very slow. The query looks like this:
SELECT
[dateExpr] AS domainValue,
(SELECT COUNT(*) FROM table1 WHERE [dateExpr]=[dateExpr(maintable)] AND column2='A') AS series1
FROM table1 maintable
GROUP BY [dateExpr]
ORDER BY domainValue
I have abbreviated [dateExpr] because it's a combination of CAST and DATEPART functions that convert a datetime field to a string in the form of 'yyyy-MM-dd' so that I can easily group by all values in a calendar day. The query above returns both those yyyy-MM-dd values as labels for the x-axis of the chart and the values from the data series "series1" to display on the chart. The data series is supposed to count the number of records that fall into that calendar day that also contain a certain value in [column2]. The "[dateExpr]=[dateExpr(maintable)]" expression looks like this:
CAST(DATEPART(YEAR,dateCol) AS VARCHAR)+'-'+CAST(DATEPART(MONTH,dateCol) AS VARCHAR) =
CAST(DATEPART(YEAR,maintable.dateCol) AS VARCHAR)+'-'+CAST(DATEPART(MONTH,maintable.dateCol) AS VARCHAR)
with an additional term for the day (ommitted above for the sake of space). That is the source of the slowness of the query, but I don't know how to rewrite the query so that it returns the same result more efficiently. I have complete control over the generation of the query, so if I could find more efficient SQL that returned the same results, I could modify the query generator appropriately. Any pointers would be greatly appreciated.
I havent tested but i think it can be done by:
SELECT
[dateExpr] AS domainValue,
SUM (CASE WHEN column2='A' THEN 1 ELSE 0 END) AS series1
FROM table1 maintable
GROUP BY [dateExpr]
ORDER BY domainValue
The fastest way to do this would be to use calendar tables. Create a sql table with an entry for every month for next who knows how many years. Then select from that calendar table, joining in the entries from table1 that have dates between the start and end date for the month. Then, if your clustered index is on the dateCol in table1, the query will run very quickly.
EDIT: Example Query. This assumes a months table exists with two columns, StartDate and EndDate where EndDate is the midnight on the first day of the next month. The clustered index on the months table should be on StartDate
SELECT
months.StartDate,
COUNT(*) AS [Count]
FROM months
INNER JOIN table1
ON table1.dateCol >= months.StartDate AND table1.dateCol < months.EndDate
GROUP BY months.StartDate;
With Calendar As
(
Select DateAdd(d, DateDiff(d, 0, Min( dateCol ) ), 0) As [date]
From Table1
Union All
Select DateAdd(d, 1, [date])
From Calendar
Where [date] <= (
Select Max( DateAdd(d, DateDiff(d, 0, dateCol) + 1, 0) )
From Table1
)
)
Select C.date, Count(Table1.PK) As Total
From Calendar As C
Left Join Table1
On Table1.dateCol >= C.date
And Table1.dateCol < DateAdd(d, 1, C.date )
And Table1.column2 = 'A'
Group By C.date
Option (Maxrecursion 0);
Rather than try to force the display format in SQL, you should do that in your report or chart generator. However, what you can do in the SQL is to strip the time portion from the datetime values as I've done in my solution.

Problem with creating Indexed View AND Group BY in SQL Server 2008 R2

I want to create indexed view with such t-sql:
Select
Table1_ID,
cast(CONVERT(varchar(8),
t2.Object_CreationDate, 112)AS DateTime) as Object_CreationDate ,
Count_BIG(*) as ObjectTotalCount
from
[dbo].Table2 t2 inner join [dbo].Table1 t1 on ...
Group BY
Table1_ID, CONVERT(varchar(8), t2.Object_CreationDate, 112))
I need to make group by only by datepart of column Object_CreationDate (type datetime2 ).
Also I want to set index on columns Theme_Id AND Object_CreationDate in the derived view.
If I use cast(CONVERT(varchar(8), m.Mention_CreationDate, 112)AS DateTime) in SELECT - I'll get problems with index on this column. Because this column (Object_CreationDate) is not deterministic.
I wonder if it is possible to solve a problem.
replace ...
CONVERT(varchar(8), t2.Object_CreationDate, 112))
... with
DATEADD(day, DATEDIFF(day, 0, t2.Object_CreationDate), 0)
--OR
CAST(t2.Object_CreationDate AS date)
The 2nd format is SQL Server 2008+ only, the 1st is more general
This removes the time component from a datetime value in the date/datetime datatype domain without any intermediate locale dependent datetime formats
See these answers: One and Two(comments)

Resources