SQL Optimization where clause - sql-server

Instead of using function in the where clause can we do something different.
DateAdd taking time poor performance i guess..
How to optimize this sql
SELECT cust_id, order_date, price
FROM customers
WHERE DATEADD(DD,50,order_date)>=GETDATE()

Don't run your function on order_date, run the inverse on getdate() instead
select cust_id, order_date, price
from customers
where order_date>=dateadd(Day,-50,getdate())
Function calls on order_date are going to cause an index scan, if you instead run your function on the filter criteria getdate() you can preserve an index seek on this column. (If it has an index).
SARGable functions in SQL Server - Rob Farley

Related

Add computed column using subquery

In SQL Server 2000, I want to add a computed column which basically is MAX(column1).
Of course I get an error because subqueries are not allowed.
What I basically try to do is to get the max(dateandtime) of a number of tables of my database.
However when I run my code it takes too long because it's a very old and badly designed database with no keys and indexes.
So, I believe that by adding a new computed column which is the max(datetime), I will do my query much much faster because I will query
(SELECT TOP 1 newcomputedcolumn FROM Mytable)
and I will not have to do
(SELECT TOP 1 dateandtime FROM Mytable
ORDER BY dateandtime DESC)
or
(SELECT MAX(dateandtime) FROM Mytable)
which takes too long.
Any ideas? Thanks a lot.

Datediff last and previous dates SQL

I'm learning SQL, for an exercise I have to several things.
I'm making a query to compare the most recent orderdate with the orderdate before. I want to use a correlated subquery for this. I have already made it using a Cross Apply and Window functions.
At the moment I have this:
select
b1.klantnr,
DATEDIFF(D, (Select MAX(b1.Besteldatum)),
(Select MAX(b1.Besteldatum)
where besteldatum not in (Select MAX(b1.besteldatum)))) as verschil
from
bestelling b1
group by
b1.klantnr, b1.besteldatum
I only get null values in the datediff column. It should return this:
Results
I'm using SQL Server 2014 Management Studio.
Any help appreciated.
Here is one simple way:
select datediff(day, min(bs.Besteldatum), max(bs.Besteldatum)) as most_recent_diff
from (select top (2) bs.*
from bestelling bs
order by bs.Besteldatum
) bs;
This uses a subquery, but not a correlated subquery. Should have really good performance, if you have an index on bestselling(Besteldatum).
A correlated subquery way.
select top 1 bs.*,datediff(day,
(select max(bs1.Besteldatum)
from bestelling bs1
where bs1.Besteldatum<bs.Besteldatum),
bs.Besteldatum
) as diff
from bestelling bs
order by bs.Besteldatum desc
This gives only the difference between latest date and the date preceding it. If you need all records remove top 1 from the query.

Which Transact-SQL query is most efficient?

I plan to pass exam "Querying Microsoft SQL Server 2012"
I have one question that I have problem to understand.
Question is:
Which Transact-SQL query should you use?
Your database contains a table named Purchases. Thetable includes a
DATETIME column named PurchaseTime that stores the date and time each
purchase is made. There is a non-clustered index on the PurchaseTime
column. The business team wants a report that displays the total
number of purchases madeon the current day. You need to write a query
that will return the correct results in the most efficient manner.
Which Transact-SQL query should you use?
Possible answers are:
A.
SELECT COUNT(*)
FROM Purchases
WHERE PurchaseTime = CONVERT(DATE, GETDATE())
B.
SELECT COUNT(*)
FROM Purchases
WHERE PurchaseTime = GETDATE()
C.
SELECT COUNT(*)
FROM Purchases
WHERE CONVERT(VARCHAR, PurchaseTime, 112) = CONVERT(VARCHAR, GETDATE(), 112)
D.
SELECT COUNT(*)
FROM Purchases
WHERE PurchaseTime >= CONVERT(DATE, GETDATE())
AND PurchaseTime < DATEADD(DAY, 1, CONVERT(DATE, GETDATE()))
This is source: Which Transact-SQL query should you use?
According to them the correct answer is 'D'.
But I do not see why is this more efficient than 'A' ?
In 'D' we call two functions (CONVERT and DATEADD).
Thanks for help.
D will be most efficient as you are not converting the datetime column to any other data type, which means SQL Server can use any indexes defined on the PurchaseTime column.
It is also known as Sargable expression.
C will ignore any indexes defined on the PurchaseTime column and will result in a Clustered scan if there is one or a table scan if it is a heap (a table without a clustered index).
And queries A and B will simply not return the correct results as they will ignore any records older than when this query is executed.

SQL Server query performance when there's a dynamic condition in where clause

Let's say we have query A:
SELECT count(1) FROM MyTable WHERE Date_Created < DATEADD(DD, 3, GETDATE())
AND query B:
SELECT count(1) FROM MyTable WHERE Date_Created < '2013-05-24'
When those queries are run, how does the compiler optimize query A? Does it re-evaluate DATEADD and GETDATE for each row in MyTable?
The reason I am asking is because I ran several tests to see which queries are faster and the result seems to indicate that there's no huge difference in the performance of the two, which is kinda counter-intuitive. Thanks.
GETDATE is a runtime constant and won't be repeatedly re-evaluated.
Chances are that the whole expression with DATEADD will be only evaluated once.

How can I efficiently compute the MAX of one column, ordered by another column?

I have a table schema similar to the following (simplified):
CREATE TABLE Transactions
(
TransactionID int NOT NULL IDENTITY(1, 1) PRIMARY KEY CLUSTERED,
CustomerID int NOT NULL, -- Foreign key, not shown
TransactionDate datetime NOT NULL,
...
)
CREATE INDEX IX_Transactions_Customer_Date
ON Transactions (CustomerID, TransactionDate)
To give a bit of background here, this transaction table is actually consolidating several different types of transactions from another vendor's database (we'll call it an ETL process), and I therefore don't have a great deal of control over the order in which they get inserted. Even if I did, transactions may be backdated, so the important thing to note here is that the maximum TransactionID for any given customer is not necessarily the most recent transaction.
In fact, the most recent transaction is a combination of the date and the ID. Dates are not unique - the vendor often truncates the time of day - so to get the most recent transaction, I have to first find the most recent date, and then find the most recent ID for that date.
I know that I can do this with a windowing query (ROW_NUMBER() OVER (PARTITION BY TransactionDate DESC, TransactionID DESC)), but this requires a full index scan and a very expensive sort, and thus fails miserably in terms of efficiency. It's also pretty awkward to keep writing all the time.
Slightly more efficient is using two CTEs or nested subqueries, one to find the MAX(TransactionDate) per CustomerID, and another to find the MAX(TransactionID). Again, it works, but requires a second aggregate and join, which is slightly better than the ROW_NUMBER() query but still rather painful performance-wise.
I've also considered using a CLR User-Defined Aggregate and will fall back on that if necessary, but I'd prefer to find a pure SQL solution if possible to simplify the deployment (there's no need for SQL-CLR anywhere else in this project).
So the question, specifically is:
Is it possible to write a query that will return the newest TransactionID per CustomerID, defined as the maximum TransactionID for the most recent TransactionDate, and achieve a plan equivalent in performance to an ordinary MAX/GROUP BY query?
(In other words, the only significant steps in the plan should be an index scan and stream aggregate. Multiple scans, sorts, joins, etc. are likely to be too slow.)
The most useful index might be:
CustomerID, TransactionDate desc, TransactionId desc
Then you could try a query like this:
select a.CustomerID
, b.TransactionID
from (
select distinct
CustomerID
from YourTable
) a
cross apply
(
select top 1
TransactionID
from YourTable
where CustomerID = a.CustomerID
order by
TransactionDate desc,
TransactionId desc
) b
How about something like this where you force the optimizer to calculate the derived table first. In my tests, this was less expensive than the two Max comparisons.
Select T.CustomerId, T.TransactionDate, Max(TransactionId)
From Transactions As T
Join (
Select T1.CustomerID, Max(T1.TransactionDate) As MaxDate
From Transactions As T1
Group By T1.CustomerId
) As Z
On Z.CustomerId = T.CustomerId
And Z.MaxDate = T.TransactionDate
Group By T.CustomerId, T.TransactionDate
Disclaimer: Thinking out loud :)
Could you have an indexed, computed column that combines the TransactionDate and TransactionID columns into a form that means finding the latest transaction is just a case of finding the MAX of that single field?
This one seemed to have good performance statistics:
SELECT
T1.customer_id,
MAX(T1.transaction_id) AS transaction_id
FROM
dbo.Transactions T1
INNER JOIN
(
SELECT
T2.customer_id,
MAX(T2.transaction_date) AS max_dt
FROM
dbo.Transactions T2
GROUP BY
T2.customer_id
) SQ1 ON
SQ1.customer_id = T1.customer_id AND
T1.transaction_date = SQ1.max_dt
GROUP BY
T1.customer_id
I think I actually figured it out. #Ada had the right idea and I had the same idea myself, but was stuck on how to form a single composite ID and avoid the extra join.
Since both dates and (positive) integers are byte-ordered, they can not only be concatenated into a BLOB for aggregation but also separated after the aggregate is done.
This feels a little unholy, but it seems to do the trick:
SELECT
CustomerID,
CAST(SUBSTRING(MAX(
CAST(TransactionDate AS binary(8)) +
CAST(TransactionID AS binary(4))),
9, 4) AS int) AS TransactionID
FROM Transactions
GROUP BY CustomerID
That gives me a single index scan and stream aggregate. No need for any additional indexes either, it performs the same as just doing MAX(TransactionID) - which makes sense, obviously, since all of the concatenation is happening inside the aggregate itself.

Resources