I am seeing some strange query speed results when using a view with an outer apply, I am doing a distinct count on 2 different columns in the view, 1 is done in less than 0.1 seconds, the other takes 4-6 seconds, is the second count query returned slower because it is part of the outer apply? If so - how could I speed this query up?
The fast distinct count is -
SELECT DISTINCT ISNULL([ItemType], 'N/A') AS Items FROM vwCustomerItemDetailsFull
The slow distinct count is -
SELECT DISTINCT ISNULL([CustomerName], 'N/A') AS Items FROM vwCustomerItemDetailsFull
The view is -
SELECT I.ItemID,
IT.Name AS ItemType,
CASE
WHEN CustomerItemEndDate IS NULL
OR CustomerItemEndDate > GETDATE() THEN CustomerItems.CustomerName
ELSE NULL
END AS CustomerName,
CASE
WHEN CustomerItemEndDate IS NULL
OR CustomerItemEndDate > GETDATE() THEN CustomerItems.CustomerNumber
ELSE NULL
END AS CustomerNumber,
CASE
WHEN CustomerItemEndDate IS NULL
OR CustomerItemEndDate > GETDATE() THEN CustomerItems.CustomerItemStartDate
ELSE NULL
END AS CustomerItemStartDate,
FROM tblItems I
INNER JOIN tblItemTypes IT
ON I.ItemTypeID = IT.ItemTypeID
OUTER APPLY (SELECT TOP 1 CustomerName,
CustomerNumber,
StartDate AS CustomerItemStartDate,
EndDate AS CustomerItemEndDate
FROM tblCustomerItems CI
INNER JOIN tblCustomers C
ON C.CustomerID = CI.CustomerID
WHERE CI.ItemID = I.ItemID
ORDER BY EndDate DESC) AS CustomerItems
Check the execution plan, this speed difference is not strange at all, since it is an outer apply and not a cross apply, and within it you are limiting the results to top 1, it means that your outer apply has no influence on the number of results of the query, or the column ItemType.
Therefore when you select from the view and don't use any columns from the outer apply, the optimiser is smart enough to know it doesn't need to execute it. So in essesnce your first query is:
SELECT DISTINCT ISNULL([ItemType], 'N/A') AS Items
FROM ( SELECT tblItems
FROM Items
INNER JOIN tblItemTypes IT
ON I.ItemTypeID = IT.ItemTypeID
) vw
Whereas your second query has to execute the outer apply.
I have previously posted a longer answer which could also be helpful.
EDIT
If you wanted to change your query to a JOIN it could be rewritten as so:
SELECT I.ItemID,
IT.Name AS ItemType,
CustomerName,
CustomerNumber,
CustomerItemStartDate,
FROM tblItems I
INNER JOIN tblItemTypes IT
ON I.ItemTypeID = IT.ItemTypeID
LEFT JOIN
( SELECT ci.ItemID,
CustomerName,
CustomerNumber,
StartDate AS CustomerItemStartDate,
EndDate AS CustomerItemEndDate,
RN = ROW_NUMBER() OVER (PARTITION BY ci.ItemID ORDER BY EndDate DESC)
FROM tblCustomerItems CI
INNER JOIN tblCustomers C
ON C.CustomerID = CI.CustomerID
) AS CustomerItems
ON CustomerItems.ItemID = I.ItemID
AND CustomerItems.rn = 1
AND CustomerItems.CustomerItemEndDate < GETDATE();
However I don't think this will improve performance much since you said the most costly part is the sort on EndDate, and for your first query it will negatively impact performance because the optimiser will no longer optimise out the outer apply.
I expect the best way to improve the performance will be adding indexes, without knowing your data size or distribution I can't accurately guess the exact index you need, if you run the query on it's own showing the actual execution plan SSMS will suggest an index for you which would be better than my best guess.
Related
I have one source table in DB. I need to do group and sum to get one bridging table, extract supplier info on the other bridging table then join the two using part_number.
If I run the subqueries separately, T1 gives me 54699 records and T2 gives approx 10 times rows of T1.
Next, I do left join, I expect it should return 54699 records, but the server engine never stops and it returns 50 million records at the time I scroll down to the end. I have to stop the query manually. I realized there must something wrong with my query, but I can not figure it out. I would appreciate it if you have any ideas. Thank you!
SELECT
T1.*, T2.SUPPLIER
FROM
(SELECT
T.PART_NUMBER,T.YEAR, T.WEEK,
SUM(T.QTY_FILLED) TOTAL_FILLED,
SUM(T.QTY_ORDERED) TOTAL_ORDERED,
COUNT(T.LINE_NUMBER) ORDER_TIMES
FROM
DBO.TABLE1 T
WHERE
T.YEAR IS NOT NULL
GROUP BY
PART_NUMBER, T.YEAR, T.WEEK) T1
LEFT JOIN
(SELECT
T.PART_NUMBER, T.SUPPLIER
FROM
DBO.TABLE1 T) T2 ON T1.PART_NUMBER = T2.PART_NUMBER
ORDER BY
T1.PART_NUMBER, T1.YEAR, T1.WEEK
I also tried the window function, but still no luck.
WITH T1 AS
(
SELECT
T.PART_NUMBER,T.YEAR, T.WEEK,
SUM(T.QTY_FILLED) TOTAL_FILLED,
SUM(T.QTY_ORDERED) TOTAL_ORDERED,
COUNT(T.LINE_NUMBER) ORDER_TIMES
FROM
DBO.TABLE1 T
WHERE
T.YEAR IS NOT NULL
GROUP BY
PART_NUMBER, T.YEAR, T.WEEK
), T2 AS
(
SELECT T.PART_NUMBER, T.SUPPLIER
FROM DBO.TABLE1 T
)
SELECT
T1.*, T2.SUPPLIER
FROM
T1
LEFT JOIN
T2 ON T1.PART_NUMBER = T2.PART_NUMBER
ORDER BY
T1.PART_NUMBER, T1.YEAR, T1.WEEK
First of all, it not only return 54699 rows. You do a join without distinct, so the result could be the join of 50.000 x 5.000.000 rows and it depends on the value of your table.
If you use SQL 2017 or newer, try something like this:
SELECT
T.PART_NUMBER,T.YEAR, T.WEEK,
SUM(T.QTY_FILLED) TOTAL_FILLED,
SUM(T.QTY_ORDERED) TOTAL_ORDERED,
COUNT(T.LINE_NUMBER) ORDER_TIMES,
STRING_AGG (SUPPLIER, ', ') AS SUPPLIER
FROM
DBO.TABLE1 T
WHERE
T.YEAR IS NOT NULL
GROUP BY
PART_NUMBER, T.YEAR, T.WEEK
I have the below SQL query that is attempting to return the Last Transaction date for a specific part. The subquery that I'm left joining runs fine when I run it by itself (with the part specific criteria)
SELECT TOP 1 S1.*
FROM PartTran S1
WHERE S1.TranDate > '10/10/2016' AND S1.TranType <> 'ADJ-CST' AND S1.PartNum = '0000AAAO' ORDER BY S1.TranDate DESC
However when I join this into my main query, its returning null.
SELECT T1.PartNum, T2.TranDate, T2.TranType
FROM dbo.Part T1
LEFT JOIN (SELECT TOP 1 S1.* FROM PartTran S1 WHERE S1.TranDate > '10/10/2016' AND S1.TranType <> 'ADJ-CST' ORDER BY S1.TranDate DESC) T2 ON T1.Company = T2.Company AND T1.PartNum = T2.PartNum
WHERE T1.PartNum = '0000AAAO'
Am I missing something here?
Can you please check this following query-
SELECT
T1.PartNum,
T2.TranDate,
T2.TranType
FROM dbo.Part T1
LEFT JOIN
(
SELECT TOP 1 S1.*
FROM PartTran S1
WHERE S1.TranDate > '10/10/2016'
AND S1.TranType <> 'ADJ-CST'
AND S1.PartNum = '0000AAAO'
-- I think this above filter (AND S1.PartNum = '0000AAAO') is required
-- other wise top 1 can select records belongs to other PartNum and
-- your left join will return NULL logically
ORDER BY S1.TranDate DESC
) T2 ON T1.Company = T2.Company
AND T1.PartNum = T2.PartNum
WHERE T1.PartNum = '0000AAAO';
The reason why your original query doesn't work has to do with order of operations.
The derived table T2 resulted in 1 and only 1 record; not 1 record per PART number. This has to do with the derived table obtaining it's results BEFORE it can be joined to T1. Since the part numbers didn't match unless you got lucky on the part and day and company... you would get no data. A cross/outer apply allows you to get the TOP record per Join Criteria. and thus will return multiple records; 1 for each part and company; instead of just 1.
I think you're after a cross or outer apply and you can avoid the 2nd filter in the derived table (T2) If you want parts w/o any transactions kept then use the outer apply, if you only want those with part transactions use cross apply.
SELECT T1.PartNum, T2.TranDate, T2.TranType
FROM dbo.Part T1
CROSS APPLY (SELECT TOP 1 S1.*
FROM PartTran S1
WHERE S1.TranDate > '10/10/2016'
AND S1.TranType <> 'ADJ-CST'
ORDER BY S1.TranDate DESC) T2
ON T1.Company = T2.Company
AND T1.PartNum = T2.PartNum
WHERE T1.PartNum = '0000AAAO'
Alternatively you could use a row number instead of top and partition by your company and partNum ordering by transdate and only return row number 1st ordering by your transdate descending.
Here's a MSDN Doc link showing how cross/outer apply
works.
Try removing the 'LEFT' in your join because it is allowing you to select rows in the sub query that don't meet the criteria of your WHERE clause. That seemed to fix the issue in my text environment at least.
I would suggest a simpler query:
SELECT TOP 1 p.PartNum, T.TranDate, T.TranType
FROM dbo.Part p JOIN
PartTran pt
ON pt.Company = p.Company AND
pt.PartNum = t.PartNum AND
pt.TranType <> 'ADJ-CST' AND
pt.TranDate > '2016-10-10'
WHERE p.PartNum = '0000AAAO'
ORDER BY pt.TranDate DESC;
What would be the most efficient way to eliminate records in WHERE clause using TOP 1 logic?
Table tblQuoteStatusChangeLog is not in a JOIN.
But based on value in this table I need to eliminate records that have NewQuoteStatusID = 12
It works the way it is, but I am looking for more efficient way, since I have Sort (Top N Sort) operator that is too expansive.
SELECT
Q.ControlNo
,sum(fid.amtbilled) as Premium
FROM
[dbo].tblQuotes Q
inner join [dbo].[tblFin_Invoices] FI on Q.QuoteID = FI.QuoteID and FI.failed = 0
inner join [dbo].[tblFin_InvoiceDetails] FID on FI.[InvoiceNum] = FID.InvoiceNum
WHERE (
SELECT TOP 1 NewQuoteStatusID
FROM tblQuoteStatusChangeLog
WHERE (ControlNo = Q.ControlNo)
ORDER BY Timestamp DESC
) <> 12
Group by
Q.ControlNo
Your code is RBAR; performing the same subquery 1 at a time, which is very inefficient.
You worry about "sort", but that by itself would not be a problem. Look further up and left of the plan; to the nested loop. See the fat input line at the top and thin just below. Basically you're hitting your sort very many times.
Suggestion: try to use a set-based solution. "Prepare" the data you require for the WHERE clause "in advance", so you can eliminate the RBAR. Imagine you had LatestStatus as a table with ControlNo and StatusID columns. It would be much simpler to apply your filter; and the Query Optimiser should be able to find a more efficient overall plan.
You can set this up using a CTE.
;with StatusByControlNo as (
SELECT ROW_NUMBER() OVER(PARTITION BY ControlNo ORDER BY Timestamp DESC) AS RowNo,
ControlNo, Timestamp, NewQuoteStatusID
FROM tblQuoteStatusChangeLog
) ...
/*Easy to get Latest status per ControlNo from here*/
SELECT ControlNo, NewQuoteStatusID
FROM StatusByControlNo
WHERE RowNo = 1
Now with a few tweaks your query becomes:
;with StatusByControlNo as (
SELECT ROW_NUMBER() OVER(PARTITION BY ControlNo ORDER BY Timestamp DESC) AS RowNo,
ControlNo, Timestamp, NewQuoteStatusID
FROM tblQuoteStatusChangeLog
)
SELECT
Q.ControlNo,
sum(fid.amtbilled) as Premium
FROM
tblQuotes Q
inner join tblFin_Invoices FI
on Q.QuoteID = FI.QuoteID and FI.failed = 0
inner join tblFin_InvoiceDetails FID
on FI.InvoiceNum = FID.InvoiceNum
inner join StatusByControlNo S
on S.ControlNo = Q.ControlNo and S.RowNo = 1
WHERE
S.ControlNo <> 12
Group by Q.ControlNo
It should go without saying you could try a number of variations on this. But the core principle is to reduce RBAR and look for solutions that are more 'set-based'.
I think this will be easier to show an example first and then explain:
SELECT P.ID,
(CASE WHEN PC.NewCostPrice IS NULL
THEN P.Cost ELSE MAX(PC.Date) PC.NewCostPrice
END)
FROM price AS P
LEFT OUTER JOIN priceChange as PC
ON P.ID = PC.ID
So in the example, if the NewCostPrice IS NULL, meaning there wasn't a price change, then I want the normal cost (P.Cost). However, if there was a price change, I want the most recent (MAX(Date)) price change. I am not sure how to incorporate that into the CASE statement.
I feel like it can be done with a subquery and having clause but that didn't really work out when I tried. Any suggestions?
Thanks!
There are 2 approaches you might consider - I would test both to see which performs better for your situation.
Use ROW_NUMBER() in subquery to find most recent price change of all price changes, then join that to prices to get correct price.
Use correlated subquery (many ways of this, either in SELECT as in other answer or with OUTER APPLY) to get only most recent price change for each row of prices
If your price table is very large and you are getting a large number of prices at once, method #1 will likely be better so the correlated subquery doesn't run for every single row of the result set.
If your final query pulls back a relatively small number of records instead of huge result sets for your server, then the correlated subquery could be better for you.
1. The ROW_NUMBER() approach
SELECT
P.ID,
COALESCE(PC.NewCostPrice, P.Cost) AS LatestPrice
FROM Price AS P
LEFT OUTER JOIN (
SELECT
ID,
ROW_NUMBER() OVER (PARTITION BY ID
ORDER BY [Date] DESC) AS RowId,
NewCostPrice
FROM PriceChange
) PC
ON P.ID = PC.ID
AND PC.RowId = 1 -- Only most recent
2a. Correlated subquery (SELECT)
SELECT
P.ID,
COALESCE((
SELECT TOP 1
NewCostPrice
FROM PriceChange PC
WHERE PC.ID = P.ID
ORDER BY PC.[Date] DESC
), P.Cost) AS LatestPrice
FROM Price AS P
2b. Correlated subquery with OUTER APPLY
SELECT
P.ID,
COALESCE(PC.NewCostPrice, P.Cost) AS LatestPrice
FROM Price AS P
OUTER APPLY (
SELECT TOP 1
NewCostPrice
FROM PriceChange PC
WHERE PC.ID = P.ID
ORDER BY PC.[Date] DESC
) PC
Whether you use 2a or 2b is more likely a preference in how you want to maintain the query going forward.
Easy way
SELECT distinct P.ID,
ISNULL((SELECT TOP 1 PC1.NewCostPrice FROM priceChange as PC1 WHERE PC1.ID = p.id ORDER BY PC1.Date DESC), p.cost)
FROM price AS P
Here I assume PC.ID is not a primary key, or it makes no sense to join with ID while there could be different price on the same item.
From your query I assume you just want to fetch the latest NewCostPrice sorted by Date, by joining priceChange
SELECT
P.ID,
CASE
WHEN PC.NewCostPrice IS NULL THEN P.Cost
ELSE PC.NewCostPrice
END AS NewPrice
FROM
price AS P
LEFT JOIN
(SELECT *, RANK() OVER (PARTITION BY ID ORDER BY [Date] DESC) as rk FROM priceChange) PC ON P.ID = PC.ID AND PC.rk = 1
SELECT P.ID
,(CASE
WHEN PC.NewCostPrice IS NULL
THEN P.Cost
ELSE (SELECT TOP 1 PC1.NewCostPrice
FROM priceChange PC1
WHERE PC1.ID = PC.ID
GROUP BY PC1.NewCostPrice, PC1.Date
ORDER BY PC1.Date DESC
)
END
)
FROM price AS P
LEFT OUTER JOIN priceChange as PC
ON P.ID = PC.ID
I was trying to write a query for the SQL Server sample DB Northwind. The question was: "Show the most recent five orders that were purchased by a customer who has spent more than $25,000 with Northwind."
In my query the Alias name - "Amount" is not being recognized. My query is as follows:
select top(5) a.customerid, sum(b.unitprice*b.quantity) as "Amount", max(c.orderdate) as Orderdate
from customers a join orders c
on a.customerid = c.customerid
join [order details] b
on c.orderid = b.orderid
group by a.customerid
--having Amount > 25000 --throws error
having sum(b.unitprice*b.quantity) > 25000 --works, but I don't think that this is a good solution
order by Orderdate desc
Pls let me know what I am doing wrong here, as I am a newbie in writing T Sql. Also can this query and my logic be treated as production level query?
TIA,
You must use the aggregate in the query you have. This all has to do with the order in which a SELECT statement is executed. The syntax of the SELECT statement is as follows:
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
The order in which a SELECT statement is executed is as follows. Since the SELECT clause isn't executed until after the HAVING clause, you can't use the alias like you can in the ORDER BY clause.
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
Reference Article: http://www.bennadel.com/blog/70-sql-query-order-of-operations.htm
This is a known limitation in SQL Server, at least, but no idea if it's a bug, intentional or even part of the standard. But the thing is, neither the WHERE or HAVING clauses accept an alias as part of their conditions, you must use only columns from the original source tables, which means that for filtering by calculated expressions, you must copy-paste the very same thing in both the SELECT and WHERE parts.
A workaround for avoiding this duplication can be to use a subquery or cte and apply the filter on the outer query, when the alias is just an "input" table:
WITH TopOrders AS (
select a.customerid, sum(b.unitprice*b.quantity) as "Amount", max(c.orderdate) as Orderdate
from customers a join orders c
on a.customerid = c.customerid
join [order details] b
on c.orderid = b.orderid
group by a.customerid
--no filter here
order by Orderdate desc
)
SELECT TOP(5) * FROM TopOrders WHERE Amount > 25000 ;
Interesting enough, the ORDER BY clause does accepts aliases directly.
You must use Where b.unitprice*b.quantity > 25000 instead of having Amount > 25000.
Having used for aggregate conditions. Your business determine your query condition. If you need to calculate sum of prices that have above value than 25000, must be use Where b.unitprice*b.quantity > 25000 and if you need to show customer that have total price above than 25000 must be use having Amount > 25000 in your query.
select top(5) a.customerid, sum(b.unitprice*b.quantity) as Amount, max(c.orderdate) as Orderdate
from customers a
JOIN orders c ON a.customerid = c.customerid
join [order details] b ON c.orderid = b.orderid
group by a.customerid
having sum(b.unitprice*b.quantity) > 25000 --works, but I don't think that this is a good solution
Order by Amount
I don't have that schema at hand, so table' and column' names might go a little astray, but the principle is the same:
select top (5) ord2.*
from (
select top (1) ord.CustomerId
from dbo.Orders ord
inner join dbo.[Order Details] od on od.OrderId = ord.OrderId
group by ord.CustomerId
having sum(od.unitPrice * od.Quantity) > $25000
) sq
inner join dbo.Orders ord2 on ord2.CustomerId = sq.CustomerId
order by ord2.OrderDate desc;
The Having Clause will works with aggregate function like SUM,MAX,AVG..
You may try like this
SELECT TOP 5 customerid,SUM(Amount)Amount , MAX(Orderdate) Orderdate
FROM
(
SELECT A.customerid, (B.unitprice * B.quantity) As "Amount", C.orderdate As Orderdate
FROM customers A JOIN orders C ON A.customerid = C.customerid
JOIN [order details] B ON C.orderid = B.orderid
) Tmp
GROUP BY customerid
HAVING SUM(Amount) > 25000
ORDER BY Orderdate DESC
The question is little ambiguos.
Show the most recent five orders that were purchased by a customer who
has spent more than $25,000 with Northwind.
Is it asking to show the 5 recent orders by all the customers who have spent more than $25,000 in all of their transactions (which can be more than 5).
The following query shows all the customers who spent $25000 in all of their transactions (not just the recent 5).
In one of the Subquery BigSpenders it gets all the Customers who spent more than $25000.
Another Subquery calculates the total amount for each order.
Then it gets rank of all the orders by OrderDate and OrderID.
Then it filters it by Top 5 orders for each customer.
--
SELECT *
FROM (SELECT C.customerid,
C.orderdate,
C.orderid,
B3.amount,
Row_number()
OVER(
partition BY C.customerid
ORDER BY C.orderdate DESC, C.orderid DESC) Rank
FROM orders C
JOIN
--Get Amount Spend Per Order
(SELECT b2.orderid,
Sum(b2.unitprice * b2.quantity) AS Amount
FROM [order details] b2
GROUP BY b2.orderid) B3
ON C.orderid = B3.orderid
JOIN
--Get Customers who spent more than 25000
(SELECT c.customerid
FROM orders c
JOIN [order details] b
ON c.orderid = b.orderid
GROUP BY c.customerid
HAVING Sum(b.unitprice * b.quantity) > 25000) BigSpenders
ON C.customerid = BigSpenders.customerid) X
WHERE X.rank <= 5