SQL Server 2005 Syntax Help - "Select Info based upon Max Value of Sub Query"

SQL Server 2005 Syntax Help - "Select Info based upon Max Value of Sub Query" - sql-server

The objective is below the list of tables.
Tables:
Table: Job
JobID
CustomerID
Value
Year
Table: Customer
CustomerID
CustName
Table: Invoice
SaleAmount
CustomerID
The Objective
Part 1: (easy) I need to select all invoice records and sort by Customer (To place nice w/ Crystal Reports)
Select * from Invoice as A inner join Customer as B on A.CustomerID = B.CustomerID
Part 2: (hard) Now, we need to add two fields:
JobID associated with that customer's job that has the Maximum Value (from 2008)
Value associated with that job
Pseudo Code
Select * from
Invoice as A
inner join Customer as B on A.CustomerID = B.CustomerID
inner join
(select JobID, Value from Jobs where Job:JobID has the highest value out of all of THIS customer's jobs from 2008)
General Thoughts
This is fairly easy to do If I am only dealing with one specific customer:
select max(JobId), max(Value) as MaxJobID from Jobs where Value = (select max(Value) from Jobs where CustomerID = #SpecificCustID and Year = '2008') and CustomerID = SpecificCustID and CustomerID = '2008'
This subquery determines the max Value for this customer in 2008, and then its a matter of choosing a single job (can't have dupes) out of potential multiple jobs from 2008 for that customer that have the same value.
The Difficulty
What happens when we don't have a specific customer ID to compare against? If my goal is to select ALL invoice records and sort by customer, then this subquery needs access to which customer it is currently dealing with. I suppose this can "sort of" be done through the ON clause of the JOIN, but that doesn't really seem to work because the sub-sub query has no access to that.
I'm clearly over my head. Any thoughts?

How about using a CTE. Obviously, I can't test, but here is the idea. You need to replace col1, col2, ..., coln with the stuff you want to select.
Inv( col1, col2, ... coln)
AS
(
SELECT col1, col2, ... coln,
ROW_NUMBER() OVER (PARTITION BY A.CustomerID
ORDER BY A.Value DESC) AS [RowNumber]
FROM Invoice A INNER JOIN Customer B ON A.CustomerID = B.CustomerID
WHERE A.CustomerID = #CustomerID
AND A.Year = #Year
)
SELECT * FROM Inv WHERE RowNumber = 1
If you don't have a CustomerID, this will return the top value for each customer (that will hurt on performance tho).

The row_number() function can give you what you need:
Select A.*, B.*, C.JobID, C.Value
from
Invoice as A
inner join Customer as B on A.CustomerID = B.CustomerID
inner join (
select JobID, Value, CustomerID,
ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Value DESC) AS Ordinal
from Jobs
WHERE Year = 2008
) AS C ON (A.CustomerID = C.customerID AND C.Ordinal = 1)
The ROW_NUMBER() function in this query will order by value in descending order and the PARTITION BY clause will do this separately for each different value of CustomerID. This means that the highest Value for each customer will always be 1, so we can join to that value.

The over function is an awesome, but often neglected function. You can use it in a subquery to pull back your valid jobs, like so:
select
a.*
from
invoice a
inner join customer b on
a.customerid = b.customerid
inner join (select customerid, max(jobid) as jobid, maxVal from
(select customerid,
jobid,
value,
max(value) over (partition by customerid) as maxVal
from jobs
where Year = '2008') s
where s.value = s.maxVal
group by customerid, maxVal) c on
b.customerid = c.customerid
and a.jobid = c.jobid
Essentially, that first inner query looks like this:
select
customerid,
jobid,
value,
max(value) over (partition by customerid) as maxVal
from jobs
where Year = '2008'
You'll see that this pulls back all of the jobs, but with that additional column which lets you know what the maximum value is for each customer. With the next subquery, we filter out any rows that have value and maxVal equal. Additionally, it finds the max JobID based on customerid and maxVal, because we need to pull back one and only one JobID (as per the requirements).
Now, you have a complete listing of CustomerID and JobID that meet the conditions of having the highest JobID that contains the maximum Value for that CustomerID in a given year. All that's left is to join it to Invoice and Customer, and you're good to go.

Just to be complete with the non row_number solution for those < MSSQL 2005. Personanly, I find it easier to follow myslef...but that could be biased considering how much time I spend in MSSQL 2000 vs 2005+.
SELECT *
FROM Invoice as A
INNER JOIN Customer as B ON
A.CustomerID = B.CustomerID
INNER JOIN (
SELECT
CustomerId,
--MAX in case dupe Values.
==If UC on CustomerId, Value (or CustomerId, Year, Value) then not needed
MAX(JobId) as JobId
FROM Jobs
JOIN (
SELECT
CustomerId,
MAX(Value) as MaxValue
FROM Jobs
WHERE Year = 2008
GROUP BY
CustomerId
) as MaxValue ON
Jobs.CustomerId = MaxValue.CustomerId
AND Jobs.Value = MaxValue.MaxValue
WHERE Year = 2008
GROUP BY
CustomerId
) as C ON
B.CustomerID = C.CustomerID

Related

Sql Filter table by two dates in order

I have been trying to filter one table by two dates with an order of importance (date2 > date1) as follows:
SELECT
t1.customer, t1.weights, t1.max(t1.date1) as date1, t1.date2
FROM
(SELECT *
FROM table
WHERE CAST(date2 AS smalldatetime) = '10/29/2017') t2
INNER JOIN
table t1 ON t1.customer = t2.customer
AND t1.date2 = t2.date2
GROUP BY
t1.customer, t1.date2
ORDER BY
t1.customer;
It filters the table correctly by date2 first, the max(t1.date1) doesn't what I want it to do though. I get duplicate customers, that share the same (and correct) date2, but show different date1's. These duplicate records have the following in common: The weight row is different. What would I need to do to output just the the customer records connected to the most current date1 without taking other columns into consideration?
I am still a noob, help would be greatly appreciated!
Solution for t-sql (all based on the accepted answer):
SELECT * FROM (
SELECT row_number() over(partition by t1.customer order by t1.date1 desc) as rownum, t1.customer, t1.weights, t1.date1 , t1.date2
FROM
(SELECT *
FROM table
WHERE CAST(date2 AS smalldatetime) = '10/29/2017') t2
INNER JOIN
table t1 ON t1.customer = t2.customer
AND t1.date2 = t2.date2
)t3
where rownum = 1;

If I understood correctly, then instead of a group by logic, I would just use a qualify row statement :)
Try the code below and tell me if it's what you needed - what I'm telling it to do is to bring back only one row per customer ID....but where we select the row based on the dates (by sorting them in ascending order) - however, I'm unclear of what you mean by importance of the 2 dates so I may be completely off base here...can you please give an example of input and desired output?
SELECT t1.customer, t1.weights, t1.date1, t1.date2
FROM
(
Select *
FROM table
WHERE Cast(date2 as smalldatetime)='10/29/2017'
) t2
Inner Join table t1
ON t1.customer = t2.customer
AND t1.date2 = t2.date2
Qualify row_number() over(partition by t1.customer order by date2 , date1)=1
Order By t1.customer;

Joining on unique ID and date range - must return 1 row

In my calculated data layer, I am attempting to populate a Customer's postcode at the time of the order, a sub sample of the table being populated is as follows:
CustomerOrders
(
CustomerID varchar(20),
...
OrderDate date,
...
CustomerPostcodeAtTimeOfOrder varchar(10)
)
This table is a join of the Customers table, the Orders table and the CustomerAddress table which looks like follows:
CustomerAddress
(
CustomerID varchar(20),
AddressType varchar(10),
/*
AddressDetails
*/
StartDate date,
EndDate date,
AddressRank int
)
It is quite conceivable that a customer may have recorded addresses of various types for a single date so the intention when populating the CustomerOrders table is to join as below:
SELECT *
FROM Customers c
LEFT JOIN Orders o
ON o.CustomerID = c.CustomerID
OUTER APPLY
(
SELECT TOP 1 Postcode
FROM CustomerAddress ca
WHERE ca.CustomerID = c.CustomerID
AND o.OrderDate BETWEEN ca.StartDate AND ca.EndDate
ORDER BY AddressRank
)
However, the performance hit I am getting by adding this join to the query means that returning 1000 rows goes from taking 4 seconds to taking 106 seconds.
Just to note, I have added a non-clustered index on the Address table too. The definition of which is as below:
CREATE NONCLUSTERED INDEX (IX_CustomerAddress)
ON CustomerAddress (StartDate, EndDate)
INCLUDE (AddressRank, CustomerID, Postcode)
I'm looking for any suggestions on the best way to tackle this issue please?

I'm not completely sure if this will return results faster, but you can rewrite your query like this:
;WITH OrderAddress AS
(
SELECT o.*,
ca.Postcode,
RN = ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY AddressRank DESC)
FROM CustomerAddress ca
INNER JOIN Orders o
ON ca.CustomerID = c.CustomerID
AND o.OrderDate BETWEEN ca.StartDate AND ca.EndDate
)
SELECT *
FROM Customers c
LEFT JOIN ( SELECT *
FROM OrderAddress
WHERE RN = 1) o
ON o.CustomerID = c.CustomerID;
You should also post the index definition on the Address table.

Alias name issue in SQL

I was trying to write a query for the SQL Server sample DB Northwind. The question was: "Show the most recent five orders that were purchased by a customer who has spent more than $25,000 with Northwind."
In my query the Alias name - "Amount" is not being recognized. My query is as follows:
select top(5) a.customerid, sum(b.unitprice*b.quantity) as "Amount", max(c.orderdate) as Orderdate
from customers a join orders c
on a.customerid = c.customerid
join [order details] b
on c.orderid = b.orderid
group by a.customerid
--having Amount > 25000 --throws error
having sum(b.unitprice*b.quantity) > 25000 --works, but I don't think that this is a good solution
order by Orderdate desc
Pls let me know what I am doing wrong here, as I am a newbie in writing T Sql. Also can this query and my logic be treated as production level query?
TIA,

You must use the aggregate in the query you have. This all has to do with the order in which a SELECT statement is executed. The syntax of the SELECT statement is as follows:
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
The order in which a SELECT statement is executed is as follows. Since the SELECT clause isn't executed until after the HAVING clause, you can't use the alias like you can in the ORDER BY clause.
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
Reference Article: http://www.bennadel.com/blog/70-sql-query-order-of-operations.htm

This is a known limitation in SQL Server, at least, but no idea if it's a bug, intentional or even part of the standard. But the thing is, neither the WHERE or HAVING clauses accept an alias as part of their conditions, you must use only columns from the original source tables, which means that for filtering by calculated expressions, you must copy-paste the very same thing in both the SELECT and WHERE parts.
A workaround for avoiding this duplication can be to use a subquery or cte and apply the filter on the outer query, when the alias is just an "input" table:
WITH TopOrders AS (
select a.customerid, sum(b.unitprice*b.quantity) as "Amount", max(c.orderdate) as Orderdate
from customers a join orders c
on a.customerid = c.customerid
join [order details] b
on c.orderid = b.orderid
group by a.customerid
--no filter here
order by Orderdate desc
)
SELECT TOP(5) * FROM TopOrders WHERE Amount > 25000 ;
Interesting enough, the ORDER BY clause does accepts aliases directly.

You must use Where b.unitprice*b.quantity > 25000 instead of having Amount > 25000.
Having used for aggregate conditions. Your business determine your query condition. If you need to calculate sum of prices that have above value than 25000, must be use Where b.unitprice*b.quantity > 25000 and if you need to show customer that have total price above than 25000 must be use having Amount > 25000 in your query.
select top(5) a.customerid, sum(b.unitprice*b.quantity) as Amount, max(c.orderdate) as Orderdate
from customers a
JOIN orders c ON a.customerid = c.customerid
join [order details] b ON c.orderid = b.orderid
group by a.customerid
having sum(b.unitprice*b.quantity) > 25000 --works, but I don't think that this is a good solution
Order by Amount

I don't have that schema at hand, so table' and column' names might go a little astray, but the principle is the same:
select top (5) ord2.*
from (
select top (1) ord.CustomerId
from dbo.Orders ord
inner join dbo.[Order Details] od on od.OrderId = ord.OrderId
group by ord.CustomerId
having sum(od.unitPrice * od.Quantity) > $25000
) sq
inner join dbo.Orders ord2 on ord2.CustomerId = sq.CustomerId
order by ord2.OrderDate desc;

The Having Clause will works with aggregate function like SUM,MAX,AVG..
You may try like this
SELECT TOP 5 customerid,SUM(Amount)Amount , MAX(Orderdate) Orderdate
FROM
(
SELECT A.customerid, (B.unitprice * B.quantity) As "Amount", C.orderdate As Orderdate
FROM customers A JOIN orders C ON A.customerid = C.customerid
JOIN [order details] B ON C.orderid = B.orderid
) Tmp
GROUP BY customerid
HAVING SUM(Amount) > 25000
ORDER BY Orderdate DESC

The question is little ambiguos.
Show the most recent five orders that were purchased by a customer who
has spent more than $25,000 with Northwind.
Is it asking to show the 5 recent orders by all the customers who have spent more than $25,000 in all of their transactions (which can be more than 5).
The following query shows all the customers who spent $25000 in all of their transactions (not just the recent 5).
In one of the Subquery BigSpenders it gets all the Customers who spent more than $25000.
Another Subquery calculates the total amount for each order.
Then it gets rank of all the orders by OrderDate and OrderID.
Then it filters it by Top 5 orders for each customer.
--
SELECT *
FROM (SELECT C.customerid,
C.orderdate,
C.orderid,
B3.amount,
Row_number()
OVER(
partition BY C.customerid
ORDER BY C.orderdate DESC, C.orderid DESC) Rank
FROM orders C
JOIN
--Get Amount Spend Per Order
(SELECT b2.orderid,
Sum(b2.unitprice * b2.quantity) AS Amount
FROM [order details] b2
GROUP BY b2.orderid) B3
ON C.orderid = B3.orderid
JOIN
--Get Customers who spent more than 25000
(SELECT c.customerid
FROM orders c
JOIN [order details] b
ON c.orderid = b.orderid
GROUP BY c.customerid
HAVING Sum(b.unitprice * b.quantity) > 25000) BigSpenders
ON C.customerid = BigSpenders.customerid) X
WHERE X.rank <= 5

Return only 1 address for each customer

I need to query the addresses of top 500 customers (best buyers). Many companies have multiple addresses.
The tables with data:
CustomerInfo
CustomerAddress
TransactionInfo
TransactionElements
My query looks this way:
select Customername, CustomerStreet --etc
from CustomerInfo
join CustomerAddress on CustomerID = Add_CustID
join TransactionInfo on Trn_CustID = CustomerID
JOIN TransactionElements ON Trn_CustID = TrE_CustID
GROUP BY CustomerName, CustomerStreet --etc
ORDER BY SUM (TrE_TranValue) DESC
It returns multiple addresses of a single company, I need just one.

One approach would be to use a CTE (Common Table Expression) if you're on SQL Server 2005 and newer (you aren't specific enough in that regard).
With this CTE, you can partition your data by some criteria - i.e. your CustomerId - and have SQL Server number all your rows starting at 1 for each of those "partitions", ordered by some criteria.
So try something like this:
;WITH CustomerAndAddress AS
(
SELECT
c.Customername, ca.CustomerStreet ,
ROW_NUMBER() OVER(PARTITION BY c.CustomerId ORDER BY ca.AddressID DESC) AS 'RowNum'
FROM
dbo.CustomerInfo c
INNER JOIN
dbo.CustomerAddress ca ON c.CustomerID = ca.Add_CustID
WHERE
......
)
SELECT
Customername, CustomerStreet
FROM
CustomerAndAddress
WHERE
RowNum = 1
Here, I am selecting only the "first" entry for each "partition" (i.e. for each CustomerId) - ordered by some criteria (I just arbitrarily picked AddressID from the address - adapt as needed) you need to define in your CTE.
Does that approach what you're looking for??

Something like this will work from sqlserver 2005+. I also suggest adding some aliasses to your tables and refer to those.
select CI.Customername, CA.CustomerStreet --etc
from CustomerInfo CI
cross apply
(select top 1 Customername, CustomerStreet --etc
from CustomerAddress where CustomerID = CI.Add_CustID) CA
join TransactionInfo TI on TI.Trn_CustID = CI.CustomerID
JOIN TransactionElements ON CI.CustomerID = TE.TrE_CustID
GROUP BY CustomerName, CustomerStreet --etc
ORDER BY SUM (TrE_TranValue) DESC

If CustomerInfo has many CustomerAddress, this query will make CustomerInfo returned for each CustomerAddress (a cartesian product) :
join CustomerAddress on CustomerID = Add_CustID
So if you need to get only one address you have to add conditions needed to choose single CustomerAddress :
join CustomerAddress on CustomerID = Add_CustID where <conditions>

MS SQL problem: Max and GUID

select first(orderid), accountid
from [Order]
group by AccountId
order by DateCreated desc
first() is invalid function.
max() does not work for unique identifiers
How would I get the last orderid created for all accounts? Thanks.

Something like (untested):
;WITH CTE_LatestOrders AS (
select accountid, lastcreated = max(datecreated)
from [Order]
group by accountid
)
select
accountid, orderid
from
[Orders] o
join CTE_LatestOrders l
on o.AccountID = l.AccountID
and o.datecreated = l.lastcreated

Max() does work for unique identifiers as of MS SQL 2012

You can proceed with below as well.
Select Temp.orderid, T.AccountId, T.DateCreated
From
(
Select AccountId, max(DateCreated) as DateCreated
From [Order]
Group By AccountId
)T
Inner Join [Order] Temp on Temp.AccountId = T.AccountId
AND Temp.DateCreated = T.DateCreated
A CTE is not a UDT/temp table; think of a CTE as a view that is defined only for your current query. Just like a view, a CTE is expanded and folded into the overall query plan. Global optimization will still occur, but do not think that just because you use a CTE you will only execute the query once. Here is a trivial example that fits in this space: WITH vw AS ( SELECT COUNT(*) c FROM Person ) SELECT a.c, b.c FROM vw a, vw b; The query plan will clearly show two scans/aggregations and a join instead of just projecting the same result twice.