Best way to aggregate data - Update query or subquery? - sql-server

I'm trying to aggregate some customer order data into one table for analysis. The data is the number of products a customer orders, then trying to determine whether the order is a small, medium, or large order, then determine the total products they bought and the cost of the order by OrderSize.
Small order - 1 - 2 products
Medium order - 3 - 4 products
Large Order - >=5 products
Here is the data:
CustomerID OrderID OrderSize OrderTotal
1 800 1 $20
2 801 1 $10
3 802 4 $85
1 803 1 $30
2 804 8 $120
3 805 1 $40
Here is the table I am attempting to build (easier to just post an image):
I could run an update query like this to populate the table:
-- Build the table --
CREATE TABLE Customers (
CustomerID varchar(10) PRIMARY KEY,
SmallOrderCount int,
SmallOrderProducts int,
SmallOrderTotal money,
MedOrderCount int,
MedOrderProducts int,
MedOrderTotal money,
LargeOrderCount int,
LargeOrderProducts int,
LargeOrderTotal money
);
-- Insert the unique customers --
INSERT INTO Customers
SELECT CustomerID FROM Orders GROUP BY CustomerID;
-- Update to populate the order information --
UPDATE Customers
SET SmallOrderCount = (SELECT count(*) FROM Orders WHERE OrderSize BETWEEN 1 AND 2 AND Orders.CustomerID = Customers.CustomerID);
UPDATE Customers
SET SmallOrderProducts = (SELECT sum(OrderSize) FROM Orders WHERE OrderSize BETWEEN 1 AND 2 AND Orders.CustomerID = Customers.CustomerID);
UPDATE Customers
SET SmallOrderTotal = (SELECT sum(OrderTotal) FROM Orders WHERE OrderSize BETWEEN 1 AND 2 AND Orders.CustomerID = Customers.CustomerID);
Then I could repeat this for the remaining 6 columns.
However, this seems like a lot of work. Is there a way to populate my Customer table using a subquery that may be less work? Or is my approach above the most direct?

You can do all of it in one INSERT command with CASE statements:
INSERT INTO Customers
SELECT CustomerID
,sum(CASE WHEN OrderSize BETWEEN 1 AND 2 THEN 1 ELSE 0 END) AS SmallOrderCount
,sum(CASE WHEN OrderSize BETWEEN 1 AND 2 THEN OrderSize ELSE 0 END) AS SmallOrderProducts
,sum(CASE WHEN OrderSize BETWEEN 1 AND 2 THEN OrderTotal ELSE 0 END) AS SmallOrderTotal
,sum(CASE WHEN OrderSize BETWEEN 3 AND 4 THEN 1 ELSE 0 END) AS MedOrderCount
,sum(CASE WHEN OrderSize BETWEEN 3 AND 4 THEN OrderSize ELSE 0 END) AS MedOrderProducts
,sum(CASE WHEN OrderSize BETWEEN 3 AND 4 THEN OrderTotal ELSE 0 END) AS MedOrderTotal
,sum(CASE WHEN OrderSize > 4 THEN 1 ELSE 0 END) AS LargeOrderCount
,sum(CASE WHEN OrderSize > 4 THEN OrderSize ELSE 0 END) AS LargeOrderProducts
,sum(CASE WHEN OrderSize > 4 THEN OrderTotal ELSE 0 END) AS LargeOrderTotal
FROM Orders
GROUP BY CustomerID;
See this working demo for the full query on data.SE.

One insert should be enough:
INSERT INTO Customers(CustomerID,SmallOrderCount,SmallOrderProducts,
SmallOrderTotal)
SELECT a.CustomerID, COUNT(a.*) as cnt, sum(a.OrderSize) as OrderSize,
sum(a.OrderTotal) as OrderTotal
FROM Orders a
WHERE a.OrderSize BETWEEN 1 AND 2
GROUP BY a.CustomerID
The query above will insert only customers whose order size is between 1 and 2. If you need to insert others as well, you can use :
INSERT INTO Customers(CustomerID,SmallOrderCount,SmallOrderProducts,
SmallOrderTotal)
SELECT a.CustomerID,
COUNT(CASE WHEN a.OrderSize BETWEEN 1 AND 2 THEN 1 END) as cnt,
sum(CASE WHEN a.OrderSize BETWEEN 1 AND 2 THEN a.OrderSize ELSE 0 END) as OrderSize,
sum(CASE WHEN a.OrderSize BETWEEN 1 AND 2 THEN a.OrderTotal ELSE 0 END ) as OrderTotal
FROM Orders a
GROUP BY a.CustomerID

I'd only create a single query (which can be used as view) for that and not a whole new persisted table.
WITH cteOrders AS (
SELECT CustomerID,
CASE WHEN OrderSize BETWEEN 1 AND 2 THEN OrderSize END SmallOrderProducts,
CASE WHEN OrderSize BETWEEN 1 AND 2 THEN OrderTotal END SmallOrderTotal,
CASE WHEN OrderSize BETWEEN 3 AND 4 THEN OrderSize END MediumOrderProducts,
CASE WHEN OrderSize BETWEEN 3 AND 4 THEN OrderTotal END MediumOrderTotal,
CASE WHEN OrderSize > 4 THEN OrderSize END LargeOrderProducts,
CASE WHEN OrderSize > 4 THEN OrderTotal END LargeOrderTotal
FROM Orders
)
SELECT CustomerID,
COUNT(SmallOrderProducts) SmallOrderCount,
COALESCE(SUM(SmallOrderProducts), 0) SmallOrderProducts,
COALESCE(SUM(SmallOrderTotal), 0) SmallOrderTotal,
COUNT(MediumOrderProducts) MediumOrderCount,
COALESCE(SUM(MediumOrderProducts), 0) MediumOrderProducts,
COALESCE(SUM(MediumOrderTotal), 0) MediumOrderTotal,
COUNT(LargeOrderProducts) LargeOrderCount,
COALESCE(SUM(LargeOrderProducts), 0) LargeOrderProducts,
COALESCE(SUM(LargeOrderTotal), 0) LargeOrderTotal
FROM cteOrders
GROUP BY CustomerID

You can do this by using single INSERT INTO SELECT statement
INSERT INTO Customers
SELECT * ,
(SELECT count(*) FROM Orders WHERE OrderSize BETWEEN 1 AND 2 AND Orders.CustomerID = CustomerID),
(SELECT sum(OrderSize) FROM Orders WHERE OrderSize BETWEEN 1 AND 2 AND Orders.CustomerID = CustomerID)
FROM CUSTOMERS

Related

SQL Pivot to be used in a View

I am trying to create a view that will sum the total hours of different work types, as well as the quantity of different products used to a RefNumber.
I have successfully pivoted and summed the hours of work grouped by RefNumber, but now I am scratching my head on how to sum the total product use by Ref Number.
Sample Data
Download Excel of data
Keep in mind, on some RefNumbers, there will be multiple entries of the same ProductID. We may have had a few guys using the same product in different quantities.
Here is what I am trying to the Pivot the data too:
Pivot Data
So far, I have been able to accomplish summing the hours of each work type, and pivoting that data into a single line item grouped by the Ref Number using the following SQL code:
SELECT RefNumber,
SUM (CASE WHEN WorkType = 'Blast' THEN THours ELSE NULL END) AS TBlast,
SUM (CASE WHEN WorkType = 'Wheel' THEN THours ELSE NULL End) As TWheel,
SUM (CASE WHEN WorkType = 'Painting' THEN THours ELSE NULL END) AS TPainter,
SUM (CASE WHEN WorkType = 'Mask/Prep' THEN Thours ELSE NULL END) As TMask,
SUM (CASE WHEN WorkType = 'Demask/Touch Up' THEN Thours ELSE NULL END) As TDMask,
SUM (CASE WHEN WorkType = 'Handling: Raw' THEN THours ELSE NULL END) As TRHand,
SUM (CASE WHEN WorkType = 'Handling: Product' THEN Thours ELSE NULL END) As TPHand,
SUM (CASE WHEN WorkType = 'Wheel:Assist' THEN Thours ELSE NULL END) As TAWheel,
SUM (CASE WHEN WorkType = 'Metalizing' THEN Thours ELSE NULL END) As TMetal
FROM (
SELECT [RefNumber], [WorkType], SUM (Hours)/60 As THours
FROM [dbo].[Vw_Beta_CostLog]
GROUP BY RefNumber, WorkType
) sub
GROUP BY RefNumber
ORDER BY RefNumber
Any Ideas on how to modify this code base to Pivot the the distinct product ID's into there own column, and sum the use of those products in a second column?
Also, I want to be able to use this as a view, so I am trying to avoid dynamic pivots.
EDIT: Forgot to mention, there will be at most 4 unique products used per ref number.
RAW DATA
GUID EmpName RefNumber DateInt Hours WorkType ProductID PQty
P-3468 Gary Hahn 114204 20181008 132 Painting NULL NULL
P-3473 Gary Hahn 114204 20181009 204 Painting NULL NULL
P-3475 Gary Hahn 114204 20181009 120 Painting NULL NULL
F-31915 Jose Flores 114204 20181007 60 Handling: Raw NULL NULL
F-31941 Jose Flores 114204 20181008 30 Handling: Raw NULL NULL
F-31951 Chris Pollock 114204 20181008 30 Handling: Raw NULL NULL
F-32076 Chris Pollock 114204 20181010 120 Handling: Product NULL NULL
F-32109 Chris Pollock 114204 20181011 90 Handling: Product NULL NULL
F-32301 Daryl Underwood 114204 20181015 15 Handling: Product NULL NULL
B-6594 David Martinez 114204 20181007 150 Blast NULL NULL
B-6599 Emiliano Barrios 114204 20181008 66 Blast NULL NULL
B-6617 Jose Molina 114204 20181009 30 Blast NULL NULL
P-3468 Gary Hahn 114204 20181008 NULL Primer 11 3
P-3473 Gary Hahn 114204 20181009 NULL Intermediate 890 2
P-3475 Gary Hahn 114204 20181009 NULL Finish 134HG 2
Output I am looking for
RefNumber Blast Painting Handling: Raw Handling: Product Product1 P1Qty Product2 P2Qty Product3 P3Qty
114204 246 456 120 225 11 3 890 2 134HG 2
You can "pivot" non-numeric data by using MAX() instead of SUM() and continue to use case expressions as you are already doing.
SELECT
RefNumber
, SUM (CASE WHEN WorkType = 'Blast' THEN Hours/60 ELSE NULL END) AS TBlast
, SUM (CASE WHEN WorkType = 'Wheel' THEN Hours/60 ELSE NULL End) As TWheel
, SUM (CASE WHEN WorkType = 'Painting' THEN Hours/60 ELSE NULL END) AS TPainter
, SUM (CASE WHEN WorkType = 'Mask/Prep' THEN Hours/60 ELSE NULL END) As TMask
, SUM (CASE WHEN WorkType = 'Demask/Touch Up' THEN Hours/60 ELSE NULL END) As TDMask
, SUM (CASE WHEN WorkType = 'Handling: Raw' THEN Hours/60 ELSE NULL END) As TRHand
, SUM (CASE WHEN WorkType = 'Handling: Product' THEN Hours/60 ELSE NULL END) As TPHand
, SUM (CASE WHEN WorkType = 'Wheel:Assist' THEN Hours/60 ELSE NULL END) As TAWheel
, SUM (CASE WHEN WorkType = 'Metalizing' THEN Hours/60 ELSE NULL END) As TMetal
, MAX (CASE WHEN rn = 1 THEN WorkType END) As WorkType1
, MAX (CASE WHEN rn = 1 THEN ProductID END) As Product1
, MAX (CASE WHEN rn = 1 THEN PQty END) As Qty1
, MAX (CASE WHEN rn = 2 THEN WorkType END) As WorkType2
, MAX (CASE WHEN rn = 2 THEN ProductID END) As Product2
, MAX (CASE WHEN rn = 2 THEN PQty END) As Qty2
, MAX (CASE WHEN rn = 3 THEN WorkType END) As WorkType3
, MAX (CASE WHEN rn = 3 THEN ProductID END) As Product3
, MAX (CASE WHEN rn = 3 THEN PQty END) As Qty3
FROM (
SELECT
[RefNumber]
, [WorkType]
, [ProductID]
, Hours
, PQty
, ROW_NUMBER() OVER (PARTITION BY RefNumber, case when ProductID IS NULL then 0 else 1 end ORDER BY ProductID) rn
FROM [Vw_Beta_CostLog]
) sub
GROUP BY
RefNumber
ORDER BY
RefNumber
Note that you cannot create a view which dynamically increases or decreases the number of columns based on how many products are used.
see online demo: https://rextester.com/SBQUP24417

Columns created by case statement do not group by

Here is a T-SQL query:
SELECT
A.DateStamp,
CASE WHEN A.T = 10 THEN A.counts END AS HT,
CASE WHEN A.T = 98 THEN A.counts END AS BP,
CASE WHEN A.T = 94 THEN A.counts END AS MP,
CASE WHEN A.T = 12 THEN A.counts END AS SP
FROM
A
WHERE
(A.date_time BETWEEN GETDATE() - 60 AND GETDATE() - 30) -- say
--GROUP BY A.DateStamp,A.T,A.counts
ORDER BY
CONVERT(DATE, A.DateStamp) ASC
It works well. Previously I was using multiple copies of table A and all joined. over here results are correct but split in multiple rows like:
date | BP | MP | SP | HT |
-----------+----+----+----+----+
22/10/2017 12 34 56 78
Looks Like -- -- -- --
22/10/2017 12 -- -- --
22/10/2017 -- 34 -- --
22/10/2017 -- -- 56 --
22/10/2017 -- -- -- 78
You need to use conditional aggregation for this and GROUP BY just the DateStamp field:
SELECT A.DateStamp,
SUM(CASE WHEN A.T=10 THEN A.counts ELSE 0 END) AS HT,
SUM(CASE WHEN A.T=98 THEN A.counts ELSE 0 END) AS BP,
SUM(CASE WHEN A.T=94 THEN A.counts ELSE 0 END) AS MP,
SUM(CASE WHEN A.T=12 THEN A.counts ELSE 0 END) AS SP
FROM A
WHERE (A.date_time BETWEEN getdate()-60 AND getdate()-30)
GROUP BY A.DateStamp
ORDER BY convert(date,A.DateStamp) ASC
i guess you need to group by date only
SELECT convert(date,A.DateStamp) AS Date,
SUM(CASE WHEN A.T=10 THEN A.counts END) AS HT ,
SUM(CASE WHEN A.T=98 THEN A.counts END) AS BP,
SUM(CASE WHEN A.T=94 THEN A.counts END) AS MP,
SUM(CASE WHEN A.T=12 THEN A.counts END) AS SP
FROM A
WHERE A.date_time BETWEEN getdate()-60 AND getdate()-30
GROUP BY convert(date,A.DateStamp)
ORDER BY convert(date,A.DateStamp) ASC

check all the values of a column

select all the departments having students with even roll number
Dept No Roll No Student Name
1 1 lee
1 2 scott
2 2 scott
2 4 smith
1 4 smith
This should result in DEpt no 2 as it has only students with roll number divisible by 2
Another(imo easy and lightweight) way is using NOT EXISTS and DISTINCT:
SELECT DISTINCT [Dept No]
FROM dbo.TableName t
WHERE NOT EXISTS
(
SELECT 1 FROM dbo.TableName t2
WHERE t.[Dept No] = t2.[Dept No]
AND t2.[Roll No] % 2 = 1
)
Demo
If there is no odd number all must be even.
You can use GROUP BY with HAVING like this.
Query
SELECT [Dept No]
FROM departments
GROUP BY [Dept No]
HAVING SUM(CASE WHEN [Roll No] % 2 = 0 THEN 1 ELSE 0 END) > 1
AND SUM(CASE WHEN [Roll No] % 2 = 1 THEN 1 ELSE 0 END) = 0
Explanation
The query returns the departments if there a rollno which is even using SUM(CASE WHEN [Roll No] % 2 = 0 THEN 1 ELSE 0 END) > 1. If there is any rollno with odd roll no, SUM(CASE WHEN [Roll No] % 2 = 1 THEN 1 ELSE 0 END) will return non zero sum and that department will be excluded.
declare #t table (Dept int,Rno int,Student varchar(10))
insert into #t (Dept,Rno,Student)values (1,1,'lee'),(1,2,'scott'),(2,2,'scott'),(2,4,'smith'),(1,4,'smith')
SELECT Dept,Rno,Student
FROM (SELECT ROW_NUMBER () OVER (ORDER BY Rno DESC) row_number, Dept,Rno,Student
FROM #t) a WHERE (row_number%2) = 0

Count of rows for 2 days

My data is in below format:
employee order id date
a 123 01/06/2013
b 124 02/06/2013
a 125 02/06/2013
a 129 02/06/2013
I need the data in below format:
employee day 1 day 2
a 1 2
b 0 1
Try this one -
DECLARE #temp TABLE
(
dtStart DATETIME
, employees CHAR(1)
)
INSERT INTO #temp (employees, dtStart) VALUES('a','01/06/2013')
INSERT INTO #temp (employees, dtStart) VALUES('a','01/06/2013')
INSERT INTO #temp (employees, dtStart) VALUES('b','02/06/2013')
SELECT
employees
, day1 = COUNT(CASE WHEN DAY(dtStart) = 1 THEN 1 END)
, day2 = COUNT(CASE WHEN DAY(dtStart) = 2 THEN 1 END)
FROM #temp
--WHERE dtStart BETWEEN '01/06/2013' AND '30/06/2013'
GROUP BY employees
Something along these lines should work (depending on whether you have a fixed amount of days or not):
select employee,
SUM(CASE WHEN date = '01/06/2013' THEN 1 ELSE 0 END) as day1,
SUM(CASE WHEN date = '02/06/2013' THEN 1 ELSE 0 END) as day2
from table
group by employee
select distinct employees,
SUM(CASE WHEN dtStart = '01/06/2013' THEN 1 ELSE 0 END) as day1,
SUM(CASE WHEN dtStart = '02/06/2013' THEN 1 ELSE 0 END) as day2
from yourTable
group by dtStart,employees
see your demo

Add SubQuery in existing subquery

Table Structures:
tblCustomer
Customer_id created field1 field2 cardno
--------------------------------------------------------
1014 Test1 Cell Phone 123146 1234567890
1015 Test2 Email abc#xyz.com 2345678891
tbl_TransactionDishout
Trnx_id offerNo TerminalID Created cardno
-------------------------------------------------------------------
1 1014 170924690436418 2010-05-25 12:51:59.547 1234567890
Is it possible to get the result as below date-wise records:
Enrolled Enrolled as Email Enrolled as Text Deals Redeemed
<First Date> 7 5 2 6
<Next Date> 9 3 6 14
My current query is something like this:
select created,
count(field1) Enrolled,
count(case field1 when 'E-mail' then 1 end) Enrolled_as_Email,
count(case field1 when 'Cell Phone' then 1 end) Enrolled_as_Cell
from tblCustomer c
group by created
order by created desc
But It's giving me the result of the date contained only in tblCustomer table..
Now, How to get Deals_redeemed..?
relation between tbl_transaction and tblCustomer is having same cardno...
In my understanding table tbl_TransactionDishout is an offer, and if it is followed through a record will be inserted into tblCustomers; if not, nothing will change. So deals redeemed is count of non-existing records in tblCustomers for given cardno:
select t.created,
count(c.field1) Enrolled,
count(case c.field1 when 'E-mail' then 1 end) Enrolled_as_Email,
count(case c.field1 when 'Cell Phone' then 1 end) Enrolled_as_Cell,
count(case when c.field1 is null then 1 end) [Deals Redeemed]
from tbl_TransactionDishout t left join tblCustomer c
on t.cardno = c.cardno
group by t.created
order by t.created desc
EDIT: c.created changed to t.created

Resources