Aggregate function help - mssql - sql-server

Need some help getting sales by month for a customer and his agent channel
have a customer table that looks something like this
customer table:
cust_id bigint,
agent_id bigint,
name varchar(200),
customer_level int,
date_signed datetime
a customer level = 1 a customer who can transact; agent_id = null
a customer_level = 2 an agent of a level_1 customer and can transact, level_1 cust_id = level_2 agent_id
Transaction table:
tx_id bigint,
tx_date datetime,
description varchar(200),
amount money,
cust_id bigint
SQL
SELECT datepart(month, t.tx_date) AS TX_MONTH
,CASE
WHEN c.customer_level = 2
THEN a.NAME
ELSE c.NAME
END AS CUSTOMER
,count(t.amount)
,sum(t.amount)
FROM TRANSACTION t
INNER JOIN customer c ON t.cust_id = c.cust_id
LEFT JOIN customer a ON c.agent_id = a.cust_id
WHERE t.tx_date >= '2014-01-01 00:00:00.000'
GROUP BY datepart(month, t.tx_date)
ORDER BY 1 ASC
===
for starters this sql wont work need to group by c.name, a.name
however the intended result will not be realised....which is to get monthly sales by a customer and his agent channel
HELP!

Assumptions I'm making:
You will likely want year as part of the result set in case this report is used across the new year boundary.
You may want to pass parameters to this query, I'm going to use #CustID BIGINT, and #StartDate DATE, #EndDate DATE to help limit rows for performance reasons in the case of large result sets.
You will need to group by the expression used in the select clause.
Try this:
SELECT MONTH(t.tx_date) AS TX_MONTH, YEAR(t.tx_date) AS TX_YEAR
,CASE
WHEN c.customer_level = 2
THEN a.NAME
ELSE c.NAME
END AS CUSTOMER
,count(t.amount)
,sum(t.amount)
FROM TRANSACTION t
INNER JOIN customer c ON t.cust_id = c.cust_id
LEFT JOIN customer a ON c.agent_id = a.cust_id
WHERE CAST(t.tx_date AS DATE) BETWEEN #StartDate AND #EndDate
AND c.cust_id = #CustID
GROUP BY YEAR(t.tx_date), MONTH(t.tx_date)
,CASE
WHEN c.customer_level = 2
THEN a.NAME
ELSE c.NAME
END
ORDER BY YEAR(t.tx_date), MONTH(t.tx_date)
This should give you a result set with Year, Month of transactions the Agent name or Customer Name, for a given customer between two dates.

Related

Build timeline from start and end dates

I have a subscription table with a user ID, a subscription start date and a subscription end date. I also have a calendar table with a datestamp field, that is every single date starting from the first subscription date in my subscription table.
I am trying to write something that would give me a table with a date column and three numbers: number of total active (on that day), number of new subscribers, number of unsubscribers.
(N.B. I tried to insert sample tables using the suggested GitHub Flavoured Markdown but it just all goes into one row.)
Currently I am playing with a query that creates multiple joins between the two tables, one for each number:
select a.datestamp
,count(distinct case when b_sub.UserID is not null then b_sub.UserID end) as total_w_subscription
,count(distinct case when b_in.UserID is not null then b_in.UserID end) as total_subscribed
,count(distinct case when b_out.UserID is not null then b_out.UserID end) as total_unsubscribed
from Calendar as a
left join Subscription as b_sub -- all those with subscription on given date
on b_sub.sub_dt <= a.datestamp
and (b_sub.unsub_dt > a.datestamp or b_sub.unsub_dt is null)
left join Subscription as b_in -- all those that subscribed on given date
on b_in.sub_dt = a.datestamp
left join Subscription as b_out -- all those that unsubscribed on given date
on b_out.unsub_dt = a.datestamp
where a.datestamp > '2021-06-10'
group by a.datestamp
order by datestamp asc
;
I have indexed the date fields in both tables. If I only look at one day, it runs in 3 seconds. Two days already takes forever. The Sub table is over 2.6M records and ideally I'll need my timeline to begin sometime in 2012.
What would be the most time efficient way to do this?
You're on the right track. I created some table variables and assumed a data structure that has each subscription include a start and end date.
--Create #dates table variable for calendar
DECLARE #startDate DATETIME = '2018-01-01'
DECLARE #endDate DATETIME = '2021-06-18'
DECLARE #dates TABLE
(
reportingdate DATETIME
)
WHILE #startDate <= #endDate
BEGIN
INSERT INTO #dates SELECT #startDate
SET #startDate += 1
END
--Create #subscriptions table variable for subcriptions to join onto calendar
DECLARE #subscriptions TABLE
(
id INT
,startDate DATETIME
,endDate DATETIME
)
INSERT INTO #subscriptions
VALUES
(1,'2018-01-01 00:00:00.000','2019-10-07 00:00:00.000')
,(2,'2018-01-11 00:00:00.000','2019-12-21 00:00:00.000')
,(3,'2019-04-21 00:00:00.000','2020-03-19 00:00:00.000')
,(4,'2019-12-09 00:00:00.000','2020-05-14 00:00:00.000')
,(5,'2020-04-26 00:00:00.000','2020-07-06 00:00:00.000')
,(6,'2020-05-02 00:00:00.000',NULL)
,(7,'2020-08-31 00:00:00.000','2020-10-29 00:00:00.000')
,(8,'2020-12-13 00:00:00.000','2021-01-13 00:00:00.000')
,(9,'2021-02-12 00:00:00.000','2021-04-19 00:00:00.000')
,(10,'2021-06-10 00:00:00.000',NULL)
;
Then I join the subscription onto the calendar table.
--CTE to join subscription onto calendar and use ROW_NUMBER functions
WITH cte AS (
SELECT
s.id AS SubID
,d.ReportingDate
,ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY d.ReportingDate) AS asc_rn --used to identify 1st
,ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY d.ReportingDate DESC) AS desc_rn --used to identify last
,CASE WHEN s.endDate IS NULL THEN 1 ELSE 0 END AS ActiveSub
FROM #subscriptions s
LEFT JOIN #dates d ON
d.reportingdate BETWEEN s.startDate AND ISNULL(s.endDate,'9999-12-31')
)
I used ROW_NUMBER to identify the first and last date rows of the subscription, as well as checking if the subscription endDate is NULL (still active). I then query the CTE to count subscriptions grouped by day, as well as summing new and terminated subscriptions grouped by day.
--Query CTE using asc_rn, desc_rn, and ActiveSub to identify new subscribers and unsubscribers.
SELECT
ReportingDate
,COUNT(*) AS TotalSubscribers
,SUM(CASE WHEN asc_rn = 1 THEN 1 ELSE 0 END) AS NewSubscribers
,SUM(CASE WHEN desc_rn = 1 AND ActiveSub = 0 THEN 1 ELSE 0 END) AS UnSubscribers
FROM cte
GROUP BY ReportingDate
ORDER BY ReportingDate

SQL Query returning multiple values

I am trying to write a query that returns the time taken by an Order from start to completion.
My table looks like below.
Order No. Action DateTime
111 Start 3/23/2018 8:18
111 Complete 3/23/2018 9:18
112 Start 3/24/2018 6:00
112 Complete 3/24/2018 11:10
Now I am trying to calculate the date difference between start and completion of multiple orders and below is my query:
Declare #StartDate VARCHAR(100), #EndDate VARCHAR(100), #Operation VARCHAR(100)
declare #ORDERTable table
(
order varchar(1000)
)
insert into #ORDERTable values ('111')
insert into #ORDERTable values ('112')
Select #Operation='Boiling'
set #EndDate = (SELECT DATE_TIME from PROCESS WHERE ACTION='COMPLETE' AND ORDER in (select order from #ORDERTable) AND OPERATION=#Operation)
---SELECT #EndDate
set #StartDate = (SELECT DATE_TIME from PROCESS WHERE ACTION='START' AND ORDER in (select order from #ORDERTable) AND OPERATION=#Operation)
---SELECT #StartDate
SELECT DATEDIFF(minute, #StartDate, #EndDate) AS Transaction_Time
So, I am able to input multiple orders but I want to get multiple output as well.
And my second question is if I am able to achieve multiple records as output, how am I gonna make sure which datediff is for which Order?
Awaiting for your answers. Thanks in advance.
I am using MSSQL.
You can aggregate by order number and use MAX or MIN with CASE WHEN to get start or end time:
select
order_no,
max(case when action = 'Start' then date_time end) as start_time,
max(case when action = 'Completed' then date_time end) as end_time,
datediff(
minute,
max(case when action = 'Start' then date_time end),
max(case when action = 'Completed' then date_time end)
) as transaction_time
from process
group by order_no
order by order_no;
You can split up your table into two temp tables, cte's, whatever, and then join them together to find the minutes it took to complete
DECLARE #table1 TABLE (OrderNO INT, Action VARCHAR(100), datetime datetime)
INSERT INTO #table1 (OrderNO, Action, datetime)
VALUES
(111 ,'Start' ,'3/23/2018 8:18'),
(111 ,'Complete' ,'3/23/2018 9:18'),
(112 ,'Start' ,'3/24/2018 6:00'),
(112 ,'Complete' ,'3/24/2018 11:10')
;with cte_start AS (
SELECT orderno, Action, datetime
FROM #table1
WHERE Action = 'Start')
, cte_complete AS (
SELECT orderno, Action, datetime
FROM #table1
WHERE Action = 'Complete')
SELECT
start.OrderNO, DATEDIFF(minute, start.datetime, complete.datetime) AS duration
FROM cte_start start
INNER JOIN cte_complete complete
ON start.OrderNO = complete.OrderNO
Why don't you attempt to approach this problem with a set-based solution? After all, that's what a RDBMS is for. With an assumption that you'd have orders that are of interest to you in a table variable like you described, #ORDERTable(Order), it would go something along the lines of:
SELECT DISTINCT
[Order No.]
, DATEDIFF(
minute,
FIRST_VALUE([DateTime]) OVER (PARTITION BY [Order No.] ORDER BY [DateTime] ASC),
FIRST_VALUE([DateTime]) OVER (PARTITION BY [Order No.] ORDER BY [DateTime] DESC)
) AS Transaction_Time
FROM tableName
WHERE [Order No.] IN (SELECT Order FROM #ORDERTable);
This query works if all the values in the Action attribute are either Start or Complete, but also if there are others in between them.
To read up more on the FIRST_VALUE() window function, check out the documentation.
NOTE: works in SQL Server 2012 or newer versions.

How to get #StartDate - 30 in SQL Server 2014?

I have a query that I use to generate statements that shows Amount Due for the month which is calculated based on date parameters- #StartDate and #EndDate
Included in the statement, I would like to add the Amount Due from the previous month (Previous Month's Balance owing) for a date range #StartDate - 30 to #EndDate - 30. What would be the code to run that?
My code:
set nocount on
Declare #S AS DateTime = ISNULL(#StartDate,DateAdd(d,-60,GETDATE()))
Declare #anum as nvarchar(8) = ISNULL(#panum,'25991275')
Declare #E AS DateTime = ISNULL(#EndDate,DateAdd(d,-0,GETDATE()))
SELECT A.AccountNumber
,C.FirstName + ' ' + C.LastName CustName
,[InvoiceNumber]
,[StatementDate]
,[NewCharges]
,[AmountDue]
,[Charges]
,[AccountFee]
,[Interest]
,[Payments]
,[Refunds]
,[DueDate]
FROM [StatementSummary] S
INNER JOIN Account A ON S.AccountID = A.Accountid
INNER JOIN Contact C ON A.AccountId = C.AccountId
WHERE A.AccountNumber = #anum
AND StatementDate >= #S
AND StatementDate <= #E
ORDER BY StatementDate DESC
I was thinking of making another Dataset to run the following code:
SELECT Top 1 AcctBalance
FROM [FinMaster]
WHERE AcctID = #anum
AND BusinessDay >= #S - 30
AND BusinessDay <= #E - 30
ORDER BY AcctBalance DESC
How do I add the date range to back to the previous month's?
If I could add this second code as a line in the first code then I won't need to create a second dataset for my report.
Using OUTER APPLY and EOMonth function to get the Last Month value
Just a logic and not using your fields
declare #reportdate date = getdate()
select a.*, x.field....
from table1 A
OUTER apply ( --- to get last month data, can be null.. similar to left outer join but in a cartesian way of display
select b.field1, b.field2, b....
from table1 B
where
b.product_id = a.product_id and
trans_date
between -- between last month based on the #reportdate
dateadd(day,1,eomonth(dateadd(month,-2,#reportdate))) -- or a.trans_date
and
eomonth(dateadd(month,-1,#reportdate))
) x
where trans_date
between -- your reporting date, can be any date
dateadd(day,1,eomonth(dateadd(month,-1,#reportdate)))
and eomonth(#reportdate)

How can I select a substring based on day within a date range

I'm trying to create a script that calculates the expected results from a stored procedure. There are several tables related to the sp that share a BatchId. x_NonFullTimeEmployees has a StatusString column whose length is equal the the period between that MeasurementStartDate and the MeasurementStartDate. I.E. for a 7 day period it may look like 'TAAAAAA'. I am selecting time cards in the same period and summing their values. My problem is I only want to use the TimeCard values where the StartDate is on a day represented by 'A' in the StatusString. How can I do this?
DECLARE #batchid INT = 1;
WITH CTE_ACTIVENFS
AS
(
select e.EmployeeId,e.OrganizationId,e.MeasurementStartDate, e.MeasurementEndDate
from x_VHEActiveNonFullTimeEmployees e
where BatchId = #batchid
)
,
CTE_RESULTS
AS
(
SELECT tc.OrganizationId ,tc.EmployeeId, SUM(tc.workhour) AS "Total Paid Hours",
DATEDIFF(month, (SELECT TOP 1 StartDate FROM TimeCard WHERE EmployeeId = tc.EmployeeId),(SELECT TOP 1 StartDate FROM TimeCard WHERE EmployeeId = tc.EmployeeId ORDER BY StartDate DESC))
AS "Total Paid Period",
SUM(tc.workhour)/ DATEDIFF(month, (SELECT TOP 1 StartDate FROM TimeCard WHERE EmployeeId = tc.EmployeeId),(SELECT TOP 1 StartDate FROM TimeCard WHERE EmployeeId = tc.EmployeeId ORDER BY StartDate DESC))
AS "Average Worked Hours"
FROM TimeCard tc INNER JOIN CTE_ACTIVENFS hire ON hire.EmployeeId = tc.EmployeeId
WHERE tc.EmployeeId IN (SELECT EmployeeId FROM CTE_ACTIVENFS)
AND tc.StartDate BETWEEN (SELECT TOP 1 MeasurementStartDate FROM CTE_ACTIVENFS) AND (SELECT TOP 1 MeasurementEndDate FROM CTE_ACTIVENFS)
AND tc.OrganizationId = (SELECT TOP 1 OrganizationId FROM CTE_ACTIVENFS)
GROUP BY tc.EmployeeId, tc.OrganizationId
)
SELECT * FROM CTE_RESULTS
First I'd like to say that your query is really a mess. All the SELECT TOP 1 should be truned into joins. Now for your question, I'd do it something like this:
select *
from x_NonFullTimeEmployees hire
inner join timecard tc
on tc.EmployeeId = hire.EmployeeId
and tc.StartDate between hire.MeasurementStartDate and hire.MeasurementEndDate
and substring(
hire.StatusString,
datediff(dd, hire.MeasurementStartDate, tc.StartDate) +1,
1) = 'A'
where hire.BatchId = #batchid

SQL Server - Get customers with nth order in specific date range

I'm tasked with the following:
Select a list of all customers who had their nth order during a certain date range (usually a specific month).
This list needs to contain: customer id, sum of first n orders
My tables are something like this:
[dbo.customers]: customerID
[dbo.orders]: orderID, customerID,
orderDate, orderTotal
Here is what I've tried so far:
-- Let's assume our threshold (n) is 10
-- Let's assume our date range is April 2013
-- Get customers that already had n orders before the beginning of the given date range.
DECLARE #tmpcustomers TABLE (tmpcustomerID varchar(8))
INSERT INTO
#tmpcustomers
SELECT
c.customerID
FROM
orders o
INNER JOIN customers c ON o.customerID = c.customerID
WHERE
o.orderDate < '2013-04-01'
GROUP BY c.customerID
HAVING (COUNT(o.orderID) >= 10)
-- Now get all customers that have n orders sometime within the given date range
-- but did not have n orders before the beginning of the given date range.
SELECT
a.customerID, SUM(orderTotal) AS firstTenOrderTotal
SELECT
o.customerID, o.orderID, o.orderTotal
FROM
orders o
INNER JOIN customers c ON c.customerID = o.customerID
WHERE
a.customerID NOT IN ( SELECT tmpcustomerID FROM #tmpcustomers )
AND
o.orderDate > '2013-04-01'
AND
o.orderDate < '2013-05-01'
GROUP BY c.customerID
HAVING COUNT(o.orderID) >= 10
This seems to work but it's clunky and slow. Another big problem is that the firstTenOrderTotal is actually the SUM of the total amount of orders by the end of the given date range and not necessarily the first 10.
Any suggestions for a better approach would be much appreciated.
In the insert to #tmpcustomers, why are you joining back to the customer table? The order table already has the customerID that you want. Also, why are you looking for orders where the order date is before your date range? Don't you just want customers with more than n orders between a date range? This will make the second query easier.
By only having the customers with n or more orders in the table variable #tmpcustomers, you should just be able to join it and the orders table in the second query to get the sum of all the orders for those customers where you would once again limit order table records to your date range (so you do not get orders outside of that range). This will remove the having statement and the join to the customers table in your final result query.
Give this a try. Depending on your order distribution it may perform better. In this query im assembling the list of orders in the range, and then looking back to count the number of prior orders (also grabbing the orderTotal).
note: I am assuming the orderID increments as orders are placed.
If this isnt the case just use a row_number over the date to project the sequence into the query.
declare #orders table (orderID int primary key identity(1,1), customerID int, orderDate datetime, orderTotal int)
insert into #orders (customerID, orderDate, orderTotal)
select 1, '2013-01-01', 1 union all
select 1, '2013-01-02', 2 union all
select 1, '2013-02-01', 3 union all
select 2, '2013-01-25', 5 union all
select 2, '2013-01-26', 5 union all
select 2, '2013-02-02', 10 union all
select 2, '2013-02-02', 10 union all
select 2, '2013-02-04', 20
declare #N int, #StartDate datetime, #EndDate datetime
select #N = 3,
#StartDate = '2013-02-01',
#EndDate = '2013-02-20'
select o.customerID,
[total] = o.orderTotal + p.total --the nth order + total prior
from #orders o
cross
apply ( select count(*)+1, sum(orderTotal)
from #orders
where customerId = o.customerID and
orderID < o.orderID and
orderDate <= o.orderDate
) p(n, total)
where orderDate between #StartDate and #EndDate and p.n = #N
Here is my suggestion:
Use Northwind
GO
select ords.OrderID , ords.OrderDate , '<-->' as Sep1 , derived1.* from
dbo.Orders ords
join
(
select CustomerID, OrderID, ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY OrderId DESC) AS ThisCustomerCardinalOrderNumber from dbo.Orders
) as derived1
on ords.OrderID = derived1.OrderID
where
derived1.ThisCustomerCardinalOrderNumber = 3
and ords.OrderDate between '06/01/1997' and '07/01/1997'
EDIT:::::::::
I took my CTE example, and reworked it for multiple Customers (seen below).
Give it the college try.
Use Northwind
GO
declare #BeginDate datetime
declare #EndDate datetime
select #BeginDate = '01/01/1900'
select #EndDate = '12/31/2010'
;
WITH
MyCTE /* http://technet.microsoft.com/en-us/library/ms175972.aspx */
( ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry,CustomerID,CustomerName,[Address],
City,Region,PostalCode,Country,Salesperson,OrderID,OrderDate,RequiredDate,ShippedDate,ShipperName,
ProductID,ProductName,UnitPrice,Quantity,Discount,ExtendedPrice,Freight,ROWID) AS
(
SELECT
ShipName ,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry,CustomerID,CustomerName,[Address]
,City ,Region,PostalCode,Country,Salesperson,OrderID,OrderDate,RequiredDate,ShippedDate,ShipperName
,ProductID ,ProductName,UnitPrice,Quantity,Discount,ExtendedPrice,Freight
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate , ProductName ASC ) as ROWID /* Note that the ORDER BY (here) is directly related to the ORDER BY (near the very end of the query) */
FROM
dbo.Invoices inv /* “Invoices” is a VIEW, FYI */
where
(inv.OrderDate between #BeginDate and #EndDate)
)
SELECT
/*
ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry,CustomerID,CustomerName,[Address],
City,Region,PostalCode,Country,Salesperson,OrderID,OrderDate,RequiredDate,ShippedDate,ShipperName,
ProductID,ProductName,UnitPrice,Quantity,Discount,ExtendedPrice,Freight,
*/
/*trim the list down a little for the final output */
CustomerID ,OrderID , OrderDate, (ExtendedPrice + Freight) as ComputedTotal
/*The below line is the “trick”. I reference the above CTE, but only get data that is less than or equal to the row that I am on (outerAlias.ROWID)*/
, (Select SUM (ExtendedPrice + Freight) from MyCTE innerAlias where innerAlias.ROWID <= outerAlias.ROWID and innerAlias.CustomerID = outerAlias.CustomerID) as RunningTotal
, ROWID as ROWID_SHOWN_FOR_KICKS , OrderDate as OrderDate
FROM
MyCTE outerAlias
GROUP BY CustomerID ,OrderID, OrderDate, ProductName,(ExtendedPrice + Freight) ,ROWID,OrderDate
/*Two Order By Options*/
ORDER BY outerAlias.CustomerID , outerAlias.OrderDate , ProductName
/* << Whatever the ORDER BY is here, should match the “ROW_NUMBER() OVER ( ORDER BY ________ ASC )” statement inside the CTE */
/*ORDER BY outerAlias.ROWID */ /* << Or, to keep is more “trim”, ORDER BY the ROWID, which will of course be the same as the “ROW_NUMBER() OVER ( ORDER BY” inside the CTE */

Resources