SQL SUM filter query - sql-server

We have a table with the following information:
table1
I need T-SQL code that receives a "start date" and "end date and would generate the invoice total, grouped by customer id and invoice type, of all invoices generated within the date range AND the total invoiced for that customer (credit and cash combined).
For instance, the result if we provide start date 01-10-2012 and end date 10-11-2012 should be:
result_table
This is what I have:
DECLARE #startdate DATE, #enddate DATE
SET #startdate = '01-10-2012'
SET #enddate = '10-11-2012'
SELECT CustomerId, InvoiceType, SUM(Total) As Total
FROM Invoices
WHERE Date BETWEEN #startdate AND #enddate
GROUP BY CustomerID, InvoiceType
It works fine, but I am unable to come up with a way to calculate the "total2" column, since I'm already grouping rows by "invoicetype".
Help, please.
Thank you.

This should work:
SELECT CustomerId, InvoiceType, SUM(Total) As Total
, (
SELECT SUM(Total) FROM Invoices t2 WHERE t2.CustomerId=t1.CustomerId
) AS Total2
FROM Invoices t1
WHERE Date BETWEEN #startdate AND #enddate
GROUP BY CustomerID, InvoiceType

Related

Build timeline from start and end dates

I have a subscription table with a user ID, a subscription start date and a subscription end date. I also have a calendar table with a datestamp field, that is every single date starting from the first subscription date in my subscription table.
I am trying to write something that would give me a table with a date column and three numbers: number of total active (on that day), number of new subscribers, number of unsubscribers.
(N.B. I tried to insert sample tables using the suggested GitHub Flavoured Markdown but it just all goes into one row.)
Currently I am playing with a query that creates multiple joins between the two tables, one for each number:
select a.datestamp
,count(distinct case when b_sub.UserID is not null then b_sub.UserID end) as total_w_subscription
,count(distinct case when b_in.UserID is not null then b_in.UserID end) as total_subscribed
,count(distinct case when b_out.UserID is not null then b_out.UserID end) as total_unsubscribed
from Calendar as a
left join Subscription as b_sub -- all those with subscription on given date
on b_sub.sub_dt <= a.datestamp
and (b_sub.unsub_dt > a.datestamp or b_sub.unsub_dt is null)
left join Subscription as b_in -- all those that subscribed on given date
on b_in.sub_dt = a.datestamp
left join Subscription as b_out -- all those that unsubscribed on given date
on b_out.unsub_dt = a.datestamp
where a.datestamp > '2021-06-10'
group by a.datestamp
order by datestamp asc
;
I have indexed the date fields in both tables. If I only look at one day, it runs in 3 seconds. Two days already takes forever. The Sub table is over 2.6M records and ideally I'll need my timeline to begin sometime in 2012.
What would be the most time efficient way to do this?
You're on the right track. I created some table variables and assumed a data structure that has each subscription include a start and end date.
--Create #dates table variable for calendar
DECLARE #startDate DATETIME = '2018-01-01'
DECLARE #endDate DATETIME = '2021-06-18'
DECLARE #dates TABLE
(
reportingdate DATETIME
)
WHILE #startDate <= #endDate
BEGIN
INSERT INTO #dates SELECT #startDate
SET #startDate += 1
END
--Create #subscriptions table variable for subcriptions to join onto calendar
DECLARE #subscriptions TABLE
(
id INT
,startDate DATETIME
,endDate DATETIME
)
INSERT INTO #subscriptions
VALUES
(1,'2018-01-01 00:00:00.000','2019-10-07 00:00:00.000')
,(2,'2018-01-11 00:00:00.000','2019-12-21 00:00:00.000')
,(3,'2019-04-21 00:00:00.000','2020-03-19 00:00:00.000')
,(4,'2019-12-09 00:00:00.000','2020-05-14 00:00:00.000')
,(5,'2020-04-26 00:00:00.000','2020-07-06 00:00:00.000')
,(6,'2020-05-02 00:00:00.000',NULL)
,(7,'2020-08-31 00:00:00.000','2020-10-29 00:00:00.000')
,(8,'2020-12-13 00:00:00.000','2021-01-13 00:00:00.000')
,(9,'2021-02-12 00:00:00.000','2021-04-19 00:00:00.000')
,(10,'2021-06-10 00:00:00.000',NULL)
;
Then I join the subscription onto the calendar table.
--CTE to join subscription onto calendar and use ROW_NUMBER functions
WITH cte AS (
SELECT
s.id AS SubID
,d.ReportingDate
,ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY d.ReportingDate) AS asc_rn --used to identify 1st
,ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY d.ReportingDate DESC) AS desc_rn --used to identify last
,CASE WHEN s.endDate IS NULL THEN 1 ELSE 0 END AS ActiveSub
FROM #subscriptions s
LEFT JOIN #dates d ON
d.reportingdate BETWEEN s.startDate AND ISNULL(s.endDate,'9999-12-31')
)
I used ROW_NUMBER to identify the first and last date rows of the subscription, as well as checking if the subscription endDate is NULL (still active). I then query the CTE to count subscriptions grouped by day, as well as summing new and terminated subscriptions grouped by day.
--Query CTE using asc_rn, desc_rn, and ActiveSub to identify new subscribers and unsubscribers.
SELECT
ReportingDate
,COUNT(*) AS TotalSubscribers
,SUM(CASE WHEN asc_rn = 1 THEN 1 ELSE 0 END) AS NewSubscribers
,SUM(CASE WHEN desc_rn = 1 AND ActiveSub = 0 THEN 1 ELSE 0 END) AS UnSubscribers
FROM cte
GROUP BY ReportingDate
ORDER BY ReportingDate

Select distinct values from one column and sum another column

I have this stored procedure shown here which calculates the order qty, rejection and percentage for rejection against each department, but I have an issue: for summing the order qty, I need to consider only distinct order number to sum the order qty, but for rejection qty all rows should be considered. How to achieve this?
Stored procedure:
CREATE PROCEDURE orderqtyper
#fromDate Date,
#toDate Date
AS
SELECT
Department,
SUM(Order_Qty) AS [Order Qty],
SUM(Rejection_Qty) AS [Rejection Qty],
FORMAT((SUM(Rejection_Qty) * 100.0 / NULLIF(SUM(Order_Qty), 0) / 100), 'P') AS Percentage
FROM
Semicon_NPD
WHERE
(Date BETWEEN #fromDate AND #toDate)
GROUP BY
Department
ORDER BY
Percentage DESC
Sample table:
Current result:
Expected result (if the the order number is the same, it should take sum of only unique values for order qty but sum of all for rejections qty):
You can use DISTINCT as SUM(DISTINCT <column_name>)
create proc orderqtyper
#fromDate Date,
#toDate Date
as
SELECT Department,
SUM(DISTINCT Order_Qty)as [Order Qty],
SUM(Rejection_Qty) as [Rejection Qty],
Format((SUM(Rejection_Qty) * 100.0 / NULLIF(SUM(Order_Qty), 0)/100),'P') AS Percentage
FROM Semicon_NPD
Where (Date between #fromDate and #toDate)
GROUP BY Department
order by Percentage desc
go

SQL Count number of records that appear more than once a month and aggregate by month

I am going through a ton of records and want to count the number of times an ID is updated more than once a month.
SELECT COUNT(*) AS Frequency, MONTH(Date) AS MM, YEAR(Date) AS YYYY, Id
FROM data
WHERE -- [some filtering]
AND Date <= -- end date
AND Date >= -- start date
GROUP BY Date, Id
HAVING COUNT(*) > 1
ORDER BY MM DESC
Now what happens there is that I am only finding the number of Id's that are updated more than once per day. What I want to do is to group the ID's by month.
I have tried using the column MM in my GROUP BY but then I get error codes stating that these are invalid column selections.
I tried using the following GROUP BY:
GROUP BY DATEPART(MONTH, Date), Id
All I get is continuous 8120 errors and I cannot figure out how I should put this together. Any help would be greatly appreciated.
First let's fix your query so it does proper aggregation:
SELECT COUNT(*) AS Frequency, MONTH(Date) AS MM, YEAR(Date) AS YYYY, Id
FROM data with(NOLOCK)
WHERE -- [some filtering]
GROUP BY MONTH(Date), YEAR(Date), Id
HAVING COUNT(*) > 1
ORDER BY YYYY, MM DESC
This gives you a list of Ids that were updated more than once each month.
Now, if you want to know how many Ids were updated more than once each month, you can add another level of aggregation:
SELECT MM, YYY, COUNT(*)
FROM (
SELECT COUNT(*) AS Frequency, MONTH(Date) AS MM, YEAR(Date) AS YYYY, Id
FROM data with(NOLOCK)
WHERE -- [some filtering]
GROUP BY MONTH(Date), YEAR(Date), Id
HAVING COUNT(*) > 1
) x
ORDER BY YYYY, MM DESC

Max Date between 2 dates

How can I find the latest date in a column but constrain it between 2 dates
SELECT [Weight]
FROM [weighinevent] w
WHERE [Date] = (SELECT MAX([Date]) WHERE [Date] BETWEEN #StartDate AND #EndDate AND w.[userid] = #userid )
This is what I have. Is that correct?
No, it is not correct. Subqueries need to define the table too from which they are selecting. But you can order by the date and take only the first record
SELECT top 1 Weight
FROM weighinevent
WHERE Date BETWEEN #StartDate AND #EndDate
AND userid = #userid
ORDER BY Date DESC

SQL Server group by / count issue

I'm trying to count Holiday bookings (B.ID) for dates 2 days either side of today.
It works but my results are separated as I have to introduce the end date of
the holiday too, which varies for each start date (holidays have different durations).
The separates out my counts. What I need is one count for each date. Is there a way of working round this? I kinda just want to exclude the vwReturnDate from the group by but have to put it there as I've used it in my count.
In English I want - For each [date] count the number of [B.id] where [B.Depart] <= [date] and [vwReturnDate] > [date]
DECLARE #startDate DATE
DECLARE #endDate DATE
SET #startDate = Getdate()-2
SET #endDate = Getdate()+2;
WITH dates(Date) AS
( SELECT #startdate as Date
UNION ALL
SELECT DATEADD(d,1,[Date])
FROM dates
WHERE DATE < #enddate )
SELECT
[Date] as 'Calendar Date',
--CONVERT(VARCHAR(10), [Date],103) AS 'Date'
-- ,CONVERT(CHAR(2), [Date], 113) AS 'Day'
-- ,CONVERT(CHAR(4), [Date], 100) AS 'Month'
-- ,CONVERT(CHAR(4), [Date], 120) AS 'Year',
Case when B.Depart <= [date] AND vwR.ReturnDate >=[date] then count (B.ID) end AS 'Number of holidays live on date'
FROM [dates]
left join Booking B on B.depart=[Date]
inner join Quote Q on Q.ID=B.QuoteID
inner join vwReturnDate vwR on vwR.ID=B.ID
Group by [date], B.depart, vwR.ReturnDate
order by [date]
OPTION (MAXRECURSION 0)
GO

Resources