I am using SQL Server to solve the following question:
My table T1 has the following data:
Date Id Name Rent Number
01/01/2019 1 A 100 10
01/02/2019 1 A 200 30
01/03/2019 1 A 300 40
.
.
.
12/31/2019 1 A 150 25
The data exists for different combinations of Id and Name. I am trying to find average Rent and Number for 7 days:
Final Output
Date Id Name Rent Number
01/01/2019 - 01/07/2019 1 A Avg(rent for 7 days) Avg(Number for 7 days)
01/08/2019 - 01/14/2019 1 A Avg(rent for 7 days) Avg(Number for 7 days)
The final data should be grouped by Id and Name
My code:
SELECT min(date), Id, Name,
AVG(Rent) as Rent,
AVG(Number) Number,
AVG(AVG(Rent)) OVER (ORDER BY min(date) ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as AvgRent,
AVG(AVG(Number)) OVER (ORDER BY min(date) ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as AvgNumber
FROM T1
WHERE date >= '2019-01-01'
AND date < '2019-12-31'
GROUP BY Id, Name
My output has only row.
You need to group the dates by week which you can do by finding the difference between '01/01/2019' and your Date column using the DATEDIFF function and then divide it by 7. Since both the dividend and the divisor are integers the quotient will be an integer as well with the effect of group your dates into weeks.
SELECT MIN(Date) AS [Start Date]
, MAX(Date) AS [End Date]
, Id
, Name
, AVG(Rent) AS [Avg Rent]
, AVG(Number) AS [Avg Number]
FROM T1
GROUP BY DATEDIFF(DAY, '01/01/2019', Date) / 7;
Related
Company_Name Amount Cumulative Total
---------------------------------------------
Company 6 100 100
Company 6 200 300
Company 6 150 450
Company 7 700 700
Company 7 1100 1800
Company 7 500 2300
How can I do cumulative sum group by company as shown in this example?
First, you need a column that specifies the ordering, because SQL tables represent unordered sets. Let me assume you have such a column.
Then the function is sum() as a window function:
select t.*,
sum(amount) over (partition by company order by <ordering col>)
from t;
Note: This does not return 0 for the "first" row for each company, so it really is a cumulative sum. For your logic, you need an additional conditional:
select t.*,
(case when row_number() over (partition by company order by <ordering col>) = 1
then 0
else sum(amount) over (partition by company order by <ordering col>)
end)
from t;
I am trying to calculate numbers of customers whom are active in the past 3 and 6 months.
SELECT COUNT (DISTINCT CustomerNo)
FROM SalesDetail
WHERE InvoiceDate > (GETDATE() - 180) AND InvoiceDate < (GETDATE() - 90)
SELECT COUNT (DISTINCT CustomerNo)
FROM SalesDetail
WHERE InvoiceDate > (GETDATE() - 90)
However, based on above query, I'll get count Customers which has been active for both in the last 3 months and the last 6 months, even if there are duplicates like this.
Customer A bought once in past 3 months
Customer A bought once in past 6 months too
How do I filter out the customers, so that if customer A has been active in both past 3 and 6 months, he/she will only be counted in the 'active in past 3 months' query and not in the 'active in past 6 months' too.
I solve this problem this way
Let us consider you have following table. You might have more columns but for the result you want, we only require customer_id and date they bought something on.
CREATE TABLE [dbo].[customer_invoice](
[id] [int] IDENTITY(1,1) NOT NULL,
[customer_id] [int] NULL,
[date] [date] NULL,
CONSTRAINT [PK_customer_invoice] PRIMARY KEY([id]);
I created this sample data on this table
INSERT INTO [dbo].[customer_invoice]
([customer_id]
,[date])
VALUES
(1,convert(date,'2019-12-01')),
(2,convert(date,'2019-11-05')),
(2,convert(date,'2019-8-01')),
(3,convert(date,'2019-7-01')),
(4,convert(date,'2019-4-01'));
Lets not try to jump directly on the final solution directly but take a single leap each time.
SELECT customer_id, MIN(DATEDIFF(DAY,date,GETDATE())) AS lastActiveDays
FROM customer_invoice GROUP BY customer_id;
The above query gives you the number of days before each customer was active
customer_id lastActiveDays
1 15
2 41
3 168
4 259
Now We will use this query as subquery and Add a new column ActiveWithinCategory so that in later step we can group our data by the column.
SELECT customer_id, lastActiveDays,
CASE WHEN lastActiveDays<90 THEN 'active within 3 months'
WHEN lastActiveDays<180 THEN 'active within 6 months'
ELSE 'not active' END AS ActiveWithinCategory
FROM(
SELECT customer_id, MIN(DATEDIFF(DAY,date,GETDATE())) AS lastActiveDays
FROM customer_invoice GROUP BY customer_id
)AS temptable;
This query gives you the the following result
customer_id lastActiveDays ActiveWithinCategory
1 15 active within 3 months
2 41 active within 3 months
3 168 active within 6 months
4 259 not active
Now use the above whole thing as subquery and Group the data using ActiveWithinCategory
SELECT ActiveWithinCategory, COUNT(*) AS NumberofCustomers FROM (
SELECT customer_id, lastActiveDays,
CASE WHEN lastActiveDays<90 THEN 'active within 3 months'
WHEN lastActiveDays<180 THEN 'active within 6 months'
ELSE 'not active' END AS ActiveWithinCategory
FROM(
SELECT customer_id, MIN(DATEDIFF(DAY,date,GETDATE())) AS lastActiveDays
FROM customer_invoice GROUP BY customer_id
)AS temptable
) AS FinalResult GROUP BY ActiveWithinCategory;
And Here is your final result
ActiveWithinCategory NumberofEmployee
active within 3 months 2
active within 6 months 1
not active 1
If you want to achieve same thing is MySQL Database
Here is the final Query
SELECT ActiveWithinCategory, count(*) NumberofCustomers FROM(
SELECT MIN(DATEDIFF(curdate(),date)) AS lastActiveBefore,
IF(MIN(DATEDIFF(curdate(),date))<90,
'active within 3 months',
IF(MIN(DATEDIFF(curdate(),date))<180,'active within 6 months','not active')
) ActiveWithinCategory
FROM customer_invoice GROUP BY customer_id
) AS FinalResult GROUP BY ActiveWithinCategory;
I suspect that you want to do conditional aggregation here:
SELECT
CustomerNo,
COUNT(CASE WHEN InvoiceDate > GETDATE() - 90 THEN 1 END) AS cnt_last_3,
COUNT(CASE WHEN InvoiceDate > GETDATE() - 180 AND InvoiceDate < GETDATE() - 90
THEN 1 END) AS cnt_first_3
FROM yourTable
GROUP BY
CustomerNo;
Here cnt_last_3 is the count over the immediate past 3 months, and cnt_first_3 is the count from the 3 month period starting 6 months ago and ending 3 months ago.
If you want the distinct count you may add distinct like this
Select
count( Case when dt between getdate()- 90 and getdate() then id else null end) cnt_3_months
,count(distinct Case when dt between getdate() - 180 and getdate() - 90 then id else null end) cnt_6_months
from a
I want to generate a Payroll type query whereby the values in Payroll 1 (say for the previous month) should be included in Payroll 2 (for the current month) Year-to-Date Totals.
This can best be explained with an example:
DECLARE #MyTable TABLE(ID INT IDENTITY, PayrollID INT, Description NVARCHAR(MAX), [Current Month] MONEY)
INSERT INTO #MyTable
VALUES (1,'Basic Salary',100),
(1,'Normal Over Time',50),
(1,'Work on Saturday',150),
(1,'Work on Sunday',200),
(2,'Basic Salary',100)
SELECT * ,SUM([Current Month]) OVER (PARTITION BY Description ORDER BY PayrollID) AS [Month to Date]
FROM #MyTable
When I run the above I get
ID EmployeeID PayrollID Description Current Month Month to Date
1 1 1 Basic Salary 100 100
2 1 1 Normal Over Time 50 50
3 1 1 Work on Saturday 150 150
4 1 1 Work on Sunday 200 200
5 1 2 Basic Salary 100 200
The Year-to-Date running totals are per each Description meaning Basic Salary Category has its own running total and so does Saturday and Sunday etc, etc. You will notice that for Basic Salary in Payroll 2 the running Year-to-Date total is 200 (i.e. 100 from Payroll 1 + 100 from Payroll 2)
The challenge I have is that Payroll 1 has data for Basic Salary, Work on Saturday and Work on Sunday whereas Payroll 2 only has Basic Salary as the employee did not work on Saturday nor on Sunday in Payroll 2 (the current month).
However, in the cumulative Year-to-Date column the data from Payroll 1 (previous month) should still be selected and included in the Year-to-Date running Total -
something like this:
ID EmployeeID PayrollID Description Current Month Month to Date
1 1 1 Basic Salary 100 100
2 1 1 Normal Over Time 50 50
3 1 1 Work on Saturday 150 150
4 1 1 Work on Sunday 200 200
5 1 2 Basic Salary 100 200
2 1 1 Normal Over Time NULL 50
3 1 1 Work on Saturday NULL 150
4 1 1 Work on Sunday NULL 200
Although the employee did not work on Saturday nor Sunday in the current month (Payroll 2) the running (Year-to-Date) totals for working on a Saturday should be 150 that he/she worked in the previous month (Payroll 1). The same should apply to working on Sunday where the running total in the current month (Payroll 2) should be the 200 that he/she worked in the previous month (Payroll 1).
How do I do that with a simple Select Statement without writing a complicated Procedure?
EDIT:
I have cleaned up the ode as follows:
DECLARE #MyTable TABLE(ID INT IDENTITY, EmployeeID INT, PayrollID INT, Description NVARCHAR(MAX), [Current Month] MONEY)
INSERT INTO #MyTable
VALUES (1,1,'Basic Salary',100),
(1,1,'Normal Over Time',50),
(1,1,'Work on Saturday',150),
(1,1,'Work on Sunday',200),
(1,2,'Basic Salary',100)
WITH pay_elements AS
(
SELECT Description
FROM #MyTable
GROUP BY Description
)
,pay_slips AS
(
SELECT EmployeeID, PayrollID
FROM #MyTable
GROUP BY EmployeeID, PayrollID
)
,pay_lines AS
(
SELECT
mt.ID
,PS.EmployeeID
,PS.PayrollID
,PE.Description
,ISNULL(mt.[Current Month], 0) AS [Current Month]
FROM
pay_slips AS ps
OUTER APPLY
pay_elements AS pe
LEFT JOIN
#MyTable AS mt
ON (mt.EmployeeID = ps.EmployeeID)
AND (mt.PayrollID = ps.PayrollID)
AND (mt.Description = pe.Description)
)
SELECT * ,SUM([Current Month]) OVER (PARTITION BY EmployeeID, Description ORDER BY PayrollID) AS [Month to Date]
FROM pay_lines
And I get this error:
Msg 319, Level 15, State 1, Line 10
Incorrect syntax near the keyword 'with'. If this statement is a common table expression, an xmlnamespaces clause or a change tracking context clause, the previous statement must be terminated with a semicolon.
Msg 102, Level 15, State 1, Line 17
Incorrect syntax near ','.
Msg 102, Level 15, State 1, Line 23
Incorrect syntax near ','.
You first need to build a "structure" of row headings, and then join that onto the actual data.
So for example:
WITH pay_elements AS
(
SELECT Description
FROM #MyTable
GROUP BY Description
)
,pay_slips AS
(
SELECT EmployeeID, PayrollID
FROM #MyTable
GROUP BY EmployeeID, PayrollID
)
,pay_lines AS
(
SELECT
mt.ID
,pay_slips.EmployeeID
,pay_slips.PayrollID
,pay_elements.Description
,ISNULL(mt.Current_Month, 0) AS Current_Month
FROM
pay_slips AS ps
OUTER APPLY
pay_elements AS pe
LEFT JOIN
#MyTable AS mt
ON (mt.EmployeeID = ps.EmployeeID)
AND (mt.PayrollID = ps.PayrollID)
AND (mt.Description = pe.Description)
)
SELECT * ,SUM([Current Month]) OVER (PARTITION BY EmployeeID, Description ORDER BY PayrollID) AS [Month to Date]
FROM pay_lines
What we're doing here is getting a list of the different kind of pay elements in your table. Then we're getting a list of Employees and Payrolls done to date, and manually forcing every Payroll to include a row in respect of all possible pay elements.
Once that structure is built, we join onto the base table to get the actual values (replacing NULLs with zeros, for those pay elements that weren't originally included in the base table).
Then we simply query this padded-out table in the same way you did originally.
Note, I've written this on the fly and haven't checked this code so please excuse any minor errors.
I am little confused with the column you mentioned Year-to-Date in your description. I assume this might be [Month to Date] column present in your query. Please correct me if I am wrong.
I think what you are trying to achieve is - the descriptions which are not present in payroll ID 2 like Work on Saturday and Work on Sunday should also be selected below the result set.
Problem is:
Summation of NULL value is always NULL so if [Current Month] value is NULL then you can not achieve to display 50,150,200 in the [Month to Date] column
You can have fixed categories against each payroll id:
Normal Over Time
Work on Saturday
Work on Sunday
Basic Salary
Query:
DECLARE #MyTable TABLE(ID INT IDENTITY, PayrollID INT, Description NVARCHAR(MAX), [Current Month] MONEY)
INSERT INTO #MyTable
VALUES (1,'Basic Salary',100),
(1,'Normal Over Time',50),
(1,'Work on Saturday',150),
(1,'Work on Sunday',200),
(2,'Basic Salary',100),
(2,'Normal Over Time',0),
(2,'Work on Saturday',0),
(2,'Work on Sunday',0)
SELECT * ,SUM([Current Month]) OVER (PARTITION BY Description ORDER BY PayrollID) AS [Month to Date]
FROM #MyTable order by ID,PayrollID
Here is a weird one for you all.
I need to determine the number of days in a Month
;WITH cteNetProfit AS
(
---- NET PROFIT
SELECT DT.CreateDate
, SUM(DT.Revenue) as Revenue
, SUM(DT.Cost) as Cost
, SUM(DT.GROSSPROFIT) AS GROSSPROFIT
FROM
(
SELECT CAST([createDTG] AS DATE) as CreateDate
, SUM(Revenue) as Revenue
, SUM(Cost) as Cost
, SUM(REVENUE - COST) AS GROSSPROFIT
FROM [dbo].[CostRevenueSpecific]
WHERE CAST([createDTG] AS DATE) > CAST(GETDATE() - 91 AS DATE)
AND CAST([createDTG] AS DATE) <= CAST(GETDATE() - 1 AS DATE)
GROUP BY createDTG
UNION ALL
SELECT CAST([CallDate] AS DATE) AS CreateDate
, SUM(Revenue) as Revenue
, SUM(Cost) as Cost
, SUM(REVENUE - COST) AS GROSSPROFIT
FROM abc.PublisherCallByDay
WHERE CAST([CallDate] AS DATE) > CAST(GETDATE() - 91 AS DATE)
AND CAST([CallDate] AS DATE) <= CAST(GETDATE() - 1 AS DATE)
GROUP BY CALLDATE
) DT
GROUP BY DT.CreateDate
)
select distinct MONTH(CREATEDATE), DateDiff(Day,CreateDate,DateAdd(month,1,CreateDate))
FROM cteNetProfit
For some reason it is returning two different results for the month of March 2016 one result is 30 and the other 31(which of course is correct) I validate that the underlying data only has 31 days worth of data for the Month of March. Since Feb is a leap year can this affect the DATEDIFF function. The remaining months return the correct #.
2 29
3 31
3 30
4 30
5 31
Thanks for the input, however, I found the solution elsewhere
select Distinct MONTH(CREATEDATE), Day(EOMONTH(CreateDate))
FROM cteNetProfit
The difference comes when you hit the 2016-03-31 date. If you run the query below for 2016-03-30 and 2016-03-31, the results of adding 1 MONTH using DATEADD, in both instances, is 2016-04-30. It returns the last day of the next month.
SELECT DATEADD(MONTH,1,'2016-03-30') , DATEADD(MONTH,1,'2016-03-31')
This syntax seemed to work (courtesy of https://raresql.com/2013/01/06/sql-server-get-number-of-days-in-month/).
SELECT DAY(DATEADD(ms,-2,DATEADD(MONTH, DATEDIFF(MONTH,0,#DATE)+1,0))) AS [Current Month]
This is the input table:
Customer_ID Date Amount
1 4/11/2014 20
1 4/13/2014 10
1 4/14/2014 30
1 4/18/2014 25
2 5/15/2014 15
2 6/21/2014 25
2 6/22/2014 35
2 6/23/2014 10
There is information pertaining to multiple customers and I want to get a rolling sum across a 3 day window for each customer.
The solution should be as below:
Customer_ID Date Amount Rolling_3_Day_Sum
1 4/11/2014 20 20
1 4/13/2014 10 30
1 4/14/2014 30 40
1 4/18/2014 25 25
2 5/15/2014 15 15
2 6/21/2014 25 25
2 6/22/2014 35 60
2 6/23/2014 10 70
The biggest issue is that I don't have transactions for each day because of which the partition by row number doesn't work.
The closest example I found on SO was:
SQL Query for 7 Day Rolling Average in SQL Server
but even in that case there were transactions made everyday which accomodated the rownumber() based solutions
The rownumber query is as follows:
select customer_id, Date, Amount,
Rolling_3_day_sum = CASE WHEN ROW_NUMBER() OVER (partition by customer_id ORDER BY Date) > 2
THEN SUM(Amount) OVER (partition by customer_id ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
END
from #tmp_taml9
order by customer_id
I was wondering if there is way to replace "BETWEEN 2 PRECEDING AND CURRENT ROW" by "BETWEEN [DATE - 2] and [DATE]"
One option would be to use a calendar table (or something similar) to get the complete range of dates and left join your table with that and use the row_number based solution.
Another option that might work (not sure about performance) would be to use an apply query like this:
select customer_id, Date, Amount, coalesce(Rolling_3_day_sum, Amount) Rolling_3_day_sum
from #tmp_taml9 t1
cross apply (
select sum(amount) Rolling_3_day_sum
from #tmp_taml9
where Customer_ID = t1.Customer_ID
and datediff(day, date, t1.date) <= 3
and t1.Date >= date
) o
order by customer_id;
I suspect performance might not be great though.