Calculating Year to Date Total - sql-server

I want to generate a Payroll type query whereby the values in Payroll 1 (say for the previous month) should be included in Payroll 2 (for the current month) Year-to-Date Totals.
This can best be explained with an example:
DECLARE #MyTable TABLE(ID INT IDENTITY, PayrollID INT, Description NVARCHAR(MAX), [Current Month] MONEY)
INSERT INTO #MyTable
VALUES (1,'Basic Salary',100),
(1,'Normal Over Time',50),
(1,'Work on Saturday',150),
(1,'Work on Sunday',200),
(2,'Basic Salary',100)
SELECT * ,SUM([Current Month]) OVER (PARTITION BY Description ORDER BY PayrollID) AS [Month to Date]
FROM #MyTable
When I run the above I get
ID EmployeeID PayrollID Description Current Month Month to Date
1 1 1 Basic Salary 100 100
2 1 1 Normal Over Time 50 50
3 1 1 Work on Saturday 150 150
4 1 1 Work on Sunday 200 200
5 1 2 Basic Salary 100 200
The Year-to-Date running totals are per each Description meaning Basic Salary Category has its own running total and so does Saturday and Sunday etc, etc. You will notice that for Basic Salary in Payroll 2 the running Year-to-Date total is 200 (i.e. 100 from Payroll 1 + 100 from Payroll 2)
The challenge I have is that Payroll 1 has data for Basic Salary, Work on Saturday and Work on Sunday whereas Payroll 2 only has Basic Salary as the employee did not work on Saturday nor on Sunday in Payroll 2 (the current month).
However, in the cumulative Year-to-Date column the data from Payroll 1 (previous month) should still be selected and included in the Year-to-Date running Total -
something like this:
ID EmployeeID PayrollID Description Current Month Month to Date
1 1 1 Basic Salary 100 100
2 1 1 Normal Over Time 50 50
3 1 1 Work on Saturday 150 150
4 1 1 Work on Sunday 200 200
5 1 2 Basic Salary 100 200
2 1 1 Normal Over Time NULL 50
3 1 1 Work on Saturday NULL 150
4 1 1 Work on Sunday NULL 200
Although the employee did not work on Saturday nor Sunday in the current month (Payroll 2) the running (Year-to-Date) totals for working on a Saturday should be 150 that he/she worked in the previous month (Payroll 1). The same should apply to working on Sunday where the running total in the current month (Payroll 2) should be the 200 that he/she worked in the previous month (Payroll 1).
How do I do that with a simple Select Statement without writing a complicated Procedure?
EDIT:
I have cleaned up the ode as follows:
DECLARE #MyTable TABLE(ID INT IDENTITY, EmployeeID INT, PayrollID INT, Description NVARCHAR(MAX), [Current Month] MONEY)
INSERT INTO #MyTable
VALUES (1,1,'Basic Salary',100),
(1,1,'Normal Over Time',50),
(1,1,'Work on Saturday',150),
(1,1,'Work on Sunday',200),
(1,2,'Basic Salary',100)
WITH pay_elements AS
(
SELECT Description
FROM #MyTable
GROUP BY Description
)
,pay_slips AS
(
SELECT EmployeeID, PayrollID
FROM #MyTable
GROUP BY EmployeeID, PayrollID
)
,pay_lines AS
(
SELECT
mt.ID
,PS.EmployeeID
,PS.PayrollID
,PE.Description
,ISNULL(mt.[Current Month], 0) AS [Current Month]
FROM
pay_slips AS ps
OUTER APPLY
pay_elements AS pe
LEFT JOIN
#MyTable AS mt
ON (mt.EmployeeID = ps.EmployeeID)
AND (mt.PayrollID = ps.PayrollID)
AND (mt.Description = pe.Description)
)
SELECT * ,SUM([Current Month]) OVER (PARTITION BY EmployeeID, Description ORDER BY PayrollID) AS [Month to Date]
FROM pay_lines
And I get this error:
Msg 319, Level 15, State 1, Line 10
Incorrect syntax near the keyword 'with'. If this statement is a common table expression, an xmlnamespaces clause or a change tracking context clause, the previous statement must be terminated with a semicolon.
Msg 102, Level 15, State 1, Line 17
Incorrect syntax near ','.
Msg 102, Level 15, State 1, Line 23
Incorrect syntax near ','.

You first need to build a "structure" of row headings, and then join that onto the actual data.
So for example:
WITH pay_elements AS
(
SELECT Description
FROM #MyTable
GROUP BY Description
)
,pay_slips AS
(
SELECT EmployeeID, PayrollID
FROM #MyTable
GROUP BY EmployeeID, PayrollID
)
,pay_lines AS
(
SELECT
mt.ID
,pay_slips.EmployeeID
,pay_slips.PayrollID
,pay_elements.Description
,ISNULL(mt.Current_Month, 0) AS Current_Month
FROM
pay_slips AS ps
OUTER APPLY
pay_elements AS pe
LEFT JOIN
#MyTable AS mt
ON (mt.EmployeeID = ps.EmployeeID)
AND (mt.PayrollID = ps.PayrollID)
AND (mt.Description = pe.Description)
)
SELECT * ,SUM([Current Month]) OVER (PARTITION BY EmployeeID, Description ORDER BY PayrollID) AS [Month to Date]
FROM pay_lines
What we're doing here is getting a list of the different kind of pay elements in your table. Then we're getting a list of Employees and Payrolls done to date, and manually forcing every Payroll to include a row in respect of all possible pay elements.
Once that structure is built, we join onto the base table to get the actual values (replacing NULLs with zeros, for those pay elements that weren't originally included in the base table).
Then we simply query this padded-out table in the same way you did originally.
Note, I've written this on the fly and haven't checked this code so please excuse any minor errors.

I am little confused with the column you mentioned Year-to-Date in your description. I assume this might be [Month to Date] column present in your query. Please correct me if I am wrong.
I think what you are trying to achieve is - the descriptions which are not present in payroll ID 2 like Work on Saturday and Work on Sunday should also be selected below the result set.
Problem is:
Summation of NULL value is always NULL so if [Current Month] value is NULL then you can not achieve to display 50,150,200 in the [Month to Date] column
You can have fixed categories against each payroll id:
Normal Over Time
Work on Saturday
Work on Sunday
Basic Salary
Query:
DECLARE #MyTable TABLE(ID INT IDENTITY, PayrollID INT, Description NVARCHAR(MAX), [Current Month] MONEY)
INSERT INTO #MyTable
VALUES (1,'Basic Salary',100),
(1,'Normal Over Time',50),
(1,'Work on Saturday',150),
(1,'Work on Sunday',200),
(2,'Basic Salary',100),
(2,'Normal Over Time',0),
(2,'Work on Saturday',0),
(2,'Work on Sunday',0)
SELECT * ,SUM([Current Month]) OVER (PARTITION BY Description ORDER BY PayrollID) AS [Month to Date]
FROM #MyTable order by ID,PayrollID

Related

Need help selecting a record between two date ranges?

I was trying to select a record between two date ranges but I keep getting duplicate record when two date range overlaps as shown below.
Here is an example.
Policy Info
Policy # Policy Effective Date Policy termination date Year
001 2018-10-01 2019-10-01 2018
002 2019-10-01 2020-10-01 2019
003 2020-10-01 2021-10-01 2020
004 2021-10-01 2022-10-01 2022
Policy Limit
LimitID Effective Date Termination Date Limit
1 2018-10-01 2021-10-01 1000
2 2018-10-01 3000-01-01 2500
How can I select Limit ID: 1 for Policy #: 001,002 003 or for the years 2018, 2019, 2020 and for any policy effective date greater than 2021-01-01 use Limit ID = 2
I tried the following but it keeps creating dupicate
((limit.effective_from_date < policy.effective_to_date
AND limit.effective_to_date > policy.effective_from_date
)
OR
(limit.effective_from_date = policy.effective_from_date
AND limit.effective_to_date = CONVERT(datetime, '01/01/3000', 102)))
but the above condition creates a duplicate. Is there any effective way of selecting a record within overlapping date ranges.
Any help will be appreciated!
Your problem is that you have overlapping periods for Policy Limits and you need to choose one. For what I understand from your data and I'm inferring a lot, you need to get the first limit for the FIRST period that it's [Policy Limit].[Effective Date] is earlier than the [Policy Info].[Policy Effective Date]
while [Policy Limit].[Termination Date] is later than [Policy Info].[Policy Termination Date].
If all my guessing is correct, you can do something like
drop table if exists #PolicyInfo
drop table if exists #PolicyLimit
CREATE TABLE #PolicyInfo (
Policy INT,
Policy_Effective_Date DATE,
Policy_termination_date DATE,
[Year] int
)
CREATE TABLE #PolicyLimit(
LimitID INT,
Effective_Date DATE,
Termination_Date DATE,
Limit INT
)
INSERT INTO #PolicyInfo (Policy, Policy_Effective_Date, Policy_termination_date, [Year])
VALUES
(001, '2018-10-01', '2019-10-01', 2018),
(002, '2019-10-01', '2020-10-01', 2019),
(003, '2020-10-01', '2021-10-01', 2020),
(004, '2021-10-01', '2022-10-01', 2022)
INSERT INTO #PolicyLimit (LimitID, Effective_Date, Termination_Date, Limit)
VALUES
(1, '2018-10-01','2021-10-01',1000),
(2, '2018-10-01','3000-01-01',2500)
;with cte AS (
-- Join PolicyInfo with PolicyLimit
-- condition: Policy_Effective_Date are between Effective_Date, pl.Termination_Date
-- AND
-- Policy_Termination_Date are between Effective_Date, pl.Termination_Date
SELECT *,
-- rank with partion by Policy
ROW_NUMBER() OVER (PARTITION BY [pi].Policy ORDER BY pl.Effective_Date, pl.Termination_Date) rn
FROM #PolicyInfo [pi]
INNER JOIN #PolicyLimit pl ON
[pi].Policy_Effective_Date BETWEEN pl.Effective_Date AND pl.Termination_Date
AND [pi].Policy_termination_date BETWEEN pl.Effective_Date AND pl.Termination_Date
)
SELECT Policy, LimitID
FROM cte
WHERE rn = 1 -- Select the first Limit per partition

Need to generate rows with missing data in a large dataset - SQL

We are comparing values between months over multiple years. As time moves on the number of years and months in the dataset increases. We are only interested in months where there were values for every year, i.e. a full set.
Consider the following example for 1 month (1) over 3 years (1,2,3) and two activities (101, 102)
Dataset:
Activity Month year Count
------- ---- ------ ------
101 1 1 2
101 1 2 3
101 1 3 1
102 1 1 1
102 1 2 1
In the example above only activity 101 will come into consideration as it satisfies the condition that there must be a count for the activity for month 1 IN year 1, 2 and 3.
Activity 102 doesn't qualify for further analysis as it has no record for year 3.
I would like to generate a record with which I can then evaluate this. The record will effectively generate the new record with the missing row (in this case 102, 1, 3 , 0) to complete the dataset
Activity Month year Count
------- ---- ------ ------
102 1 3 0
We find the problem difficult as the data keeps in growing, the number of activities keep expanding and it is a combination of activity, year and month that need to be evaluated.
An elegant solution will be appreciated.
As I mention in my comment, presumably you have both an Activity table and some kind of Calendar table with details of your activities and the years in your system. As such you can therefore do a CROSS JOIN between these 2 objects and then LEFT JOIN to your table to get the data set you want:
--Create sample objects/data
CREATE TABLE dbo.Activity (Activity int); --Obviously your table has more columns
INSERT INTO dbo.Activity (Activity)
VALUES (101),(102);
GO
CREATE TABLE dbo.Calendar (Year int,
Month int);--Likely your table has more columns
INSERT INTO dbo.Calendar (Year, Month)
VALUES(1,1),
(2,1),
(3,1);
GO
CREATE TABLE dbo.YourTable (Activity int,
Year int,
Month int,
[Count] int);
INSERT INTO dbo.YourTable (Activity,Month, Year, [Count])
VALUES(101,1,1,2),
(101,1,2,3),
(101,1,3,1),
(102,1,1,1),
(102,1,2,1);
GO
--Solution
SELECT A.Activity,
C.Month,
C.Year,
ISNULL(YT.[Count],0) AS [Count]
FROM dbo.Activity A
CROSS JOIN dbo.Calendar C
LEFT JOIN dbo.YourTable YT ON A.Activity = YT.Activity
AND C.[Year] = YT.[Year]
AND C.[Month] = YT.[Month]
WHERE C.Month = 1; --not sure if this is needed
If you don't have an Activity and Calendar table (I suggest, however, you should), then you can use subqueries with a DISTINCT, but note this will be far from performant with large data sets:
SELECT A.Activity,
C.Month,
C.Year,
ISNULL(YT.[Count],0) AS [Count]
FROM (SELECT DISTINCT Activity FROM dbo.YourTable) A
CROSS JOIN (SELECT DISTINCT Year, Month FROM dbo.YourTable) C
LEFT JOIN dbo.YourTable YT ON A.Activity = YT.Activity
AND C.[Year] = YT.[Year]
AND C.[Month] = YT.[Month]
WHERE C.Month = 1; --not sure if this is needed

SQL- Finding a gap that is x amount of months with the same foreign key

I am editing this to clarify my question.
Let's say I have a table that holds patient information. I need to find new patients for this year, and the date of their prescription first prescription when they were considered new. Anytime there is a six month gap they are considered a new patient.
How do I accomplish this using SQL. I can do this in Java and any other imperative language easily enough, but I am having problems doing this in SQL. I need this script to be run in Crystal by non-SQL users
Table:
Patient ID Prescription Date
-----------------------------------------
1 12/31/16
1 03/13/17
2 10/10/16
2 05/11/17
2 06/11/17
3 01/01/17
3 04/20/17
4 01/31/16
4 01/01/17
4 07/02/17
So Patients 2 and 4 are considered new patients. Patient 4 is considered a new patient twice, so I need dates for each time patient 4 was considered new 1/1/17 and 7/2/17. Patients 1 and 3 are not considered new this year.
So far I have the code below which tells me if they are new this year, but not if they had another six month gap this year.
SELECT DISTINCT
this_year.patient_id
,this_year.date
FROM (SELECT
patient_id
,MIN(prescription_date) as date
FROM table
WHERE prescription_date BETWEEN '2017-01-01 00:00:00.000' AND '2017-
12-31 00:00:00.000'
GROUP BY [patient_id]) AS this_year
LEFT JOIN (SELECT
patient_id
,MAX(prescription_date) as date
FROM table
WHERE prescription_date BETWEEN '2016-01-01 00:00:00.000' AND '2016-
12-31 00:00:00.000'
GROUP BY [patient_id]) AS last_year
WHERE DATEDIFF(month, last_year.date, this_year.date) > 6
OR last_year.date IS NULL
Patient 2 in your example does not meet the criteria you specified ... that being said ...
You can try something like this ... untested but should be similar (assuming you can put this in a stored procedure):
WITH ordered AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY [Prescription Date]) rn
FROM table1
)
SELECT o1.[PatientID], DATEDIFF(s, o1.[Prescription Date], o2.[Prescription Date]) diff
FROM ordered o1 JOIN ordered o2
ON o1.rn + 1 = o2.rn
WHERE DATEDIFF(m, o1.[Prescription Date], o2.[Prescription Date]) > 6
Replace table1 with the name of your table.
I assume that you mean the patient has not been prescribed in the last 6 months.
SELECT DISTINCT user_id
FROM table_name
WHERE prescribed_date >= DATEADD(month, -6, GETDATE())
This gives you the list of users that have been prescribed in the last 6 months. You want the list of users that are not in this list.
SELECT DISTINCT user_id
FROM table_name
WHERE user_id NOT IN (SELECT DISTINCT user_id
FROM table_name
WHERE prescribed_date >= DATEADD(month, -6, GETDATE()))
You'll need to amend the field and table names.

SQL Server: How to get a rolling sum over 3 days for different customers within same table

This is the input table:
Customer_ID Date Amount
1 4/11/2014 20
1 4/13/2014 10
1 4/14/2014 30
1 4/18/2014 25
2 5/15/2014 15
2 6/21/2014 25
2 6/22/2014 35
2 6/23/2014 10
There is information pertaining to multiple customers and I want to get a rolling sum across a 3 day window for each customer.
The solution should be as below:
Customer_ID Date Amount Rolling_3_Day_Sum
1 4/11/2014 20 20
1 4/13/2014 10 30
1 4/14/2014 30 40
1 4/18/2014 25 25
2 5/15/2014 15 15
2 6/21/2014 25 25
2 6/22/2014 35 60
2 6/23/2014 10 70
The biggest issue is that I don't have transactions for each day because of which the partition by row number doesn't work.
The closest example I found on SO was:
SQL Query for 7 Day Rolling Average in SQL Server
but even in that case there were transactions made everyday which accomodated the rownumber() based solutions
The rownumber query is as follows:
select customer_id, Date, Amount,
Rolling_3_day_sum = CASE WHEN ROW_NUMBER() OVER (partition by customer_id ORDER BY Date) > 2
THEN SUM(Amount) OVER (partition by customer_id ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
END
from #tmp_taml9
order by customer_id
I was wondering if there is way to replace "BETWEEN 2 PRECEDING AND CURRENT ROW" by "BETWEEN [DATE - 2] and [DATE]"
One option would be to use a calendar table (or something similar) to get the complete range of dates and left join your table with that and use the row_number based solution.
Another option that might work (not sure about performance) would be to use an apply query like this:
select customer_id, Date, Amount, coalesce(Rolling_3_day_sum, Amount) Rolling_3_day_sum
from #tmp_taml9 t1
cross apply (
select sum(amount) Rolling_3_day_sum
from #tmp_taml9
where Customer_ID = t1.Customer_ID
and datediff(day, date, t1.date) <= 3
and t1.Date >= date
) o
order by customer_id;
I suspect performance might not be great though.

GROUP BY DAY, CUMULATIVE SUM

I have a table in MSSQL with the following structure:
PersonId
StartDate
EndDate
I need to be able to show the number of distinct people in the table within a date range or at a given date.
As an example i need to show on a daily basis the totals per day, e.g. if we have 2 entries on the 1st June, 3 on the 2nd June and 1 on the 3rd June the system should show the following result:
1st June: 2
2nd June: 5
3rd June: 6
If however e.g. on of the entries on the 2nd June also has an end date that is 2nd June then the 3rd June result would show just 5.
Would someone be able to assist with this.
Thanks
UPDATE
This is what i have so far which seems to work. Is there a better solution though as my solution only gets me employed figures. I also need unemployed on another column - unemployed would mean either no entry in the table or date not between and no other entry as employed.
CREATE TABLE #Temp(CountTotal int NOT NULL, CountDate datetime NOT NULL);
DECLARE #StartDT DATETIME
SET #StartDT = '2015-01-01 00:00:00'
WHILE #StartDT < '2015-08-31 00:00:00'
BEGIN
INSERT INTO #Temp(CountTotal, CountDate)
SELECT COUNT(DISTINCT PERSON.Id) AS CountTotal, #StartDT AS CountDate FROM PERSON
INNER JOIN DATA_INPUT_CHANGE_LOG ON PERSON.DataInputTypeId = DATA_INPUT_CHANGE_LOG.DataInputTypeId AND PERSON.Id = DATA_INPUT_CHANGE_LOG.DataItemId
LEFT OUTER JOIN PERSON_EMPLOYMENT ON PERSON.Id = PERSON_EMPLOYMENT.PersonId
WHERE PERSON.Id > 0 AND DATA_INPUT_CHANGE_LOG.Hidden = '0' AND DATA_INPUT_CHANGE_LOG.Approved = '1'
AND ((PERSON_EMPLOYMENT.StartDate <= DATEADD(MONTH,1,#StartDT) AND PERSON_EMPLOYMENT.EndDate IS NULL)
OR (#StartDT BETWEEN PERSON_EMPLOYMENT.StartDate AND PERSON_EMPLOYMENT.EndDate) AND PERSON_EMPLOYMENT.EndDate IS NOT NULL)
SET #StartDT = DATEADD(MONTH,1,#StartDT)
END
select * from #Temp
drop TABLE #Temp
You can use the following query. The cte part is to generate a set of serial dates between the start date and end date.
DECLARE #ViewStartDate DATETIME
DECLARE #ViewEndDate DATETIME
SET #ViewStartDate = '2015-01-01 00:00:00.000';
SET #ViewEndDate = '2015-02-25 00:00:00.000';
;WITH Dates([Date])
AS
(
SELECT #ViewStartDate
UNION ALL
SELECT DATEADD(DAY, 1,Date)
FROM Dates
WHERE DATEADD(DAY, 1,Date) <= #ViewEndDate
)
SELECT [Date], COUNT(*)
FROM Dates
LEFT JOIN PersonData ON Dates.Date >= PersonData.StartDate
AND Dates.Date <= PersonData.EndDate
GROUP By [Date]
Replace the PersonData with your table name
If startdate and enddate columns can be null, then you need to add
addditional conditions to the join
It assumes one person has only one record in the same date range
You could do this by creating data where every start date is a +1 event and end date is -1 and then calculate a running total on top of that.
For example if your data is something like this
PersonId StartDate EndDate
1 20150101 20150201
2 20150102 20150115
3 20150101
You first create a data set that looks like this:
EventDate ChangeValue
20150101 +2
20150102 +1
20150115 -1
20150201 -1
And if you use running total, you'll get this:
EventDate Total
2015-01-01 2
2015-01-02 3
2015-01-15 2
2015-02-01 1
You can get it with something like this:
select
p.eventdate,
sum(p.changevalue) over (order by p.eventdate asc) as total
from
(
select startdate as eventdate, sum(1) as changevalue from personnel group by startdate
union all
select enddate, sum(-1) from personnel where enddate is not null group by enddate
) p
order by p.eventdate asc
Having window function with sum() requires SQL Server 2012. If you're using older version, you can check other options for running totals.
My example in SQL Fiddle
If you have dates that don't have any events and you need to show those too, then the best option is probably to create a separate table of dates for the whole range you'll ever need, for example 1.1.2000 - 31.12.2099.
-- Edit --
To get count for a specific day, it's possible use the same logic, but just sum everything up to that day:
declare #eventdate date
set #eventdate = '20150117'
select
sum(p.changevalue)
from
(
select startdate as eventdate, 1 as changevalue from personnel
where startdate <= #eventdate
union all
select enddate, -1 from personnel
where enddate < #eventdate
) p
Hopefully this is ok, can't test since SQL Fiddle seems to be unavailable.

Resources