Populating a list of dates without a defined end date - SQL server - sql-server

I have a list of accounts and their cost which changes every few days.
In this list I only have the start date every time the cost updates to a new one, but no column for the end date.
Meaning, I need to populate a list of dates when the end date for a specific account and cost, should be deduced as the start date of the same account with a new cost.
More or less like that:
Account start date cost
one 1/1/2016 100$
two 1/1/2016 150$
one 4/1/2016 200$
two 3/1/2016 200$
And the result I need would be:
Account date cost
one 1/1/2016 100$
one 2/1/2016 100$
one 3/1/2016 100$
one 4/1/2016 200$
two 1/1/2016 150$
two 2/1/2016 150$
two 3/1/2016 200$
For example, if the cost changed in the middle of the month, than the sample data will only hold two records (one per each unique combination of account-start date-cost), while the results will hold 30 records with the cost for each and every day of the month (15 for the first cost and 15 for the second one). The costs are a given, and no need to calculate them (inserted manually).
Note the result contains more records because the sample data shows only a start date and an updated cost for that account, as of that date. While the results show the cost for every day of the month.
Any ideas?

Solution is a bit long.
I added an extra date for test purposes:
DECLARE #t table(account varchar(10), startdate date, cost int)
INSERT #t
values
('one','1/1/2016',100),('two','1/1/2016',150),
('one','1/4/2016',200),('two','1/3/2016',200),
('two','1/6/2016',500) -- extra row
;WITH CTE as
( SELECT
row_number() over (partition by account order by startdate) rn,
*
FROM #t
),N(N)AS
(
SELECT 1 FROM(VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))M(N)
),
tally(N) AS -- tally is limited to 1000 days
(
SELECT ROW_NUMBER()OVER(ORDER BY N.N) - 1 FROM N,N a,N b
),GROUPED as
(
SELECT
cte.account, cte.startdate, cte.cost, cte2.cost cost2, cte2.startdate enddate
FROM CTE
JOIN CTE CTE2
ON CTE.account = CTE2.account
and CTE.rn = CTE2.rn - 1
)
-- used DISTINCT to avoid overlapping dates
SELECT DISTINCT
CASE WHEN datediff(d, startdate,enddate) = N THEN cost2 ELSE cost END cost,
dateadd(d, N, startdate) startdate,
account
FROM grouped
JOIN tally
ON datediff(d, startdate,enddate) >= N
Result:
cost startdate account
100 2016-01-01 one
100 2016-01-02 one
100 2016-01-03 one
150 2016-01-01 two
150 2016-01-02 two
200 2016-01-03 two
200 2016-01-04 one
200 2016-01-04 two
200 2016-01-05 two
500 2016-01-06 two

Thank you #t-clausen.dk!
It didn't solve the problem completely, but did direct me in the correct way.
Eventually I used the LEAD function to generate an end date for every cost per account, and then I was able to populate a list of dates based on that idea.
Here's how I generate the end dates:
DECLARE #t table(account varchar(10), startdate date, cost int)
INSERT #t
values
('one','1/1/2016',100),('two','1/1/2016',150),
('one','1/4/2016',200),('two','1/3/2016',200),
('two','1/6/2016',500)
select account
,[startdate]
,DATEADD(DAY, -1, LEAD([Startdate], 1,'2100-01-01') OVER (PARTITION BY account ORDER BY [Startdate] ASC)) AS enddate
,cost
from #t
It returned the expected result:
account startdate enddate cost
one 2016-01-01 2016-01-03 100
one 2016-01-04 2099-12-31 200
two 2016-01-01 2016-01-02 150
two 2016-01-03 2016-01-05 200
two 2016-01-06 2099-12-31 500
Please note that I set the end date of current costs to be some date in the far future which means (for me) that they are currently active.

Related

Need help selecting a record between two date ranges?

I was trying to select a record between two date ranges but I keep getting duplicate record when two date range overlaps as shown below.
Here is an example.
Policy Info
Policy # Policy Effective Date Policy termination date Year
001 2018-10-01 2019-10-01 2018
002 2019-10-01 2020-10-01 2019
003 2020-10-01 2021-10-01 2020
004 2021-10-01 2022-10-01 2022
Policy Limit
LimitID Effective Date Termination Date Limit
1 2018-10-01 2021-10-01 1000
2 2018-10-01 3000-01-01 2500
How can I select Limit ID: 1 for Policy #: 001,002 003 or for the years 2018, 2019, 2020 and for any policy effective date greater than 2021-01-01 use Limit ID = 2
I tried the following but it keeps creating dupicate
((limit.effective_from_date < policy.effective_to_date
AND limit.effective_to_date > policy.effective_from_date
)
OR
(limit.effective_from_date = policy.effective_from_date
AND limit.effective_to_date = CONVERT(datetime, '01/01/3000', 102)))
but the above condition creates a duplicate. Is there any effective way of selecting a record within overlapping date ranges.
Any help will be appreciated!
Your problem is that you have overlapping periods for Policy Limits and you need to choose one. For what I understand from your data and I'm inferring a lot, you need to get the first limit for the FIRST period that it's [Policy Limit].[Effective Date] is earlier than the [Policy Info].[Policy Effective Date]
while [Policy Limit].[Termination Date] is later than [Policy Info].[Policy Termination Date].
If all my guessing is correct, you can do something like
drop table if exists #PolicyInfo
drop table if exists #PolicyLimit
CREATE TABLE #PolicyInfo (
Policy INT,
Policy_Effective_Date DATE,
Policy_termination_date DATE,
[Year] int
)
CREATE TABLE #PolicyLimit(
LimitID INT,
Effective_Date DATE,
Termination_Date DATE,
Limit INT
)
INSERT INTO #PolicyInfo (Policy, Policy_Effective_Date, Policy_termination_date, [Year])
VALUES
(001, '2018-10-01', '2019-10-01', 2018),
(002, '2019-10-01', '2020-10-01', 2019),
(003, '2020-10-01', '2021-10-01', 2020),
(004, '2021-10-01', '2022-10-01', 2022)
INSERT INTO #PolicyLimit (LimitID, Effective_Date, Termination_Date, Limit)
VALUES
(1, '2018-10-01','2021-10-01',1000),
(2, '2018-10-01','3000-01-01',2500)
;with cte AS (
-- Join PolicyInfo with PolicyLimit
-- condition: Policy_Effective_Date are between Effective_Date, pl.Termination_Date
-- AND
-- Policy_Termination_Date are between Effective_Date, pl.Termination_Date
SELECT *,
-- rank with partion by Policy
ROW_NUMBER() OVER (PARTITION BY [pi].Policy ORDER BY pl.Effective_Date, pl.Termination_Date) rn
FROM #PolicyInfo [pi]
INNER JOIN #PolicyLimit pl ON
[pi].Policy_Effective_Date BETWEEN pl.Effective_Date AND pl.Termination_Date
AND [pi].Policy_termination_date BETWEEN pl.Effective_Date AND pl.Termination_Date
)
SELECT Policy, LimitID
FROM cte
WHERE rn = 1 -- Select the first Limit per partition

T-SQL Grouping Dynamic Date Ranges

Using MS SQL Server 2019
I have a set of recurring donation records. Each have a First Gift Date and a Last Gift Date associated with them. I need to add a GroupedID to these rows so that I can get the full date range for the earliest FirstGiftDate and the oldest LastGiftDate as long as there is not a break of more than 45 days in between the recurring donations.
For example Bob is a long time supporter. His card has expired multiple times and he has always started new gifts within 45 days. All of his gifts need to be given a single grouped ID. On the opposite side June has been donating and her card expires. She doesn't give again for 6 months, but then continues to give after her card expires. The first gift of Junes should get its own "GroupedID" and the second and third should be grouped together.The grouping count should restart with each donor.
My initial attempt was to join the donation table back to itself aliased as D2. This did work to give me an indicator of which ones were within the 45 day mark but I can't wrap my head around how to then link them. My only thought was to use LEAD and LAG to try analyze each scenario and figure out the different combinations of LEAD and LAG values needed to make it catch each different scenario, but that doesn't seem as reliable as scaleable as I'd like it to be.
I appreciate any help anyone can give.
My code:
SELECT #Donation.*, D2.*
FROM #Donation
LEFT JOIN #Donation D2 ON #Donation.RecurringGiftID <> D2.RecurringGiftID
AND #Donation.Donor = D2.Donor
AND ABS(DATEDIFF(DAY, #Donation.FirstGiftDate, D2.LastGiftDate)) < 45
Table structure and sample data:
CREATE TABLE #Donation
(
RecurringGiftID int,
Donor nvarchar(25),
FirstGiftDate date,
LastGiftDate date
)
INSERT INTO #Donation
VALUES (1, 'Bob', '2017-02-15', '2018-07-01'),
(15, 'Bob', '2018-08-05', '2019-04-01'),
(32, 'Bob', '2019-04-15', '2022-06-15'),
(54, 'June', '2015-05-01', '2016-05-01'),
(96, 'June', '2016-12-15', '2018-02-01'),
(120, 'June', '2018-03-04', '2020-07-01')
Desired output:
RecurringGiftId
Donor
FirstGiftDate
LastGiftDate
GroupedID
1
Bob
2017-02-15
2018-07-01
1
15
Bob
2018-08-05
2019-04-01
1
32
Bob
2019-04-15
2022-06-15
1
54
June
2015-05-01
2016-05-01
1
96
June
2016-12-15
2018-02-01
2
120
June
2018-03-04
2020-07-01
2
use LAG() to detect when current row is more than 45 days from previous and perform a cumulative sum to form the required Group ID
select *,
GroupedID = sum(g) over (partition by Donor order by FirstGiftDate)
from
(
select *,
g = case when datediff(day,
lag(LastGiftDate, 1, '19000101') over (partition by Donor
order by FirstGiftDate),
FirstGiftDate)
> 45
then 1
else 0
end
from #Donation
) d

Getting the Min(startdate) and Max(enddate) for an ID when that ID shows up multiple times

I have a table with a column for ID, StartDate, EndDate, And whether or not there was a gap between the enddate of that row and the next start date. If there was only one set instance of that ID i know that I could just do
SELECT min(startdate),max(enddate)
FROM table
GROUP BY ID
However, I have multiple instances of these IDs in several non-connected timespans. So if I were to do that I would get the very first start date and the last enddate for a different set of time for that personID. How would I go about making sure I get the min a max dates for the specific blocks of time?
I thought about potentially creating a new column where it would have a number for each set of time. So for the first set of time that has no gaps, it would have 1, then when the next row has a gap it will add +1 corresponding to a new set of time. but I am not really sure how to go about that. Here is some sample data to illustrate what I am working with:
ID StartDate EndDate NextDate Gap_ind
001 1/1/2018 1/31/2018 2/1/2018 N
001 2/1/2018 2/30/2018 3/1/2018 N
001 3/1/2018 3/31/2018 5/1/2018 Y
001 5/1/2018 5/31/2018 6/1/2018 N
001 6/1/2018 6/30/2018 6/30/2018 N
This is a classic "gaps and islands" problem, where you are trying to define the boundaries of your islands, and which you can solve by using some windowing functions.
Your initial effort is on track. Rather than getting the next start date, though, I used the previous end date to calculate the groupings.
The innermost subquery below gets the previous end date for each of your date ranges, and also assigns a row number that we use later to keep our groupings in order.
The next subquery out uses the previous end date to identify which groups of date ranges go together (overlap, or nearly so).
The outermost query is the end result you're looking for.
SELECT
Grp.ID,
MIN(Grp.StartDate) AS GroupingStartDate,
MAX(Grp.EndDate) AS GroupingEndDate
FROM
(
SELECT
PrevDt.ID,
PrevDt.StartDate,
PrevDt.EndDate,
SUM(CASE WHEN DATEADD(DAY,1,PrevDt.PreviousEndDate) >= PrevDt.StartDate THEN 0 ELSE 1 END)
OVER (PARTITION BY PrevDt.ID ORDER BY PrevDt.RN) AS GrpNum
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY ID ORDER BY StartDate, EndDate) as RN,
ID,
StartDate,
EndDate,
LAG(EndDate,1) OVER (PARTITION BY ID ORDER BY StartDate) AS PreviousEndDate
FROM
tbl
) AS PrevDt
) AS Grp
GROUP BY
Grp.ID,
Grp.GrpNum;
Results:
+-----+------------------+--------------+
| ID | InitialStartDate | FinalEndDate |
+-----+------------------+--------------+
| 001 | 2018-01-01 | 2018-03-01 |
| 001 | 2018-05-01 | 2018-06-01 |
+-----+------------------+--------------+
SQL Fiddle demo.
Further reading:
The SQL of Gaps and Islands in Sequences
Gaps and Islands Across Date Ranges
This is an example of a gaps-and-islands problem. A simple solution is to use lag() to determine if there are overlaps. When there is none, you have the start of a group. A cumulative sum defines the group -- and you aggregate on that.
select t.id, min(startdate), max(enddate)
from (select t.*,
sum(case when prev_enddate >= dateadd(day, -1, startdate)
then 0 else 1
end) over (partition by id order by startdate) as grp
from (select t.*, lag(enddate) over (partition by id order by startdate) as prev_enddate
from t
) t
) t
group by id, grp;

Calculating Year to Date Total

I want to generate a Payroll type query whereby the values in Payroll 1 (say for the previous month) should be included in Payroll 2 (for the current month) Year-to-Date Totals.
This can best be explained with an example:
DECLARE #MyTable TABLE(ID INT IDENTITY, PayrollID INT, Description NVARCHAR(MAX), [Current Month] MONEY)
INSERT INTO #MyTable
VALUES (1,'Basic Salary',100),
(1,'Normal Over Time',50),
(1,'Work on Saturday',150),
(1,'Work on Sunday',200),
(2,'Basic Salary',100)
SELECT * ,SUM([Current Month]) OVER (PARTITION BY Description ORDER BY PayrollID) AS [Month to Date]
FROM #MyTable
When I run the above I get
ID EmployeeID PayrollID Description Current Month Month to Date
1 1 1 Basic Salary 100 100
2 1 1 Normal Over Time 50 50
3 1 1 Work on Saturday 150 150
4 1 1 Work on Sunday 200 200
5 1 2 Basic Salary 100 200
The Year-to-Date running totals are per each Description meaning Basic Salary Category has its own running total and so does Saturday and Sunday etc, etc. You will notice that for Basic Salary in Payroll 2 the running Year-to-Date total is 200 (i.e. 100 from Payroll 1 + 100 from Payroll 2)
The challenge I have is that Payroll 1 has data for Basic Salary, Work on Saturday and Work on Sunday whereas Payroll 2 only has Basic Salary as the employee did not work on Saturday nor on Sunday in Payroll 2 (the current month).
However, in the cumulative Year-to-Date column the data from Payroll 1 (previous month) should still be selected and included in the Year-to-Date running Total -
something like this:
ID EmployeeID PayrollID Description Current Month Month to Date
1 1 1 Basic Salary 100 100
2 1 1 Normal Over Time 50 50
3 1 1 Work on Saturday 150 150
4 1 1 Work on Sunday 200 200
5 1 2 Basic Salary 100 200
2 1 1 Normal Over Time NULL 50
3 1 1 Work on Saturday NULL 150
4 1 1 Work on Sunday NULL 200
Although the employee did not work on Saturday nor Sunday in the current month (Payroll 2) the running (Year-to-Date) totals for working on a Saturday should be 150 that he/she worked in the previous month (Payroll 1). The same should apply to working on Sunday where the running total in the current month (Payroll 2) should be the 200 that he/she worked in the previous month (Payroll 1).
How do I do that with a simple Select Statement without writing a complicated Procedure?
EDIT:
I have cleaned up the ode as follows:
DECLARE #MyTable TABLE(ID INT IDENTITY, EmployeeID INT, PayrollID INT, Description NVARCHAR(MAX), [Current Month] MONEY)
INSERT INTO #MyTable
VALUES (1,1,'Basic Salary',100),
(1,1,'Normal Over Time',50),
(1,1,'Work on Saturday',150),
(1,1,'Work on Sunday',200),
(1,2,'Basic Salary',100)
WITH pay_elements AS
(
SELECT Description
FROM #MyTable
GROUP BY Description
)
,pay_slips AS
(
SELECT EmployeeID, PayrollID
FROM #MyTable
GROUP BY EmployeeID, PayrollID
)
,pay_lines AS
(
SELECT
mt.ID
,PS.EmployeeID
,PS.PayrollID
,PE.Description
,ISNULL(mt.[Current Month], 0) AS [Current Month]
FROM
pay_slips AS ps
OUTER APPLY
pay_elements AS pe
LEFT JOIN
#MyTable AS mt
ON (mt.EmployeeID = ps.EmployeeID)
AND (mt.PayrollID = ps.PayrollID)
AND (mt.Description = pe.Description)
)
SELECT * ,SUM([Current Month]) OVER (PARTITION BY EmployeeID, Description ORDER BY PayrollID) AS [Month to Date]
FROM pay_lines
And I get this error:
Msg 319, Level 15, State 1, Line 10
Incorrect syntax near the keyword 'with'. If this statement is a common table expression, an xmlnamespaces clause or a change tracking context clause, the previous statement must be terminated with a semicolon.
Msg 102, Level 15, State 1, Line 17
Incorrect syntax near ','.
Msg 102, Level 15, State 1, Line 23
Incorrect syntax near ','.
You first need to build a "structure" of row headings, and then join that onto the actual data.
So for example:
WITH pay_elements AS
(
SELECT Description
FROM #MyTable
GROUP BY Description
)
,pay_slips AS
(
SELECT EmployeeID, PayrollID
FROM #MyTable
GROUP BY EmployeeID, PayrollID
)
,pay_lines AS
(
SELECT
mt.ID
,pay_slips.EmployeeID
,pay_slips.PayrollID
,pay_elements.Description
,ISNULL(mt.Current_Month, 0) AS Current_Month
FROM
pay_slips AS ps
OUTER APPLY
pay_elements AS pe
LEFT JOIN
#MyTable AS mt
ON (mt.EmployeeID = ps.EmployeeID)
AND (mt.PayrollID = ps.PayrollID)
AND (mt.Description = pe.Description)
)
SELECT * ,SUM([Current Month]) OVER (PARTITION BY EmployeeID, Description ORDER BY PayrollID) AS [Month to Date]
FROM pay_lines
What we're doing here is getting a list of the different kind of pay elements in your table. Then we're getting a list of Employees and Payrolls done to date, and manually forcing every Payroll to include a row in respect of all possible pay elements.
Once that structure is built, we join onto the base table to get the actual values (replacing NULLs with zeros, for those pay elements that weren't originally included in the base table).
Then we simply query this padded-out table in the same way you did originally.
Note, I've written this on the fly and haven't checked this code so please excuse any minor errors.
I am little confused with the column you mentioned Year-to-Date in your description. I assume this might be [Month to Date] column present in your query. Please correct me if I am wrong.
I think what you are trying to achieve is - the descriptions which are not present in payroll ID 2 like Work on Saturday and Work on Sunday should also be selected below the result set.
Problem is:
Summation of NULL value is always NULL so if [Current Month] value is NULL then you can not achieve to display 50,150,200 in the [Month to Date] column
You can have fixed categories against each payroll id:
Normal Over Time
Work on Saturday
Work on Sunday
Basic Salary
Query:
DECLARE #MyTable TABLE(ID INT IDENTITY, PayrollID INT, Description NVARCHAR(MAX), [Current Month] MONEY)
INSERT INTO #MyTable
VALUES (1,'Basic Salary',100),
(1,'Normal Over Time',50),
(1,'Work on Saturday',150),
(1,'Work on Sunday',200),
(2,'Basic Salary',100),
(2,'Normal Over Time',0),
(2,'Work on Saturday',0),
(2,'Work on Sunday',0)
SELECT * ,SUM([Current Month]) OVER (PARTITION BY Description ORDER BY PayrollID) AS [Month to Date]
FROM #MyTable order by ID,PayrollID

SQL Server CTE Use Previous Computed Date as Next Start Date

I have a table that holds tasks. Each task has an allotted number of hours that it's supposed to take to complete the task.
I'm storing the data in a table, like so:
declare #fromtable table (recordid int identity(1,1), orderdate date, deptid int, task varchar(500), estimatedhours int);
I also have a function that calculates the completion date of the task, based on the start date, estimated hours, and department, and some other math that computes headcount, hours available to work, etc.
dbo.fn_getCapEndDate(aStartDate,estimatedHours,deptID)
I need to generate the start and end date for each record in #fromtable. The first record will start with column orderdate as the start date for the computation, then each subsequent record will use the previous record's computedEndDate as their start date.
What I'm trying to achieve:
Here's what I have started with:
with MyCTE as
(
select mt.recordID, mt.deptID, mt.estimatedhours, mt.JobNumber, ROW_NUMBER() over (order by recordID) as RowNum,
convert(date,mt.orderdate) as computedStart,
case when mt.recordID = 1 then convert(date,dbo.fn_getCapEndDate(mt.orderdate,mt.estimatedhours,mt.deptid)) end as computedEnd
from #fromtable mt
)
select c1.*, c2.recordID,
case when c2.recordid is null then c1.computedStart else c2.computedEnd end as StartDate,
case when c2.recordid is null then c1.computedEnd else dbo.fn_getCapEndDate(c2.computedEnd,c1.estimatedhours,c1.deptid) end as computedEnd
from MyCTE c1
left join MyCTE c2 on c1.RowNum = c2.RowNum + 1;
With this, the first two columns have the correct start/end dates. Every column after that computes NULL for its start and end values. It "loses" the value of the previous column's computed end date.
What can I do to fix the issue and return the values as needed?
EDIT: Sample data in text format:
estimatedhours OrderDate
0 1/1/2017
0 1/1/2017
0 1/1/2017
0 1/1/2017
500 1/1/2017
32 1/1/2017
0 1/1/2017
0 1/1/2017
320 1/1/2017
0 1/1/2017
5 1/1/2017
0 1/1/2017
4 1/1/2017
You can use lead as below:
select RecordId, EstimatedHours, StartDate,
ComputedEnd = LEAD(StartDate) over (order by RecordId)
From yourTable

Resources