Find customer lapse across variable subscription periods - sql-server

Hoping someone has run across this issue previously and has a solution.
I am trying to find customers who lapse based off subscription periods rather than a single order date.
Lapse is defined by us as not making a purchase/renewal within 30 days of the end of their subscription. A customer can have multiple subscriptions simultaneously and subscriptions can vary in length.
I have a data set that includes customerIDs, Orders, the subscription start date, the subscription expire date, and that order’s rank in the customer’s order history, something like this:
CREATE TABLE #Subscriptions
(CustomerID INT,
Orderid INT,
SubscriptionStart DATE,
SubscriptionEnd DATE,
OrderNumber INT);
INSERT INTO #Subscriptions
VALUES(1, 111111, '2017-01-01', '2017-12-31', 1),
(1, 211111, '2018-01-01', '2019-12-31' ,2),
(1, 311121, '2018-10-01', '2018-10-02', 3),
(1, 451515, '2019-02-01', '2019-02-28', 4),
(2, 158797, '2018-07-01', '2018-07-31', 1),
(2, 287584, '2018-09-01', '2018-12-31', 2),
(2, 387452, '2019-01-01', '2019-01-31', 3),
(3, 187498, '2019-01-01', '2019-02-28', 1),
(3, 284990, '2019-02-01', '2019-02-28', 2),
(4, 184849, '2019-02-01', '2019-02-28', 1)
Within this data set, customer 2 would have lapsed on 2018-07-31. Since Customer 1 has a subscription of 2017-01-01 - 2017-12-31 and then one that starts 2018-01-01 and ends 2019-12-31 they cannot lapse within that time period even if other orders made by the customer would qualify.
I have attempt some of simple gap calculations using LEAD() and LAG(), however, I have had no success due to the variable lengths of the subscription period where a single subscription can span across multiple other orders. Eventually, we will use this to calculate monthly churn rate across approximately 5 million records.

You're overthinking this trying to use LEAD() and LAG(). All you need is a NOT EXISTS() function in the WHERE clause
In psuedocode:
SELECT...FROM...
WHERE {SubscriptionEnd is at least 30 days in the past}
AND NOT EXISTS(
{A row for the same Customer where the StartDate is 30 days or less after this EndDate}
)

This one looks to be a tricky one. You are correct about the problem with using the LEAD() and LAG() functions. It stems from customers being able to have multiple subscriptions of variable length. So we need to deal with that issue first. Let's begin with creating a single list of dates instead of having a list of SubscriptionStart and SubscriptionEnd.
SELECT
CustomerId,
OrderId,
1 AS Activity,
SubscriptionStart AS ActivityDate
FROM
#Subscriptions
UNION ALL
SELECT
CustomerId,
OrderId,
-1 AS Activity,
SubscriptionEnd AS ActivityDate
FROM
#Subscriptions
ORDER BY
CustomerId,
ActivityDate
CustomerId OrderId Activity ActivityDate
----------- ----------- ----------- ------------
1 111111 1 2017-01-01
1 111111 -1 2017-12-31
1 211111 1 2018-01-01
1 311121 1 2018-10-01
1 311121 -1 2018-10-02
1 451515 1 2019-02-01
1 451515 -1 2019-02-28
1 211111 -1 2019-12-31
2 158797 1 2018-07-01
2 158797 -1 2018-07-31
2 287584 1 2018-09-01
2 287584 -1 2018-12-31
2 387452 1 2019-01-01
2 387452 -1 2019-01-31
3 187498 1 2019-01-01
3 284990 1 2019-02-01
3 187498 -1 2019-02-28
3 284990 -1 2019-02-28
4 184849 1 2019-02-01
4 184849 -1 2019-02-28
Notice the additional Activity field. It is 1 for the SubscriptionStart and -1 for the SubscriptionEnd.
Using this new Activity field it is possible to find places where there might be a lapse in the customer's subscriptions. At the same time use LEAD() to find the NextDate.
;WITH SubscriptionList AS (
SELECT
CustomerId,
OrderId,
1 AS Activity,
SubscriptionStart AS ActivityDate
FROM
#Subscriptions
UNION ALL
SELECT
CustomerId,
OrderId,
-1 AS Activity,
SubscriptionEnd AS ActivityDate
FROM
#Subscriptions
)
SELECT
CustomerId,
OrderId,
Activity,
SUM(Activity) OVER(PARTITION BY CustomerId ORDER BY ActivityDate ROWS UNBOUNDED PRECEDING) as SubscriptionCount,
ActivityDate,
LEAD(ActivityDate, 1, GETDATE()) OVER(PARTITION BY CustomerId ORDER BY ActivityDate) AS NextDate,
DATEDIFF(d, ActivityDate, LEAD(ActivityDate, 1, GETDATE()) OVER(PARTITION BY CustomerId ORDER BY ActivityDate)) AS LapsedDays
FROM
SubscriptionList
ORDER BY
CustomerId,
ActivityDate
CustomerId OrderId Activity SubscriptionCount ActivityDate NextDate LapsedDays
----------- ----------- ----------- ----------------- ------------ ---------- -----------
1 111111 1 1 2017-01-01 2017-12-31 364
1 111111 -1 0 2017-12-31 2018-01-01 1
1 211111 1 1 2018-01-01 2018-10-01 273
1 311121 1 2 2018-10-01 2018-10-02 1
1 311121 -1 1 2018-10-02 2019-02-01 122
1 451515 1 2 2019-02-01 2019-02-28 27
1 451515 -1 1 2019-02-28 2019-12-31 306
1 211111 -1 0 2019-12-31 2019-02-28 -306
2 158797 1 1 2018-07-01 2018-07-31 30
2 158797 -1 0 2018-07-31 2018-09-01 32
2 287584 1 1 2018-09-01 2018-12-31 121
2 287584 -1 0 2018-12-31 2019-01-01 1
2 387452 1 1 2019-01-01 2019-01-31 30
2 387452 -1 0 2019-01-31 2019-02-28 28
3 187498 1 1 2019-01-01 2019-02-01 31
3 284990 1 2 2019-02-01 2019-02-28 27
3 187498 -1 1 2019-02-28 2019-02-28 0
3 284990 -1 0 2019-02-28 2019-02-28 0
4 184849 1 1 2019-02-01 2019-02-28 27
4 184849 -1 0 2019-02-28 2019-02-28 0
Adding running total on the Activity field will effectively give the number of active subscriptions. While it is greater than 0 a lapse is not possible. So focus in on the rows WHERE the SubscriptionCount is zero.
Using LEAD() get the NextDate. If there isn't a next date then default to today. If the SubscriptionCount is 0 then the NextDate has to be from a new subscription and the NextDate will be the date that the new subscription starts. Using DATEDIFF count the number of days between the SubscriptionEnd and the SubscriptionBegin if it is > 30 days then there was a lapse. Sounds like a good WHERE statement.
;WITH SubscriptionList AS (
SELECT
CustomerId,
OrderId,
1 AS Activity,
SubscriptionStart AS ActivityDate
FROM
#Subscriptions
UNION ALL
SELECT
CustomerId,
OrderId,
-1 AS Activity,
SubscriptionEnd AS ActivityDate
FROM
#Subscriptions
)
, FindLapse AS (
SELECT
CustomerId,
OrderId,
Activity,
SUM(Activity) OVER(PARTITION BY CustomerId ORDER BY ActivityDate ROWS UNBOUNDED PRECEDING) as SubscriptionCount,
ActivityDate,
LEAD(ActivityDate, 1, GETDATE()) OVER(PARTITION BY CustomerId ORDER BY ActivityDate) AS NextDate
FROM
SubscriptionList
)
SELECT
CustomerId,
OrderId,
Activity,
SubscriptionCount,
ActivityDate,
NextDate,
DATEDIFF(d, ActivityDate, NextDate) AS LapsedDays
FROM
FindLapse
WHERE
SubscriptionCount = 0
AND DATEDIFF(d, ActivityDate, NextDate) >= 30
CustomerId OrderId Activity SubscriptionCount ActivityDate NextDate LapsedDays
----------- ----------- ----------- ----------------- ------------ ---------- -----------
2 158797 -1 0 2018-07-31 2018-09-01 32
Looks like we have a winner!

Related

Number of Days Between a List of Dates

I have a temp table with deals, dates, and volumes in it. What I need is to calculate the days between the effective dates in a dynamic fashion. Also complicating this, the first for each deal is in the past but the represents the current deal volume. So for that line I need to return the number of days from today to the next effective date for that deal. Furthermore, on the last effective date for each deal, I have to run a subquery to grab the contract end date from another temp table.
Sample of the temp table and the sample return needed:
Sample
Here's an option to explore.
Basically what you're after for each record:
Previous record EffectiveDate based on Deal_ID - LAG()
Next EffectiveDate after today based on Deal_ID - Sub query
Then you can evaluate and figure out days based on those values.
Here's a temp table and some sample data:
CREATE TABLE #Deal
(
[Row_ID] INT
, [Deal_ID] BIGINT
, [EffectiveDate] DATE
, [Volume] BIGINT
);
INSERT INTO #Deal (
[Row_ID]
, [Deal_ID]
, [EffectiveDate]
, [Volume]
)
VALUES ( 1, 1479209, '2018-11-01', 5203 )
, ( 2, 1479209, '2019-03-01', 2727 )
, ( 3, 1479209, '2019-04-01', 1615 )
, ( 4, 1479209, '2019-06-01', 1325 )
, ( 5, 1598451, '2018-12-01', 2000 )
, ( 6, 1598451, '2019-04-01', 4000 )
, ( 7, 1598451, '2019-08-01', 4000 );
Here's an example query using LAG() and sub-query:
SELECT *
-- LAG here partitioned by the Deal_ID, will return NULL if first record.
, LAG([dl].[EffectiveDate], 1, NULL) OVER ( PARTITION BY [dl].[Deal_ID]
ORDER BY [dl].[EffectiveDate]
) AS [PreviousRowEffectiveData]
--Sub query to get min EffectiveDate that is greater than today
, (
SELECT MIN([dl1].[EffectiveDate])
FROM #Deal [dl1]
WHERE [dl1].[Deal_ID] = [dl].[Deal_ID]
AND [dl1].[EffectiveDate] > GETDATE()
) AS [NextEffectiveDateAfterToday]
FROM #Deal [dl]
Giving you these results:
Row_ID Deal_ID EffectiveDate Volume PreviousRowEffectiveData NextEffectiveDateAfterToday
----------- -------------------- ------------- -------------------- ------------------------ ---------------------------
1 1479209 2018-11-01 5203 NULL 2019-03-01
2 1479209 2019-03-01 2727 2018-11-01 2019-03-01
3 1479209 2019-04-01 1615 2019-03-01 2019-03-01
4 1479209 2019-06-01 1325 2019-04-01 2019-03-01
5 1598451 2018-12-01 2000 NULL 2019-04-01
6 1598451 2019-04-01 4000 2018-12-01 2019-04-01
7 1598451 2019-08-01 4000 2019-04-01 2019-04-01
Now that we have that we can use that in a sub query and then implement the business rules for DAYS if I understood correctly:
First row for a Deal_ID, number of days from next effective date after today to today.
If not first, number of days from previous effective date to row effective date.
Example query:
SELECT *
--Case statement, if previousrow null(first record) difference in days of datday and NextEfectiveDateAfterToday
--Else we will do the difference in days of previousrow and this rows effective date.
, CASE WHEN [Deal].[PreviousRowEffectiveData] IS NULL THEN DATEDIFF(DAY, GETDATE(), [Deal].[NextEffectiveDateAfterToday])
ELSE DATEDIFF(DAY, [Deal].[PreviousRowEffectiveData], [Deal].[EffectiveDate])
END AS [DAYS]
FROM (
SELECT *
-- LAG here partioned by the Deal_ID, we'l return NULL if first record.
, LAG([dl].[EffectiveDate], 1, NULL) OVER ( PARTITION BY [dl].[Deal_ID]
ORDER BY [dl].[EffectiveDate]
) AS [PreviousRowEffectiveData]
--Sub query to get min EffectiveDate that is greater than today
, (
SELECT MIN([dl1].[EffectiveDate])
FROM #Deal [dl1]
WHERE [dl1].[Deal_ID] = [dl].[Deal_ID]
AND [dl1].[EffectiveDate] > GETDATE()
) AS [NextEffectiveDateAfterToday]
FROM #Deal [dl]
) AS [Deal];
Giving us the final results of:
Row_ID Deal_ID EffectiveDate Volume PreviousRowEffectiveData NextEffectiveDateAfterToday DAYS
----------- -------------------- ------------- -------------------- ------------------------ --------------------------- -----------
1 1479209 2018-11-01 5203 NULL 2019-03-01 14
2 1479209 2019-03-01 2727 2018-11-01 2019-03-01 120
3 1479209 2019-04-01 1615 2019-03-01 2019-03-01 31
4 1479209 2019-06-01 1325 2019-04-01 2019-03-01 61
5 1598451 2018-12-01 2000 NULL 2019-04-01 45
6 1598451 2019-04-01 4000 2018-12-01 2019-04-01 121
7 1598451 2019-08-01 4000 2019-04-01 2019-04-01 122

Sql-Get time ranges from million+ rows for particular condition

I am working with SQL Server 2012, I have a table with approx 35 column and 10+ million rows.
I need to find time ranges from across the data where the value of any particular column is matching
E.g.
The sample data is as below
Datetime col1 col2 col3
2018-05-31 0:00 1 2 1
2018-05-31 13:00 2 2 2
2018-05-31 14:30 3 2 1
2018-05-31 15:00 4 3 1
2018-05-31 16:00 4 5 1
2018-05-31 17:00 3 2 2
2018-05-31 17:30 3 2 4
2018-05-31 18:00 2 2 4
2018-05-31 20:00 1 2 6
2018-05-31 21:00 2 2 3
2018-05-31 21:10 2 2 1
2018-05-31 22:00 1 6 3
2018-05-31 22:00 4 5 1
2018-05-31 23:59 4 7 2
Find the time range from data where col2 value =< 2, accordingly my expected result set is as below
Start Time End time Time Diff
2018-05-31 0:00 2018-05-31 14:30 14:30:00
2018-05-31 17:00 2018-05-31 21:10 4:10:00
I can achieved the same with below logic, but it's extremely slow
I get all rows and then
Order by date_Time
Scan the rows get the first row where exactly value is matching and record that timestamp as start time.
Scan further rows till i get the row where condition is breaking and record that timestamp as end time.
But as i have to play with huge no. Of rows, overall this will make my operation slow, any inputs or pseudo code to improve the same.
We can use a slightly modified difference in row number method here. The purpose of the first CTE labelled cte1 is to add a computed column which labels islands we want, having a col2 values <= 2, as 1 and everything else as 0. Then, we can compute the difference of two row numbers, and aggregate over the islands to find the starting and ending times, and the difference between those times.
WITH cte1 AS (
SELECT *,
CASE WHEN col2 <= 2 THEN 1 ELSE 0 END AS class
FROM yourTable
),
cte2 AS (
SELECT *,
ROW_NUMBER() OVER (ORDER BY Datetime) -
ROW_NUMBER() OVER (PARTITION BY class ORDER BY Datetime) rn
FROM cte1
)
SELECT
MIN(Datetime) AS [Start Time],
MAX(Datetime) AS [End Time],
CONVERT(TIME, MAX(Datetime) - MIN(Datetime)) AS [Time Diff]
FROM cte2
WHERE class = 1
GROUP BY rn
ORDER BY MIN(Datetime);
Demo

MS SQL value change

I have a SQL query that looks like this.
Select Timestamp, Value From [dbo].[nro_ReadRawDataByTimeFunction](
'SV/SVTP01.BONF0335-D1-W1-BL1',
'2017-11-01 00:00',
'2017-12-01 00:00')
GO
This will return
Timestamp | Value
1 2017-11-01 10:00 | 0
2 2017-11-01 11:00 | 0
3 2017-11-01 12:00 | 0
4 2017-11-01 13:00 | 1
5 2017-11-01 14:00 | 1
6 2017-11-01 15:00 | 0
7 2017-11-01 16:00 | 0
8 2017-11-01 17:00 | 0
9 2017-11-01 18:00 | 1
10 2017-11-01 19:00 | 0
The full list is alot larger, and I'm only interested in in results where value change from last result, so in this case row 1,4,6,9,10
I know how to do it if it's directly from a table but not when it's from a function
You can use this construction:
;WITH cte AS (
Select [Timestamp],
[Value]
From [dbo].[nro_ReadRawDataByTimeFunction](
'SV/SVTP01.BONF0335-D1-W1-BL1',
'2017-11-01 00:00',
'2017-12-01 00:00')
)
SELECT TOP 1 WITH TIES c.*
FROM cte c
OUTER APPLY (
SELECT TOP 1 *
FROM cte
WHERE [Value] != c.[Value] AND c.[Timestamp] < [Timestamp]
ORDER BY [Timestamp] ASC
) t
ORDER BY ROW_NUMBER() OVER (PARTITION BY t.[Timestamp] ORDER BY c.[Timestamp] ASC)
Output:
Timestamp Value
2017-11-01 19:00 0
2017-11-01 10:00 0
2017-11-01 13:00 1
2017-11-01 15:00 0
2017-11-01 18:00 1
Explanation:
SELECT *
FROM cte c
OUTER APPLY (
SELECT TOP 1 *
FROM cte
WHERE [Value] != c.[Value] AND c.[Timestamp] < [Timestamp]
ORDER BY [Timestamp] ASC
) t
Here we select data from main table and with the help of OUTER APPLY add to each row data with different value and greater timestamp.
ROW_NUMBER() OVER (PARTITION BY t.[Timestamp] ORDER BY c.[Timestamp] ASC)
Hope, you are familiar with ROW_NUMBER it
returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.
So, if you run the above query and add this code to SELECT, you will get:
Timestamp Value Timestamp Value rn
2017-11-01 19:00 0 NULL NULL 1
2017-11-01 10:00 0 2017-11-01 13:00 1 1
2017-11-01 11:00 0 2017-11-01 13:00 1 2
2017-11-01 12:00 0 2017-11-01 13:00 1 3
2017-11-01 13:00 1 2017-11-01 15:00 0 1
2017-11-01 14:00 1 2017-11-01 15:00 0 2
2017-11-01 15:00 0 2017-11-01 18:00 1 1
2017-11-01 16:00 0 2017-11-01 18:00 1 2
2017-11-01 17:00 0 2017-11-01 18:00 1 3
2017-11-01 18:00 1 2017-11-01 19:00 0 1
As you can see, all rows you need are marked with 1. We coluld put this in other CTE or sub-query and use rn = 1 but we can do it all-in-one with the help of TOP 1 WITH TIES (MSDN link).
Since you are referring to SQL Server 2012 you can enjoy the new features:
;WITH Hist AS (
SELECT r,
LAG(v) OVER(ORDER BY d) PreviousValue,
v,
LEAD(v) OVER(ORDER BY d) NextValue ---Just to know that also this is available
FROM #t
)
SELECT *
FROM Hist h Inner JOIN #t t ON h.r = t.r
WHERE ISNULL(h.PreviousValue, -1) != t.v
#t contains your results
If this function is a table-valued function, you can just put it in the where or do it inside a function, in this second case as a parameter?
Select a.Timestamp, a.Value From [dbo].[nro_ReadRawDataByTimeFunction](
'SV/SVTP01.BONF0335-D1-W1-BL1',
'2017-11-01 00:00',
'2017-12-01 00:00') as a
WHERE a.Value = 1
What i ended with is as following
#DECLARE #startDate DATE,
#tagName nVarChar(200);
WITH CTE AS(
SELECT Timestamp As StopTime, Value As [OFF], LAG(Value,1) OVER (order by Timestamp) As [ON], Quality
FROM [dbo].[nrp_ReadRawDataByTimeFunction] (#TagName,#startDate, DATEADD(MONTH,2,#startDate))
Where Quality & 127 = 100
),
CalenderCTE AS(
SELECT [DATE] = DATEADD(Day,Number,#startDate)
FROM master..spt_values
WHERE Type='P'
AND DATEADD(day,Number,#startDate) < DATEADD(MONTH,1,#startDate)
)
SELECT * FROM CTE
FULL OUTER JOIN
CalenderCTE on CalenderCTE.Date = CAST(CTE.StopTime as [Date])
Where CTE.[OFF] != CTE.[ON}
This is just a tiny part of the query since it doing alot more which ain't included in the original post.
Thanks all for your input's, it help me on the way to the final result.

MSSQL 2008 Merge Contiguous Dates With Groupings

I have searched high and low for weeks now trying to find a solution to my problem.
As far as I can ascertain, my SQL Server version (2008r2) is a limiting factor on this but, I am positive there is a solution out there.
My problem is as follows:
A have a table with potential contiguous dates in the form of Customer-Status-DateStart-DateEnd-EventID.
I need to merge contiguous dates by customer and status - the status field can shift up and down throughout a customers pathway.
Some example data is as follows:
DECLARE #Tbl TABLE([CustomerID] INT
,[Status] INT
,[DateStart] DATE
,[DateEnd] DATE
,[EventID] INT)
INSERT INTO #Tbl
VALUES (1,1,'20160101','20160104',1)
,(1,1,'20160104','20160108',3)
,(1,2,'20160108','20160110',4)
,(1,1,'20160110','20160113',7)
,(1,3,'20160113','20160113',9)
,(1,3,'20160113',NULL,10)
,(2,1,'20160101',NULL,2)
,(3,2,'20160109','20160110',5)
,(3,1,'20160110','20160112',6)
,(3,1,'20160112','20160114',8)
Desired output:
Customer | Status | DateStart | DateEnd
---------+--------+-----------+-----------
1 | 1 | 2016-01-01| 2016-01-08
1 | 2 | 2016-01-08| 2016-01-10
1 | 1 | 2016-01-10| 2016-01-13
1 | 3 | 2016-01-13| NULL
2 | 1 | 2016-01-01| NULL
3 | 2 | 2016-01-09| 2016-01-10
3 | 1 | 2016-01-10| 2016-01-14
Any ideas / code will be greatly received.
Thanks,
Dan
Try this
DECLARE #Tbl TABLE([CusomerID] INT
,[Status] INT
,[DateStart] DATE
,[DateEnd] DATE
,[EventID] INT)
INSERT INTO #Tbl
VALUES (1,1,'20160101','20160104',1)
,(1,1,'20160104','20160108',3)
,(1,2,'20160108','20160110',4)
,(1,1,'20160110','20160113',7)
,(1,3,'20160113','20160113',9)
,(1,3,'20160113',NULL,10)
,(2,1,'20160101',NULL,2)
,(3,2,'20160109','20160110',5)
,(3,1,'20160110','20160112',6)
,(3,1,'20160112','20160114',8)
;WITH CTE
AS
(
SELECT CusomerID ,
Status ,
DateStart ,
COALESCE(DateEnd, '9999-01-01') AS DateEnd,
EventID,
ROW_NUMBER() OVER (ORDER BY CusomerID, EventID) RowId,
ROW_NUMBER() OVER (PARTITION BY CusomerID, Status ORDER BY EventID) StatusRowId FROM #Tbl
)
SELECT
A.CusomerID ,
A.Status ,
A.DateStart ,
CASE WHEN A.DateEnd = '9999-01-01' THEN NULL
ELSE A.DateEnd END AS DateEnd
FROM
(
SELECT
CTE.CusomerID,
CTE.Status,
MIN(CTE.DateStart) AS DateStart,
MAX(CTE.DateEnd) AS DateEnd
FROM
CTE
GROUP BY
CTE.CusomerID,
CTE.Status,
CTE.StatusRowId -CTE.RowId
) A
ORDER BY A.CusomerID, A.DateStart
Output
CusomerID Status DateStart DateEnd
----------- ----------- ---------- ----------
1 1 2016-01-01 2016-01-08
1 2 2016-01-08 2016-01-10
1 1 2016-01-10 2016-01-13
1 3 2016-01-13 NULL
2 1 2016-01-01 NULL
3 2 2016-01-09 2016-01-10
3 1 2016-01-10 2016-01-14

Cleaning up old record to a specific date: How to select the old record?

I posted a question here, which I now need to perform. I edited it a few times to match the current requirement, and now I think i will make it clearer as a final solution for me as well.
My table:
Items | Price | UpdateAt
1 | 2000 | 02/02/2015
2 | 4000 | 06/04/2015
1 | 2500 | 05/25/2015
3 | 2150 | 07/05/2015
4 | 1800 | 07/05/2015
5 | 5540 | 08/16/2015
4 | 1700 | 12/24/2015
5 | 5200 | 12/26/2015
2 | 3900 | 01/01/2016
4 | 2000 | 06/14/2016
As you can see, this is a table that keeps items' price as well as their old price before the last update.
Now I need to find the rows which :
UpdateAt is more than 1 year ago from now
Must have updated price at least once ever since
Aren't the most up-to-date price
Why those conditions? Because I need to perform a cleanup on that table off of those records that older than 1 year, while still maintain the full item list.
So with those conditions, the result from the above table should be :
Items | Price | UpdateAt
1 | 2000 | 02/02/2015
2 | 4000 | 06/04/2015
4 | 1800 | 07/05/2015
The update at 02/02/2015 of item 1 should be selected, while the update no. 2 at 05/25/2015, though still over 1 year old, should not because it is the most up-to-date price for item 1.
Item 3 isn't in the list because it never been updated, hence its price remain the same until now so i don't need to clean it up.
At first i think it wouldn't be so hard, and i think I've already had an answer but as I proceed, it isn't something that easy anymore.
#Tim Biegeleisen provided me with an answer in the last question, but it doesn't select the items which price doesn't change over the year at all, which i'm having to deal with now.
I need a solution to effectively clean up the table - it isn't necessary to follow 3 conditions above if it can produce the same result as I need : Records that needs to be deleted.
try this,
DECLARE #Prices TABLE(Items INT, Price DECIMAL(10,2), UpdateAt DATETIME)
INSERT INTO #Prices
VALUES
(1, 2000, '02/02/2015')
,(2, 4000, '06/04/2015')
,(1, 2500, '05/25/2015')
,(3, 2150, '07/05/2015')
,(4, 1800, '07/05/2015')
,(5, 5540, '08/16/2015')
,(4, 1700, '12/24/2015')
,(5, 5200, '12/26/2015')
,(2, 3900, '01/01/2016')
,(4, 2000, '06/14/2016')
SELECT p.Items, p.Price, p.UpdateAt
FROM #Prices p
LEFT JOIN ( SELECT
p1.Items,
p1.UpdateAt,
ROW_NUMBER() OVER (PARTITION BY p1.Items ORDER BY p1.UpdateAt DESC) AS RowNo
FROM #Prices p1
) AS hp ON hp.Items = p.Items
AND hp.UpdateAt = p.UpdateAt
WHERE hp.RowNo > 1 -- spare one price for each item at any date
AND p.UpdateAt < DATEADD(YEAR, -1, GETDATE()) -- remove only prices older than a year
the result is:
Items Price UpdateAt
----------- --------------------------------------- -----------------------
1 2000.00 2015-02-02 00:00:00.000
2 4000.00 2015-06-04 00:00:00.000
4 1800.00 2015-07-05 00:00:00.000
This query will return the dataset you're looking for:
SELECT t1.Items, t1.Price, t1.UpdateAt
FROM
(
SELECT
t2.Items,
t2.Price,
t2.UpdateAt,
ROW_NUMBER() OVER (PARTITION BY t2.Items ORDER BY t2.UpdateAt DESC) AS rn
FROM [Table] AS t2
) AS t1
WHERE t1.rn > 1
AND t1.UpdateAt < DATEADD(year, -1, GETDATE())

Resources