Window function in Snowflake - snowflake-cloud-data-platform

My data is structured as below -
1.For each ID month denotes reporting month, Sub created is the original subscription purchase date, status = whether customer was active or not, tenure is lifetime months ( It resets to 1 upon the customer returning )
ID Month Sub_created status tenure
100 2017-02-01 2017-02-01 active 1
100 2017-03-01 active 2
100 2017-04-01 active 3
100 2017-05-01 churned 3
100 2021-02-01 2021-02-01 active 1
100 2021-03-01 active 2
100 2021-04-01 active 3
100 2021-05-01 active 4
100 2021-06-01 active 5
100 2021-07-01 active 6
I want to be able to have sub created for all the rows till it has a new subscription date. The output I am trying to get is below -
ID Month Sub_created status tenure
100 2017-02-01 2017-02-01 active 1
100 2017-03-01 2017-02-01 active 2
100 2017-04-01 2017-02-01 active 3
100 2017-05-01 2017-02-01 churned 3
100 2021-02-01 2021-02-01 active 1
100 2021-03-01 2021-02-01 active 2
100 2021-04-01 2021-02-01 active 3
100 2021-05-01 2021-02-01 active 4
100 2021-06-01 2021-02-01 active 5
100 2021-07-01 2021-02-01 active 6
Can anyone suggest snowflake code ? Thanks

You can use the last_value() window function like so:
with CTE as (
select 100 as ID, '2017-02-01' as Month, '2017-02-01' as Sub_created, 'active' as status, 1 as tenure union all
select 100 as ID, '2017-03-01' as Month, null as Sub_created, 'active' as status, 2 as tenure union all
select 100 as ID, '2017-04-01' as Month, null as Sub_created, 'active' as status, 3 as tenure union all
select 100 as ID, '2017-05-01' as Month, null as Sub_created, 'churned' as status, 3 as tenure union all
select 100 as ID, '2021-02-01' as Month, '2021-02-01' as Sub_created, 'active' as status, 1 as tenure union all
select 100 as ID, '2021-03-01' as Month, null as Sub_created, 'active' as status, 2 as tenure union all
select 100 as ID, '2021-04-01' as Month, null as Sub_created, 'active' as status, 3 as tenure union all
select 100 as ID, '2021-05-01' as Month, null as Sub_created, 'active' as status, 4 as tenure union all
select 100 as ID, '2021-06-01' as Month, null as Sub_created, 'active' as status, 5 as tenure union all
select 100 as ID, '2021-07-01' as Month, null as Sub_created, 'active' as status, 6 as tenure
)
select ID, Month, Sub_created as Sub_created_orig,
last_value(sub_created ignore nulls) over (partition by id order by month rows between unbounded preceding and current row) as Sub_created_new,
status, tenure
from CTE
order by ID, month;

Related

Dynamically find unpaid invoices

How can I dynamically find un-paid invoices from tables bellow:
Invoices Table
InvoiceID, Date CustomerID, Amount
1 06/01/2022 1 5000.00
2 08/03/2022 1 4000.00
3 08/25/2022 1 3000.00
4 09/05/2022 1 4500.00
5 09/25/2022 1 4500.00
6 010/10/2022 1 2000.00
7 11/20/2022 1 2500.00
Payments Table:-
PaymentID Date CustomerID Amount
1 06/10/2022 1 3000.00
2 06/25/2022 1 4000.00
3 07/15/2022 1 2000.00
4 09/10/2022 1 3000.00
5 10/22/2022 1 4000.00
6 10/24/2022 1 1500.00
7 10/28/2022 1 1000.00
8 11/14/2022 1 500.00
Try to start with this:
SELECT I.CustomerID
, I.AmountTotal-ISNULL(P.AmountTotal,0) as AmountDiff
FROM
( SELECT CustomerID
, SUM(Amount) AmountTotal
FROM <invoices_table>
GROUP
BY CustomerID
) I
LEFT
OUTER
JOIN
( SELECT CustomerID
, SUM(Amount) AmountTotal
FROM <payments_table>
GROUP
BY CustomerID
) P
ON I.CustomerID = P.CustomerID
WHERE I.AmountTotal <= P.AmountTotal

Number of Days Between a List of Dates

I have a temp table with deals, dates, and volumes in it. What I need is to calculate the days between the effective dates in a dynamic fashion. Also complicating this, the first for each deal is in the past but the represents the current deal volume. So for that line I need to return the number of days from today to the next effective date for that deal. Furthermore, on the last effective date for each deal, I have to run a subquery to grab the contract end date from another temp table.
Sample of the temp table and the sample return needed:
Sample
Here's an option to explore.
Basically what you're after for each record:
Previous record EffectiveDate based on Deal_ID - LAG()
Next EffectiveDate after today based on Deal_ID - Sub query
Then you can evaluate and figure out days based on those values.
Here's a temp table and some sample data:
CREATE TABLE #Deal
(
[Row_ID] INT
, [Deal_ID] BIGINT
, [EffectiveDate] DATE
, [Volume] BIGINT
);
INSERT INTO #Deal (
[Row_ID]
, [Deal_ID]
, [EffectiveDate]
, [Volume]
)
VALUES ( 1, 1479209, '2018-11-01', 5203 )
, ( 2, 1479209, '2019-03-01', 2727 )
, ( 3, 1479209, '2019-04-01', 1615 )
, ( 4, 1479209, '2019-06-01', 1325 )
, ( 5, 1598451, '2018-12-01', 2000 )
, ( 6, 1598451, '2019-04-01', 4000 )
, ( 7, 1598451, '2019-08-01', 4000 );
Here's an example query using LAG() and sub-query:
SELECT *
-- LAG here partitioned by the Deal_ID, will return NULL if first record.
, LAG([dl].[EffectiveDate], 1, NULL) OVER ( PARTITION BY [dl].[Deal_ID]
ORDER BY [dl].[EffectiveDate]
) AS [PreviousRowEffectiveData]
--Sub query to get min EffectiveDate that is greater than today
, (
SELECT MIN([dl1].[EffectiveDate])
FROM #Deal [dl1]
WHERE [dl1].[Deal_ID] = [dl].[Deal_ID]
AND [dl1].[EffectiveDate] > GETDATE()
) AS [NextEffectiveDateAfterToday]
FROM #Deal [dl]
Giving you these results:
Row_ID Deal_ID EffectiveDate Volume PreviousRowEffectiveData NextEffectiveDateAfterToday
----------- -------------------- ------------- -------------------- ------------------------ ---------------------------
1 1479209 2018-11-01 5203 NULL 2019-03-01
2 1479209 2019-03-01 2727 2018-11-01 2019-03-01
3 1479209 2019-04-01 1615 2019-03-01 2019-03-01
4 1479209 2019-06-01 1325 2019-04-01 2019-03-01
5 1598451 2018-12-01 2000 NULL 2019-04-01
6 1598451 2019-04-01 4000 2018-12-01 2019-04-01
7 1598451 2019-08-01 4000 2019-04-01 2019-04-01
Now that we have that we can use that in a sub query and then implement the business rules for DAYS if I understood correctly:
First row for a Deal_ID, number of days from next effective date after today to today.
If not first, number of days from previous effective date to row effective date.
Example query:
SELECT *
--Case statement, if previousrow null(first record) difference in days of datday and NextEfectiveDateAfterToday
--Else we will do the difference in days of previousrow and this rows effective date.
, CASE WHEN [Deal].[PreviousRowEffectiveData] IS NULL THEN DATEDIFF(DAY, GETDATE(), [Deal].[NextEffectiveDateAfterToday])
ELSE DATEDIFF(DAY, [Deal].[PreviousRowEffectiveData], [Deal].[EffectiveDate])
END AS [DAYS]
FROM (
SELECT *
-- LAG here partioned by the Deal_ID, we'l return NULL if first record.
, LAG([dl].[EffectiveDate], 1, NULL) OVER ( PARTITION BY [dl].[Deal_ID]
ORDER BY [dl].[EffectiveDate]
) AS [PreviousRowEffectiveData]
--Sub query to get min EffectiveDate that is greater than today
, (
SELECT MIN([dl1].[EffectiveDate])
FROM #Deal [dl1]
WHERE [dl1].[Deal_ID] = [dl].[Deal_ID]
AND [dl1].[EffectiveDate] > GETDATE()
) AS [NextEffectiveDateAfterToday]
FROM #Deal [dl]
) AS [Deal];
Giving us the final results of:
Row_ID Deal_ID EffectiveDate Volume PreviousRowEffectiveData NextEffectiveDateAfterToday DAYS
----------- -------------------- ------------- -------------------- ------------------------ --------------------------- -----------
1 1479209 2018-11-01 5203 NULL 2019-03-01 14
2 1479209 2019-03-01 2727 2018-11-01 2019-03-01 120
3 1479209 2019-04-01 1615 2019-03-01 2019-03-01 31
4 1479209 2019-06-01 1325 2019-04-01 2019-03-01 61
5 1598451 2018-12-01 2000 NULL 2019-04-01 45
6 1598451 2019-04-01 4000 2018-12-01 2019-04-01 121
7 1598451 2019-08-01 4000 2019-04-01 2019-04-01 122

Find customer lapse across variable subscription periods

Hoping someone has run across this issue previously and has a solution.
I am trying to find customers who lapse based off subscription periods rather than a single order date.
Lapse is defined by us as not making a purchase/renewal within 30 days of the end of their subscription. A customer can have multiple subscriptions simultaneously and subscriptions can vary in length.
I have a data set that includes customerIDs, Orders, the subscription start date, the subscription expire date, and that order’s rank in the customer’s order history, something like this:
CREATE TABLE #Subscriptions
(CustomerID INT,
Orderid INT,
SubscriptionStart DATE,
SubscriptionEnd DATE,
OrderNumber INT);
INSERT INTO #Subscriptions
VALUES(1, 111111, '2017-01-01', '2017-12-31', 1),
(1, 211111, '2018-01-01', '2019-12-31' ,2),
(1, 311121, '2018-10-01', '2018-10-02', 3),
(1, 451515, '2019-02-01', '2019-02-28', 4),
(2, 158797, '2018-07-01', '2018-07-31', 1),
(2, 287584, '2018-09-01', '2018-12-31', 2),
(2, 387452, '2019-01-01', '2019-01-31', 3),
(3, 187498, '2019-01-01', '2019-02-28', 1),
(3, 284990, '2019-02-01', '2019-02-28', 2),
(4, 184849, '2019-02-01', '2019-02-28', 1)
Within this data set, customer 2 would have lapsed on 2018-07-31. Since Customer 1 has a subscription of 2017-01-01 - 2017-12-31 and then one that starts 2018-01-01 and ends 2019-12-31 they cannot lapse within that time period even if other orders made by the customer would qualify.
I have attempt some of simple gap calculations using LEAD() and LAG(), however, I have had no success due to the variable lengths of the subscription period where a single subscription can span across multiple other orders. Eventually, we will use this to calculate monthly churn rate across approximately 5 million records.
You're overthinking this trying to use LEAD() and LAG(). All you need is a NOT EXISTS() function in the WHERE clause
In psuedocode:
SELECT...FROM...
WHERE {SubscriptionEnd is at least 30 days in the past}
AND NOT EXISTS(
{A row for the same Customer where the StartDate is 30 days or less after this EndDate}
)
This one looks to be a tricky one. You are correct about the problem with using the LEAD() and LAG() functions. It stems from customers being able to have multiple subscriptions of variable length. So we need to deal with that issue first. Let's begin with creating a single list of dates instead of having a list of SubscriptionStart and SubscriptionEnd.
SELECT
CustomerId,
OrderId,
1 AS Activity,
SubscriptionStart AS ActivityDate
FROM
#Subscriptions
UNION ALL
SELECT
CustomerId,
OrderId,
-1 AS Activity,
SubscriptionEnd AS ActivityDate
FROM
#Subscriptions
ORDER BY
CustomerId,
ActivityDate
CustomerId OrderId Activity ActivityDate
----------- ----------- ----------- ------------
1 111111 1 2017-01-01
1 111111 -1 2017-12-31
1 211111 1 2018-01-01
1 311121 1 2018-10-01
1 311121 -1 2018-10-02
1 451515 1 2019-02-01
1 451515 -1 2019-02-28
1 211111 -1 2019-12-31
2 158797 1 2018-07-01
2 158797 -1 2018-07-31
2 287584 1 2018-09-01
2 287584 -1 2018-12-31
2 387452 1 2019-01-01
2 387452 -1 2019-01-31
3 187498 1 2019-01-01
3 284990 1 2019-02-01
3 187498 -1 2019-02-28
3 284990 -1 2019-02-28
4 184849 1 2019-02-01
4 184849 -1 2019-02-28
Notice the additional Activity field. It is 1 for the SubscriptionStart and -1 for the SubscriptionEnd.
Using this new Activity field it is possible to find places where there might be a lapse in the customer's subscriptions. At the same time use LEAD() to find the NextDate.
;WITH SubscriptionList AS (
SELECT
CustomerId,
OrderId,
1 AS Activity,
SubscriptionStart AS ActivityDate
FROM
#Subscriptions
UNION ALL
SELECT
CustomerId,
OrderId,
-1 AS Activity,
SubscriptionEnd AS ActivityDate
FROM
#Subscriptions
)
SELECT
CustomerId,
OrderId,
Activity,
SUM(Activity) OVER(PARTITION BY CustomerId ORDER BY ActivityDate ROWS UNBOUNDED PRECEDING) as SubscriptionCount,
ActivityDate,
LEAD(ActivityDate, 1, GETDATE()) OVER(PARTITION BY CustomerId ORDER BY ActivityDate) AS NextDate,
DATEDIFF(d, ActivityDate, LEAD(ActivityDate, 1, GETDATE()) OVER(PARTITION BY CustomerId ORDER BY ActivityDate)) AS LapsedDays
FROM
SubscriptionList
ORDER BY
CustomerId,
ActivityDate
CustomerId OrderId Activity SubscriptionCount ActivityDate NextDate LapsedDays
----------- ----------- ----------- ----------------- ------------ ---------- -----------
1 111111 1 1 2017-01-01 2017-12-31 364
1 111111 -1 0 2017-12-31 2018-01-01 1
1 211111 1 1 2018-01-01 2018-10-01 273
1 311121 1 2 2018-10-01 2018-10-02 1
1 311121 -1 1 2018-10-02 2019-02-01 122
1 451515 1 2 2019-02-01 2019-02-28 27
1 451515 -1 1 2019-02-28 2019-12-31 306
1 211111 -1 0 2019-12-31 2019-02-28 -306
2 158797 1 1 2018-07-01 2018-07-31 30
2 158797 -1 0 2018-07-31 2018-09-01 32
2 287584 1 1 2018-09-01 2018-12-31 121
2 287584 -1 0 2018-12-31 2019-01-01 1
2 387452 1 1 2019-01-01 2019-01-31 30
2 387452 -1 0 2019-01-31 2019-02-28 28
3 187498 1 1 2019-01-01 2019-02-01 31
3 284990 1 2 2019-02-01 2019-02-28 27
3 187498 -1 1 2019-02-28 2019-02-28 0
3 284990 -1 0 2019-02-28 2019-02-28 0
4 184849 1 1 2019-02-01 2019-02-28 27
4 184849 -1 0 2019-02-28 2019-02-28 0
Adding running total on the Activity field will effectively give the number of active subscriptions. While it is greater than 0 a lapse is not possible. So focus in on the rows WHERE the SubscriptionCount is zero.
Using LEAD() get the NextDate. If there isn't a next date then default to today. If the SubscriptionCount is 0 then the NextDate has to be from a new subscription and the NextDate will be the date that the new subscription starts. Using DATEDIFF count the number of days between the SubscriptionEnd and the SubscriptionBegin if it is > 30 days then there was a lapse. Sounds like a good WHERE statement.
;WITH SubscriptionList AS (
SELECT
CustomerId,
OrderId,
1 AS Activity,
SubscriptionStart AS ActivityDate
FROM
#Subscriptions
UNION ALL
SELECT
CustomerId,
OrderId,
-1 AS Activity,
SubscriptionEnd AS ActivityDate
FROM
#Subscriptions
)
, FindLapse AS (
SELECT
CustomerId,
OrderId,
Activity,
SUM(Activity) OVER(PARTITION BY CustomerId ORDER BY ActivityDate ROWS UNBOUNDED PRECEDING) as SubscriptionCount,
ActivityDate,
LEAD(ActivityDate, 1, GETDATE()) OVER(PARTITION BY CustomerId ORDER BY ActivityDate) AS NextDate
FROM
SubscriptionList
)
SELECT
CustomerId,
OrderId,
Activity,
SubscriptionCount,
ActivityDate,
NextDate,
DATEDIFF(d, ActivityDate, NextDate) AS LapsedDays
FROM
FindLapse
WHERE
SubscriptionCount = 0
AND DATEDIFF(d, ActivityDate, NextDate) >= 30
CustomerId OrderId Activity SubscriptionCount ActivityDate NextDate LapsedDays
----------- ----------- ----------- ----------------- ------------ ---------- -----------
2 158797 -1 0 2018-07-31 2018-09-01 32
Looks like we have a winner!

Get the info of the given date / closest date to the given date

I am trying to find whether the status of an ID is Active/Backup on a given date/the closest date to the given date.
Data in my CTE :
ID StatusDate Status Order
2145 2012-04-29 n/a 1
2145 2012-08-02 Backup 2
2145 2012-09-27 Backup 3
2145 2012-11-07 Backup 4
2145 2012-11-09 Active 5
2145 2012-11-12 Backup 6
2145 2012-12-13 Pending 7
2145 2012-12-18 Sold 8
2146 2012-10-15 Pending 1
2146 2012-10-15 n/a 2
2146 2012-12-19 Sold 3
4145 2012-04-24 Active 1
4145 2012-04-24 Active 2
4145 2012-05-22 Pending 3
4145 2012-09-13 Active 4
4145 2012-09-13 Active 5
4145 2012-12-05 Pending 6
4145 2012-12-19 Sold 7
7175 2012-11-08 n/a 1
7175 2012-12-01 Backup 2
7175 2012-12-05 Active 3
7175 2012-12-06 Pending 4
7175 2012-12-19 Sold 5
Result :
Analysis 09/20/2012 12/19/2012 3/20/2013
Total Active 2 0 0
The headers for the result should be : the date 6 months ago from current date, 3 months ago from current date and current date.**
Here is the query I am struggling with :
;WITH x AS
(
SELECT ID,statusdate,status , row_number() over (partition by ID order by statusdate DESC ) as RN1
FROM
(SELECT ID,statusdate,status,
rn = row_number() over (partition by ID order by statusdate )
FROM tblHistory (nolock)
WHERE [statusdate] <= '20120920' AND
ID in ('2145','2146','4145''7175')
) AS A
)
SELECT ID,statusdate,status
FROM x
WHERE rn1 = 1 AND status IN ('Backup','Active')

tsql grouping consecutive numbers in range

Is there any way to group these temperature measurement in a range with consecutive group?
I want to get group, time difference and count in between 0-7 and 8-12 and more than 12
Date Heat
01/01/2012 12:00 8
01/01/2012 12:03 9
01/01/2012 12:06 5
01/01/2012 12:09 3
01/01/2012 12:12 6
01/01/2012 12:15 7
01/01/2012 12:18 1
01/01/2012 12:21 12
01/01/2012 12:24 28
01/01/2012 12:27 25
01/01/2012 12:30 20
01/01/2012 12:33 20
01/01/2012 12:36 20
01/01/2012 12:39 12
01/01/2012 12:42 6
01/01/2012 12:45 3
01/01/2012 12:48 5
01/01/2012 12:51 7
01/01/2012 12:54 11
01/01/2012 12:57 12
01/01/2012 13:00 6
The result should be:
0-7 (01/01/2012 12:06-01/01/2012 12:18) 5
/* Rows of dataset:
01/01/2012 12:06 5
01/01/2012 12:09 3
01/01/2012 12:12 6
01/01/2012 12:15 7
01/01/2012 12:18 1
*/
0-7 (01/01/2012 12:42-01/01/2012 12:51) 5
/* Rows of dataset:
01/01/2012 12:42 6
01/01/2012 12:45 3
01/01/2012 12:48 5
01/01/2012 12:51 7
*/
8-12 (01/01/2012 12:00-01/01/2012 12:03) 2
/* Rows of dataset:
01/01/2012 12:00 8
01/01/2012 12:03 9
*/
more then 12 (01/01/2012 12:24-01/01/2012 12:36) 5
/* Rows of dataset:
01/01/2012 12:24 28
01/01/2012 12:27 25
01/01/2012 12:30 20
01/01/2012 12:33 20
01/01/2012 12:36 20
*/
8-12 (01/01/2012 12:21) 1
/* Rows of dataset:
01/01/2012 12:21 12 */
Note: because the processing order for RANK/DENSE_RANK is PARTITION BY and then ORDER BY, these functions are not useful in this case. Maybe, at some point in time, MS will introduce a supplementary syntax thus:
[DENSE_]RANK() OVER(ORDER BY fields PARTITION BY fields) so ORDER BY will be processed first and then PARTITION BY.
1) First solution (SQL2005+)
DECLARE #TestData TABLE
(
Dt SMALLDATETIME PRIMARY KEY,
Heat TINYINT NOT NULL
);
INSERT #TestData(Dt, Heat)
VALUES
SELECT '2012-01-01T12:00:00', 8 UNION ALL SELECT '2012-01-01T12:03:00', 9 UNION ALL SELECT '2012-01-01T12:06:00', 5
UNION ALL SELECT '2012-01-01T12:09:00', 3 UNION ALL SELECT '2012-01-01T12:12:00', 6 UNION ALL SELECT '2012-01-01T12:15:00', 7
UNION ALL SELECT '2012-01-01T12:18:00', 1 UNION ALL SELECT '2012-01-01T12:21:00', 12 UNION ALL SELECT '2012-01-01T12:24:00', 28
UNION ALL SELECT '2012-01-01T12:27:00', 25 UNION ALL SELECT '2012-01-01T12:30:00', 20 UNION ALL SELECT '2012-01-01T12:33:00', 20
UNION ALL SELECT '2012-01-01T12:36:00', 20 UNION ALL SELECT '2012-01-01T12:39:00', 12 UNION ALL SELECT '2012-01-01T12:42:00', 6
UNION ALL SELECT '2012-01-01T12:45:00', 3 UNION ALL SELECT '2012-01-01T12:48:00', 5 UNION ALL SELECT '2012-01-01T12:51:00', 7
UNION ALL SELECT '2012-01-01T12:54:00', 11 UNION ALL SELECT '2012-01-01T12:57:00', 12 UNION ALL SELECT '2012-01-01 13:00:00', 6;
SET STATISTICS IO ON;
WITH CteSource
AS
(
SELECT a.*,
CASE
WHEN a.Heat >= 0 AND a.Heat <= 7 THEN 1
WHEN a.Heat >= 8 AND a.Heat <= 12 THEN 2
WHEN a.Heat > 12 THEN 3
END AS Grp,
ROW_NUMBER() OVER(ORDER BY a.Dt) AS RowNum
FROM #TestData a
), CteRecursive
AS
(
SELECT s.RowNum,
s.Dt,
s.Heat,
s.Grp,
1 AS DENSE_RANK_OVER_ORDERBY_PARTITIONBY
FROM CteSource s
WHERE s.RowNum = 1
UNION ALL
SELECT crt.RowNum,
crt.Dt,
crt.Heat,
crt.Grp,
CASE
WHEN crt.Grp = prev.Grp THEN prev.DENSE_RANK_OVER_ORDERBY_PARTITIONBY
ELSE prev.DENSE_RANK_OVER_ORDERBY_PARTITIONBY + 1
END
FROM CteSource crt
INNER JOIN CteRecursive prev ON crt.RowNum = prev.RowNum + 1
)
SELECT r.DENSE_RANK_OVER_ORDERBY_PARTITIONBY,
MAX(r.Grp) AS Grp,
COUNT(*) AS Cnt,
MIN(r.Dt) AS MinDt,
MAX(r.Dt) AS MaxDt
FROM CteRecursive r
GROUP BY r.DENSE_RANK_OVER_ORDERBY_PARTITIONBY;
Results:
DENSE_RANK_OVER_ORDERBY_PARTITIONBY Grp Cnt MinDt MaxDt
----------------------------------- ----------- ----------- ----------------------- -----------------------
1 2 2 2012-01-01 12:00:00 2012-01-01 12:03:00
2 1 5 2012-01-01 12:06:00 2012-01-01 12:18:00
3 2 1 2012-01-01 12:21:00 2012-01-01 12:21:00
4 3 5 2012-01-01 12:24:00 2012-01-01 12:36:00
5 2 1 2012-01-01 12:39:00 2012-01-01 12:39:00
6 1 4 2012-01-01 12:42:00 2012-01-01 12:51:00
7 2 2 2012-01-01 12:54:00 2012-01-01 12:57:00
8 1 1 2012-01-01 13:00:00 2012-01-01 13:00:00
2) Second solution (SQL2012; better performance)
SELECT d.DENSE_RANK_OVER_ORDERBY_PARTITIONBY,
MAX(d.Grp) AS Grp,
MIN(d.Dt) AS MinDt,
MAX(d.Dt) AS MaxDt
FROM
(
SELECT c.*,
1+SUM(c.IsNewGroup) OVER(ORDER BY c.Dt) AS DENSE_RANK_OVER_ORDERBY_PARTITIONBY
FROM
(
SELECT b.*,
CASE
WHEN LAG(b.Grp) OVER(ORDER BY b.Dt) <> b.Grp THEN 1
ELSE 0
END
AS IsNewGroup
FROM
(
SELECT a.*,
CASE
WHEN a.Heat >= 0 AND a.Heat <= 7 THEN 1
WHEN a.Heat >= 8 AND a.Heat <= 12 THEN 2
WHEN a.Heat > 12 THEN 3
END AS Grp
FROM #TestData a
) b
) c
) d
GROUP BY d.DENSE_RANK_OVER_ORDERBY_PARTITIONBY;
Here's an alternative solution for SQL Server 2005 or newer version:
WITH auxiliary (HeatID, MinHeat, MaxHeat, HeatDescr) AS (
SELECT 1, 0 , 7 , '0-7' UNION ALL
SELECT 2, 8 , 12 , '8-12' UNION ALL
SELECT 3, 13, NULL, 'more than 12'
),
datagrouped AS (
SELECT
d.*,
a.HeatDescr,
grp = ROW_NUMBER() OVER ( ORDER BY d.Date)
- ROW_NUMBER() OVER (PARTITION BY a.HeatID ORDER BY d.Date)
FROM data d
INNER JOIN auxiliary a
ON d.Heat BETWEEN a.MinHeat AND ISNULL(a.MaxHeat, 0x7fffffff)
)
SELECT
HeatDescr,
DateFrom = MIN(Date),
DateTo = MAX(Date),
ItemCount = COUNT(*)
FROM datagrouped
GROUP BY
HeatDescr, grp
ORDER BY
MIN(Date)
Where data is defined as follows:
CREATE TABLE data (Date datetime, Heat int);
INSERT INTO data (Date, Heat)
SELECT '01/01/2012 12:00', 8 UNION ALL
SELECT '01/01/2012 12:03', 9 UNION ALL
SELECT '01/01/2012 12:06', 5 UNION ALL
SELECT '01/01/2012 12:09', 3 UNION ALL
SELECT '01/01/2012 12:12', 6 UNION ALL
SELECT '01/01/2012 12:15', 7 UNION ALL
SELECT '01/01/2012 12:18', 1 UNION ALL
SELECT '01/01/2012 12:21', 12 UNION ALL
SELECT '01/01/2012 12:24', 28 UNION ALL
SELECT '01/01/2012 12:27', 25 UNION ALL
SELECT '01/01/2012 12:30', 20 UNION ALL
SELECT '01/01/2012 12:33', 20 UNION ALL
SELECT '01/01/2012 12:36', 20 UNION ALL
SELECT '01/01/2012 12:39', 12 UNION ALL
SELECT '01/01/2012 12:42', 6 UNION ALL
SELECT '01/01/2012 12:45', 3 UNION ALL
SELECT '01/01/2012 12:48', 5 UNION ALL
SELECT '01/01/2012 12:51', 7 UNION ALL
SELECT '01/01/2012 12:54', 11 UNION ALL
SELECT '01/01/2012 12:57', 12 UNION ALL
SELECT '01/01/2012 13:00', 6;
For the above sample, the query gives the following output:
HeatDescr DateFrom DateTo ItemCount
------------ ------------------- ------------------- ---------
8-12 2012-01-01 12:00:00 2012-01-01 12:03:00 2
0-7 2012-01-01 12:06:00 2012-01-01 12:18:00 5
8-12 2012-01-01 12:21:00 2012-01-01 12:21:00 1
more than 12 2012-01-01 12:24:00 2012-01-01 12:36:00 5
8-12 2012-01-01 12:39:00 2012-01-01 12:39:00 1
0-7 2012-01-01 12:42:00 2012-01-01 12:51:00 4
8-12 2012-01-01 12:54:00 2012-01-01 12:57:00 2
0-7 2012-01-01 13:00:00 2012-01-01 13:00:00 1
You should reach your goal using RANK()
http://msdn.microsoft.com/en-us/library/ms176102.aspx
Something like
SELECT date, heat, RANK() OVER (PARTITION BY heat ORDER BY date DESC) AS Rank
FROM tbl
Then you can GROUP it after, or make more sub selects and unions them, depending what you have as result.

Resources