Window function to count occurrences in last 10 minutes - sql-server

I can use a traditional subquery approach to count the occurrences in the last ten minutes. For example, this:
drop table if exists [dbo].[readings]
go
create table [dbo].[readings](
[server] [int] NOT NULL,
[sampled] [datetime] NOT NULL
)
go
insert into readings
values
(1,'20170101 08:00'),
(1,'20170101 08:02'),
(1,'20170101 08:05'),
(1,'20170101 08:30'),
(1,'20170101 08:31'),
(1,'20170101 08:37'),
(1,'20170101 08:40'),
(1,'20170101 08:41'),
(1,'20170101 09:07'),
(1,'20170101 09:08'),
(1,'20170101 09:09'),
(1,'20170101 09:11')
go
-- Count in the last 10 minutes - example periods 08:31 to 08:40, 09:12 to 09:21
select server,sampled,(select count(*) from readings r2 where r2.server=r1.server and r2.sampled <= r1.sampled and r2.sampled > dateadd(minute,-10,r1.sampled)) as countinlast10minutes
from readings r1
order by server,sampled
go
How can I use a window function to obtain the same result ? I've tried this:
select server,sampled,
count(case when sampled <= r1.sampled and sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
-- count(case when currentrow.sampled <= r1.sampled and currentrow.sampled > dateadd(minute,-10,r1.sampled) then 1 else null end) over (partition by server order by sampled rows between unbounded preceding and current row) as countinlast10minutes
from readings r1
order by server,sampled
But the result is just the running count. Any system variable that refers to the current row pointer ? currentrow.sampled ?

This isn't a very pleasing answer but one possibility is to first create a helper table with all the minutes
CREATE TABLE #DateTimes(datetime datetime primary key);
WITH E1(N) AS
(
SELECT 1 FROM (VALUES(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1)) V(N)
) -- 1*10^1 or 10 rows
, E2(N) AS (SELECT 1 FROM E1 a, E1 b) -- 1*10^2 or 100 rows
, E4(N) AS (SELECT 1 FROM E2 a, E2 b) -- 1*10^4 or 10,000 rows
, E8(N) AS (SELECT 1 FROM E4 a, E4 b) -- 1*10^8 or 100,000,000 rows
,R(StartRange, EndRange)
AS (SELECT MIN(sampled),
MAX(sampled)
FROM readings)
,N(N)
AS (SELECT ROW_NUMBER()
OVER (
ORDER BY (SELECT NULL)) AS N
FROM E8)
INSERT INTO #DateTimes
SELECT TOP (SELECT 1 + DATEDIFF(MINUTE, StartRange, EndRange) FROM R) DATEADD(MINUTE, N.N - 1, StartRange)
FROM N,
R;
And then with that in place you could use ROWS BETWEEN 9 PRECEDING AND CURRENT ROW
WITH T1 AS
( SELECT Server,
MIN(sampled) AS StartRange,
MAX(sampled) AS EndRange
FROM readings
GROUP BY Server )
SELECT Server,
sampled,
Cnt
FROM T1
CROSS APPLY
( SELECT r.sampled,
COUNT(r.sampled) OVER (ORDER BY N.datetime ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS Cnt
FROM #DateTimes N
LEFT JOIN readings r
ON r.sampled = N.datetime
AND r.server = T1.server
WHERE N.datetime BETWEEN StartRange AND EndRange ) CA
WHERE CA.sampled IS NOT NULL
ORDER BY sampled
The above assumes that there is at most one sample per minute and that all the times are exact minutes. If this isn't true it would need another table expression pre-aggregating by datetimes rounded to the minute.

As far as I know, there is not a simple exact replacement for your subquery using window functions.
Window functions operate on a set of rows and allow you to work with them based on partitions and order.
What you are trying to do isn't the type of partitioning that we can work with in window functions.
To generate the partitions we would need to be able to use window functions in this instance would just result in overly complicated code.
I would suggest cross apply() as an alternative to your subquery.
I am not sure if you meant to restrict your results to within 9 minutes, but with sampled > dateadd(...) that is what is happening in your original subquery.
Here is what a window function could look like based on partitioning your samples into 10 minute windows, along with a cross apply() version.
select
r.server
, r.sampled
, CrossApply = x.CountRecent
, OriginalSubquery = (
select count(*)
from readings s
where s.server=r.server
and s.sampled <= r.sampled
/* doesn't include 10 minutes ago */
and s.sampled > dateadd(minute,-10,r.sampled)
)
, Slices = count(*) over(
/* partition by server, 10 minute slices, not the same thing*/
partition by server, dateadd(minute,datediff(minute,0,sampled)/10*10,0)
order by sampled
)
from readings r
cross apply (
select CountRecent=count(*)
from readings i
where i.server=r.server
/* changed to >= */
and i.sampled >= dateadd(minute,-10,r.sampled)
and i.sampled <= r.sampled
) as x
order by server,sampled
results: http://rextester.com/BMMF46402
+--------+---------------------+------------+------------------+--------+
| server | sampled | CrossApply | OriginalSubquery | Slices |
+--------+---------------------+------------+------------------+--------+
| 1 | 01.01.2017 08:00:00 | 1 | 1 | 1 |
| 1 | 01.01.2017 08:02:00 | 2 | 2 | 2 |
| 1 | 01.01.2017 08:05:00 | 3 | 3 | 3 |
| 1 | 01.01.2017 08:30:00 | 1 | 1 | 1 |
| 1 | 01.01.2017 08:31:00 | 2 | 2 | 2 |
| 1 | 01.01.2017 08:37:00 | 3 | 3 | 3 |
| 1 | 01.01.2017 08:40:00 | 4 | 3 | 1 |
| 1 | 01.01.2017 08:41:00 | 4 | 3 | 2 |
| 1 | 01.01.2017 09:07:00 | 1 | 1 | 1 |
| 1 | 01.01.2017 09:08:00 | 2 | 2 | 2 |
| 1 | 01.01.2017 09:09:00 | 3 | 3 | 3 |
| 1 | 01.01.2017 09:11:00 | 4 | 4 | 1 |
+--------+---------------------+------------+------------------+--------+

Thanks, Martin and SqlZim, for your answers. I'm going to raise a Connect enhancement request for something like %%currentrow that can be used in window aggregates. I'm thinking this would lead to much more simple and natural sql:
select count(case when sampled <= %%currentrow.sampled and sampled > dateadd(minute,-10,%%currentrow.sampled) then 1 else null end) over (...whatever the window is...)
We can already use expressions like this:
select count(case when sampled <= getdate() and sampled > dateadd(minute,-10,getdate()) then 1 else null end) over (...whatever the window is...)
so thinking would be great if we could reference a column that's in the current row.

Related

Partion based on Specified value

Am trying to write q query which Partition based on value 90. Below is My table
create table #temp(StudentID char(2), Status int)
insert #temp values('S1',75 )
insert #temp values('S1',85 )
insert #temp values('S1',90)
insert #temp values('S1',85)
insert #temp values('S1',83)
insert #temp values('S1',90 )
insert #temp values('S1',85)
insert #temp values('S1',90)
insert #temp values('S1',93 )
insert #temp values('S1',93 )
insert #temp values('S1',93 )
Required Out put:
ID Status Result
S1 75 0
S1 85 0
S1 90 0
S1 85 1
S1 83 1
S1 90 1
S1 85 2
S1 90 2
S1 93 3
S1 93 3
S1 93 3
Please any one has the solution to partition based status id 90,Result should be 1,2,3 ..etc incrementing based on number of time value 90
Assuming that the actual question is "How can I find ranges/islands of incrementing values", the answer could use LAG to compare the current Status value with the previous one base on some order. If the previous value is 90, you have a new island :
declare #temp table (ID int identity PRIMARY KEY, StudentID char(2), Status int)
insert into #temp (StudentID,Status)
values
('S1',75),
('S1',85),
('S1',90),
('S1',85),
('S1',83),
('S1',90),
('S1',85),
('S1',90),
('S1',93),
('S1',93),
('S1',93);
select
* ,
case LAG(Status,1,0) OVER (PARTITION BY StudentID ORDER BY ID)
when 90 then 1 else 0 end as NewIsland
from #temp
This returns :
+----+-----------+--------+-----------+
| ID | StudentID | Status | NewIsland |
+----+-----------+--------+-----------+
| 1 | S1 | 75 | 0 |
| 2 | S1 | 85 | 0 |
| 3 | S1 | 90 | 0 |
| 4 | S1 | 85 | 1 |
| 5 | S1 | 83 | 0 |
| 6 | S1 | 90 | 0 |
| 7 | S1 | 85 | 1 |
| 8 | S1 | 90 | 0 |
| 9 | S1 | 93 | 1 |
| 10 | S1 | 93 | 0 |
| 11 | S1 | 93 | 0 |
+----+-----------+--------+-----------+
You can create an Island ID from this by summing all NewIsland values before the current one, using SUM with the ROWS clause of OVER:
with islands as
(
select
* ,
case LAG(Status,1,0) OVER (PARTITION BY StudentID ORDER BY ID)
when 90 then 1 else 0 end as NewIsland
from #temp
)
select * ,
SUM(NewIsland) OVER (PARTITION BY StudentID ORDER BY ID ROWS UNBOUNDED PRECEDING)
from islands
This produces :
+----+-----------+--------+-----------+--------+
| ID | StudentID | Status | NewIsland | Result |
+----+-----------+--------+-----------+--------+
| 1 | S1 | 75 | 0 | 0 |
| 2 | S1 | 85 | 0 | 0 |
| 3 | S1 | 90 | 0 | 0 |
| 4 | S1 | 85 | 1 | 1 |
| 5 | S1 | 83 | 0 | 1 |
| 6 | S1 | 90 | 0 | 1 |
| 7 | S1 | 85 | 1 | 2 |
| 8 | S1 | 90 | 0 | 2 |
| 9 | S1 | 93 | 1 | 3 |
| 10 | S1 | 93 | 0 | 3 |
| 11 | S1 | 93 | 0 | 3 |
+----+-----------+--------+-----------+--------+
BTW this is a case of the wider Gaps & Islands problem in SQL.
UPDATE
LAG and OVER are available in all supported SQL Server versions, ie SQL Server 2012 and later. OVER is also available in SQL Server 2008 but not LAG. In those versions different, slower techniques were used to calculate islands: The SQL of Gaps and Islands in Sequences
In most cases ROW_NUMBER() is used to calculate the row ordering, which results in one extra CTE. This can be avoided if the desired ordering is the same as the ID, or any other unique incrementing column. The following query returns the same results as the query that uses LAG :
select
* ,
case when exists (select ID
from #temp t1
where t1.StudentID=t2.StudentID
and t1.ID=t2.ID-1
and t2.status=90) then 1
else 0 end
as NewIsland
from #temp t2
This query returns 1 if there's any row with the same StudentID, Status 90 and ID or ROW_NUMBER one less, ie the same as LAG(,1).
After that we just need to SUM previous values. While SUM OVER was available in 2008, it only supported PARTITION BY. We need to use another subquery :
;with islands as
(
select
* ,
case when exists (select ID from #temp t1 where t1.StudentID=t2.StudentID and t1.ID=t2.ID-1 and t2.status=90) then 1
else 0 end
as NewIsland
from #temp t2
)
select * ,
(select ISNULL(SUM(NewIsland),0)
from islands i1
where i1.ID<i2.ID) AS Result
from islands i2
This sums all NewIsland values for rows with an ID less than the current one.
Performance
All those subqueries result in a lot of repeated scans. Suprisingly though, the older query is faster than the query with LAG because the first query has to order temporary results multiple times and filter by Status, with a 45% vs 55% execution plan cost.
Things change dramatically when an index is added :
declare #temp table ( ID int identity PRIMARY KEY, StudentID char(2), Status int,
INDEX IX_TMP(StudentID,ID,Status))
The multiple sorts disappear and the costs become 80% vs 20%. The query just scans the index values once without sorting the intermediate results.
The subquery version wasn't able to take advantage of the index
UPDATE 2
uzi suggested that removing LAG and summing only up to the previous row would be better :
select * ,
SUM(case when status =90 then 1 else 0 end)
OVER (PARTITION BY StudentID
ORDER BY ID ROWS
BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
from #temp;
Semantically, this is the same thing - for each row find all previous ones, calculate 1 for the 90s and 0 for the other rows, and sum them.
The server generate similar execution plans in both cases. The LAG version used two streaming aggregate operators while the version without it one. The end result for this limited data set was essentially the same though.
For a larger data set the results may be different, eg if the server has to spool data to tempdb because they didn't fit in memory.
Perhaps this is not a very good solution, but it works.
SELECT StudentID ID
, Marks Status
, CASE
WHEN Marks = 90
THEN SUM(q) OVER(order by row) - 1
ELSE SUM(q) OVER(order by row)
END Result
FROM (
SELECT row_number() OVER(order by StudentID desc) row
, *
, CASE
WHEN Marks = 90
THEN 1
ELSE 0
END q
FROM #temp
) a
You could simply use subquery
select *,
coalesce((select sum(case when Marks = 90 then 1 else 0 end)
from table
where studentid = t.studentid and
? < t.?) , 0) as Result
from table t;
However, ? (i.e. id) specify your actual data ordering columns

Counts for specific properties

I have the following query which I know is incorrect syntax
SELECT Vender as Carrier,
count(IsPup WHERE IsPup = 1) as PU,
count(IsFull WHERE IsFull = 1) as FU,
count(*) as NUM, count(IsPup)/2 + Count(IsFull) as FTE
FROM Trailers WHERE Completed = 0 group by Vender order by NUM;
In particular the count(IsPup WHERE IsPup = 1) is wrong, I've searched various phrases like "How to count multiple properties of rows in SQL" etc.
and tried other manipulations of the same query like count(IsPup) as PU, count(IsFull) as FU
I had the syntactically correct query
SELECT Vender as Carrier,
count(IsPup) as PU,
count(IsFull) as FU,
count(*) as NUM,
count(IsPup)/2 + Count(IsFull) as FTE
FROM Trailers WHERE Completed = 0 group by Vender order by NUM
Which runs but PU, FU, and NUM are always being the same value...
I'm trying to get a table like below
| Carrier | PU | FU | NUM | FTE |
--------------------------------------------
| Vender1 | 2 | 1 | 3 | 2 |
| Vender2 | 0 | 4 | 4 | 4 |
| TOTAL | 2 | 5 | 7 | 6 |
The trailers table has IsPup and IsFull as the BIT type so they are true or false (0 or 1)
I thought this query would be simple and feel like I am missing something obvious
How do I get the counts of each separate property and the total count?
The duplicate question marked doesn't match the format with the total on the bottom.
SELECT
VENDER
, SUM(CAST(IsPUP AS INT)) AS PU
, SUM(CAST(IsFull AS INT)) AS FU
, COUNT(*) AS NUM
, SUM(CAST(IsPUP AS INT)) * .5 + SUM(CAST(IsFull AS INT))
FROM Trailers
WHERE COMPLETED = 0
GROUP BY VENDER
WITH ROLLUP

Calculating sum of differences from each group

I have the following table:
Sensor | building | Date_time | Current_value
1 | 1 | 20.08.2017 | 20
1 | 1 | 21.08.2017 | 25
1 | 1 | 22.08.2017 | 35
2 | 1 | 20.08.2017 | 120
2 | 1 | 21.08.2017 | 200
2 | 1 | 22.08.2017 | 210
3 | 2 | 20.08.2017 | 20
3 | 2 | 21.08.2017 | 25
3 | 2 | 22.08.2017 | 85
5 | 2 | 20.08.2017 | 320
5 | 2 | 21.08.2017 | 400
5 | 2 | 22.08.2017 | 410
The sensor ID is assumed to be unique, as is the building ID.
I need to calculate the total value for each building for any given timeframe by subtracting the MIN value from the MAX value for each sensor, then group the sum by each building.
In the above sample it would be
Sensor 1: (35 - 20)=15
Sensor 2: (210-120)=90
Building 1 = 15+90 = 105
(...)
Building 2 = 65+90 = 155
Any pointers in the right direction are greatly appreciated!
You are asking how to calculate the difference between min and max values per sensor, then aggregate the differences per building.
with diffs as (
SELECT Building,Sensor, MAX(Current_Value)-MIN(Current_Value) as diff
FROM SomeTable
GROUP BY Building, Sensor
)
SELECT Building,sum(diff)
FROM diffs
GROUP BY Building
If you want to restrict the time period, you'll have to do so inside the CTE :
with diffs as (
SELECT Building,Sensor, MAX(Current_Value)-MIN(Current_Value) as diff
FROM SomeTable
WHERE Date_Time between #start and #end
GROUP BY Building, Sensor
)
SELECT Building,sum(diff)
FROM diffs
GROUP BY Building
You can convert this query into a user defined function that can be used in other queries :
create function fn_TotalDiffs(#start datetime2(0), #end datetime2(0))
returns table
as
Return (
with diffs as (
select Building,Sensor, MAX(Current_Value)-MIN(Current_Value) as diff
from SomeTable
Group by Building, Sensor
)
select Building,sum(diff) as Total
from diffs
Group by Building
)
Another option using window function min/max over()
Example
Select Building
,Total = sum(R1)
From (
Select Distinct
Building
,R1 = max([Current_value]) over (Partition By Building,Sensor)
-min([Current_value]) over (Partition By Building,Sensor)
From YourTable
Where Date_time between #Date1 and #Date2
) A
Group By Building
Returns
Building Total
1 105
2 155

The highest value from list-distinct

Can anyone help me with query, I have table
vendorid, agreementid, sales
12001 1004 700
5291 1004 20576
7596 1004 1908
45 103 345
41 103 9087
what is the goal ?
when agreemtneid >1 then show me data when sales is the highest
vendorid agreementid sales
5291 1004 20576
41 103 9087
Any ideas ?
Thx
Well you could try using a CTE and ROW_NUMBER something like
;WITH Vals AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY AgreementID ORDER BY Sales DESC) RowID
FROM MyTable
WHERE AgreementID > 1
)
SELECT *
FROM Vals
WHERE RowID = 1
This will avoid you returning multiple records with the same sale.
If that was OK you could try something like
SELECT *
FROM MyTable mt INNER JOIN
(
SELECT AgreementID, MAX(Sales) MaxSales
FROM MyTable
WHERE AgreementID > 1
) MaxVals ON mt.AgreementID = MaxVals.AgreementID AND mt.Sales = MaxVals.MaxSales
SELECT TOP 1 WITH TIES *
FROM MyTable
ORDER BY DENSE_RANK() OVER(PARTITION BY agreementid ORDER BY SIGN (SIGN (agreementid - 2) + 1) * sales DESC)
Explanation
We break table MyTable into partitions by agreementid.
For each partition we construct a ranking or its rows.
If agreementid is greater than 1 ranking will be equal to ORDER BY sales DESC.
Otherwise ranking for every single row in partition will be the same: ORDER BY 0 DESC.
See how it looks like:
SELECT *
, SIGN (SIGN (agreementid - 2) + 1) * sales AS x
, DENSE_RANK() OVER(PARTITION BY agreementid ORDER BY SIGN (SIGN (agreementid - 2) + 1) * sales DESC) AS rnk
FROM MyTable
+----------+-------------+-------+-------+-----+
| vendorid | agreementid | sales | x | rnk |
+----------|-------------|-------+-------+-----+
| 0 | 0 | 3 | 0 | 1 |
| -1 | 0 | 7 | 0 | 1 |
| 0 | 1 | 3 | 0 | 1 |
| -1 | 1 | 7 | 0 | 1 |
| 41 | 103 | 9087 | 9087 | 1 |
| 45 | 103 | 345 | 345 | 2 |
| 5291 | 1004 | 20576 | 20576 | 1 |
| 7596 | 1004 | 1908 | 1908 | 2 |
| 12001 | 1004 | 700 | 700 | 3 |
+----------+-------------+-------+-------+-----+
Then using TOP 1 WITH TIES construction we leave only rows where rnk equals 1.
you can try like this.
SELECT TOP 1 sales FROM MyTable WHERE agreemtneid > 1 ORDER BY sales DESC
I really do not know the business logic behind agreement_id > 1. It looks to me you want the max sales (with ties) by agreement id regardless of vendor_id.
First, lets create a simple sample database.
-- Sample table
create table #sales
(
vendor_id int,
agreement_id int,
sales_amt money
);
-- Sample data
insert into #sales values
(12001, 1004, 700),
(5291, 1004, 20576),
(7596, 1004, 1908),
(45, 103, 345),
(41, 103, 9087);
Second, let's solve this problem using a common table expression to get a result set that has each row paired with the max sales by agreement id.
The select statement just applies the business logic to filter the data to get your answer.
-- CTE = max sales for each agreement id
;
with cte_sales as
(
select
vendor_id,
agreement_id,
sales_amt,
max(sales_amt) OVER(PARTITION BY agreement_id) AS max_sales
from
#sales
)
-- Filter by your business logic
select * from cte_sales where sales_amt = max_sales and agreement_id > 1;
The screen shot below shows the exact result you wanted.

How do I utilize Row_Number() (partitioning) for my datapool correctly

we have following table (output is already ordered and separated for understanding):
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
----------------------------------------------------------------------------
| 3 | 100 | 500 | Change | 2011-01-01 02:00:00 | Z |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
| 4 | 100 | 510 | Create | 2011-01-01 00:30:00 | T |
----------------------------------------------------------------------------
| 5 | 100 | 520 | CreateSystem | 2011-01-01 00:30:00 | A |
----------------------------------------------------------------------------
what is ActionCode? we use this in c# and there it represents an enum-value
what do i want to achieve?
well, i need the following output:
| FK1 | FK2 | ActionCode | SomeAttributeValue |
+-----+-----+--------------+--------------------+
| 100 | 500 | Create | H |
| 100 | 500 | Create | Z |
| 100 | 510 | Create | T |
| 100 | 520 | CreateSystem | A |
-------------------------------------------------
well, what is the actual logic?
we have some logical groups for composite-key (FK1 + FK2). each of these groups can be broken into partitions, which begin with Create or CreateSystem. each partition ends with Create, CreateSystem or Change. The actual value of SomeAttributeValue for each partition should be the value from the last line of the partition.
it is not possible to have following datapool:
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 7 | 100 | 500 | Change | 2011-01-02 02:00:00 | Z |
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
and then expect PK 7 to affect PK 2 or PK 6 to affect PK 1.
i don't even know how/where to start ... how can i achieve this?
we are running on mssql 2005+
EDIT:
there's a dump available:
instanceId: my PK
tenantId: FK 1
campaignId: FK 2
callId: FK 3
refillCounter: FK 4
ticketType: ActionCode (1 & 4 & 6 are Create, 5 is Change, 3 must be ignored)
ticketType, profileId, contactPersonId, ownerId, handlingStartTime, handlingEndTime, memo, callWasPreselected, creatorId, creationTS, changerId, changeTS should be taken from the Create (first line in partition in groups)
callingState, reasonId, followUpDate, callingAttempts and callingAttemptsConsecutivelyNotReached should be taken from the last Create (which then would be a "one-line-partition-in-group" / the same as the upper one) or Change (last line in partition in groups)
I'm assuming that each partition can only contain a single Create or CreateSystem, otherwise your requirements are ill-defined. The following is untested, since I don't have a sample table, nor sample data in an easily consumed format:
;With Partitions as (
Select
t1.FK1,
t1.FK2,
t1.CreationTS as StartTS,
t2.CreationTS as EndTS
From
Table t1
left join
Table t2
on
t1.FK1 = t2.FK1 and
t1.FK2 = t2.FK2 and
t1.CreationTS < t2.CreationTS and
t2.ActionCode in ('Create','CreateSystem')
left join
Table t3
on
t1.FK1 = t3.FK1 and
t1.FK2 = t3.FK2 and
t1.CreationTS < t3.CreationTS and
t3.CreationTS < t2.CreationTS and
t3.ActionCode in ('Create','CreateSystem')
where
t1.ActionCode in ('Create','CreateSystem') and
t3.FK1 is null
), PartitionRows as (
SELECT
t1.FK1,
t1.FK2,
t1.ActionCode,
t2.SomeAttributeValue,
ROW_NUMBER() OVER (PARTITION_FRAGMENT_ID BY t1.FK1,T1.FK2,t1.StartTS ORDER BY t2.CreationTS desc) as rn
from
Partitions t1
inner join
Table t2
on
t1.FK1 = t2.FK1 and
t1.FK2 = t2.FK2 and
t1.StartTS <= t2.CreationTS and
(t2.CreationTS < t1.EndTS or t1.EndTS is null)
)
select * from PartitionRows where rn = 1
(Please note than I'm using all kinds of reserved names here)
The basic logic is: The Partitions CTE is used to define each partition in terms of the FK1, FK2, an inclusive start timestamp, and exclusive end timestamp. It does this by a triple join to the base table. the rows from t2 are selected to occur after the rows from t1, then the rows from t3 are selected to occur between the matching rows from t1 and t2. Then, in the WHERE clause, we exclude any rows from the result set where a match occurred from t3 - the result being that the row from t1 and the row from t2 represent the start of two adjacent partitions.
The second CTE then retrieves all rows from Table for each partition, but assigning a ROW_NUMBER() score within each partition, based on the CreationTS, sorted descending, with the result that ROW_NUMBER() 1 within each partition is the last row to occur.
Finally, within the select, we choose those rows that occur last within their respective partitions.
This does all assume that CreationTS values are distinct within each partition. I may be able to re-work it using PK also, if that assumption doesn't hold up.
It is solvable with a recursive CTE. Here (assuming rows within partitions are ordered by CreationTS):
WITH partitioned AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY FK1, FK2 ORDER BY CreationTS)
FROM data
),
subgroups AS (
SELECT
PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
Subgroup = 1,
Subrank = 1
FROM partitioned
WHERE rn = 1
UNION ALL
SELECT
p.PK, p.FK1, p.FK2, p.ActionCode, p.CreationTS, p.SomeAttributeValue, p.rn,
Subgroup = s.Subgroup + CASE p.ActionCode WHEN 'Change' THEN 0 ELSE 1 END,
Subrank = CASE p.ActionCode WHEN 'Change' THEN s.Subrank ELSE 0 END + 1
FROM partitioned p
INNER JOIN subgroups s ON p.FK1 = s.FK1 AND p.FK2 = s.FK2
AND p.rn = s.rn + 1
),
finalranks AS (
SELECT
PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
Subgroup, Subrank,
rank = ROW_NUMBER() OVER (PARTITION BY FK1, FK2, Subgroup ORDER BY Subrank DESC)
/* or: rank = MAX(Subrank) OVER (PARTITION BY FK1, FK2, Subgroup) - Subrank + 1 */
FROM subgroups
)
SELECT PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue
FROM finalranks
WHERE rank = 1

Resources