SQL Server: add column for rows since value changed - sql-server

I have a table that contains 3 columns: personID, weeknumber, and event. Event is 0 if there was no event for that person in that week and 1 if there was.
I need to create a new column weekssincelastevent which will be 0 for the week where event=1 and then 1,2,3,4 etc for the weeks afterwards. If there is a later event then it starts from 0 again. E.g.
personID
weeknumber
event
weekssincelastevent
1
1
0
NULL
1
2
0
NULL
1
3
1
0
1
4
0
1
1
5
0
2
1
6
0
3
2
1
0
NULL
2
2
1
0
2
3
0
1
2
4
1
0
2
5
0
1
The column should be NULL before the first events and all values NULL where a personID never has event.
I can't think how to write this in SQL.
The table has ~600m rows (60m personIDs with 100 weeknumbers each, although some personIDs don't have all the weeknumbers).
Many thanks for any insight.

This is a bit of a gaps and island problem here. The first part, in the CTE, puts the data into "groups". Each time there is an event that's a new group. it also calculates the number of weeks that past since the prior week (which is set to 0 for rows hosting an event). Then in the outer query we SUM the number of weeks past in each group, giving the number of weeks that have passed:
WITH Groups AS(
SELECT PersonID,
WeekNumber,
Event,
COUNT(CASE Event WHEN 1 THEN 1 END) OVER (PARTITION BY PersonID ORDER BY WeekNumber ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Events,
CASE Event WHEN 0 THEN WeekNumber - LAG(WeekNumber) OVER (PARTITION BY PersonID ORDER BY WeekNumber ASC) ELSE 0 END AS WeeksPassed
FROM dbo.YourTable)
SELECT PersonID,
WeekNumber,
Event,
CASE WHEN Events = 0 THEN NULL
ELSE SUM(WeeksPassed) OVER (PARTITION BY PersonID, Events ORDER BY WeekNumber ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
END AS WeekSinceLastEvent
FROM Groups;
db<>fiddle

You can do this with a conditional aggregate within a windowed function:
SELECT t.PersonID,
t.WeekNumber,
t.Event,
WeeksSinceLastEvent = t.WeekNumber - MAX(CASE WHEN t.Event = 1 THEN t.WeekNumber END)
OVER(PARTITION BY t.PersonID ORDER BY t.WeekNumber)
FROM dbo.T AS t;
The key parts are:
CASE WHEN t.Event = 1 THEN t.WeekNumber END Only consider week number where it is a valid event. Since MAX with ignore nulls this will only consider relevant rows
OVER (PARTITION BY t.PersonID ORDER BY t.WeekNumber) - Only consider rows for the current person, where the weeknumber is lower than the current row.
Example on DB<>Fiddle

Related

Assigning event id with start/stop and nulls in between

I have a large series of event data that I need to categorize upwards in granularity where one or many events occur during a larger event.
The parameters to define those events are done. What I need now is to add an ID to the larger events where the next id starts when the prior event ends (start_end). There may be many NULLS in between start and end, or none.
My requirements are:
Get the result of the right 4 columns from the 3 left columns below.
I don't care what happens before the first start or after the last end
seqnum is partitioned by user_id, restarting with each user. doesn't matter if new_id does.
Each seqnum has a corresponding start/end timestamp, as does each start_end that is not null (excluded for brevity)
single in start_end = just one original event is a larger event
How do I populate new_id?
user_id
seqnum
start_end
-->
user_id
seqnum
new_id
start_end
a
1
NULL
a
1
NULL
NULL
a
2
end
a
2
NULL
end
a
3
start
a
3
1
start
a
4
NULL
a
4
1
NULL
a
5
NULL
a
5
1
NULL
a
6
end
a
6
1
end
a
7
single
a
7
2
single
b
1
start
b
1
3
start
b
2
NULL
b
2
3
NULL
b
3
end
b
3
3
end
b
4
single
b
4
4
single
This is just a conditional cumulative COUNT:
COUNT(CASE WHEN start_end IN ('end','single') THEN 1 END) OVER (ORDER BY seqnum
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS new_id
If you want NULL instead of 0, then wrap the whole expression in a NULLIF or use SUM instead of COUNT.

How to split Row into multiple column using T-SQL

There are three column,wherever D_ID=13,value_amount holds value for mode of payment and wherever D_ID=10,value_amount holds value for amount.
ID D_ID Value_amount
1 13 2
1 13 2
1 10 1500
1 10 1500
2 13 1
2 13 1
2 10 2000
2 10 2000
Now I have to add two more columns amount and mode_of_payment and result should come like below
ID amount mode_of_payment
1 1500 2
1 1500 2
2 2000 1
2 2000 1
This is too long for a comment.
Simply put, your data is severely flawed. For the example data you've given, you're "ok", because the rows have the same values to the same ID, but what about when they don't? Let's assume, for example, we have data that looks like this:
ID D_ID Value_amount
1 13 1 --1
1 13 2 --2
1 10 1500 --3
1 10 1000 --4
2 13 1 --5
2 13 2 --6
2 10 2000 --7
2 10 3000 --8
I've added a "row number" next to data, for demonstration purposes only.
Here, what row is row "1" related to? Row "3" or row "4"? How do you know? There's no always ascending value in your data, so row "3" could just as easily be row "4". In fact, if we were to order the data using ID ASC, D_ID DESC, Value_amount ASC then rows 3 and 4 would "swap" in order. This could mean that when you attempt a solution, the order in wrong.
Tables aren't stored in any particular order, that are unordered. What determines the order the data is presented in is the ORDER BY clause, and if you don't have a value to define that "order", then that "order" is lost as soon as you INSERT it.
If, however, we add a always ascending value into your data, you can achieve this.
CREATE TABLE dbo.YourTable (UID int IDENTITY,
ID int,
DID int,
Value_amount int);
GO
INSERT INTO dbo.YourTable (ID, DID, Value_amount)
VALUES (1,13,1 ),
(1,13,2 ),
(1,10,1500),
(1,10,1000),
(2,13,1 ),
(2,13,2 ),
(2,10,2000),
(2,10,3000);
GO
WITH RNs AS(
SELECT ID,
DID,
Value_amount,
ROW_NUMBER() OVER (PARTITION BY ID, DID ORDER BY UID ASC) AS RN
FROM dbo.YourTable)
SELECT ID,
MAX(CASE DID WHEN 13 THEN Value_Amount END) AS Amount,
MAX(CASE DID WHEN 10 THEN Value_Amount END) AS PaymentMode
FROM RNs
GROUP BY RN,
ID;
GO
DROP TABLE dbo.YourTable;
Of course, you need to fix your design to implement this, but you need to do that anyway.

Grouping ID while counting specific attribute values

I want to count how many occurrences there is of the value 1 in the attribute months for each ID in a table.
Here is what I am working with
ID. Months
1000 1
1000 1
1000 2
1001 2
1002 3
1003 1
This is what I would like to have
ID. Count(Months=1)
1000 2
1003 1
If you want to count row for just one month, you can use WHERE clause for filtering:
select id,
count(*) as cnt
from your_table
where month = 1
group by id;
If you want to get counts for multiple months in one row (it's called pivoting), you can use conditional aggregation in most of the databases:
select id,
count(case when month = 1 then 1 end) as cnt_month_1,
count(case when month = 2 then 1 end) as cnt_month_2,
count(case when month = 3 then 1 end) as cnt_month_3,
. . .
from your_table
group by id;
Some databases offer PIVOT operator for this task. For that, you'll need to specify which database you are using.

How to query records based on row_num and one of the column value?

Rownum Status
1 2
2 1
3 3
4 2
5 3
6 1
The condition is to query records appear before the first record of status=3 which in the above scenario the expected output will be rownum = 1 and 2.
In the case if there is no status=3 then show everything.
I'm not sure from where to start hence currently no findings
If you are using SQL Server 2012+, then you can use window version of SUM with an ORDER BY clause:
SELECT Rownum, Status
FROM (
SELECT Rownum, Status,
SUM(CASE WHEN Status = 3 THEN 1 ELSE 0 END)
OVER
(ORDER BY Rownum) AS s
FROM mytable) t
WHERE t.s = 0
Calculated field s is a running total of Status = 3 occurrences. The query returns all records before the first occurrence of a 3 value.
Demo here

Rolling a number from rows with a flag into the next row without the flag

I'm a bit stumped about how to solve this particular piece of a problem I'm working on. I started with a much bigger problem, but I managed to simplify it into this while keeping good performance intact.
Say I have the following result set. AggregateMe is something I'm deriving from SQL conditionals.
MinutesElapsed AggregateMe ID Type RowNumber
1480 1 1 A 1
1200 0 1 A 2
1300 0 1 B 3
1550 0 1 C 4
725 1 1 A 5
700 0 1 A 6
1900 1 2 A 7
3300 1 2 A 8
4900 0 2 A 9
If AggregateMe is 1 (true) or, if you prefer, if is true, I want the counts to be aggregated into the next row where AggregateMe (or conditions) do not evaluate to true.
Aggregate functions or Subqueries are fair game as is PARTITION BY.
For example, the above result set would become:
MinutesElapsed ID Type
2680 1 A
1300 1 B
1550 1 C
1425 1 A
10100 2 A
Is there a clean way to do this? If you want, I can share more about the original problem, but it is a bit more complicated.
Edited to add: SUM and GROUP BY alone won't work, because some sums would be rolled into the wrong row. My sample data did not reflect this case, so I added rows where this case can occur. In the updated sample data, using an aggregate function in the simplest way would cause the 2680 count and the 1425 count to be rolled together, which I do not want.
EDIT: And if you're wondering how I got here in the first place, here you go. I'm going to aggregate statistics about how long our program left something in a certain ActionType, and my first step was by creating this subquery. Please feel free to criticize:
select
ROW_NUMBER() over(order by claimid, insertdate asc) as RowNbr,
DateDiff(mi, ahCurrent.InsertDate, CASE WHEN ahNext.NextInsertDate is null THEN GetDate() ELSE ahNext.NextInsertDate END) as MinutesInActionType,
ahCurrent.InsertDate, ahNext.NextInsertDate,
ahCurrent.ClaimID, ahCurrent.ActionTypeID,
case when ahCurrent.ActionTypeID = ahNext.NextActionTypeID and ahCurrent.ClaimID = ahNext.NextClaimID then 1 else 0 end as aggregateme
FROM
(
select ROW_NUMBER () over(order by claimid, insertdate asc) as RowNum, ClaimID, InsertDate, ActionTypeID
From autostatushistory
--Where AHCurrent is not AHPast
) ahCurrent
LEFT JOIN
(
select ROW_NUMBER() over(order by claimid, insertdate asc) as RowNum, ClaimID as NextClaimID, InsertDate as NextInsertDate, ActionTypeID as NextActionTypeID
FROM autostatushistory
) ahNext
ON (ahCurrent.ClaimID = ahNext.NextClaimID AND ahCurrent.RowNum = ahNext.RowNum - 1 and ahCurrent.ActionTypeID = ahNext.NextActionTypeID)
here the query the you need to execute,
it's not clean, maybe you'll optimize it:
WITH cte AS( /* Create a table containing row number */
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS ROW,
MinutesElapsed,
AggregateMe,
ID,
TYPE
FROM rolling
)
SELECT MinutesElapsed + (CASE /* adding minutes from next valid records*/
WHEN cte.AggregateMe <> 1 /*if current record is 0 then */
THEN 0 /*skip it*/
ELSE
(SELECT SUM(MinutesElapsed) /* calculating sum of all -> */
FROM cte localTbl
WHERE
cte.ROW < localTbl.ROW /* next records -> */
AND
localTbl.ROW <= ( /* until we find aggregate = 0 */
SELECT MIN(ROW)
FROM cte sTbl
WHERE sTbl.AggregateMe = 0
AND
sTbl.ROW > cte.ROW
)
AND
(localTbl.AggregateMe = 0 OR /* just to be sure :) */
localTbl.AggregateMe = 1))
END) as MinutesElapsed,
AggregateMe,
ID,
TYPE
FROM cte
WHERE cte.ROW = 1 OR NOT( /* not showing records used that are used in sum, skipping 1 record*/
( /* records with agregate 0 after record with aggregate 1 */
cte.AggregateMe = 0
AND
(
SELECT AggregateMe
FROM cte tblLocal
WHERE cte.ROW = (tblLocal.ROW + 1)
)>0
)
OR
( /* record with aggregate 1 after record with aggregate 1 */
cte.AggregateMe = 1
AND
(
SELECT AggregateMe
FROM cte tblLocal
WHERE cte.ROW = (tblLocal.ROW + 1)
)= 1
)
);
test here
hope it helps to your problem.
feel free to ask questions.
By looking at your result set seems like following would work,
SELECT ID,Type,SUM(MinutesElapsed)
FROM mytable
GROUP BY ID,Type
But cannot tell for sure without looking into original dataset.

Resources