SQL Server - recursively calculated column

SQL Server - recursively calculated column - sql-server

I need to calculate a column based of a seed row where each row's value uses the "previous" row's values. I feel like this should be a recursive query but I can't quite wrap my head around it.
To illustrate:
BOP EOP IN OUT Wk_Num
--------------------------------------
6 4 10 12 1
? ? 2 6 2
? ? 7 5 3
... ... ... ... ...
So the next row's BOP and EOP columns need to be calculated using the seed row. The IN and OUT values are already present in the table.
BOP = (previous row's EOP)
EOP = (Previous row's EOP) + IN - OUT [where IN and OUT are from the current row)
OUTPUT of this example should look like:
BOP EOP IN OUT Wk_num
-------------------------------------
6 4 10 12 1
4 0 2 6 2
0 2 7 5 3
2 6 4 0 4
... ... ... ... ...

You can use a Recursive CTE for this;
WITH RecursiveCTE AS (
-- Base Case
SELECT
BOP,
EOP,
[IN],
[OUT],
[WK_Num]
FROM [someTable]
WHERE BOP IS NOT NULL
UNION ALL
SELECT
r.EOP AS BOP,
r.EOP + r2.[In] - r2.[Out] AS EOP,
r2.[IN],
r2.[OUT],
r2.[WK_Num]
FROM [someTable] r2
INNER JOIN [RecursiveCTE] r
ON r2.[Wk_Num] = r.[Wk_Num] + 1
)
SELECT * FROM RecursiveCTE
Here is a SQL Fiddle: http://sqlfiddle.com/#!18/e041f/1
You basically define the base case as the first row (by saying the row with BOP != null), then join to each following week with the Wk_Num + 1 join, and reference the previous rows values

You can use SUM OVER like this:
DECLARE #TempTable AS TABLE(T_In INT, T_Out INT, T_WeekNum INT)
INSERT INTO #TempTable VALUES (6, 0, 0)
INSERT INTO #TempTable VALUES (10, 12, 1)
INSERT INTO #TempTable VALUES (2, 6, 2)
INSERT INTO #TempTable VALUES (7, 5, 3)
INSERT INTO #TempTable VALUES (4, 0, 4)
SELECT
SUM(T_In - T_Out) OVER(ORDER BY T_WeekNum ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) - T_In + T_Out AS T_Bop,
SUM(T_In - T_Out) OVER(ORDER BY T_WeekNum ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS T_Eop,
T_In,
T_Out,
T_WeekNum
FROM #TempTable
The calculation is the same for BOP and EOP but the values from the current row are subtracted from the BOP column to get the value from the last row.

Related

Calculating the stddev and avg between the most recent number and all the other numbers in a running list snowflake

I have a dataset that looks something like this:
id committed delivered timestamp stddev
1 10 8 01-02-2022 ?
2 20 15 01-14-2022 ?
3 12 12 01-30-2022 ?
4 2 0 02-14-2022 ?
.
.
99 null
I am trying to calculate the standard deviation between sprint x and all the sprints after sprint x; for example, the standard deviation and avg between sprint 1, 2, 3 & 4, 2, 3 & 4, 3 & 4, and so on. If there are no records after 4, that stddev would be null
With the current snowflake functions, I am generally unable to calculate the stddev in general, let alone do something with a lag/lead function.
Does anyone have any advice? Thanks in advance!
Update:
I've figured out how to calculate a moving avg over sprint x and the next sprint, but not for all previous sprints:
(delivered + lead(delivered) over (partition by id order by timestamp asc)) / 2
stddev can also be calculated using abs / sqrt (2)

You're looking for a frame clause -- this is part of the window function that can specify which rows in the current partition to use in the calculation.
select
id,
stddev(delivered) over (
order by id asc
rows between current row and unbounded following
) as stddev,
avg(delivered) over (
order by id asc
rows between current row and unbounded following
) as avg
from my_data

tconbeer is 100% correct, but here is the code and count to "show it working"
and you example data (moshed) into a VALUES section to avoid making a table.
I also stripped out timestamp, as it's not used in this demo, but normally I would order by that, but I could not see the pattern, so just dropped it, as it's non material to the example.
SELECT t.*
,count(delivered) over ( order by id asc rows between current row and unbounded following ) as _count
,stddev(delivered) over ( order by id asc rows between current row and unbounded following ) as stddev
,avg(delivered) over ( order by id asc rows between current row and unbounded following ) as avg
FROM VALUES
(1, 10, 8),
(2, 20, 15),
(3, 12, 12),
(4, 2, 0),
(5, 2, 0),
(6, 0, 0)
t(id, committed, delivered)
ORDER BY 1;
gives:
ID
COMMITTED
DELIVERED
_COUNT
STDDEV
AVG
1
10
8
6
6.765106577
5.833
2
20
15
5
7.469939759
5.4
3
12
12
4
6
3
4
2
0
3
0
0
5
2
0
2
0
0
6
0
0
1
null
0

you can create a dummy table which will have a id generated sequentially using the generator function for a particular range and do a LEFT join with the table. This way you will get rows with NULL values where the id is not present, and then you can use lag / leap to get the average.
--- untested
select seq4() as id1 , TMP.* from table(generator(rowcount => 10)) v
LEFT JOIN (SELECT * FROM (
SELECT 1 as id , 8 As committed , '01-02-2022' as delivered UNION ALL
SELECT 2 as id , 20 As committed , '01-14-2022' as delivered UNION ALL
SELECT 3 as id , 12 As committed , '01-30-2022' as delivered UNION ALL
SELECT 5 as id , 2 As committed , '02-14-2022' as delivered
)) TMP
ON trim(id1) = trim(tmp.id)

Number Range diff

I have a question, I have a sample data shown below: my question is I want to calculate the difference between second row starting number (150) to first row ending number (100).
Can any one please help me? This answer will reduce some manual work what I'm doing daily.
Thanks

Sample data
create table MyTable
(
Model nvarchar(5),
StartingNo int,
EndingNo int
);
insert into MyTable (Model, StartingNo, EndingNo) values
('X', 1, 100),
('X', 150, 200),
('Y', 10, 30),
('Y', 1, 5);
Solution 1
This solution uses an outer apply to fetch the previous row.
select t1.*,
t3.EndingNo as PreviousEndingNo,
t1.StartingNo - t3.EndingNo as Diff
from MyTable t1
outer apply ( select top 1 t2.EndingNo -- first row...
from MyTable t2
where t2.Model = t1.Model -- ... with same model
and t2.StartingNo < t1.StartingNo -- ... that comes before current row
order by t2.StartingNo desc ) t3 -- ... when sorting on start number
order by t1.Model, t1.StartingNo;
Solution 2
This solution uses the lag function to fetch the previous row.
select t1.*,
lag(t1.EndingNo) over(partition by t1.Model order by t1.StartingNo) as PreviousEndingNo,
t1.StartingNo - lag(t1.EndingNo) over(partition by t1.Model order by t1.StartingNo) as Diff
from MyTable t1
order by t1.Model, t1.StartingNo;
Result
Model StartingNo EndingNo PreviousEndingNo Diff
----- ----------- ----------- ---------------- -----------
X 1 100 NULL NULL
X 150 200 100 50
Y 1 5 NULL NULL
Y 10 30 5 5
Fiddle to see everything in action.

Accessing prior rows and divide its value by current row

I have the rows below, and i want to access prior row and divide its value by current row. For every row, i need to calculate the Vi value, this Vi value is equal to Vi-1/Vi which means that:
Given the table
Table T
id value out
1 100
2 200
3 10
4 50
I want to generate these values
V1 = 100
V2= 100/200 = 0.5
V3 = 0.5/10 = 0.05
V4 = 0.05/50 = 0.001
So at the end i want the following output:
id value out
1 100 100
2 200 0.5
3 10 0.05
4 50 0.001
I tried using the aggregate function SUM with OVER(), but i do not know how to solve this problem as i need to divide and not sum the value
SELECT id, value, SUM(value) OVER(ORDER BY id ROWS BETWEEN
1 PRECEDING AND 1 PRECEDING ) / value as out
FROM T
Sample data:
CREATE TABLE t(
id INT,
value INT
);
INSERT INTO t VALUES
(1, 100), (2, 200), (3, 10), (4, 50);

Unfortunately, SQL do not have Product, but it should be simple to use cte. The performance should be not bad if id was indexed
DECLARE #T table (id int identity(1,1) primary key, value int)
INSERT #T VALUES (100), (200), (10), (50)
;WITH cte AS
(
SELECT id, value, CAST(value AS decimal(20,4)) AS out FROM #T WHERE id = 1
UNION ALL SELECT T.id, T.value, CAST(cte.out / T.value AS decimal(20,4)) FROM cte INNER JOIN #T T ON cte.id = T.id - 1
)
SELECT * FROM cte

UPDATE with non-CLR product aggregate results in off-by-one

I was answering another question and ran into a strange outcome - the output of a product aggregate (without CLR) was different when used in a SELECT vs UPDATE.
This is simplified from the original question to minimally reproduce the problem:
GroupKey RowIndex A
----------- ----------- -----------
25 1 5
25 2 6
25 3 NULL
26 1 3
26 2 4
26 3 NULL
The goal is for each group key to update the A column of each row with a RowIndex = 3 to the product of the A columns of each row with RowIndex IN (1, 2), so this would produce the following changes:
GroupKey RowIndex A
----------- ----------- -----------
25 3 30
26 3 12
So this is the code I used:
UPDATE T SET
A = Products.Product
FROM #Table T
INNER JOIN (
SELECT
GroupKey,
EXP(SUM(LOG(A))) AS Product
FROM #Table
WHERE RowIndex IN (1, 2)
GROUP BY
GroupKey
) Products
ON Products.GroupKey = T.GroupKey
WHERE T.RowIndex = 3
SELECT * FROM #Table WHERE RowIndex = 3
Which then produced the off-by-one results:
GroupKey RowIndex A
----------- ----------- -----------
25 3 29
26 3 12
If I just run the sub-query, I see the correct values.
GroupKey Product
----------- ----------------------
25 30
26 12
Here's the full script to make it easy to play with. I can't figure out where the off-by-one is coming from.
DECLARE #Table TABLE (GroupKey INT, RowIndex INT, A INT)
INSERT #Table VALUES (25, 1, 5), (25, 2, 6), (25, 3, NULL), (26, 1, 3), (26, 2, 4), (26, 3, NULL)
SELECT * FROM #Table
SELECT
GroupKey,
EXP(SUM(LOG(A))) AS Product
FROM #Table
WHERE RowIndex IN (1, 2)
GROUP BY
GroupKey
UPDATE T SET
A = Products.Product
FROM #Table T
INNER JOIN (
SELECT
GroupKey,
EXP(SUM(LOG(A))) AS Product
FROM #Table
WHERE RowIndex IN (1, 2)
GROUP BY
GroupKey
) Products
ON Products.GroupKey = T.GroupKey
WHERE T.RowIndex = 3
SELECT * FROM #Table WHERE RowIndex = 3
Here are some references I came across:
Non-CLR Aggregate: http://michaeljswart.com/2011/03/the-aggregate-function-product/
Original question: Set one row fields as a multiplication of 2 others

I'd say that this cute "PRODUCT" aggregate is inherently unreliable if you want to work with ints - EXP and LOG are only defined against the float type and so we get rounding errors creeping in.
Why they're not consistently appearing, I couldn't say, except to suggest that different queries may cause changes in evaluation orders.
As a simpler example of how this can go wrong:
select CAST(EXP(LOG(5)) as int)
Can produce 4. EXP and LOG together will produce a value that is just less than 5, but of course when converting to int, SQL Server always truncates rather than applying any rounding.

Tsql group by clause with exceptions

I have a problem with a query.
This is the data (order by Timestamp):
Data
ID Value Timestamp
1 0 2001-1-1
2 0 2002-1-1
3 1 2003-1-1
4 1 2004-1-1
5 0 2005-1-1
6 2 2006-1-1
7 2 2007-1-1
8 2 2008-1-1
I need to extract distinct values and the first occurance of the date. The exception here is that I need to group them only if not interrupted with a new value in that timeframe.
So the data I need is:
ID Value Timestamp
1 0 2001-1-1
3 1 2003-1-1
5 0 2005-1-1
6 2 2006-1-1
I've made this work by a complicated query, but am sure there is an easier way to do it, just cant think of it. Could anyone help?
This is what I started with - probably could work with that. This is a query that should locate when a value is changed.
> SELECT * FROM Data d1 join Data d2 ON d1.Timestamp < d2.Timestamp and
> d1.Value <> d2.Value
It probably could be done with a good use of row_number clause but cant manage it.

Sample data:
declare #T table (ID int, Value int, Timestamp date)
insert into #T(ID, Value, Timestamp) values
(1, 0, '20010101'),
(2, 0, '20020101'),
(3, 1, '20030101'),
(4, 1, '20040101'),
(5, 0, '20050101'),
(6, 2, '20060101'),
(7, 2, '20070101'),
(8, 2, '20080101')
Query:
;With OrderedValues as (
select *,ROW_NUMBER() OVER (ORDER By TimeStamp) as rn --TODO - specific columns better than *
from #T
), Firsts as (
select
ov1.* --TODO - specific columns better than *
from
OrderedValues ov1
left join
OrderedValues ov2
on
ov1.Value = ov2.Value and
ov1.rn = ov2.rn + 1
where
ov2.ID is null
)
select * --TODO - specific columns better than *
from Firsts
I didn't rely on the ID values being sequential and without gaps. If that's the situation, you can omit OrderedValues (using the table and ID in place of OrderedValues and rn). The second query simply finds rows where there isn't an immediate preceding row with the same Value.
Result:
ID Value Timestamp rn
----------- ----------- ---------- --------------------
1 0 2001-01-01 1
3 1 2003-01-01 3
5 0 2005-01-01 5
6 2 2006-01-01 6
You can order by rn if you need the results in this specific order.