I have an issue where I have to calculate a column using a formula that uses the value from the calculation done in the previous row.
I have tried the lag function but cannot get past the 2nd row. After that all my values are null, since that column originally starts as null. I feel like I am missing something.
I need to calculate a new column, using the formula:
MovingRate = MonthlyRate + (0.7 * MovingRatePrevious)
... where the MovingRatePrevious is the MovingRate of the prior row. For month 1, I have the value so I do not need to re-calculate that but I need that value to be able to calculate the subsequent rows. I need to partition by Type.
This is my original dataset:
Month Type MonthyRate MovingRate
--------------------------------------
1 Blue 0.400 0.330
2 Blue 0.300
3 Blue 0.700
4 Blue 0.900
Desired results in MovingRate column:
Month Type MonthyRate MovingRate
---------------------------------------
1 Blue 0.400 0.330
2 Blue 0.300 0.531
3 Blue 0.700 1.072
4 Blue 0.900 1.650
You can calculate it using recursive CTE. Below is a generalized version for your data:
DECLARE #t TABLE (Month INT, Type VARCHAR(100), MonthlyRate DECIMAL(18, 3));
INSERT INTO #t VALUES
(1, 'Blue', 0.400),
(2, 'Blue', 0.300),
(3, 'Blue', 0.700),
(4, 'Blue', 0.900);
WITH cte1 AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Type ORDER BY Month) AS rn
FROM #t
), rcte AS (
SELECT *, CAST(0.330 AS DECIMAL(18, 3)) AS MovingRate
FROM cte1 AS base
WHERE rn = 1
UNION ALL
SELECT curr.*, CAST(curr.MonthlyRate + 0.7 * prev.MovingRate AS DECIMAL(18, 3))
FROM cte1 AS curr
JOIN rcte AS prev ON curr.Type = prev.type AND curr.rn = prev.rn + 1
)
SELECT *
FROM rcte
Related
Let's say I have a temporal table called ProductDetails that using below query return some historical data.
SELECT * FROM ProductDetails
FOR system_time
BETWEEN '1900-01-01 00:00:00' AND '9999-12-31 00:00:00'
WHERE ProductID = 8
ID ProductID(FK) Attribute Value SysStartTime SysEndTime
-- ------------- --------- ----- ------------------- ----------
1 8 Size S 2020-07-06 05:00:00 9999-12-31 23:59:59
2 8 Color Blue 2020-07-06 05:00:01 2020-07-09 11:11:11
2 8 Color Green 2020-07-09 11:11:11 9999-12-31 23:59:59
This means when product with ID = 8 was created at 2020-07-06 05:00:00, 2 attributes were added, and then later one of records was edited to change from "Blue" to "Green". Notice that SysStartTime for second row has 1 second difference when they were saved.
Now I need to write a query to have below results. Basically, it is attribute values in different snapshots of time when changes occurred. Time is down to minute.
Start Time End Time Attributes Values
---------------- ---------------- -----------------
2020-07-06 05:00 2020-07-09 11:11 Size = S, Color = Blue
2020-07-09 11:11 NULL Size = S, Color = Green
How can I achieve that? Each product might have different attributes, but the query is for one product at a time.
Below is a solution that formats your data in one query. Performance is not an issue with a small data set of 4 rows (I added a row to your example), but my guess is that this will not be fast for millions of records.
The solution provided here generates different data sets in the form of common table expressions (CTE) and uses some techniques from other StackOverflow answers to remove the seconds and concatenate the row values. Plus a cross apply at the end.
The approach can be described in steps that correspond with the consecutive CTE's / joins:
Create a set of attributes for each product.
Create a set of period start moments for each product (leaving out the seconds).
Combine the attributes for each product with each period and look for the appropriate value.
Use some XML functions to format the attributes values in a single row.
Use cross apply to fetch the period end.
Full solution:
-- sample data
declare #data table
(
ID int,
ProductId int,
Attribute nvarchar(10),
Value nvarchar(10),
SysStartTime datetime2(0),
SysEndTime datetime2(0)
);
insert into #data (ID, ProductId, Attribute, Value, SysStartTime, SysEndTime) values
(1, 8, 'Size', 'S', '2020-07-06 05:00:00', '9999-12-31 23:59:59'),
(2, 8, 'Color', 'Blue', '2020-07-06 05:00:01', '2020-07-09 11:11:11'),
(2, 8, 'Color', 'Green', '2020-07-09 11:11:11', '9999-12-31 23:59:59'),
(2, 8, 'Weight', 'Light', '2020-07-10 10:11:12', '9999-12-31 23:59:59'); -- additional data to have extra attribute not available from start
-- solution
with prodAttrib as -- attributes per product
(
select d.ProductId, d.Attribute
from #data d
group by d.ProductId, d.Attribute
),
prodPeriod as -- periods per product
(
select d.ProductId,
dateadd(minute, datediff(minute, 0, d.SysStartTime), 0) as 'SysStartTimeNS' -- start time No Seconds
from #data d
group by ProductId, dateadd(minute, datediff(minute, 0, d.SysStartTime), 0)
),
prodResult as -- attribute value per period per product
(
select pp.ProductId,
convert(nvarchar(16), pp.SysStartTimeNS, 120) as 'FromDateTime',
convert(nvarchar(16), coalesce(pe.SysEndTime, '9999-12-31 23:59:59'), 120) as 'ToDateTime',
pa.Attribute,
av.Value
from prodPeriod pp
join prodAttrib pa
on pa.ProductId = pp.ProductId
outer apply ( select top 1 d.Value
from #data d
where d.ProductId = pp.ProductId
and d.Attribute = pa.Attribute
and dateadd(minute, datediff(minute, 0, d.SysStartTime), 0) <= pp.SysStartTimeNS
order by d.SysStartTime desc ) av -- attribute values per product
outer apply ( select top 1 dateadd(second, -1, d.SysStartTime) as 'SysEndTime'
from #data d
where d.ProductId = pp.ProductId
and dateadd(minute, datediff(minute, 0, d.SysStartTime), 0) > pp.SysStartTimeNS
order by d.SysStartTime ) pe -- period end
),
prodResultFormat as -- concatenate attribute values per period
(
select pp.ProductId,
convert(nvarchar(16), pp.SysStartTimeNS, 120) as 'FromDateTime',
(
select pr.Attribute + ' = ' + coalesce(pr.Value,'') + ', ' as [text()]
from prodResult pr
where pr.ProductId = pp.ProductId
and pr.FromDateTime = convert(nvarchar(16), pp.SysStartTimeNS, 120)
order by pr.Attribute
for xml path('')
) as 'Attributes'
from prodPeriod pp
)
select prf.ProductId,
prf.FromDateTime,
x.ToDateTime,
left(prf.Attributes, len(prf.Attributes)-1) as 'Attributes'
from prodResultFormat prf
cross apply ( select top 1 pr.ToDateTime
from prodResult pr
where pr.ProductId = prf.ProductId
and pr.FromDateTime = prf.FromDateTime ) x
order by prf.ProductId, prf.FromDateTime;
Result for extended example data:
ProductId FromDateTime ToDateTime Attributes
----------- ---------------- ---------------- ----------------------------------------
8 2020-07-06 05:00 2020-07-09 11:11 Color = Blue, Size = S, Weight =
8 2020-07-09 11:11 2020-07-10 10:11 Color = Green, Size = S, Weight =
8 2020-07-10 10:11 9999-12-31 23:59 Color = Green, Size = S, Weight = Light
P.S. replace x.EndDateTime with case when x.ToDateTime = '9999-12-31 23:59' then NULL else x.ToDateTime end as 'ToDateTime' if you really need the NULL values.
This is not a homework question.
I'm trying to take the count of t-shirts in an order and see which price range the shirts fall into, depending on how many have been ordered.
My initial thought (I am brand new at this) was to ask another table if count > 1st price range's maximum, and if so, keep looking until it's not.
printing_range_max printing_price_by_range
15 4
24 3
33 2
So for example here, if the order count is 30 shirts they would be $2 each.
When I'm looking into how to do that, it looks like most people are using BETWEEN or IF and hard-coding the ranges instead of looking in another table. I imagine in a business setting it's best to be able to leave the range in its own table so it can be changed more easily. Is there a good/built-in way to do this or should I just write it in with a BETWEEN command or IF statements?
EDIT:
SQL Server 2014
Let's say we have this table:
DECLARE #priceRanges TABLE(printing_range_max tinyint, printing_price_by_range tinyint);
INSERT #priceRanges VALUES (15, 4), (24, 3), (33, 2);
You can create a table with ranges that represent the correct price. Below is how you would do this in pre-2012 and post-2012 systems:
DECLARE #priceRanges TABLE(printing_range_max tinyint, printing_price_by_range tinyint);
INSERT #priceRanges VALUES (15, 4), (24, 3), (33, 2);
-- post-2012 using LAG
WITH pricerange AS
(
SELECT
printing_range_min = LAG(printing_range_max, 1, 0) OVER (ORDER BY printing_range_max),
printing_range_max,
printing_price_by_range
FROM #priceRanges
)
SELECT * FROM pricerange;
-- pre-2012 using ROW_NUMBER and a self-join
WITH prices AS
(
SELECT
rn = ROW_NUMBER() OVER (ORDER BY printing_range_max),
printing_range_max,
printing_price_by_range
FROM #priceRanges
),
pricerange As
(
SELECT
printing_range_min = ISNULL(p2.printing_range_max, 0),
printing_range_max = p1.printing_range_max,
p1.printing_price_by_range
FROM prices p1
LEFT JOIN prices p2 ON p1.rn = p2.rn+1
)
SELECT * FROM pricerange;
Both queries return:
printing_range_min printing_range_max printing_price_by_range
------------------ ------------------ -----------------------
0 15 4
15 24 3
24 33 2
Now that you have that you can use BETWEEN for your join. Here's the full solution:
-- Sample data
DECLARE #priceRanges TABLE
(
printing_range_max tinyint,
printing_price_by_range tinyint
-- if you're on 2014+
,INDEX ix_xxx NONCLUSTERED(printing_range_max, printing_price_by_range)
-- note: second column should be an INCLUDE but not supported in table variables
);
DECLARE #orders TABLE
(
orderid int identity,
ordercount int
-- if you're on 2014+
,INDEX ix_xxy NONCLUSTERED(orderid, ordercount)
-- note: second column should be an INCLUDE but not supported in table variables
);
INSERT #priceRanges VALUES (15, 4), (24, 3), (33, 2);
INSERT #orders(ordercount) VALUES (10), (20), (25), (30);
-- Solution:
WITH pricerange AS
(
SELECT
printing_range_min = LAG(printing_range_max, 1, 0) OVER (ORDER BY printing_range_max),
printing_range_max,
printing_price_by_range
FROM #priceRanges
)
SELECT
o.orderid,
o.ordercount,
--p.printing_range_min,
--p.printing_range_max
p.printing_price_by_range
FROM pricerange p
JOIN #orders o ON o.ordercount BETWEEN printing_range_min AND printing_range_max
Results:
orderid ordercount printing_price_by_range
----------- ----------- -----------------------
1 10 4
2 20 3
3 25 2
4 30 2
Now that we have that we can
I have the rows below, and i want to access prior row and divide its value by current row. For every row, i need to calculate the Vi value, this Vi value is equal to Vi-1/Vi which means that:
Given the table
Table T
id value out
1 100
2 200
3 10
4 50
I want to generate these values
V1 = 100
V2= 100/200 = 0.5
V3 = 0.5/10 = 0.05
V4 = 0.05/50 = 0.001
So at the end i want the following output:
id value out
1 100 100
2 200 0.5
3 10 0.05
4 50 0.001
I tried using the aggregate function SUM with OVER(), but i do not know how to solve this problem as i need to divide and not sum the value
SELECT id, value, SUM(value) OVER(ORDER BY id ROWS BETWEEN
1 PRECEDING AND 1 PRECEDING ) / value as out
FROM T
Sample data:
CREATE TABLE t(
id INT,
value INT
);
INSERT INTO t VALUES
(1, 100), (2, 200), (3, 10), (4, 50);
Unfortunately, SQL do not have Product, but it should be simple to use cte. The performance should be not bad if id was indexed
DECLARE #T table (id int identity(1,1) primary key, value int)
INSERT #T VALUES (100), (200), (10), (50)
;WITH cte AS
(
SELECT id, value, CAST(value AS decimal(20,4)) AS out FROM #T WHERE id = 1
UNION ALL SELECT T.id, T.value, CAST(cte.out / T.value AS decimal(20,4)) FROM cte INNER JOIN #T T ON cte.id = T.id - 1
)
SELECT * FROM cte
I was answering another question and ran into a strange outcome - the output of a product aggregate (without CLR) was different when used in a SELECT vs UPDATE.
This is simplified from the original question to minimally reproduce the problem:
GroupKey RowIndex A
----------- ----------- -----------
25 1 5
25 2 6
25 3 NULL
26 1 3
26 2 4
26 3 NULL
The goal is for each group key to update the A column of each row with a RowIndex = 3 to the product of the A columns of each row with RowIndex IN (1, 2), so this would produce the following changes:
GroupKey RowIndex A
----------- ----------- -----------
25 3 30
26 3 12
So this is the code I used:
UPDATE T SET
A = Products.Product
FROM #Table T
INNER JOIN (
SELECT
GroupKey,
EXP(SUM(LOG(A))) AS Product
FROM #Table
WHERE RowIndex IN (1, 2)
GROUP BY
GroupKey
) Products
ON Products.GroupKey = T.GroupKey
WHERE T.RowIndex = 3
SELECT * FROM #Table WHERE RowIndex = 3
Which then produced the off-by-one results:
GroupKey RowIndex A
----------- ----------- -----------
25 3 29
26 3 12
If I just run the sub-query, I see the correct values.
GroupKey Product
----------- ----------------------
25 30
26 12
Here's the full script to make it easy to play with. I can't figure out where the off-by-one is coming from.
DECLARE #Table TABLE (GroupKey INT, RowIndex INT, A INT)
INSERT #Table VALUES (25, 1, 5), (25, 2, 6), (25, 3, NULL), (26, 1, 3), (26, 2, 4), (26, 3, NULL)
SELECT * FROM #Table
SELECT
GroupKey,
EXP(SUM(LOG(A))) AS Product
FROM #Table
WHERE RowIndex IN (1, 2)
GROUP BY
GroupKey
UPDATE T SET
A = Products.Product
FROM #Table T
INNER JOIN (
SELECT
GroupKey,
EXP(SUM(LOG(A))) AS Product
FROM #Table
WHERE RowIndex IN (1, 2)
GROUP BY
GroupKey
) Products
ON Products.GroupKey = T.GroupKey
WHERE T.RowIndex = 3
SELECT * FROM #Table WHERE RowIndex = 3
Here are some references I came across:
Non-CLR Aggregate: http://michaeljswart.com/2011/03/the-aggregate-function-product/
Original question: Set one row fields as a multiplication of 2 others
I'd say that this cute "PRODUCT" aggregate is inherently unreliable if you want to work with ints - EXP and LOG are only defined against the float type and so we get rounding errors creeping in.
Why they're not consistently appearing, I couldn't say, except to suggest that different queries may cause changes in evaluation orders.
As a simpler example of how this can go wrong:
select CAST(EXP(LOG(5)) as int)
Can produce 4. EXP and LOG together will produce a value that is just less than 5, but of course when converting to int, SQL Server always truncates rather than applying any rounding.
I have a problem with a query.
This is the data (order by Timestamp):
Data
ID Value Timestamp
1 0 2001-1-1
2 0 2002-1-1
3 1 2003-1-1
4 1 2004-1-1
5 0 2005-1-1
6 2 2006-1-1
7 2 2007-1-1
8 2 2008-1-1
I need to extract distinct values and the first occurance of the date. The exception here is that I need to group them only if not interrupted with a new value in that timeframe.
So the data I need is:
ID Value Timestamp
1 0 2001-1-1
3 1 2003-1-1
5 0 2005-1-1
6 2 2006-1-1
I've made this work by a complicated query, but am sure there is an easier way to do it, just cant think of it. Could anyone help?
This is what I started with - probably could work with that. This is a query that should locate when a value is changed.
> SELECT * FROM Data d1 join Data d2 ON d1.Timestamp < d2.Timestamp and
> d1.Value <> d2.Value
It probably could be done with a good use of row_number clause but cant manage it.
Sample data:
declare #T table (ID int, Value int, Timestamp date)
insert into #T(ID, Value, Timestamp) values
(1, 0, '20010101'),
(2, 0, '20020101'),
(3, 1, '20030101'),
(4, 1, '20040101'),
(5, 0, '20050101'),
(6, 2, '20060101'),
(7, 2, '20070101'),
(8, 2, '20080101')
Query:
;With OrderedValues as (
select *,ROW_NUMBER() OVER (ORDER By TimeStamp) as rn --TODO - specific columns better than *
from #T
), Firsts as (
select
ov1.* --TODO - specific columns better than *
from
OrderedValues ov1
left join
OrderedValues ov2
on
ov1.Value = ov2.Value and
ov1.rn = ov2.rn + 1
where
ov2.ID is null
)
select * --TODO - specific columns better than *
from Firsts
I didn't rely on the ID values being sequential and without gaps. If that's the situation, you can omit OrderedValues (using the table and ID in place of OrderedValues and rn). The second query simply finds rows where there isn't an immediate preceding row with the same Value.
Result:
ID Value Timestamp rn
----------- ----------- ---------- --------------------
1 0 2001-01-01 1
3 1 2003-01-01 3
5 0 2005-01-01 5
6 2 2006-01-01 6
You can order by rn if you need the results in this specific order.