Snowflake's LAG function bug when current row's value is NULL - snowflake-cloud-data-platform

I am using LAG function to get "previous" price of the item, and it is working fine, except the case when the current price is NULL, in which case it returns the very last price in the section
SELECT s.id as sale_id,
COALESCE(s.property_id, c.property_id) as c_property_id,
COALESCE(c.sale_date, s.created_at) as sale_date,
c.price,
lag(c.price, 1, NULL) IGNORE NULLS OVER (partition by c_property_id order by sale_date) as previous_price
FROM "CLOSINGS" c FULL OUTER JOIN "SALES" s on c.sale_id = s.id
WHERE c_property_id = xxx
ORDER BY sale_date
And here is what I am getting as a result - please note the 990000 as previous_price on the fourth row
SALE_ID C_PROPERTY_ID SALE_DATE PRICE PREVIOUS_PRICE
xxx 1997-10-06 370000 NULL
xxx 2000-02-22 550000 370000
xxx 2003-09-05 675000 550000
mmmmmmm xxx 2019-11-26 NULL 990000
xxx 2019-12-17 1100000 675000
nnnnnnn xxx 2020-06-16 990000 1100000

I tried to reproduce the issue but as I see, LAG works as expected. Maybe there is something else with your query. Can you share some sample data?
with price_data as (
select * from values
('xxx','1997-10-06',370000 ),
('xxx','2000-02-22',550000 ),
('xxx','2003-09-05',675000 ),
('xxx','2019-11-26',null ),
('xxx','2019-12-17',1100000 ),
('xxx','2020-06-16',990000 ) tmp(C_PROPERTY_ID,SALE_DATE,PRICE))
select C_PROPERTY_ID,SALE_DATE,PRICE,
lag( PRICE, 1, NULL) IGNORE NULLS OVER (partition by c_property_id order by sale_date) as previous_price
FROM price_data
order by sale_date;
+---------------+------------+---------+----------------+
| C_PROPERTY_ID | SALE_DATE | PRICE | PREVIOUS_PRICE |
+---------------+------------+---------+----------------+
| xxx | 1997-10-06 | 370000 | NULL |
| xxx | 2000-02-22 | 550000 | 370000 |
| xxx | 2003-09-05 | 675000 | 550000 |
| xxx | 2019-11-26 | NULL | 675000 |
| xxx | 2019-12-17 | 1100000 | 675000 |
| xxx | 2020-06-16 | 990000 | 1100000 |
+---------------+------------+---------+----------------+

Related

How to find the last value depends on an event value

I have the date who likes :
Key
DATE
Event_Date
Event
LastEventDate
1
2021-12-01
NULL
1
2021-12-02
NULL
1
2021-12-03
NULL
1
2021-12-04
NULL
1
2021-12-05
NULL
1
2021-12-06
2021-12-06
Yes
2021-12-06
1
2021-12-07
NULL
2021-12-06
1
2021-12-08
NULL
2021-12-06
1
2021-12-09
2021-12-09
Yes
2021-12-09
1
2021-12-10
NULL
2021-12-09
1
2021-12-11
NULL
2021-12-09
1
2021-12-12
NULL
2021-12-09
1
2021-12-13
2021-12-13
Yes
2021-12-13
The challenge s to create the red column : LastEventDate.
I tried this
SELECT
Key,
Event_Date,
value_partition,
first_value(Event_Date) over (partition by value_partition order by Key)
FROM (
SELECT
Key,
Event_Date,
sum(case when Event_Date is null then 0 else 1 end) over (order by Key) as value_partition
FROM MyTable
ORDER BY Key ASC
) as A
But I got the error:
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.
Does anyone has idea how to get the LastEventDate column?
An ORDER BY is only allowed in a subquery when there's a TOP or OFFSET FETCH.
The ORDER BY in the window functions should be on the DATE column.
WITH CTE_DATA AS (
SELECT [Key], [DATE], Event_Date, Event
, SUM(IIF(Event_Date IS NOT NULL,1,0)) OVER (PARTITION BY [Key] ORDER BY [DATE]) AS Rnk
FROM MyTable
)
SELECT [Key], [DATE], Event_Date, Event
, FIRST_VALUE(Event_Date) OVER (PARTITION BY [Key], Rnk ORDER BY [DATE]) AS LastEventDate
FROM CTE_DATA
ORDER BY [Key], [DATE];
GO
Key | DATE | Event_Date | Event | LastEventDate
--: | :--------- | :--------- | :---- | :------------
1 | 2021-12-01 | null | null | null
1 | 2021-12-02 | null | null | null
1 | 2021-12-03 | null | null | null
1 | 2021-12-04 | null | null | null
1 | 2021-12-05 | null | null | null
1 | 2021-12-06 | 2021-12-06 | Yes | 2021-12-06
1 | 2021-12-07 | null | null | 2021-12-06
1 | 2021-12-08 | null | null | 2021-12-06
1 | 2021-12-09 | 2021-12-09 | Yes | 2021-12-09
1 | 2021-12-10 | null | null | 2021-12-09
1 | 2021-12-11 | null | null | 2021-12-09
1 | 2021-12-12 | null | null | 2021-12-09
1 | 2021-12-13 | 2021-12-13 | Yes | 2021-12-13
Test on db<>fiddle here
You can use a running windowed MAX for this
SELECT
t.[Key],
t.DATE,
t.Event_Date,
t.Event,
LastEventDate = MAX(t.Event_Date) OVER (PARTITION BY t.[Key] ORDER BY t.Date ROWS UNBOUNDED PRECEDING)
FROM MyTable t
db<>fiddle

SQL Server - Cumulate awake time for devices from event summary

I'm working on a SQL server (used by BMC) to grab the uptime of some devices.
I've got a query that display me results like this:
| DeviceName | EventDate | EventType |
| ---------- | ----------------------- | ---------- |
| 1 | 2021-02-15 08:06:12.000 | getting up |
| 1 | 2021-02-12 15:07:26.000 | going down |
| 2 | 2021-02-16 08:12:54.000 | getting up |
| 2 | 2021-02-12 15:43:00.000 | going down |
| 3 | 2021-02-15 07:47:42.000 | getting up |
| 3 | 2021-02-12 15:38:41.000 | going down |
| 4 | 2021-02-15 08:10:07.000 | getting up |
| 5 | 2021-02-18 06:41:40.000 | getting up |
| ... | ... | ... |
I would like to get a result that looks like that:
| DeviceName | TotalUpTime (min) |
| ---------- | ----------------- |
| 1 | 16543 |
| 2 | 13639 |
| 3 | 13524 |
| 4 | 19235 |
| 5 | 12347 |
Here is my current query:
SELECT
DeviceName,
EventDate,
EventType
FROM **irrelevant complexe SELECT query**
ORDER BY DeviceName, EventDate DESC
Any help would be great!!
Many thx in advance!
SOLUTION:
Ok, here's what worked for me:
SELECT
DeviceName,
DATEDIFF(s, EventDate, EndTime)/60 AS [TotalUpTime (min)]
FROM (
SELECT *,
LEAD(CASE WHEN EventType = 32 THEN EventDate END, 1, GETDATE())
OVER (PARTITION BY DeviceName ORDER BY EventDate) AS EndTime
FROM (
**Irrelevant SELECT query**
) r
) s
WHERE EventType = 16 AND EndTime IS NOT NULL
Many thanks to #Charlieface whos response helped me a lot.
Hope this help someone someday, even if it's very specific.
SELECT
DeviceName,
SUM(DATEDIFF(ms, EventDate, EndTime) / 60000.0) AS [TotalUpTime (min)]
FROM (
SELECT *,
LEAD(CASE WHEN EventType = 'going down' THEN EventDate END, 1, GETDATE()) OVER
OVER (PARTITION BY DeviceName ORDER BY EventDate) AS EndTime
FROM table
)
WHERE EventType = 'getting up' AND EndTime IS NOT NULL
GROUP BY DeviceName

Extract into multiple columns from JSON with PostgreSQL

I have a column item_id that contains data in JSON (like?) structure.
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| id | item_id |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
| 56711 | {itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}" |
| 56712 | {itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}" |
| 56721 | {itemID":["2704\/1#1#1356"]}" |
| 56722 | {itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}" |
| 57638 | {itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}" |
| 57638 | {itemID":["109#1#3364","110\/1#1#3364"]}" |
+----------+---------------------------------------------------------------------------------------------------------------------------------------+
I need the last four digits before every comma (if there is) and the last 4 digits distincted and separated into individual colums.
The distinct should happen across id as well, so only one result row with id: 57638 is permitted.
Here is a fiddle with a code draft that is not giving the right answer.
The desired result should look like this:
+----------+-----------+-----------+
| id | item_id_1 | item_id_2 |
+----------+-----------+-----------+
| 56711 | 1974 | |
| 56712 | 4220 | 4221 |
| 56721 | 1356 | |
| 56722 | 3349 | |
| 57638 | 3364 | 3365 |
+----------+-----------+-----------+
There can be quite a lot 'item_id_%' column in the results.
with the_table (id, item_id) as (
values
(56711, '{"itemID":["0530#2#1974","0538\/2#2#1974","0538\/3#2#1974","0538\/18#2#1974","0539#2#1974"]}'),
(56712, '{"itemID":["0138528#2#4221","0138529#2#4221","0138530#2#4221","0138539#2#4221","0118623\/2#2#4220"]}'),
(56721, '{"itemID":["2704\/1#1#1356"]}'),
(56722, '{"itemID":["0825\/2#2#3349","0840#2#3349","0844\/10#2#3349","0844\/11#2#3349","0844\/13#2#3349","0844\/14#2#3349","0844\/15#2#3349"]}'),
(57638, '{"itemID":["0161\/1#2#3364","0162\/1#2#3364","0163\/2#2#3364"]}'),
(57638, '{"itemID":["109#1#3365","110\/1#1#3365"]}')
)
select id
,(array_agg(itemid)) [1] itemid_1
,(array_agg(itemid)) [2] itemid_2
from (
select distinct id
,split_part(replace(json_array_elements(item_id::json -> 'itemID')::text, '"', ''), '#', 3)::int itemid
from the_table
order by 1
,2
) t
group by id
DEMO
You can unnest the json array, get the last 4 characters of each element as a number, then do conditional aggregation:
select
id,
max(val) filter(where rn = 1) item_id_1,
max(val) filter(where rn = 2) item_id_2
from (
select
id,
right(val, 4)::int val,
dense_rank() over(partition by id order by right(val, 4)::int) rn
from mytable t
cross join lateral jsonb_array_elements_text(t.item_id -> 'itemID') as x(val)
) t
group by id
You can add more conditional max()s to the outer query to handle more possible values.
Demo on DB Fiddle:
id | item_id_1 | item_id_1
----: | --------: | --------:
56711 | 1974 | null
56712 | 4220 | 4221
56721 | 1356 | null
56722 | 3349 | null
57638 | 3364 | 3365

T-SQL Add Counter to Grouped Data

I am attempting to produce a query that displays a column which increments (counts) for each group set of data. The overall order of the results does not matter, except the occurrence must count by date (oldest = 1) and should reset for each set of grouped data. Here is an example table, ProductInteractions.
+---------+------------+----------------+------------+
| User ID | Product ID | Date Purchased | Occurrence |
+---------+------------+----------------+------------+
| user15 | b1290 | 1/1/2012 | 1 |
| user15 | b1290 | 1/15/2013 | 2 |
| user15 | b1290 | 3/15/2019 | 3 |
| user15 | a7983 | 7/22/2017 | 1 |
| user2 | a7983 | 12/3/2015 | 1 |
| user2 | a7983 | 5/6/2016 | 2 |
| user3 | a7983 | 3/24/2017 | 1 |
+---------+------------+----------------+------------+
Original data:
+---------+------------+-----------+
| User ID | Product ID | Date |
+---------+------------+-----------+
| user15 | b1290 | 1/1/2012 |
| user2 | a7983 | 5/6/2016 |
| user15 | b1290 | 3/15/2019 |
| user15 | a7983 | 7/22/2017 |
| user2 | a7983 | 12/3/2015 |
| user15 | b1290 | 1/15/2013 |
| user3 | a7983 | 3/24/2017 |
+---------+------------+-----------+
Note in the example above, user15 and product b1290 have 3 interactions. It is important that the first occurrence is tied to the initial interaction date and that subsequent interactions are counted by increasing date.
I believe that the basic format of the query will be:
SELECT [User ID],
[Product ID],
[Date Purchased]
-- Something here utilizing IDENTITY, maybe?
FROM ProductInteractions
GROUP BY [User ID],
[Product ID];
use ROW_NUMBER()
Here is the code to test/validate teh script below: replace ProductInteractions by your own table
declare #ProductInteractions as table([User ID] varchar(50),[Product ID] varchar(50),[Date] datetime)
insert into #ProductInteractions values
('user15' , 'b1290' , '1/1/2012' ),
('user2' , 'a7983' , '5/6/2016' ),
('user15' , 'b1290' , '3/15/2019' ),
('user15' , 'a7983' , '7/22/2017' ),
('user2' , 'a7983' , '12/3/2015' ),
('user15' , 'b1290' , '1/15/2013' ),
('user3' , 'a7983' , '3/24/2017' )
select [User ID],[Product ID],[Date],
row_number() over(partition by [User ID],[Product ID] order by [date]) [occurence]
from #ProductInteractions order by [Product ID] desc
A simple ROW_NUMBER is perfect for this.
SELECT [User ID],
[Product ID],
[Date Purchased]
, ROW_NUMBER() over(partition by [User ID], [Product ID] order by [Date Purchased])
FROM ProductInteractions
GROUP BY [User ID],
[Product ID];

SQL Server - recursive references (loop, join, insert?)

I would appreciate if you could give me any hints regarding the fastest solution of the following SQL Server challenge:
Let's say I have a table with DATE, CLIENT and his several characteristics in other columns. I need to calculate COLUMN_1 and COLUMN_2 but:
COLUMN_1 uses the client's characteristics as of current DATE and as of previous DATE and COLUMN_1 value from the previous DATE (recursive referencing)
COLUMN_2 additionally uses COLUMN_1 value as of current date (therefore I would like to refer to its final value, not the particular 'case when' that implements the column logic)
How do I replicate this logic most efficiently in SQL Server?
I was thinking about the loop that goes over DATA and for each DATA, joins previous DATA, calculates firstly COLUMN_1, then COLUMN_2 (but how to make sure that the values in COLUMN_1 are accessible for COLUMN_2?)
Regards,
Bart
Without a specific example you we will not be able to tell you which solution would be the most efficient, especially when you are looking for a solution you describe as recursive. You might not need a full recursive solution if you could use window functions instead.
In sql server 2012+ you have access to lead() and lag() which you can use to get the previous and next values for a column based on a partition and order.
select
client
, date
, nextdate = lead(date) over (partition by client order by date)
, prevdate = lag(date) over (partition by client order by date)
, column1 = 'do stuff with lead/lag'
, column2 = 'do stuff with lead/lag'
from t
rextester example: http://rextester.com/FFHU71709
returns:
+--------+------------+------------+------------+------------------------+------------------------+
| client | date | nextdate | prevdate | column1 | column2 |
+--------+------------+------------+------------+------------------------+------------------------+
| 1 | 2017-01-01 | 2017-01-02 | NULL | do stuff with lead/lag | do stuff with lead/lag |
| 1 | 2017-01-02 | 2017-01-03 | 2017-01-01 | do stuff with lead/lag | do stuff with lead/lag |
| 1 | 2017-01-03 | NULL | 2017-01-02 | do stuff with lead/lag | do stuff with lead/lag |
| 2 | 2017-01-02 | 2017-01-04 | NULL | do stuff with lead/lag | do stuff with lead/lag |
| 2 | 2017-01-04 | 2017-01-06 | 2017-01-02 | do stuff with lead/lag | do stuff with lead/lag |
| 2 | 2017-01-06 | NULL | 2017-01-04 | do stuff with lead/lag | do stuff with lead/lag |
+--------+------------+------------+------------+------------------------+------------------------+
One way to simulate lead/lag prior to sql server 2012 is with outer apply()
select
client
, date
, nextdate
, prevdate
, column1 = 'do stuff with lead/lag'
, column2 = 'do stuff with lead/lag'
from t
outer apply (
select top 1 nextdate = i.date
from t i
where i.client = t.client
and i.date > t.date
order by i.date asc
) n
outer apply (
select top 1 prevdate = i.date
from t i
where i.client = t.client
and i.date < t.date
order by i.date desc
) p
rextester demo: http://rextester.com/GGS1299
returns:
+--------+------------+------------+------------+---------------------------------+---------------------------------+
| client | date | nextdate | prevdate | column1 | column2 |
+--------+------------+------------+------------+---------------------------------+---------------------------------+
| 1 | 2017-01-01 | 2017-01-02 | NULL | do stuff with nextdate/prevdate | do stuff with nextdate/prevdate |
| 1 | 2017-01-02 | 2017-01-03 | 2017-01-01 | do stuff with nextdate/prevdate | do stuff with nextdate/prevdate |
| 1 | 2017-01-03 | NULL | 2017-01-02 | do stuff with nextdate/prevdate | do stuff with nextdate/prevdate |
| 2 | 2017-01-02 | 2017-01-04 | NULL | do stuff with nextdate/prevdate | do stuff with nextdate/prevdate |
| 2 | 2017-01-04 | 2017-01-06 | 2017-01-02 | do stuff with nextdate/prevdate | do stuff with nextdate/prevdate |
| 2 | 2017-01-06 | NULL | 2017-01-04 | do stuff with nextdate/prevdate | do stuff with nextdate/prevdate |
+--------+------------+------------+------------+---------------------------------+---------------------------------+
For solutions that absolutely require recursion, then you probably need to use a recursive cte.
;with cte as (
-- non recursive cte to add `nextdate` for recursive join
select
t.client
, t.date
, nextdate = x.date
from t
outer apply (
select top 1 i.date
from t i
where i.client = t.client
and i.date > t.date
order by i.date asc
) x
)
, r_cte as (
--anchor rows / starting rows
select
client
, date
, nextdate
, prevDate = convert(date, null)
, column1 = convert(varchar(64),null)
, column2 = convert(varchar(64),null)
from cte t
where not exists (
select 1
from cte as i
where i.client = t.client
and i.date < t.date
)
union all
--recursion starts here
select
c.client
, c.date
, c.nextdate
, prevDate = p.date
, column1 = convert(varchar(64),'do recursive stuff with p.column1')
, column2 = convert(varchar(64),'do recursive stuff with p.column2')
from cte c
inner join r_cte p
on c.client = p.client
and c.date = p.nextdate
)
select *
from r_cte
rextester demo: http://rextester.com/LKH38243
returns:
+--------+------------+------------+------------+-----------------------------------+-----------------------------------+
| client | date | nextdate | prevdate | column1 | column2 |
+--------+------------+------------+------------+-----------------------------------+-----------------------------------+
| 1 | 2017-01-01 | 2017-01-02 | NULL | NULL | NULL |
| 2 | 2017-01-02 | 2017-01-04 | NULL | NULL | NULL |
| 2 | 2017-01-04 | 2017-01-06 | 2017-01-02 | do recursive stuff with p.column1 | do recursive stuff with p.column2 |
| 2 | 2017-01-06 | NULL | 2017-01-04 | do recursive stuff with p.column1 | do recursive stuff with p.column2 |
| 1 | 2017-01-02 | 2017-01-03 | 2017-01-01 | do recursive stuff with p.column1 | do recursive stuff with p.column2 |
| 1 | 2017-01-03 | NULL | 2017-01-02 | do recursive stuff with p.column1 | do recursive stuff with p.column2 |
+--------+------------+------------+------------+-----------------------------------+-----------------------------------+
Reference
Recursive Queries Using Common Table Expressions (cte)
If using SQL2012 or later, look at the features LAG & LEAD
For example, if you want to use the previous row's value in conjunction with this row's value - LAG like this:
DECLARE #T TABLE (DateCol DATETIME, StringCol VARCHAR(10))
INSERT INTO #T (DateCol, StringCol) VALUES ('2017-01-01','A'), ('2017-01-02','B'), ('2017-01-03','C'), ('2017-01-04','D'), ('2017-01-05','E')
SELECT DateCol, StringCol, PreviousRowStringcol = LAG(StringCol,1,NULL) OVER (ORDER BY DateCol) FROM #T

Resources