Snowflake Query for Latest Snapshot From Bitemporal Data - snowflake-cloud-data-platform

Given a table of data with bitemporal modeling where there are 2 dates: (i) the date that the data applies to, and (ii) the datetime at which the fact is known
City Temp Date As_of_Datetime
——— ———- ———- -——————-
Boston 32 2022/07/01 2022/06/28 13:23:00
Boston 31 2022/07/01 2022/06/29 13:23:00
Miami 74 2022/07/01 2022/06/28 13:23:00
Miami 75 2022/07/01 2022/06/29 13:23:00
What snowflake query will give the latest snapshot of the data for each date based on the most recent As_of_Datetime?
The expected result would be
City Temp Date
Boston 31 2022/07/01
Miami 75 2022/07/01
I tried using the last_value function
select City, Date, last_value(Temp) over (partition by City, Date order by As_of_Datetime) as Temp
from temperature_table
order by City, Date
but that produced duplicate rows where the same last value is repeated:
Boston 31 2022/07/01
Boston 31 2022/07/01
Miami 75 2022/07/01
Miami 75 2022/07/01
Ideally there should only be 1 row returned for each (City, Date) combo.
Thank you in advance for your consideration and response.

It could be achieved by using QUALIFY and ROW_NUMBER - partitioned by City, Date and sorted As_of_DateTime descending:
SELECT *
FROM tab
QUALIFY ROW_NUMBER() OVER(PARTITION BY City, Date ORDER BY As_of_DateTime DESC) = 1

Related

How to convert dynamic column into rows in snowflake

INPUT
product
country
brand
01-01-2022
02-01-2022
03-01-2022
dairy milk
India
Cadbury
10
20
30
OUTPUT
product
country
brand
DATE
VALUE
dairy milk
India
Cadbury
01-01-2022
10
dairy milk
India
Cadbury
02-01-2022
20
dairy milk
India
Cadbury
03-01-2022
30
INPUT
product
country
brand
01-01-2022
02-01-2022
03-01-2022
04-01-2022
dairy milk
India
Cadbury
10
20
30
40
OUTPUT
product
country
brand
DATE
VALUE
dairy milk
India
Cadbury
01-01-2022
10
dairy milk
India
Cadbury
02-01-2022
20
dairy milk
India
Cadbury
03-01-2022
30
dairy milk
India
Cadbury
04-01-2022
40
Here's a dynamic solution using object_construct and lateral flatten .
First create some test data.
create or replace view data as
SELECT
*
FROM
(VALUES (
'dairy milk',
'India',
'Cadbury',
10,
20,
30))
as v (PRODUCT,
COUNTRY,
BRAND,
"01-01-2022",
"02-01-2022",
"03-01-2022")
;
I assume your date columns are quoted, although not shown as such in your question, as otherwise they are invalid column names.
with
-- First create an object containing the contents of each row
ro as (select
PRODUCT, COUNTRY, BRAND,
object_construct(*) row_obj
from data)
-- Lateral flatten the object, and filter out the columns that you don't want to pivot.
Select
PRODUCT, COUNTRY, BRAND,
to_date( -- Note: Removing the " from quoted column names
replace(key,'"')
,'DD-MM-YYYY') as "DATE", value
from ro, lateral flatten (input => row_obj)
where key not in ('PRODUCT','COUNTRY','BRAND');
I've assumed that you want the DATE column in the result to be returned as a date type, hence the need for replace and cast to convert the column names you are un-pivoting. If you are fine with the DATE column as varchar type, you can replace
to_date(
replace(key,'"')
,'DD-MM-YYYY') as "DATE"
with
key as "DATE"
Note: your columname DATE is a keyword and therefore needs to be quoted. I think its good practice to avoid using SQL keywords as object-names.
One approach uses a UNION ALL:
SELECT product, country, brand, '2022-01-01' AS DATE, "01-01-2022" AS VALUE FROM yourTable
UNION ALL
SELECT product, country, brand, '2022-01-02', "02-01-2022" FROM yourTable
UNION ALL
SELECT product, country, brand, '2022-01-03', "03-01-2022" FROM yourTable;

T-SQL Grouping Dynamic Date Ranges

Using MS SQL Server 2019
I have a set of recurring donation records. Each have a First Gift Date and a Last Gift Date associated with them. I need to add a GroupedID to these rows so that I can get the full date range for the earliest FirstGiftDate and the oldest LastGiftDate as long as there is not a break of more than 45 days in between the recurring donations.
For example Bob is a long time supporter. His card has expired multiple times and he has always started new gifts within 45 days. All of his gifts need to be given a single grouped ID. On the opposite side June has been donating and her card expires. She doesn't give again for 6 months, but then continues to give after her card expires. The first gift of Junes should get its own "GroupedID" and the second and third should be grouped together.The grouping count should restart with each donor.
My initial attempt was to join the donation table back to itself aliased as D2. This did work to give me an indicator of which ones were within the 45 day mark but I can't wrap my head around how to then link them. My only thought was to use LEAD and LAG to try analyze each scenario and figure out the different combinations of LEAD and LAG values needed to make it catch each different scenario, but that doesn't seem as reliable as scaleable as I'd like it to be.
I appreciate any help anyone can give.
My code:
SELECT #Donation.*, D2.*
FROM #Donation
LEFT JOIN #Donation D2 ON #Donation.RecurringGiftID <> D2.RecurringGiftID
AND #Donation.Donor = D2.Donor
AND ABS(DATEDIFF(DAY, #Donation.FirstGiftDate, D2.LastGiftDate)) < 45
Table structure and sample data:
CREATE TABLE #Donation
(
RecurringGiftID int,
Donor nvarchar(25),
FirstGiftDate date,
LastGiftDate date
)
INSERT INTO #Donation
VALUES (1, 'Bob', '2017-02-15', '2018-07-01'),
(15, 'Bob', '2018-08-05', '2019-04-01'),
(32, 'Bob', '2019-04-15', '2022-06-15'),
(54, 'June', '2015-05-01', '2016-05-01'),
(96, 'June', '2016-12-15', '2018-02-01'),
(120, 'June', '2018-03-04', '2020-07-01')
Desired output:
RecurringGiftId
Donor
FirstGiftDate
LastGiftDate
GroupedID
1
Bob
2017-02-15
2018-07-01
1
15
Bob
2018-08-05
2019-04-01
1
32
Bob
2019-04-15
2022-06-15
1
54
June
2015-05-01
2016-05-01
1
96
June
2016-12-15
2018-02-01
2
120
June
2018-03-04
2020-07-01
2
use LAG() to detect when current row is more than 45 days from previous and perform a cumulative sum to form the required Group ID
select *,
GroupedID = sum(g) over (partition by Donor order by FirstGiftDate)
from
(
select *,
g = case when datediff(day,
lag(LastGiftDate, 1, '19000101') over (partition by Donor
order by FirstGiftDate),
FirstGiftDate)
> 45
then 1
else 0
end
from #Donation
) d

Records for the last 7 days

Using Snowflake, I want to get the daily stock for the last 7 days.
Columns I have in this table are: product_ID, date, and quantity
my desired out put should look like the following:
product_ID DATE Quantity
82471 2022-07-14 40
82471 2022-07-15 35
82471 2022-07-16 34
82471 2022-07-17 50
82471 2022-07-18 53
82471 2022-07-19 51
82471 2022-07-20 40
Any ideas how to reach this output? :)
I don't know if you need to aggregate your data, but based on your input, something like this may work:
select product_id, date, Quantity
from mytable
where "DATE" > dateadd( 'days', -7, current_date )
order by product_id, date DESC;

SQL query to calculate Throughput based "subtracting" two Select statements using Group By

I'm trying to formulate a SQL query to calculate the difference in the number of people "arriving" and "departing" grouped by City and Date.
TravelerID ArrivalDate DepartureDate City
1 2015-10-01 2015-10-03 New York
2 2015-10-02 2015-10-03 New York
3 2015-10-02 2015-10-04 Chicago
4 2015-10-01 2015-10-02 Chicago
I'm hoping to get a table that looks like
NumOfTravelers Date City
1 2015-10-01 New York
1 2015-10-02 New York
-2 2015-10-03 New York
1 2015-10-01 Chicago
0 2015-10-02 Chicago
-1 2015-10-04 Chicago
A positive number for NumOfTravelers means that more people arrived in that city on that particular date. A negative number for NumOfTravelers means that more people left that city on that particular date.
In trying to break down this SQL query, I've tried
SELECT COUNT(TravelerID) as NumTravelersArrivng, ArrivalDate, City FROM TravelTable GROUP BY ArrivalDate, City;
SELECT COUNT(TravelerID) as NumTravelersDeparting, DepartureDate, City FROM TravelTable GROUP BY DepartureDate, City;
I'm trying to get "NumTravelersArriving" - "NumTravelersDeparting" into a column that represents "traveler throughput" grouped by City and Date.
I've been so stumped on this. I'm using SQL Server, and having a frustrating time using Table aliases and Column aliases.
Try this:
SELECT *
FROM (
SELECT City, ArrivalDate As Date, COUNT(TravelerID) As NumOfTravelers
FROM TravelTable
GROUP BY City, ArrivalDate
) a
FULL JOIN (
SELECT City, DepartureDate As Date, COUNT(TravelerID) * -1 As NumOfTravelers
FROM TravelTable
GROUP BY City, DepartureDate
) b ON b.City = a.City AND b.Date = a.Date

how to get record for which given date falls between two dates of same column in PostgreSql

My table is having data e.g. empcode designation code and promotion date, I want to get what was an employee's designation on some given date. for eg.
EmpCode DesignationCode PromotionDate
101 50 2010-01-25
101 10 2014-01-01
101 11 2015-01-01
102 10 2009-10-01
103 15 2015-01-01
now if I check designation as on 2014-02-01 it should give result as following
EmpCode DesignationCode PromotionDate
101 10 2014-01-01
102 10 2009-10-01
Can anyone please tell what query should I write ?
Thanks in Advance.
You can try:
SELECT DISTINCT ON (EmpCode) EmpCode, DesignationCode, PromotionDate
FROM mytable
WHERE PromotionDate <= '2014-02-01'
ORDER BY EmpCode, PromotionDate DESC
The query first filters out any records having a PromotionDate that is past given date, i.e. '2014-02-01'.
Using DISTINCT ON (EmpCode) we get one row per EmpCode. This is the one having the most recent PromotionDate (this is achieved by placing PromotionDate DESC in the ORDER BY clause).
Demo here

Resources