How to perform SQL join on lag date in Snowflake?

How to perform SQL join on lag date in Snowflake? - snowflake-cloud-data-platform

I have a table with charges and user id's and a separate table with plan changes. I need to join these tables in such a way that I know what the user's plan id was at the time of the charge in Snowflake.
Charge Table:
Charge_ID,User_ID,Inserted_At
AAAA ,1234 ,2022-01-01 15:00:00
AAAA ,1234 ,2022-02-01 15:00:00
BBBB ,5678 ,2022-01-05 18:00:00
BBBB ,5678 ,2022-02-07 18:00:00
Plan Table:
User_ID,Plan_ID,Inserted_At
1234 ,100 ,2022-01-01 13:00:00
1234 ,099 ,2022-01-01 14:00:00
1234 ,101 ,2022-01-18 13:00:00
5678 ,050 ,2022-01-04 13:00:00
5678 ,051 ,2022-02-08 13:00:00
Result:
Charge_ID,User_ID,Charge_Inserted_At ,Plan_ID
AAAA ,1234 ,2022-01-01 15:00:00 ,099
AAAA ,1234 ,2022-02-01 15:00:00 ,101
BBBB ,5678 ,2022-01-05 18:00:00 ,050
BBBB ,5678 ,2022-02-07 18:00:00 ,050
Do I need to cross join and lag, if so how can I accomplish that? Is there a way to accomplish this that's optimally efficient beyond some type of cross join?

This is an alternate approach to using a window function. It's a subquery to find the closest date with a self-join. This approach can perform better than a window function, especially if there's a known limitation in date ranges. For example, this is looking for the closest change backward in time, but if it's known that the change would never be more than N days in the past, adding that to the subquery can help performance a lot.
create table CHARGE(Charge_ID string, User_ID int, Inserted_At timestamp_ntz);
insert into charge (Charge_ID,User_ID,Inserted_At) values
('AAAA' ,1234 ,'2022-01-01 15:00:00'),
('AAAA' ,1234 ,'2022-02-01 15:00:00'),
('BBBB' ,5678 ,'2022-01-05 18:00:00'),
('BBBB' ,5678 ,'2022-02-07 18:00:00')
;
create table PLAN(User_ID int,Plan_ID int ,Inserted_At timestamp_ntz);
insert into PLAN(User_ID,Plan_ID ,Inserted_At) values
(1234 ,100 ,'2022-01-01 13:00:00'),
(1234 ,099 ,'2022-01-01 14:00:00'),
(1234 ,101 ,'2022-01-18 13:00:00'),
(5678 ,050 ,'2022-01-04 13:00:00'),
(5678 ,051 ,'2022-02-08 13:00:00');
with X as
(
select CHARGE_ID
,USER_ID
,INSERTED_AT CHARGE_INSERTED_AT
,(select max(P.INSERTED_AT) from PLAN P
where C.USER_ID = P.USER_ID
and C.INSERTED_AT >= P.INSERTED_AT) LAST_INSERTED_AT
from CHARGE C
)
select X.CHARGE_ID, X.USER_ID, X.CHARGE_INSERTED_AT, P.PLAN_ID
from X left join PLAN P
on X.USER_ID = P.USER_ID and X.LAST_INSERTED_AT = P.INSERTED_AT

One way to solve it is to "transpose" single "InsetedAt" to pair date_from and date_to using LAG/LEAD and perform range join:
Pseudocode:
WITH ChargeTransposed AS (
SELECT Charge_id, User_ID, Inserted_At AS start_date,
LEAD(Inserted_at,1,'2999-12-31') OVER(PARTITION BY ChargeId, UserId
ORDER BY InsertedAt) AS end_date
FROM ChargeTable
)
SELECT *
FROM ChargeTransposed c
LEFT JOIN PlanTable p
ON c.User_ID = p.User_id
AND p.InsertedAt >= c.start_date AND p.InsetedAt < c.end_date
-- QUALIFTY ROW_NUMBER() OVER(PARTITION BY c.Charge_Id, c.User_Id, c.Start_date
ORDER BY p.InsertAt DESC) = 1

Related

13 Period Calendar 4-4-5 Calendar T-SQL MSSQL

I am trying to create a 13 period calendar in mssql but I am a bit stuck. I am not sure if my approach is the best way to achieve this. I have my base script which can be seen below:
Set DateFirst 1
Declare #Date1 date = '20180101' --startdate should always be start of
financial year
Declare #Date2 date = '20181231' --enddate should always be start of
financial year
SELECT * INTO #CalendarTable
FROM dbo.CalendarTable(#Date1,#Date2,0,0,0)c
DECLARE #StartDate datetime,#EndDate datetime
SELECT #StartDate=MIN(CASE WHEN [Day]='Monday' THEN [Date] ELSE NULL END),
#EndDate=MAX([Date])
FROM #CalendarTable
;With Period_CTE(PeriodNo,Start,[End])
AS
(SELECT 1,#StartDate,DATEADD(wk,4,#StartDate) -1
UNION ALL
SELECT PeriodNo+1,DATEADD(wk,4,Start),DATEADD(wk,4,[End])
FROM Period_CTE
WHERE DATEADD(wk,4,[End])< =#EndDate
OR PeriodNo+1 <=13
)
select * from Period_CTE
Which gives me this:
PeriodNo Start End
1 2018-01-01 00:00:00.000 2018-01-28 00:00:00.000
2 2018-01-29 00:00:00.000 2018-02-25 00:00:00.000
3 2018-02-26 00:00:00.000 2018-03-25 00:00:00.000
4 2018-03-26 00:00:00.000 2018-04-22 00:00:00.000
5 2018-04-23 00:00:00.000 2018-05-20 00:00:00.000
6 2018-05-21 00:00:00.000 2018-06-17 00:00:00.000
7 2018-06-18 00:00:00.000 2018-07-15 00:00:00.000
8 2018-07-16 00:00:00.000 2018-08-12 00:00:00.000
9 2018-08-13 00:00:00.000 2018-09-09 00:00:00.000
10 2018-09-10 00:00:00.000 2018-10-07 00:00:00.000
11 2018-10-08 00:00:00.000 2018-11-04 00:00:00.000
12 2018-11-05 00:00:00.000 2018-12-02 00:00:00.000
13 2018-12-03 00:00:00.000 2018-12-30 00:00:00.000
The result i am trying to get is
Even if I have to take a different approach I would not mind, as long as the result is the same as the above.
dbo.CalendarTable() is a function that returns the following results. I can share the code if desired.

I'd create a general number's table like suggested here and add a column Periode13.
The trick to get the tiling is the integer division:
DECLARE #PeriodeSize INT=28; --13 "moon-months" a 28 days
SELECT TOP 100 (ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1)/#PeriodeSize
FROM master..spt_values --just a table with many rows to show the principles
You can add this to an existing numbers table with a simple update statement.
UPDATE A fully working example (using the logic linked above)
DECLARE #RunningNumbers TABLE (Number INT NOT NULL
,CalendarDate DATE NOT NULL
,CalendarYear INT NOT NULL
,CalendarMonth INT NOT NULL
,CalendarDay INT NOT NULL
,CalendarWeek INT NOT NULL
,CalendarYearDay INT NOT NULL
,CalendarWeekDay INT NOT NULL);
DECLARE #CountEntries INT = 100000;
DECLARE #StartNumber INT = 0;
WITH E1(N) AS(SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)), --10 ^ 1
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), -- 10 ^ 2 = 100 rows
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), -- 10 ^ 4 = 10,000 rows
E8(N) AS(SELECT 1 FROM E4 a CROSS JOIN E4 b), -- 10 ^ 8 = 10,000,000 rows
CteTally AS
(
SELECT TOP(ISNULL(#CountEntries,1000000)) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) -1 + ISNULL(#StartNumber,0) As Nmbr
FROM E8
)
INSERT INTO #RunningNumbers
SELECT CteTally.Nmbr,CalendarDate.d,CalendarExt.*
FROM CteTally
CROSS APPLY
(
SELECT DATEADD(DAY,CteTally.Nmbr,{ts'2018-01-01 00:00:00'})
) AS CalendarDate(d)
CROSS APPLY
(
SELECT YEAR(CalendarDate.d) AS CalendarYear
,MONTH(CalendarDate.d) AS CalendarMonth
,DAY(CalendarDate.d) AS CalendarDay
,DATEPART(WEEK,CalendarDate.d) AS CalendarWeek
,DATEPART(DAYOFYEAR,CalendarDate.d) AS CalendarYearDay
,DATEPART(WEEKDAY,CalendarDate.d) AS CalendarWeekDay
) AS CalendarExt;
--The mockup table from above is now filled and can be queried
WITH AddPeriode AS
(
SELECT Number/28 +1 AS PeriodNumber
,CalendarDate
,CalendarWeek
,r.CalendarDay
,r.CalendarMonth
,r.CalendarWeekDay
,r.CalendarYear
,r.CalendarYearDay
FROM #RunningNumbers AS r
)
SELECT TOP 100 p.*
,(SELECT MIN(CalendarDate) FROM AddPeriode AS x WHERE x.PeriodNumber=p.PeriodNumber) AS [Start]
,(SELECT MAX(CalendarDate) FROM AddPeriode AS x WHERE x.PeriodNumber=p.PeriodNumber) AS [End]
,(SELECT MIN(CalendarDate) FROM AddPeriode AS x WHERE x.PeriodNumber=p.PeriodNumber AND x.CalendarWeek=p.CalendarWeek) AS [wkStart]
,(SELECT MAX(CalendarDate) FROM AddPeriode AS x WHERE x.PeriodNumber=p.PeriodNumber AND x.CalendarWeek=p.CalendarWeek) AS [wkEnd]
,(ROW_NUMBER() OVER(PARTITION BY PeriodNumber ORDER BY CalendarDate)-1)/7+1 AS WeekOfPeriode
FROM AddPeriode AS p
ORDER BY CalendarDate
Try it out...
Hint: Do not use a VIEW or iTVF for this.
This is non-changing data and much better placed in a physically stored table with appropriate indexes.

Not abundantly sure external links are accepted here, but I wrote an article that pulls of a 5-4-4 'Crop Year' fiscal year with all the code. Feel free to use all the code in these articles.
SQL Server Calendar Table
SQL Server Calendar Table: Fiscal Years

Finding max date difference on a single column

in the below table example - Table A, we have entries for four different ID's 1,2,3,4 with the respective status and its time. I wanted to find the "ID" which took the maximum amount of time to change the "Status" from Started to Completed. In the below example it is ID = 4. I wanted to run a query and find the results, where we currently has approximately million records in a table. It would be really great, if someone provide an effective way to retrieve this data.
Table A
ID Status Date(YYYY-DD-MM HH:MM:SS)
1. Started 2017-01-01 01:00:00
1. Completed 2017-01-01 02:00:00
2. Started 2017-10-02 03:00:00
2. Completed 2017-10-02 05:00:00
3. Started 2017-15-03 06:00:00
3. Completed 2017-15-03 09:00:00
4. Started 2017-22-04 10:00:00
4. Completed 2017-22-04 15:00:00
Thanks!
Bruce

You can query as below:
Select top 1 with ties Id from #yourDate y1
join #yourDate y2
On y1.Id = y2.Id
and y1.[STatus] = 'Started'
and y2.[STatus] = 'Completed'
order by Row_number() over(order by datediff(mi,y1.[Date], y2.[date]) desc)

SELECT
started.ID, timediff(completed.date, started.date) as elapsed_time
FROM TABLE_A as started
INNER JOIN TABLE_A as completed ON (completed.ID=started.ID AND completed.status='Completed')
WHERE started.status='Started'
ORDER BY elapsed_time desc
be sure there's a index on TABLE_A for the columns ID, date

I haven't run this sql but it may solve your problem.
select a.id, max(DATEDIFF(SECOND, a.date, b.date + 1)) from TableA as a
join TableA as b on a.id = b.id
where a.status="started" and b.status="completed"

Here's a way with a correlated sub-query. Just uncomment the TOP 1 to get ID 4 in this case. This is based off your comments that there is only 1 "started" record, but could be multiple "completed" records for each ID.
declare #TableA table (ID int, Status varchar(64), Date datetime)
insert into #TableA
values
(1,'Started','2017-01-01 01:00:00'),
(1,'Completed','2017-01-01 02:00:00'),
(2,'Started','2017-02-10 03:00:00'),
(2,'Completed','2017-02-10 05:00:00'),
(3,'Started','2017-03-15 06:00:00'),
(3,'Completed','2017-03-15 09:00:00'),
(4,'Started','2017-04-22 10:00:00'),
(4,'Completed','2017-04-22 15:00:00')
select --top 1
s.ID
,datediff(minute,s.Date,e.EndDate) as TimeDifference
from #TableA s
inner join(
select
ID
,max(Date) as EndDate
from #TableA
where Status = 'Completed'
group by ID) e on e.ID = s.ID
where
s.Status = 'Started'
order by
datediff(minute,s.Date,e.EndDate) desc
RETURNS
+----+----------------+
| ID | TimeDifference |
+----+----------------+
| 4 | 300 |
| 3 | 180 |
| 2 | 120 |
| 1 | 60 |
+----+----------------+

If you know that 'started' will always be the earliest point in time for each ID and the last 'completed' record you are considering will always be the latest point in time for each ID, the following should have good performance for a large number of records:
SELECT TOP 1
id
, DATEDIFF(s, MIN([Date]), MAX([date])) AS Elapsed
FROM #TableA
GROUP BY ID
ORDER BY DATEDIFF(s, MIN([Date]), MAX([date])) DESC

SQL Server: split a column value into two separate columns based on another column

I have a table Access:
logId empid empname inout tim
----------------------------------------------------
230361 0100 XYZ 0 2015-08-01 10:00:03
230362 0106 XYZ 0 2015-08-01 10:30:00
230363 0100 XYZ 1 2015-08-01 12:00:00
which records each employee's in time and out time. inout=0 means in and inout=1 means out
I would like to create a table as below from this table
empid empname timIn timOut
-------------------------------------------------------------
0100 XYZ 2015-08-01 10:00:03 2015-08-01 12:00:00
0106 XYZ 2015-08-01 10:30:00
First I tried case as follows:
select
empid, empname, inout,
case when inout = 0 then tim end as 'timIn',
case when inout = 1 then tim end as 'timout'
But NULLs were a problem the result was
0100 xyz 2015-08-01 10:00:03 NULL
0100 xyz NULL 2015-08-01 12:00:00
Second I tried PIVOT, but the problem was I had to use an aggregate function. I need all in-out times and cannot take an aggregate of that.
Is there any alternative way to get the desired result?

You can use APPLY, in conjunction with TOP 1 and the correct ORDER BY to get the next out event after each in event
SELECT i.empID,
i.empname,
TimeIn = i.tim,
TimeOut = o.tim
FROM Access AS i
OUTER APPLY
( SELECT TOP 1 tim
FROM Access AS o
WHERE o.EmpID = i.EmpID
AND o.InOut = 1
AND o.tim > i.tim
ORDER BY o.Tim
) AS o
WHERE i.InOut = 0;
So you are simply selecting all in events (table aliased i), then for each in event, finding the next out event, if there is not one, then the time out field will be null.
FULL WORKING EXAMPLE
DECLARE #Access TABLE (LogID INT NOT NULL, EmpID CHAR(4) NOT NULL, empname VARCHAR(50), InOut BIT NOT NULL, tim DATETIME2 NOT NULL);
INSERT #Access (LogID, EmpID, empname, InOut, tim)
VALUES
(230361, '0100', 'XYZ', 0, '2015-08-01 10:00:03'),
(230362, '0106', 'XYZ', 0, '2015-08-01 10:30:00'),
(230363, '0100', 'XYZ', 1, '2015-08-01 12:00:00');
SELECT i.empID,
i.empname,
TimeIn = i.tim,
TimeOut = o.tim
FROM #Access AS i
OUTER APPLY
( SELECT TOP 1 tim
FROM #Access AS o
WHERE o.EmpID = i.EmpID
AND o.InOut = 1
AND o.tim > i.tim
ORDER BY o.Tim
) AS o
WHERE i.InOut = 0;

So what I think you want to do is find the first time out after each time in. The following SQL should do that.
Select
empid,
empname,
tim as timein
(select top 1 tim
from my_table outTimes
where outTimes.inout = 1 and
outTimes.empid = inTimes.empid and
outTimes.tim > inTimes.tim
orderby outTimes.tim asc
) as timeout
from my_table inTimes
when inout=0
The critical bit here is the orderby asc and the top 1. This is what gives you the next time in the table.

update: Based on comment that I should improve this query to take all dates data and not just last date's data, updated query simply includes a new date column
select empid,empname,d,[0] as [timin],[1] as [timOut]
from
(select empid,empname, cast(tim as DATE)as d,inout,tim from tbl) s
pivot
(max(tim) for inout in ([0],[1]))p
updated fiddle link http://sqlfiddle.com/#!6/f1bc7/1
try PIVOT query like this:
select empid,empname,[0] as [timin],[1] as [timOut]
from
(select empid,empname,inout,tim from tbl) s
pivot
(max(tim) for inout in ([0],[1]))p
added SQL fiddle link http://sqlfiddle.com/#!6/6c3bf/1

Find the min and max dates between multiple sets of dates

Given the following set of data, I'm trying to determine how I can select the start and end dates of the combined date ranges, when they intersect with each other.
For instance, for PartNum 115678, I would want my final result set to display the date ranges 2012/01/01 - 2012/01/19 (rows 1, 2 and 4 combined since the date ranges intersect) and 2012/02/01 - 2012/03/28 (row 3 since this ones does not intersect with the range found previously).
For PartNum 213275, I would want to select the only row for that part, 2012/12/01 - 2013/01/01.
Edit:
I'm currently playing around with the following SQL statement, but it's not giving me exactly what I need.
with DistinctRanges as (
select distinct
ha1.PartNum "PartNum",
ha1.StartDt "StartDt",
ha2.EndDt "EndDt"
from dbo.HoldsAll ha1
inner join dbo.HoldsAll ha2
on ha1.PartNum = ha2.PartNum
where
ha1.StartDt <= ha2.EndDt
and ha2.StartDt <= ha1.EndDt
)
select
PartNum,
StartDt,
EndDt
from DistinctRanges
Here are the results of the query shown in the edit:

You're better off having a persisted Calendar table, but if you don't, the CTE below will create it ad-hoc. The TOP(36000) part is enough to give you 10 years worth of dates from the pivot ('20100101') on the same line.
SQL Fiddle
MS SQL Server 2008 Schema Setup:
create table data (
partnum int,
startdt datetime,
enddt datetime,
age int
);
insert data select
12345, '20120101', '20120116', 15 union all select
12345, '20120115', '20120116', 1 union all select
12345, '20120201', '20120328', 56 union all select
12345, '20120113', '20120119', 6 union all select
88872, '20120201', '20130113', 43;
Query 1:
with Calendar(thedate) as (
select TOP(36600) dateadd(d,row_number() over (order by 1/0),'20100101')
from sys.columns a
cross join sys.columns b
cross join sys.columns c
), tmp as (
select partnum, thedate,
grouper = datediff(d, dense_rank() over (partition by partnum order by thedate), thedate)
from Calendar c
join data d on d.startdt <= c.thedate and c.thedate <= d.enddt
)
select partnum, min(thedate) startdt, max(thedate) enddt
from tmp
group by partnum, grouper
order by partnum, startdt
Results:
| PARTNUM | STARTDT | ENDDT |
------------------------------------------------------------------------------
| 12345 | January, 01 2012 00:00:00+0000 | January, 19 2012 00:00:00+0000 |
| 12345 | February, 01 2012 00:00:00+0000 | March, 28 2012 00:00:00+0000 |
| 88872 | February, 01 2012 00:00:00+0000 | January, 13 2013 00:00:00+0000 |

Perform date range query from a single column control table (SQL Server)

Let's say I have the following tables
tableA
seq datea
1 2010-01-01
2 2010-02-01
3 2010-03-01
tableb
dateb sthvalue
2010-01-11 AAA
2010-01-12 AAB
2010-02-03 CCC
2010-02-06 CCD
2010-02-10 CCE
2010-03-05 FFF
I want to join the two tables on tableb.dateb is within the daterange of tablea
i.e. output should be
seq datea dateb sthvalue
1 2010-01-01 2010-01-11 AAA
1 2010-01-01 2010-01-12 AAB
2 2010-02-01 2010-02-03 CCC
2 2010-02-01 2010-02-06 CCD
2 2010-02-01 2010-02-10 CCE
3 2010-03-01 2010-03-05 FFF
Many thanks for your kind help!

Assuming that table A values are always one month apart and set on the 1st of each month, the existing answers will do.
If your table A can contain more variety:
SELECT
*
FROM
TableB b
inner join
TableA a
on
b.dateb >= a.datea
left join
TableA a_nolater
on
a_nolater.datea > a.datea and
b.dateb >= a_nolater.datea
WHERE
a_nolater.seq is null
This joins the two tables together, then attempts to find a "better" join (a row from tablea that occurs later than the currently matching one, and would still be a match for tableb). It only returns rows where it cannot find this "better" join. As such, it find the latest dated row in tableA that is on or before the date from tableB.

I believe what you are asking for is to join on year and month
select
seq,datea,dateb,sthvalue
from
TableA inner join Tableb
on datepart(year,datea) = datepart(year,dateb) and
datepart(month,datea) = datepart(month,dateb)
order by seq,dateb

You can
select
a.seq,
a.datea,
b.dateb,
b.sthvalue
from
tablea a inner join tableb b on (b.dateb >= a.datea and b.dateb < dateadd(month, 1, a.datea))
order by
a.seq, b.sthvalue

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to perform SQL join on lag date in Snowflake? - snowflake-cloud-data-platform

Related

13 Period Calendar 4-4-5 Calendar T-SQL MSSQL

Finding max date difference on a single column

SQL Server: split a column value into two separate columns based on another column

Find the min and max dates between multiple sets of dates

Perform date range query from a single column control table (SQL Server)

Categories

Resources