How to join a table to a changelog table? - sql-server

I have two tables related to tickets, DimTickets and want to calculate the time it takes between the CreatedDateTime and when the status was first changed to Done/Released. The DimTickets table is structured like so:
IssueKey
IssueType
Priority
Project
Status
CreatedDateTime
TEAM1-100
Story
High
Team 1
Approved for Release
2020-04-02 16:09:45
TEAM1-101
Task
Medium
Team 1
Done
2020-04-03 15:38:25
TEAM1-102
Sub-task
Low
Team 1
Done
2020-04-08 09:03:43
TEAM1-103
Bug
High
Team 1
In Progress
2020-04-13 12:18:56
TEAM1-104
Task
Medium
Team 1
Done
2020-04-16 11:40:08
TEAM2-100
Task
Medium
Team 2
Done
2020-04-17 09:06:17
TEAM2-101
Story
Medium
Team 2
Released
2020-04-17 15:55:45
TEAM2-102
Task
Low
Team 2
Done
2020-04-20 10:12:41
TEAM1-105
Task
High
Team 1
In Progress
2020-04-20 15:24:56
and a DimTicketChangelog that's structured like this:
ChangeLogID
IssueKey
FromStatus
ToStatus
ChangeLogDateTime
1
TEAM1-100
1
2
2019-06-14 15:56:03
2
TEAM1-100
2
3
2019-06-15 12:58:29
3
TEAM2-102
2
4
2019-06-16 17:58:48
4
TEAM1-100
3
5
2019-06-16 20:01:43
5
TEAM1-104
1
3
2019-06-18 10:02:39
6
TEAM1-105
4
5
2019-06-21 18:03:19
7
TEAM1-104
3
5
2019-06-24 22:05:28
8
TEAM2-102
4
6
2019-07-02 08:06:50
9
TEAM2-103
1
4
2019-07-04 11:06:50
Is there a way for me to join to DimTicketChangelog the first time a ticket is changed from a status < 5 to status 5/6 so that I can create a field that is essentially ChangeLogDateTime - CreatedDateTime to get the amount of time it took between creation of the ticket, to when it had its status changed to a resolved one?

Something like this should work
Explanation: You'll use the CTE to find all the instances where a ticket crossed from <5 to 5+ and "rank" them by change log date. You'll then select all of records where this ranking = 1 as that is the first instance, sorted by change log date time
EDIT: I added a date diff to satisfy your second requirement for the time it took. Your sample data is a bit interesting in terms of these dates, though with your real data you should be fine.
;WITH FindFirstChange AS (
select
t.IssueKey /*Add whatever other columns you need here*/
,t.createdDateTime
,tcl.ChangeLogDateTime
,DATEDIFF(day, t.createdDateTime ,tcl.ChangeLogDateTime) Diff
,ranking = ROW_NUMBER() OVER(PARTITION BY tcl.issuekey ORDER BY tcl.ChangeLogDateTime ASC)
FROM DimTickets t
INNER JOIN DimTicketchangelog tcl ON t.issuekey = tcl.issuekey
WHERE tcl.fromStatus <5
AND toStatus >= 5
)
SELECT *
FROM FindFirstChange
WHERE ranking = 1;

You have not provided any expected results and your test data doesn't really cover the criteria you describe, also your log table dates are earlier than the createDate of each ticket which makes no sense?! However see if the following works for you:
with l as (
select issuekey,changelogdatetime, Row_Number() over(partition by issueKey order by ChangeLogID) rn
from DimTicketChangeLog
where tostatus in (5,6) and FromStatus<5
)
select t.*, DateDiff(day,t.createddatetime,l.ChangeLogDateTime) Duration
from DimTickets t
left join l on l.IssueKey=t.IssueKey
where l.rn=1

Related

SQL: How to count distinct for all hourly periods in a day

I have a table of hotel data like this:
Room_ID
Check_in_time
Check_out_time
123
2021-10-01 01:02:03
2021-10-01 02:03:04
I would like to do a count of how many rooms were were checked in during each hour throughout a day (even if the room was checked in for 1 minute during the hour it still counts), so an output that look like this:
Time period
Number of rooms
09:00-10:00
10
10:00-11:00
12
..
..
There are a couple of other 'where' conditions but this is the crux of the problem. I have so far managed to write a query that can count unique room ID by specifying the hourly window:
select count (distinct room_id)
from data
where check_out_time > 9am and check_in_time < 10am
But how do I do this for each of the 24 hourly windows without repeating the same query 24 times? Hopefully something that can be later adapted into half hour intervals, or even minutes. I'm using Sigma in case that matters. Thanks in advance!
In Snowflake, I'd leverage a DATE_TRUNC function. If your dataset is very large, this will likely perform much better than any of the BETWEEN type of filtering that the OP and other answers are using.
select date_trunc('hour',check_out_time) as check_out_hour
, count (distinct room_id) as cnt
from data
group by 1;
If you needed to parse it out by day and time, you could add that, as well:
select date_trunc('day',check_out_time) as check_out_day
, date_trunc('hour',check_out_time) as check_out_hour
, count (distinct room_id) as cnt
from data
group by 1,2;
For reference:
https://docs.snowflake.com/en/sql-reference/functions/date_trunc.html
You may try the following:
A recursive CTE is used to generate the possible hours 0-23 (we could have also select distinct hours from your existing dataset but i did not want to assume that every hour was possibly booked and this may be a less expensive operation for this case to get all possible hours). A left join was then used to determine hours rooms were booked before aggregating this and counting the number of bookings each hour.
WITH recursive hours(hr) as (
select 0 as hr
union all
select hr + 1 from hours where hr < 23
)
select
concat(h.hr,':00-',(h.hr+1),':00') as time_period,
COUNT(DISTINCT r.room_id) as no_rooms
from hours h
left join room_times r on (
CAST(r.check_in_time AS DATE) = CAST(r.check_out_time AS DATE) AND
h.hr BETWEEN DATE_PART(hour,r.Check_in_time) AND DATE_PART(hour,r.Check_out_time)
) OR
(
CAST(r.check_in_time AS DATE) < CAST(r.check_out_time AS DATE) AND
(
h.hr >= DATE_PART(hour, r.Check_in_time) OR
h.hr <= DATE_PART(hour,r.Check_out_time)
)
)
GROUP BY h.hr
order by h.hr
See working db fiddle (using sql server instead) with the same logic and additional data and outputs to assist verification here.
Sample Data:
INSERT INTO room_times
(Room_ID, Check_in_time, Check_out_time)
VALUES
('123', '2021-10-01 01:02:03', '2021-10-01 03:03:04'),
('124', '2021-10-01 15:02:03', '2021-10-02 01:03:04');
Outputs:
time_period
no_rooms
0:00-1:00
1
1:00-2:00
2
2:00-3:00
1
3:00-4:00
1
4:00-5:00
0
5:00-6:00
0
6:00-7:00
0
7:00-8:00
0
8:00-9:00
0
9:00-10:00
0
10:00-11:00
0
11:00-12:00
0
12:00-13:00
0
13:00-14:00
0
14:00-15:00
0
15:00-16:00
1
16:00-17:00
1
17:00-18:00
1
18:00-19:00
1
19:00-20:00
1
20:00-21:00
1
21:00-22:00
1
22:00-23:00
1
23:00-24:00
1
Let me know if this works for you.

BigQuery getting Correlated subqueries error - to get previous event

I have 2 BigQuery tables:
1- Trips
car start end
---------------------------------------------------
1 2019-03-13T17:07:00 2019-03-13T17:17:00
2 2019-03-13T17:07:00 2019-03-13T17:22:00
3 2019-03-13T17:07:00 2019-03-13T17:34:00
4 2019-03-13T17:07:00 2019-03-13T17:12:00
2- Tracking
car created_at status
--------------------------------------
1 2019-03-13T17:01:00 1
1 2019-03-13T17:02:00 1
1 2019-03-13T17:03:00 1
1 2019-03-13T17:04:00 1
1 2019-03-13T17:05:00 2
1 2019-03-13T17:06:00 2
1 2019-03-13T17:18:00 3
1 2019-03-13T17:19:00 3
1 2019-03-13T17:20:00 3
1 2019-03-13T17:21:00 3
1 2019-03-13T17:22:00 3
The tracking table contains the status of a car until it was on a trip. My objective is to get the status of a car on the previous moment of a trip.
My approach so far was:
select *,
(select status created_at from tracking
where car = tracking.car
AND start > created_at
order by created_at desc
limit 1
) as previous_status
from trips
But I'm getting the following error:
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
Any clue on how to rewrite the query for BigQuery?
To get the previous event use LAG() OVER() as in:
SELECT actor.login
, type
, LAG(type, 1) OVER(PARTITION BY actor.login ORDER BY created_at) prev_type
FROM `githubarchive.day.20190313`
WHERE actor.login IN ('00aquir', '0123hoang')
LIMIT 100

SQL Server: How to get a rolling sum over 3 days for different customers within same table

This is the input table:
Customer_ID Date Amount
1 4/11/2014 20
1 4/13/2014 10
1 4/14/2014 30
1 4/18/2014 25
2 5/15/2014 15
2 6/21/2014 25
2 6/22/2014 35
2 6/23/2014 10
There is information pertaining to multiple customers and I want to get a rolling sum across a 3 day window for each customer.
The solution should be as below:
Customer_ID Date Amount Rolling_3_Day_Sum
1 4/11/2014 20 20
1 4/13/2014 10 30
1 4/14/2014 30 40
1 4/18/2014 25 25
2 5/15/2014 15 15
2 6/21/2014 25 25
2 6/22/2014 35 60
2 6/23/2014 10 70
The biggest issue is that I don't have transactions for each day because of which the partition by row number doesn't work.
The closest example I found on SO was:
SQL Query for 7 Day Rolling Average in SQL Server
but even in that case there were transactions made everyday which accomodated the rownumber() based solutions
The rownumber query is as follows:
select customer_id, Date, Amount,
Rolling_3_day_sum = CASE WHEN ROW_NUMBER() OVER (partition by customer_id ORDER BY Date) > 2
THEN SUM(Amount) OVER (partition by customer_id ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
END
from #tmp_taml9
order by customer_id
I was wondering if there is way to replace "BETWEEN 2 PRECEDING AND CURRENT ROW" by "BETWEEN [DATE - 2] and [DATE]"
One option would be to use a calendar table (or something similar) to get the complete range of dates and left join your table with that and use the row_number based solution.
Another option that might work (not sure about performance) would be to use an apply query like this:
select customer_id, Date, Amount, coalesce(Rolling_3_day_sum, Amount) Rolling_3_day_sum
from #tmp_taml9 t1
cross apply (
select sum(amount) Rolling_3_day_sum
from #tmp_taml9
where Customer_ID = t1.Customer_ID
and datediff(day, date, t1.date) <= 3
and t1.Date >= date
) o
order by customer_id;
I suspect performance might not be great though.

Creating a recursive CTE with no rootrecord

My Apologies for the appalling Title, I was trying to be descriptive but not sure I got to the point. Hopefully the below will explain it
I begin with a table that has the following information
Party Id Party Name Party Code Parent Id
1 Acme 1 ACME1 1
2 Acme 2 ACME2 1
3 Acme 3 ACME3 3
4 Acme 4 ACME4 4
5 Acme 5 ACME5 4
6 Acme 6 ACME6 6
As you can see this isn't perfect for a recursive CTE because rather than having a NULL where there isn't a parent record it is instead parented to itself (see rows 1,3 and 6). Some however are parented normally.
I have therefore tried to amend this table in a CTE then refer to the output of that CTE as part of my recursive query... This doesn't appear to be running very well (no errors yet) so I wonder if I have managed to create an infinite loop or some other error that just slows the query to a crawl rather than killing it
My Code is below... please pick it apart!
--This is my attempt to 'clean' the data and set records parented to themselves as the 'anchor'
--record
WITH Parties
AS
(Select CASE
WHEN Cur_Parent_Id = Party_Id THEN NULL
ELSE Cur_Parent_Id
END AS Act_Parent_Id
, Party_Id
, CUR_PARTY_CODE
, CUR_PARTY_NAME
FROM EDW..TBDIMD_PARTIES
WHERE CUR_FLG = 1),
--In this CTE I referred to my 'clean' records from above and then traverse through them
--looking at the actual parent record identified
linkedParties
AS
(
Select Act_Parent_Id, Party_Id, CUR_PARTY_CODE, CUR_PARTY_NAME, 0 AS LEVEL
FROM Parties
WHERE Act_Parent_Id IS NULL
UNION ALL
Select p.Act_Parent_Id, p.Party_Id, p.CUR_PARTY_CODE, p.CUR_PARTY_NAME, Level + 1
FROM Parties p
inner join
linkedParties t on p.Act_Parent_Id = t.Party_Id
)
Select *
FROM linkedParties
Order By Level
From the data I supplied earlier the results I would expect are;
Party Id Party Name Party Code Parent Id Level
1 Acme 1 ACME1 1 0
3 Acme 3 ACME3 3 0
4 Acme 4 ACME4 4 0
6 Acme 6 ACME6 6 0
2 Acme 2 ACME2 1 1
5 Acme 5 ACME5 4 1
If everything seems to be OK then I'll assume its just a processing issue and start investigating that but I am not entirely comfortable with CTE's so wish to make sure the error is not mine before looking elsewhere.
Many Thanks
I think that you made it more complicated than it needs to be :).
drop table #temp
GO
select
*
into #temp
from (
select '1','Acme 1','ACME1','1' union all
select '2','Acme 2','ACME2','1' union all
select '3','Acme 3','ACME3','3' union all
select '4','Acme 4','ACME4','4' union all
select '5','Acme 5','ACME5','4' union all
select '6','Acme 6','ACME6','6'
) x ([Party Id],[Party Name],[Party Code],[Parent Id])
GO
;with cte as (
select
*,
[Level] = 0
from #temp
where 1=1
and [Party Id]=[Parent Id] --assuming these are root records
union all
select
t.*,
[Level] = c.[Level]+1
from #temp t
join cte c
on t.[Parent Id]=c.[Party Id]
where 1=1
and t.[Party Id]<>t.[Parent Id] --prevent matching root records with themselves creating infinite recursion
)
select
*
from cte
(* should ofcourse be replaced with actual column names)

Average calculation through time scale (SQL Server) - preparation for charting

I have the following table in SQL Server Express edition:
Time Device Value
0:00 1 2
0:01 2 3
0:03 3 5
0:03 1 3
0:13 2 5
0:22 1 7
0:34 3 5
0:35 2 6
0:37 1 5
The table is used to log the events of different devices which are reporting their latest values. What I'd like to do is to prepare the data in a way that I'd present the average data through time scale and eventually create a chart using this data. I've manipulated this example data in Excel in the following way:
Time Average value
0:03 3,666666667
0:13 4,333333333
0:22 5,666666667
0:34 5,666666667
0:35 6
0:37 5,333333333
So, at time 0:03 I need to take latest data I have in the table and calculate the average. In this case it's (3+3+5)/3=3,67. At time 0:13 the steps would be repeated, and again at 0:22,...
As I'd like to leave the everything within the SQL table (I wouldn't like to create any service with C# or similar which would grab the data and store it into some other table)
I'd like to know the following:
is this the right approach or should I use some other concept of calculating the average for charting data preparation?
if yes, what's the best approach to implement it? Table view, function within the database, stored procedure (which would be called from the charting API)?
any suggestions on how to implement this?
Thank you in advance.
Mark
Update 1
In the mean time I got one idea how to approach to this problem. I'd kindly ask you for your comments on it and I'd still need some help in getting the problem resolved.
So, the idea is to crosstab the table like this:
Time Device1Value Device2Value Device3Value
0:00 2 NULL NULL
0:01 NULL 3 NULL
0:03 3 NULL 5
0:13 NULL 5 NULL
0:22 7 NULL NULL
0:34 NULL NULL 5
0:35 NULL 6 NULL
0:37 5 NULL NULL
The query for this to happen would be:
SELECT Time,
(SELECT Stock FROM dbo.Event WHERE Time = S.Time AND Device = 1) AS Device1Value,
(SELECT Stock FROM dbo.Event WHERE Time = S.Time AND Device = 2) AS Device2Value,
(SELECT Stock FROM dbo.Event WHERE Time = S.Time AND Device = 3) AS Device3Value
FROM dbo.Event S GROUP BY Time
What I'd still need to do is to write a user defined function and call it within this query which would write last available value in case of NULL and if the last available value doesn't exist it would leave NULL value. With this function I'd get the following results:
Time Device1Value Device2Value Device3Value
0:00 2 NULL NULL
0:01 2 3 NULL
0:03 3 3 5
0:13 3 5 5
0:22 7 5 5
0:34 7 5 5
0:35 7 6 5
0:37 5 6 5
And by having this results I'd be able to calculate the average for each time by only SUMing up the 3 relevant columns and dividing it by count (in this case 3). For NULL I'd use 0 value.
Can anybody suggest how to create a user defined function for replacing NULL values with latest value?
Update 2
Thanks Martin.
This query worked but it took almost 21 minutes to go through the 13.576 lines which is far too much.
The final query I used was:
SELECT Time,
(SELECT TOP 1 Stock FROM dbo.Event e WHERE e.Time <= S.Time AND Device = 1 ORDER BY e.Time DESC) AS Device1Value,
(SELECT TOP 1 Stock FROM dbo.Event e WHERE e.Time <= S.Time AND Device = 2 ORDER BY e.Time DESC) AS Device2Value,
(SELECT TOP 1 Stock FROM dbo.Event e WHERE e.Time <= S.Time AND Device = 3 ORDER BY e.Time DESC) AS Device3Value
FROM dbo.Event S GROUP BY Time
but I've extended it to 10 devices.
I agree that this is not the best way to do it. Is there any other way to prepare the data for the average calculation because this takes just too much of the processing.
Here's one way. It uses the "Quirky Update" approach to filling in the gaps. This relies on an undocumented behaviour so you may prefer to use a cursor for this.
DECLARE #SourceData TABLE([Time] TIME, Device INT, value FLOAT)
INSERT INTO #SourceData
SELECT '0:00',1,2 UNION ALL
SELECT '0:01',2,3 UNION ALL
SELECT '0:03',3,5 UNION ALL
SELECT '0:03',1,3 UNION ALL
SELECT '0:13',2,5 UNION ALL
SELECT '0:22',1,7 UNION ALL
SELECT '0:34',3,5 UNION ALL
SELECT '0:35',2,6 UNION ALL
SELECT '0:37',1,5
CREATE TABLE #tmpResults
(
[Time] Time primary key,
[1] FLOAT,
[2] FLOAT,
[3] FLOAT
)
INSERT INTO #tmpResults
SELECT [Time],[1],[2],[3]
FROM #SourceData
PIVOT ( MAX(value) FOR Device IN ([1],[2],[3])) AS pvt
ORDER BY [Time];
DECLARE #1 FLOAT, #2 FLOAT, #3 FLOAT
UPDATE #tmpResults
SET #1 = [1] = ISNULL([1],#1),
#2 = [2] = ISNULL([2],#2),
#3 = [3] = ISNULL([3],#3)
SELECT [Time],
(SELECT AVG(device)
FROM (SELECT [1] AS device
UNION ALL
SELECT [2]
UNION ALL
SELECT [3]) t) AS [Average value]
FROM #tmpResults
DROP TABLE #tmpResults
So one of the possible solutions which I found is far more efficient (less than a second for 14.574 lines). I haven't yet had time to review the results in details but on the first hand it looks promising. This is the code for the 3 device example:
SELECT Time,
SUM(CASE MAC WHEN '1' THEN Stock ELSE 0 END) Device1Value,
SUM(CASE MAC WHEN '2' THEN Stock ELSE 0 END) Device1Value,
SUM(CASE MAC WHEN '3' THEN Stock ELSE 0 END) Device1Value,
FROM dbo.Event
GROUP BY Time
ORDER BY Time
In any case I'll test the code provided by Martin to see if it makes any difference to the results.

Resources