SQL: How to count distinct for all hourly periods in a day - snowflake-cloud-data-platform

I have a table of hotel data like this:
Room_ID
Check_in_time
Check_out_time
123
2021-10-01 01:02:03
2021-10-01 02:03:04
I would like to do a count of how many rooms were were checked in during each hour throughout a day (even if the room was checked in for 1 minute during the hour it still counts), so an output that look like this:
Time period
Number of rooms
09:00-10:00
10
10:00-11:00
12
..
..
There are a couple of other 'where' conditions but this is the crux of the problem. I have so far managed to write a query that can count unique room ID by specifying the hourly window:
select count (distinct room_id)
from data
where check_out_time > 9am and check_in_time < 10am
But how do I do this for each of the 24 hourly windows without repeating the same query 24 times? Hopefully something that can be later adapted into half hour intervals, or even minutes. I'm using Sigma in case that matters. Thanks in advance!

In Snowflake, I'd leverage a DATE_TRUNC function. If your dataset is very large, this will likely perform much better than any of the BETWEEN type of filtering that the OP and other answers are using.
select date_trunc('hour',check_out_time) as check_out_hour
, count (distinct room_id) as cnt
from data
group by 1;
If you needed to parse it out by day and time, you could add that, as well:
select date_trunc('day',check_out_time) as check_out_day
, date_trunc('hour',check_out_time) as check_out_hour
, count (distinct room_id) as cnt
from data
group by 1,2;
For reference:
https://docs.snowflake.com/en/sql-reference/functions/date_trunc.html

You may try the following:
A recursive CTE is used to generate the possible hours 0-23 (we could have also select distinct hours from your existing dataset but i did not want to assume that every hour was possibly booked and this may be a less expensive operation for this case to get all possible hours). A left join was then used to determine hours rooms were booked before aggregating this and counting the number of bookings each hour.
WITH recursive hours(hr) as (
select 0 as hr
union all
select hr + 1 from hours where hr < 23
)
select
concat(h.hr,':00-',(h.hr+1),':00') as time_period,
COUNT(DISTINCT r.room_id) as no_rooms
from hours h
left join room_times r on (
CAST(r.check_in_time AS DATE) = CAST(r.check_out_time AS DATE) AND
h.hr BETWEEN DATE_PART(hour,r.Check_in_time) AND DATE_PART(hour,r.Check_out_time)
) OR
(
CAST(r.check_in_time AS DATE) < CAST(r.check_out_time AS DATE) AND
(
h.hr >= DATE_PART(hour, r.Check_in_time) OR
h.hr <= DATE_PART(hour,r.Check_out_time)
)
)
GROUP BY h.hr
order by h.hr
See working db fiddle (using sql server instead) with the same logic and additional data and outputs to assist verification here.
Sample Data:
INSERT INTO room_times
(Room_ID, Check_in_time, Check_out_time)
VALUES
('123', '2021-10-01 01:02:03', '2021-10-01 03:03:04'),
('124', '2021-10-01 15:02:03', '2021-10-02 01:03:04');
Outputs:
time_period
no_rooms
0:00-1:00
1
1:00-2:00
2
2:00-3:00
1
3:00-4:00
1
4:00-5:00
0
5:00-6:00
0
6:00-7:00
0
7:00-8:00
0
8:00-9:00
0
9:00-10:00
0
10:00-11:00
0
11:00-12:00
0
12:00-13:00
0
13:00-14:00
0
14:00-15:00
0
15:00-16:00
1
16:00-17:00
1
17:00-18:00
1
18:00-19:00
1
19:00-20:00
1
20:00-21:00
1
21:00-22:00
1
22:00-23:00
1
23:00-24:00
1
Let me know if this works for you.

Related

Find nearest row that matches condition in SQL Server

I have a SQL table with unique IDs, a date of service for a health care encounter, and whether this encounter was an emergency room visit (ed = 1) or a hospital admission (hosp = 1).
For each unique ID, I want to identify ED visits that occurred <= 1 calendar day from a hospital stay.
Thus I think I want to ask SQL first identify ED visits and then search up and down to find the nearest hospital admission and calculate the difference in dates (absolute value). I'm familiar with lag/lead and rownumber() functions, but can't quite seem to figure this out.
Any ideas would be much appreciated! Thank you!
Table looks like this for one illustrative ID:
id date ed hosp
1 2012-01-01 0 1
1 2012-01-05 1 0
1 2012-02-01 0 1
1 2012-02-03 1 0
1 2012-05-01 0 0
And I want to create a new column (ed_hosp_diff) that is the minimum absolute date difference (days) between each ED visit and the closest hospital stay, something like this:
id date ed hosp ed_hosp_diff
1 2012-01-01 0 1 null
1 2012-01-05 1 0 4
1 2012-02-01 0 1 null
1 2012-02-03 1 0 2
1 2012-05-01 0 0 null
So this doesn't get you the output table you show, but it meets the requirement you list:
For each unique ID, I want to identify ED visits that occurred <= 1
calendar day from a hospital stay.
Your output table doesn't really give you that - it includes rows for ED Visits that don't have a matching hospital admit, and has rows for hospital admits, etc. This SQL doesn't give you those, it just gives you the ED Visits that were followed by a hospital admit within one day.
It also doesn't give you matches with negative days - cases where the hospital visit is prior to the ED visit (in terms of healthcare analytics, that's usually a different thing than looking for ED Visits followed by an IP Admit). If you do want those, delete the last bit of logic in the WHERE clause for the main query.
SELECT
ID = e.id,
ED_DATE = e.date,
HOSP_DATE = h.date
ED_HOSP_DIFF = DATEDIFF(dd, e.date, h.date)
FROM
Table1 AS e
JOIN
(
SELECT
id,
date
FROM
Table1
WHERE
hosp = 1
) AS h
ON
e.id = h.id
WHERE
e.ed = 1
AND
DATEDIFF(dd, e.date, h.date) <= 1
AND
DATEDIFF(dd, e.date, h.date) >= 0
use OUTER APPLY to get the record with ed = 1 and find the min date diff
SELECT *
FROM table t
OUTER APPLY
(
SELECT ed_hosp_diff = MIN ( ABS ( DATEDIFF(DAY, t.date, x.date) ) )
FROM table x
WHERE x.date <> t.date
AND x.ed = 1
) eh

I want to get data if records are not present in specific month

I wrote a sql query to get all records happen in specific month
select month(loggingdate),Count(id) from communicationlogs
where clientid=20154 and month(loggingdate) in (1,2,3,4,5,6,7,8,9)
group by month(loggingdate)
7 65
8 5
here records are present in 7th and 8th month. I want to get 0 value for other month numbers like-
1 0
2 0
3 0
4 0
...
This is a standard problem where a calendar table comes in handy. A calendar table, as the name implies, is a table which just stores a sequence of dates. In your particular case, we only need the digits corresponding to the 12 months. Begin the query with the calendar table and then left join to your aggregation query as a subquery.
Note the use of COALESCE below. If a given month appears nowhere in your original query, then its count would show up as NULL in the join, in which case we report zero for that month.
WITH calendar_month AS (
SELECT 1 AS month
UNION ALL
SELECT month +1
FROM
calendar_month
WHERE month +1 <= 12
)
SELECT
t1.month,
COALESCE(t2.cnt, 0) AS cnt
FROM calendar_month t1
LEFT JOIN
(
SELECT
MONTH(loggingdate) as month,
COUNT(id) AS cnt
FROM communicationlogs
WHERE
clientid = 20154 AND
MONTH(loggingdate) IN (1,2,3,4,5,6,7,8,9)
GROUP BY MONTH(loggingdate)
) t2
ON t1.month = t2.month

Calculate Bounce Rate SQL Server 2008

I'm trying to calculate the Bounce Rate of pages in SQL Server in a table with Audit Data from Sharepoint.
ItemId UserId DocLocation Occurred
1 1 Home.aspx 2016-08-02 13:39:41
1 2 Home.aspx 2016-08-02 13:40:07
2 1 Other.aspx 2016-08-02 13:40:16
3 1 Items.aspx 2016-08-02 13:40:17
2 2 Other.aspx 2016-08-02 13:40:11
ItemId is the id of the page, DocLocation the location of the page and Occurred when the user goes into the page.
To calculate the bounce rate we have to divide the number of bounces between the total number of visits.
A Bounce happens when an user leaves the page in less than 5 seconds.
This should be the results for that table:
ItemId Bounces Visits BounceRate(Bounces/Visits)
1 1 2 0.5
2 1 2 0.5
3 0 1 0
I want to count a bounce calculating how much passes since the user performs the check until the user makes a visit to another page. If that time is less than 5 seconds, it would be counted as a bounce.
I'm making a stored procedure that execute the query to show the bounce rate of each page, but this doesn´t work.
SELECT
SUM(CASE
WHEN (DATEDIFF(second, #Occurred,
(SELECT TOP 1 a.Occurred
FROM [AuditPages] a
WHERE a.UserId = #userId
AND a.Occurred > #occurred
ORDER BY a.Occurred ASC))) < 30
THEN 1.0
ELSE 0.0
END) / COUNT(#itemId)
Someone knows how i can calculate this Bounce Rate?
Thanks for all the answers.
I like using row_number for this type of sequenced problem. The query below gives the desired result. I find performance with CTEs can sometimes be problematic with larger tables and you may need to convert to a temp table. You might consider using milliseconds if there is a chance you would want to use 4.5 seconds or such in the future.
declare #bounce_seconds int = 5;
with audit_cte as (
select *, ROW_NUMBER() over (partition by UserId order by Occurred) row_num
from AuditPages
--order by UserId,row_num
)
select a.ItemId, sum(a.bounce) Bounces, count(1) Visits, sum(a.bounce)/convert(float, count(1)) BounceRate
from (
select a1.ItemId, datediff(s,a1.Occurred, a2.Occurred) elapsed, case when datediff(s,a1.Occurred, a2.Occurred) < #bounce_seconds then 1 else 0 end bounce
from audit_cte a1
left join audit_cte a2
on a2.UserId = a1.UserId
and a2.row_num = a1.row_num + 1
--order by a1.UserId, a1.row_num
) a
group by a.ItemId
order by a.ItemId;
SELECT ItemId,COUNT(1) VISITS,SUM(BOUNCE_IND) BOUNCE, cast(SUM(BOUNCE_IND) as decimal(5,2))/cast(COUNT(1) as decimal(5,2)) BOUNCE_RATE
FROM (
Select
UserID,
ItemID,
DocLocation,
Occurred as Entry_time,
Lead(Occurred,1) Over (Partition by Userid order by Occurred) Exit_time,
CASE WHEN DATEDIFF(ss,Occurred,Lead(Occurred,1) Over (Partition by Userid order by Occurred)) <= 5 THEN 1 ELSE 0 END BOUNCE_IND
FROM Web_Data_Sample
) TBL GROUP BY ItemId

SQL Server: How to get a rolling sum over 3 days for different customers within same table

This is the input table:
Customer_ID Date Amount
1 4/11/2014 20
1 4/13/2014 10
1 4/14/2014 30
1 4/18/2014 25
2 5/15/2014 15
2 6/21/2014 25
2 6/22/2014 35
2 6/23/2014 10
There is information pertaining to multiple customers and I want to get a rolling sum across a 3 day window for each customer.
The solution should be as below:
Customer_ID Date Amount Rolling_3_Day_Sum
1 4/11/2014 20 20
1 4/13/2014 10 30
1 4/14/2014 30 40
1 4/18/2014 25 25
2 5/15/2014 15 15
2 6/21/2014 25 25
2 6/22/2014 35 60
2 6/23/2014 10 70
The biggest issue is that I don't have transactions for each day because of which the partition by row number doesn't work.
The closest example I found on SO was:
SQL Query for 7 Day Rolling Average in SQL Server
but even in that case there were transactions made everyday which accomodated the rownumber() based solutions
The rownumber query is as follows:
select customer_id, Date, Amount,
Rolling_3_day_sum = CASE WHEN ROW_NUMBER() OVER (partition by customer_id ORDER BY Date) > 2
THEN SUM(Amount) OVER (partition by customer_id ORDER BY Date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
END
from #tmp_taml9
order by customer_id
I was wondering if there is way to replace "BETWEEN 2 PRECEDING AND CURRENT ROW" by "BETWEEN [DATE - 2] and [DATE]"
One option would be to use a calendar table (or something similar) to get the complete range of dates and left join your table with that and use the row_number based solution.
Another option that might work (not sure about performance) would be to use an apply query like this:
select customer_id, Date, Amount, coalesce(Rolling_3_day_sum, Amount) Rolling_3_day_sum
from #tmp_taml9 t1
cross apply (
select sum(amount) Rolling_3_day_sum
from #tmp_taml9
where Customer_ID = t1.Customer_ID
and datediff(day, date, t1.date) <= 3
and t1.Date >= date
) o
order by customer_id;
I suspect performance might not be great though.

SQL How to show '0' value for a month, if no data exists in the table for that month

First of all my result looks like this:
KONTONR
Month
SELSKAPSKODE
BELOP
459611
1
BAGA
156000
459611
2
BAGA
73000
459611
4
BAGA
217000
459611
5
BAGA
136000
459611
1
CIVO
45896
459611
3
CIVO
32498
459611
4
CIVO
9841
330096
1
BAGA
42347
330096
3
BAGA
3695
I'm trying to show month 2 month bookings on several accounts, per account (KONTONR) there are several codes (SELSKAPSKODE) on which bookings are recorded (the sum of the bookings as BELOP). I would like to give an overview of the sum of the bookings (BELOP) per account (KONTONR) per month per code (SELSKAPSKODE). My problem is the codes don't show in a month if no bookings are made on that code. Is there a way to fix this? I understand why the codes don't show, since they're simply not in the table I'm querying. And I suspect that the solution is in making a 'fake' table which I then join (left outer join?) with 'another' table.
I just can't get it to work, I'm pretty new to SQL. Can someone please help?
My query looks like this (I only inserted the 'nested' query to make a set-up for a join, if this makes sense?!):
SELECT TOP (100) PERCENT KONTONR, Month, SELSKAPSKODE, BELOP
FROM (
SELECT SELSKAPSKODE, KONTONR, SKIPS_KODE, MONTH(POSTDATO) AS Month, SUM(BELOP) AS BELOP
FROM dbo.T99_DETALJ
WHERE (POSTDATO >= '2012-01-01') AND (BILAGSART = 0 OR BILAGSART = 2)
GROUP BY SELSKAPSKODE, KONTONR, SKIPS_KODE, MONTH(POSTDATO)
) AS T99_summary
GROUP BY KONTONR, SELSKAPSKODE, Month, BELOP
ORDER BY KONTONR, SELSKAPSKODE, Month
So concluding I would like to 'fill up' the missing months (see table at the start), for instance for account (KONTONR) 459611 month 3 is 'missing'. I would like to show month 3, with the sum of the bookings (BELOP) as '0'. Any help is greatly appreciated, thanks in advance!
You can query a table with the values 1-12 and left outer join your result.
Here is a sample using a table variable instead of your query and a CTE to build a table with numbers.
declare #T table
(
Month int
)
insert into #T values(1)
insert into #T values(1)
insert into #T values(1)
insert into #T values(3)
insert into #T values(3)
;with Months(Month) as
(
select 1
union all
select Month + 1
from Months
where Month < 12
)
select M.Month,
count(T.Month) Count,
isnull(sum(T.Month), 0) Sum
from Months as M
left outer join #T as T
on M.Month = T.Month
group by M.Month
Result:
Month Count Sum
----------- ----------- -----------
1 3 3
2 0 0
3 2 6
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 0 0
11 0 0
12 0 0
if you don't want to do all that you could also modify this: SUM(BELOP) with this:
Sum (case when BELOP is not null then 1 else 0 end)
You can also add in the year if you have a creation date for the interactions you are counting which may be helpful if your interactions span the course of many years.
with Months(Month) as
(
select 1
union all
select Month + 1
from Months
where Month < 12
)
select M.Month, year(CreatedOn) as Year,
count(amount) Count,
isnull(sum(amount), 0) Sum
from Months as M
left outer join Charge as C
on M.Month = (month(CreatedOn))
group by M.Month, year(CreatedOn) order by year(CreatedOn)

Resources