Mapping accurate end time to start time for a video - sql-server

I have 2 tables with start date and end date with video ids.
Table with Start time (Sample table with one user):
Video ID
Start Time
abc
22-01-2022 02:20:00
abc
22-01-2022 02:30:00
abc
22-01-2022 02:42:00
Table with End time (Sample table with one user):
Video ID
End Time
abc
22-01-2022 02:26:00
abc
22-01-2022 02:45:00
A user can have multiple start times for a video if they started watching it multiple times.
The record will be in end time table only if the user had finished watching the video.
A user can start the same video again without ending it and the start time will be captured in the next row.
I want to map the start time and end time based on the recent video start time.
Desired Output:
Video ID
Start Time
End Time
abc
22-01-2022 02:20:00
22-01-2022 02:26:00
abc
22-01-2022 02:30:00
null
abc
22-01-2022 02:42:00
22-01-2022 02:45:00
I tried joining the table with condition
from start_time a left join end_time b on a.start_time<b.end_time
But this will fill the 2nd start_time with 3rd end_time value which infact should be null.

Assuming your date formats are appropriate, you can use below -
select s.vid,st,et from
(select vid_1.v_id as vid,start_time::timestamp as st,
min(timediff(second,start_time::timestamp,end_time::timestamp)) as tdiff
from vid_1,vid_2 where st<end_time::timestamp group by vid,st) s left jo
IN
(select vid_2.v_id as vid,end_time::timestamp as et,
min(timediff(second,start_time::timestamp,end_time::timestamp)) as tdiff
from vid_1,vid_2 where start_time::timestamp<et group by vid,et) e
on s.vid=e.vid and s.tdiff=e.tdiff order by st;
+-----+-------------------------+-------------------------+
| VID | ST | ET |
|-----+-------------------------+-------------------------|
| abc | 2022-01-22 02:20:00.000 | 2022-01-22 02:26:00.000 |
| abc | 2022-01-22 02:30:00.000 | NULL |
| abc | 2022-01-22 02:42:00.000 | 2022-01-22 02:45:00.000 |
+-----+-------------------------+-------------------------+
3 Row(s) produced. Time Elapsed: 0.426s

Related

UNNEST array and assign to new columns with CASE WHEN

I have following BigQuery table, which has nested structure, i.e. example below is one record in my table.
Id | Date | Time | Code
AQ5ME | 120520 | 0950 | 123
---------- | 150520 | 1530 | 456
My goal is to unnest the array to achieve the following structure (given that 123 is the Start Date code and 456 is End Date code):
Id | Start Date | Start Time | End Date | End Time
AQ5ME | 120520 | 0950 | 150520 | 1530
I tried basic UNNEST in BigQuery and my results are as follows:
Id | Start Date | Start Time | End Date | End Time
AQ5ME | 120520 | 0950 | NULL | NULL
AQ5ME | NULL | NULL | 150520 | 1530
Could you please support me how to unnest it in a correct way as described above?
You can calculate mins and max within the row, and extract them as a new column.
Since you didn't show the full schema, I assume Date and Time are separate arrays.
For that case, you can use that query:
SELECT Id,
(SELECT MIN(D) from UNNEST(Date) as d) as StartDate,
(SELECT MIN(t) from UNNEST(Time) as t) as StartTime,
(SELECT MAX(D) from UNNEST(Date) as d) as EndDate,
(SELECT MAX(t) from UNNEST(Time) as t) as EndTime
FROM table
As in Sabri's response - using aggregation functions while unnesting works perfectly. To use this fields later on for sorting purposes (in ORDER BY statement) SAFE_OFFSET[0] can be used, like for example below:
...
ORDER BY StartDate[SAFE_OFFSET(0)] ASC

How to efficiently match on dates in SQL Server?

I am trying to return the first registration for a person based on the minimum registration date and then return full information. The data looks something like this:
Warehouse_ID SourceID firstName lastName firstProgramSource firstProgramName firstProgramCreatedDate totalPaid totalRegistrations
12345 1 Max Smith League Kid Hockey 2017-06-06 $100 3
12345 6 Max Smith Activity Figure Skating 2018-09-26 $35 1
The end goal is to return one row per person that looks like this:
Warehouse_ID SourceID firstName lastName firstProgramSource firstProgramName firstProgramCreatedDate totalPaid totalRegistrations
12345 1 Max Smith League Kid Hockey 2017-06-06 $135 4
So, this would aggregate the totalPaid and totalRegistrations variables based on the Warehouse_ID and would pull the rest of the information based on the min(firstProgramCreatedDate) specific to the Warehouse_ID.
This will end up in Tableau, so what I've recently tried ignores aggregating totalPaid and totalRegistrations for now (I can get that in another query pretty easily). The query I'm using seems to work, but it is taking forever to run; it seems to be going row by row for >50,000 rows, which is taking forever.
select M.*
from (
select Warehouse_ID, min(FirstProgramCreatedDate) First
from vw_FirstRegistration
group by Warehouse_ID
) B
left join vw_FirstRegistration M on B.Warehouse_ID = M.Warehouse_ID
where B.First in (M.FirstProgramCreatedDate)
order by B.Warehouse_ID
Any advice on how I can achieve my goal without this query taking an hour plus to run?
A combination of the ROW_NUMBER windowing function, plus the OVER clause on a SUM expression should perform pretty well.
Here's the query:
SELECT TOP (1) WITH TIES
v.Warehouse_ID
,v.SourceID
,v.firstName
,v.lastName
,v.firstProgramSource
,v.firstProgramName
,v.firstProgramCreatedDate
,SUM(v.totalPaid) OVER (PARTITION BY v.Warehouse_ID) AS totalPaid
,SUM(v.totalRegistrations) OVER (PARTITION BY v.Warehouse_ID) AS totalRegistrations
FROM
#vw_FirstRegistration AS v
ORDER BY
ROW_NUMBER() OVER (PARTITION BY v.Warehouse_ID
ORDER BY CASE WHEN v.firstProgramCreatedDate IS NULL THEN 1 ELSE 0 END,
v.firstProgramCreatedDate)
And here's a Rextester demo: https://rextester.com/GNOB14793
Results (I added another kid...):
+--------------+----------+-----------+----------+--------------------+------------------+-------------------------+-----------+--------------------+
| Warehouse_ID | SourceID | firstName | lastName | firstProgramSource | firstProgramName | firstProgramCreatedDate | totalPaid | totalRegistrations |
+--------------+----------+-----------+----------+--------------------+------------------+-------------------------+-----------+--------------------+
| 12345 | 1 | Max | Smith | League | Kid Hockey | 2017-06-06 | 135.00 | 4 |
| 12346 | 6 | Joe | Jones | Activity | Other Activity | 2017-09-26 | 125.00 | 4 |
+--------------+----------+-----------+----------+--------------------+------------------+-------------------------+-----------+--------------------+
EDIT: Changed the ORDER BY based on comments.
Try to use ROW_NUMBER() with PARTITIYION BY.
For more information please refer to:
https://learn.microsoft.com/en-us/sql/t-sql/functions/row-number-transact-sql?view=sql-server-2017

Do dates of service fall in between membership date range

I have two tables one is the customer_service table with dates of service and the other is the membership table where the member can exist multiple times if they have had lapses in their membership effective and expiration dates. Below is a basic example of how these table might layout.
How might I find dates of service that fall outside or in between membership date ranges. A simple join will not work with this due to the member possibly having multiple date ranges for their membership under the same ID. Would this require some form of iteration here? I am unsure as to the best way to approach this kind of issue.
Customer_Service Table
id | customers | Dos
-------------------------
1 | Rodney | 01/18/2018
2 | Jim | 02/15/2018
3 | Tom | 01/01/2018
1 | Rodney | 02/15/2018
3 | Tom | 03/01/2018
Membership Table
id | Effective_date | End_date
-------------------------
1 | 01/01/2017 | 12/31/2017
1 | 02/15/2018 | 05/20/2018
2 | 06/20/2016 | 01/25/2018
2 | 02/25/2018 | 12/31/2099
3 | 01/01/2018 | 06/01/2018
A simple approach is below. The query will identify rows in CUSTOMER_SERVICE where DOS does not fall between any periods in the membership table for that customer.
SELECT * FROM CUSTOMER_SERVICE CS
WHERE NOT EXISTS (
SELECT * FROM MEMBERSHIP M
WHERE CS.ID = M.ID
AND DOS BETWEEN EFFECTIVE_DATE AND END_DATE
)
Or alternatively:
SELECT CS.* FROM CUSTOMER_SERVICE CS
LEFT JOIN MEMBERSHIP M ON M.ID = CS.ID
AND DOS BETWEEN EFFECTIVE_DATE AND END_DATE
WHERE M.ID IS NULL

Unable to remove duplicates in SQL Query with JOIN and DISTINCT

I have a sort of abstract question with a real world example. I'm attempting to run a query that has an issue with the tables I am joining.
In my first draft of the query, if I add a Distinct and only have the one Inner Join needed, I sum up values that are correct.
The values I yield needed to be broken into 4 other totals depended on certain values. When I add the table in my query that has those values and add it to my join or where clause, it takes those totals and sums up each iteration of the value with the corresponding value.
My Query:
SELECT DISTINCT SUM(CASE WHEN Tax_Records.TaxValue = '0.06' THEN Bill_Summary.NonSalesTax
WHEN Tax_Records.TaxValue = '0.065' THEN Bill_Summary.NonSalesTax
WHEN Tax_Records.TaxValue = '0.07' THEN Bill_Summary.NonSalesTax
WHEN Tax_Records.TaxValue = '0.075' THEN Bill_Summary.NonSalesTax ELSE 0.0 END)
AS 'UnTaxable Sales'
FROM Order_Records INNER JOIN Bill_Summary ON Order_Records.RowNum = Bill_Summary.OrderNumID
LEFT JOIN Tax_Records ON Order_Records.OZipCode = Tax_Records.tZipCode
WHERE Order_Records.Date Between 'DATE' And 'DATE'
AND Order_Records.cState = 'state'
GROUP BY Tax_Records.TaxValue
My query runs correctly, but I get the wrong totals, if I remove the LEFT JOIN and it's corresponding items in the SELECT Statement i get the correct totals.
The Tax_Records table has no relation to any other table in the database so I know putting that in the Join will cause issues.
I changed my query to see why I'm getting the incorrect totals and it's because it will sum up a value depening on the cases on my select.
For instance there's an Bill_Summary with a value of 5, it will sum up 5 4 times, 1 for each tax value. So I know why it would do that, but I want to know how i can add the information from the Tax Table to my query to derive the 4 values from my original correct totals.
I've tried different JOINS, embedded SELECTs, and CTE's but nothing works correctly.
EDIT: All this data is coming from order's placed by customers.
What we want to see is the total value of Tax Collected from a certain State Tax in a period of 1 month. So for the month of March 1st to April 1st.
All the sales charged with a 6% Tax Rate Equals $50.
All the sales charged with a 6.5% Tax Rate Equals $65.
All the sales charged with a 7% Tax Rate equals $20.
All the Sales charged with a 7.5% Tax Rate equals $15.
If I run a query without joining the Tax_Records table, I get my correct total of $145.
No I want to show the total broken up into the 4 values as shown earlier by combining the Zip Codes found in the Order_Records table with the Zip Codes in the Tax_Records table.
What happens if I do that is let's say for the 7.5% Value, the total of those sales are $15. Where one sale was $8 and another $7, if I join the Tax_Records table, it runs the query to show that the total number of tax collected from the sales is $8 for 6%, 6.5%, 7%, and 7.5% same thing for the $7 order which then now shows my total for 7.5% to be $60 as opposed to $15 which it should be.
You can try like this
select * from demo;
+------+-------+
| id | des |
+------+-------+
| 1 | afgg |
| 2 | aaaaa |
+------+-------+
select * from test;
+------+---------+
| id | name |
+------+---------+
| 2 | aaaaa |
| 1 | assdasa |
+------+---------+
select id as id,des as description,'' as id,'' as name from demo UNION select '' as id ,''as description,id as id,name as name from test;
+------+-------------+------+---------+
| id | description | id | name |
+------+-------------+------+---------+
| 1 | afgg | | |
| 2 | aaaaa | | |
| | | 2 | aaaaa |
| | | 1 | assdasa |
+------+-------------+------+---------+
4 rows in set (0.00 sec)

TSQL Finding Overlapping Hours

When two tables are given
Employee Table
EmpID Name
1 Jon
2 Smith
3 Dana
4 Nancy
Lab Table
EmpID StartTime EndTime Date LabID
1 10:00 AM 12:15 PM 01/JAN/2000 Lab I
1 11:00 AM 14:15 PM 01/JAN/2000 Lab II
1 16:30 PM 18:30 PM 01/JAN/2000 Lab I
2 10:00 AM 12:10 PM 01/JAN/2000 Lab I
From the given details ,I have to find out the overlapping hours,and non overlapping hours of each employee on each date. (StartTime and EndTime are of type varchar).
The expected output is
-------------------------------------------------------------------------------
EmpID| Name| Overlapping | Non-Overlapping | Date
Period Period
-------------------------------------------------------------------------------
1 Jon | 10:00 AM to 12:15 PM |16:30 PM to 18:30 PM | 01/JAN/2000
| AND | |
| 11:00 AM to 14:15 PM | |
| AND ...(If any) | |
--------------------------------------------------------------------------------
2 Smith| NULL | 10:00 AM to 12:10 PM |01/JAN/2000
--------------------------------------------------------------------------------
Please help me to bring such output using TSQL(SQL Server 2005/2008).
First, you should probably consider using a DateTime field to store the StartTime and EndTime, and thus make calculations easier, and remove the need for the Date field.
SELECT t1.EmpID,
t1.StartTime,
t1.EndTime,
t2.StartTime
t2.EndTime,
FROM lab t1
LEFT OUTER JOIN lab t2
ON t2.StartTime BETWEEN t1.StartTime AND t1.EndTime
AND t2.EmpID = t1.EmpID
ORDER BY t1.EmpID,
t1.StartTime,
t2.StartTime
That won't get you the EXACT format you have listed, but it's close. You should end up with:
| EmpID| Name| Normal Period | Overlapping Period |
------------------------------------------------------------
| 1 | Jon | 10:00 AM | 12:15 PM | 11:00 AM | 02:15 PM |
------------------------------------------------------------
| 2 | Smith | 10:00 AM | 12:10 PM | NULL | NULL |
------------------------------------------------------------
Each overlapped period within a normal period would show up in a new row, but any period with no overlaps would have only one row. You could easily concatenate the fields if you wanted specifically the "xx:xx xx to xx:xx xx" format. Hope this helps you some.

Resources