KDB: Union of time intervals - union

Any nifty ways of converting a series of (possibly overlapping) time intervals into a set of disjoint time intervals covering the same times?
Example:
interval1:(07:00:00;08:00:00)
interval2:(07:30:00;08:30:00)
interval3:(10:00:00;11:00:00)
Desired output:
((07:00:00;08:30:00) ; (10:00:00;11:00:00))

In a table context you can do something like:
q)d:([]st:07:00:00 07:30:00 10:00:00; et:08:00:00 08:30:00 11:00:00)
q)d
st et
-----------------
07:00:00 08:00:00
07:30:00 08:30:00
10:00:00 11:00:00
distinct update st:?[st<prev et;prev st;st], et:et^?[et>next st;next et;et] from d
st et
-----------------
07:00:00 08:30:00
10:00:00 11:00:00
Not sure if i'd call it nifty, but it's decent!

Related

Netezza Convert UTC/GMT to Central with Daylight Savings Time

I am working in a Netezza database that stores time as GMT (or so I am told by our data engineers). I need to be able to convert this to Central Standard Time (CST) but accounting for daylight savings time. I found that I could use something like:
SELECT CURRENT_TIMESTAMP, CURRENT_TIMESTAMP AT TIME ZONE 'CST' AT TIME ZONE 'GMT'
However, when I run this SELECT (keep in mind, today is March 30, 2021 - CST should only be 5 hours different from GTM), I get a 6 hour difference.... I looked up a reference to see what time zones are available in Netezza and I see a "CDT" which is 5 hours, and that works for the 5 hour difference, but this means in my query I would need to either change this each time DST switches over or do some sort of elaborate case statement to know which one to use depending on the date/time of year.
Is there an easy automated way to convert a GTM time to Central Standard Time accounting for daylight savings time? Thanks so much!!!
The question can be interpreted one of two ways. In both cases, the solution is to determine the timezone to convert to, based on whether the timestamp is between 2 AM 2nd Sunday of March and 2 AM on 1st Sunday of Nov (for US Central timezone)
The timestamps in your table, need to be converted to CST or CDT based on the current time (when the query is being run)
this means if the same query was run in Feb, the results would be different than if its run now
also it would be different based on what the timezone of the netezza system is set to
Eg
select
t as original,
-- extract year from current date and 2nd Sunday of March
-- use last_day to make sure we account for March 1 being a Sunday
(next_day(next_day(
last_day((date_part('years', current_date) || '-02-01'):: date),
'sun'),
'sun')|| ' 02:00:00'):: timestamp as dstart,
-- extract year from current date and 1st Sunday of Nov
-- use last_day to make sure we account for Nov 1 being a Sunday
(next_day(last_day(
(date_part('years', current_date) || '-10-01')::date),
'sun')|| ' 02:00:00'):: timestamp as dend,
case when current_timestamp between dstart
and dend then 'CDT' else 'CST' end as tz,
t at time zone tz as converted
from
tdata;
will produce
ORIGINAL | DSTART | DEND | TZ | CONVERTED
---------------------+---------------------+---------------------+-----+------------------------
2021-01-01 17:00:00 | 2021-03-14 02:00:00 | 2021-11-07 02:00:00 | CDT | 2021-01-01 12:00:00-05
2021-04-01 17:00:00 | 2021-03-14 02:00:00 | 2021-11-07 02:00:00 | CDT | 2021-04-01 12:00:00-05
2020-04-01 17:00:00 | 2021-03-14 02:00:00 | 2021-11-07 02:00:00 | CDT | 2020-04-01 12:00:00-05
2020-12-01 17:00:00 | 2021-03-14 02:00:00 | 2021-11-07 02:00:00 | CDT | 2020-12-01 12:00:00-05
(4 rows)
OR
The timestamps in your table need to be converted to CST or CDT depending on when the daylight savings started/ended in the respective year as defined in the time stamp.
this is more deterministic
select
t as original,
-- extract year from this timestamp and 2nd Sunday of March
-- use last_day to make sure we account for March 1 being a Sunday
(next_day(next_day(
last_day((date_part('years', t) || '-02-01'):: date), 'sun'),
'sun')|| ' 02:00:00'):: timestamp as dstart,
-- extract year from this timestamp and 1st Sunday of Nov
-- use last_day to make sure we account for Nov 1 being a Sunday
(next_day(last_day((date_part('years', t) || '-10-01')::date),
'sun')|| ' 02:00:00'):: timestamp as dend,
case when current_timestamp between dstart
and dend then 'CDT' else 'CST' end as tz,
t at time zone tz as converted
from
tdata;
This will produce (tdata is a sample table w/ 4 timestamps)
ORIGINAL | DSTART | DEND | TZ | CONVERTED
---------------------+---------------------+---------------------+-----+------------------------
2021-01-01 17:00:00 | 2021-03-14 02:00:00 | 2021-11-07 02:00:00 | CST | 2021-01-01 11:00:00-06
2021-04-01 17:00:00 | 2021-03-14 02:00:00 | 2021-11-07 02:00:00 | CDT | 2021-04-01 12:00:00-05
2020-04-01 17:00:00 | 2020-03-08 02:00:00 | 2020-11-01 02:00:00 | CDT | 2020-04-01 12:00:00-05
2020-12-01 17:00:00 | 2020-03-08 02:00:00 | 2020-11-01 02:00:00 | CST | 2020-12-01 11:00:00-06
(4 rows)
system.admin(admin)=> select '2021-04-07 11:00:00' as gmt, timezone('2021-04-07 11:00:00' , 'GMT', 'America/New_York') as eastern, timezone('2021-04-07 11:00:00', 'GMT', 'America/Chicago') as central, timezone('2021-04-07 11:00:00', 'GMT', 'America/Los_Angeles') as pacific;
gmt | eastern | central | pacific
---------------------+---------------------+---------------------+---------------------
2021-04-07 11:00:00 | 2021-04-07 07:00:00 | 2021-04-07 06:00:00 | 2021-04-07 04:00:00
(1 row)
system.admin(admin)=> select '2021-03-07 11:00:00' as gmt, timezone('2021-03-07 11:00:00' , 'GMT', 'America/New_York') as eastern, timezone('2021-03-07 11:00:00', 'GMT', 'America/Chicago') as central, timezone('2021-03-07 11:00:00', 'GMT', 'America/Los_Angeles') as pacific;
gmt | eastern | central | pacific
---------------------+---------------------+---------------------+---------------------
2021-03-07 11:00:00 | 2021-03-07 06:00:00 | 2021-03-07 05:00:00 | 2021-03-07 03:00:00
(1 row)
Instead of CDT and CST if we use 'America/Chicago' as shown above it takes care of daylight savings.

Postgresql Average and Group By

So I have a table with a column id(INTEGER), temperature(REAL) and one time(TIMESTAMP) and I want to make the average temperature by hour but I can't work it around.
For a sample with
temperature | time
-------------+---------------------
21.88 | 2018-06-01 07:30:00
23.21 | 2018-06-01 07:45:00
23.57 | 2018-06-01 08:15:00
24.91 | 2018-06-01 08:30:00
25.5 | 2018-06-01 08:45:00
25.98 | 2018-06-01 09:00:00
| 2018-06-01 09:30:00
24.45 | 2018-06-01 09:45:00
| 2018-06-01 10:00:00
And the request:
SELECT DISTINCT ON(DATE_PART('hour',time)) time, avg(temperature)
FROM Measure GROUP BY time , DATE_PART('hour',time);
I get:
time | avg
---------------------+------------------
2018-06-01 07:30:00 | 21.8799991607666
2018-06-01 08:15:00 | 23.5699996948242
2018-06-01 09:00:00 | 25.9799995422363
Something is happening here, but it's not an average...
Resolved:
Thanks to the comment, I get something correct with the query:
SELECT DISTINCT ON(DATE_TRUNC('hour',time)) DATE_TRUNC('hour',time), avg(temperature)
FROM Measure GROUP BY DATE_TRUNC('hour',time), DATE_TRUNC('day',time);

First TimeIn Last TimeOut for each employee for respective days

I have following table from which I want to extract the time calculated. I am looking to get the Hours Spent by each employee for each day.
CREATE TABLE Attendance
(
, EmpID INT
, TimeIn datetime
, TimeOut datetime
)
The sample record against this table I have is listed below.
EmpID | AttendanceTimeIN | AttendanceTimeOut
1 2017-04-01 9:00:00 2017-04-01 10:20:00
2 2017-04-01 9:00:00 2017-04-01 12:30:00
1 2017-04-01 10:25:00 2017-04-01 17:30:00
2 2017-04-01 13:26:00 2017-04-01 14:50:00
2 2017-04-01 15:00:00 2017-04-01 18:00:00
1 2017-04-02 9:00:00 2017-04-02 11:00:00
1 2017-04-02 11:10:00 2017-04-02 12:00:00
2 2017-04-02 9:00:00 2017-04-02 12:00:00
1 2017-04-02 12:50:00 2017-04-02 18:00:00
2 2017-04-02 12:51:00 2017-04-02 18:00:00
I want to get the First TimeIn and Last TimeOut of and employee for each day to calculate how many hours a specific employee have spent in office each day.
I'm bit confused that how to use Min/Max function so I can get both employees hours for each day.
The result set I am looking for should look like this.
EmpID | AttendanceTimeIN | AttendanceTimeOut
1 2017-04-01 9:00:00 2017-04-01 17:30:00
2 2017-04-01 9:00:00 2017-04-01 18:00:00
1 2017-04-02 9:00:00 2017-04-02 18:00:00
2 2017-04-02 9:00:00 2017-04-02 18:00:00
Any help would be highly appreciated.
Thank you
If your TimeIn and TimeOut are datetime type (which they should be!), this solution works with the tests I did:
SELECT
EmpID
, MIN(TimeIn)
, MAX(TimeOut)
FROM Attendance
GROUP BY EmpID, CAST(TimeIn AS DATE)
the GROUP BY clause means that there's one row for each employee and each day, since CASTing to DATE gets rid of the time part. MIN and MAX then just inherently work.

SQL Calculate hours over night with static date

I'm writing Queries on a system someone else installed, so tables can not be changed here.
problem:
I have a table where i've got Date, timeIN and timeOUT
take the following records;
date | timeIN | timerOUT
-------------------------------------------------
2016-01-01 00:00:00.00 | 2000-01-01 07:00 | 2000-01-01 15:00 DATEDIFF = 8H
2016-01-02 00:00:00.00 | 2000-01-01 07:00 | 2000-01-01 15:00 DATEDIFF = 8H
2016-01-05 00:00:00.00 | 2000-01-01 23:00 | 2000-01-01 07:00 DATEDIFF = -16H
How can i get DATEDIFF = 8H from record number 3?
The problem here is that all timeIN and timeOUT stamps have the same dummy date.
You can use CASE expression inside the DATEDIFF function:
SELECT
Diff =
DATEDIFF(
HOUR,
timeIn,
CASE
WHEN timeOut < timeIn THEN DATEADD(DAY, 1, timeOut)
ELSE timeOut
END
)
FROM tbl
This will add one day on timeOut if it's less than the timeIn.

TSQL Finding Overlapping Hours

When two tables are given
Employee Table
EmpID Name
1 Jon
2 Smith
3 Dana
4 Nancy
Lab Table
EmpID StartTime EndTime Date LabID
1 10:00 AM 12:15 PM 01/JAN/2000 Lab I
1 11:00 AM 14:15 PM 01/JAN/2000 Lab II
1 16:30 PM 18:30 PM 01/JAN/2000 Lab I
2 10:00 AM 12:10 PM 01/JAN/2000 Lab I
From the given details ,I have to find out the overlapping hours,and non overlapping hours of each employee on each date. (StartTime and EndTime are of type varchar).
The expected output is
-------------------------------------------------------------------------------
EmpID| Name| Overlapping | Non-Overlapping | Date
Period Period
-------------------------------------------------------------------------------
1 Jon | 10:00 AM to 12:15 PM |16:30 PM to 18:30 PM | 01/JAN/2000
| AND | |
| 11:00 AM to 14:15 PM | |
| AND ...(If any) | |
--------------------------------------------------------------------------------
2 Smith| NULL | 10:00 AM to 12:10 PM |01/JAN/2000
--------------------------------------------------------------------------------
Please help me to bring such output using TSQL(SQL Server 2005/2008).
First, you should probably consider using a DateTime field to store the StartTime and EndTime, and thus make calculations easier, and remove the need for the Date field.
SELECT t1.EmpID,
t1.StartTime,
t1.EndTime,
t2.StartTime
t2.EndTime,
FROM lab t1
LEFT OUTER JOIN lab t2
ON t2.StartTime BETWEEN t1.StartTime AND t1.EndTime
AND t2.EmpID = t1.EmpID
ORDER BY t1.EmpID,
t1.StartTime,
t2.StartTime
That won't get you the EXACT format you have listed, but it's close. You should end up with:
| EmpID| Name| Normal Period | Overlapping Period |
------------------------------------------------------------
| 1 | Jon | 10:00 AM | 12:15 PM | 11:00 AM | 02:15 PM |
------------------------------------------------------------
| 2 | Smith | 10:00 AM | 12:10 PM | NULL | NULL |
------------------------------------------------------------
Each overlapped period within a normal period would show up in a new row, but any period with no overlaps would have only one row. You could easily concatenate the fields if you wanted specifically the "xx:xx xx to xx:xx xx" format. Hope this helps you some.

Resources