Sum variable amount of intervals together

Sum variable amount of intervals together - sql-server

we just changed our telephony system and every agents are now being logged through 15 minute intervals and we need 1 line per event
table event:
empid | code | timestamp | duration
5111 | 5 | 09:45:00 | 45
5222 | 2 | 09:58:00 | 120
5111 | 5 | 10:00:00 | 900
5111 | 5 | 10:15:00 | 900
5111 | 5 | 10:15:30 | 30
5222 | 5 | 11:00:00 | 8
5222 | 5 | 11:00:05 | 5
timestamp is writen after the fact, so a timestamp at 9:45:00 with a duration of 45 was from 9:44:15 and since the interval stopped at 9:45, it was written at that time, but i need 9:44:15 save
result should give me
empid | code | timestamp | duration
5111 | 5 | 09:44:15 | 1875
5222 | 2 | 09:56:00 | 120
5222 | 5 | 10:59:52 | 13
The problem is the phones are locked with a 2 hours max delay, and as you can see with my employee # 5222 he spent 13 seconds on two lines... i could join the same table 10 times. 1 to avoid when there is the same code where the end time of the previous line = the starttime of the new line
this is on MSSQL 2008
Select e.empid
,e.code
,convert(time(0),DATEADD(ss,- e.Duration, e.timestamp))
,e.duration + isnull(e1.duration,0) + isnull(e2.duration,0)
from [event] e
left join [event] e0 on
convert(TIME(0),DATEADD(ss,- e.Duration, e.timestamp)) = e0.timestamp
and
e.empid = e0.empid
and
e.code = e0.code
left join [event] e1 on
convert(TIME(0),DATEADD(ss,- e1.Duration, e1.timestamp)) = e.timestamp
and
e.empid = e1.empid
and
e.code = e1.code
left join [event] e2 on
convert(TIME(0),DATEADD(ss,- e2.Duration, e2.timestamp)) = e1.timestamp
and
e2.empid = e1.empid
and
e2.code = e1.code
--etc......
where isnull(e0.duration,'-10') = '-10'
This works but far from optimal...
i would rather use an aggregate function but i dont know how to write it as there is no comon key other than last timestamps match with new - duration with this table!
it is important to know that agent 5111 could go again on code 5 on the same day, and i would need 2 lines for this one.... if not it would have been too easy!
thank you in advance!

Try this. I have commented in the code, but the basic algorithm
find rowswhich are continuations i.e. there exists a row which matches once
you subtract the duration
find the "originals" i.e. the start of each call by subtracting the continuations
for each original, find the next original so we can determine a range of times to look for continuations
join it all together and add the total duration from continuations appropriate to each original
Hope this helps, it was an interesting challenge!
declare #data table
(
empid int,
code int,
[timestamp] time,
duration int
);
insert into #data values(5111,5,'09:45',45),
(5222,2,'09:58',120),
(5111,5,'10:00',900),
(5111,5,'10:15',900),
(5111,5,'10:15:30',30),
(5222,5,'11:00',8),
(5222,5,'11:00:05',5),
-- added these rows to include the situation you describe where 5111 goes again on code 5:
(5111,5,'13:00',45),
(5111,5,'13:15',900),
(5111,5,'13:15:25',25);
-- find where a row is a continuation
with continuations as (
select a.empid, a.code, a.[timestamp] , a.duration
from #data a
inner join #data b on a.empid = b.empid
and a.code = b.code
where dateadd(ss, -a.duration, a.[timestamp]) = b.[timestamp]
),
-- find the "original" rows as the complement of continuations
originals as
(
select d.empid, d.code, d.[timestamp], d.duration
from #data d
left outer join continuations c on d.empid = c.empid and d.code = c.code and d.timestamp = c.timestamp
where c.empid is null
),
-- to hand the situation where we have more than one call for same agent and code,
-- find the next timestamp for each empid/code
nextcall as (
select a.*, a2.[timestamp] nex
from originals a
outer apply (
select top 1 [timestamp]
from originals a2
where a2.[timestamp] > a.[timestamp]
and a.empid = a2.empid
and a.code = a2.code
order by a2.[timestamp] desc
) a2
)
select o.empid,
o.code,
dateadd(ss, -o.duration, o.timestamp) as [timestamp],
o.duration + isnull(sum(c.duration),0) as duration
from originals o
left outer join nextcall n on o.empid = n.empid and o.code = n.code and o.[timestamp] = n.[timestamp]
left outer join continuations c on o.empid = c.empid
and o.code = c.code
-- filter the continuations on the range of times based on finding the next one
and c.[timestamp] > o.[timestamp]
and (n.nex is null or c.[timestamp] < n.nex)
group by o.empid,
o.code,
o.duration,
o.[timestamp]

Related

How to repeat values in case of null values on left join

I have a table with a calendar, and a table with rates. In the table with the rates, there are no values existing for days in the weekend. I'm trying to join the two, in order to have a table where there is a rate for all days, and I need the rates in the weekend to be the latest available rate. Instad of it showing NULL values, as it would when you make a left join and the record doesn't exist, it should just take the latest available, repeating the previous value.
I have the below code, which works, but it takes 2 min to do on 7,397 rows, which is way too long.
Does anyone know a faster way to get the same results?
SELECT
c.CalendarID,
MAX(r.RateID)
FROM Dim_Calendar c
LEFT JOIN Dim_Rates r ON r.RateDate <= c.CalendarID
What I get without <= and just an = is the following
CalendarID | RateID
20131001 | 2
20131002 | 3
20131003 | 4
20131004 | 5
20131005 | NULL
20131006 | NULL
20131007 | 6
And this is the desired table:
CalendarID | RateID
20131001 | 2
20131002 | 3
20131003 | 4
20131004 | 5
20131005 | 5
20131006 | 5
20131007 | 6

You can use LAG() window function:
SELECT c.CalendarID,
COALESCE(
r.RateID,
LAG(r.RateID, 1) OVER (ORDER BY c.CalendarID),
LAG(r.RateID, 2) OVER (ORDER BY c.CalendarID)
) RateID
FROM Dim_Calendar c LEFT JOIN Dim_Rates r
ON r.RateDate = c.CalendarID
ORDER BY c.CalendarID
See the demo.
Results:
> CalendarID | RateID
> ---------: | :-----
> 20131001 | 2
> 20131002 | 3
> 20131003 | 4
> 20131004 | 5
> 20131005 | 5
> 20131006 | 5
> 20131007 | 6

You could use a correlated subquery to fill the gaps:
SELECT
c.CalendarID,
(SELECT TOP 1 r.RateID FROM Dim_Rates r
WHERE r.RateDate <= c.CalendarID AND r.RateID IS NOT NULL
ORDER BY r.RateDate DESC) AS RateID
FROM Dim_Calendar c
ORDER BY c.CalendarID;
This query can be improved by using the following index:
CREATE INDEX idx ON Dim_Rates (RateDate, RateID);

As pointed out, you need to check for proper and covering indexing. It appears you are running a against a DW DB and if that is the case then you can replace the CTE with indexed temp tables if the esitmated row count approximation is way off in the query plan.
;WITH NormalizedData AS
(
SELECT
RateID,CalendarID,
VirtualGroupID = SUM(LastRecordBeforeGap) OVER (ORDER BY CalendarID ROWS UNBOUNDED PRECEDING)
FROM
(
SELECT RateID,CalendarID,
LastRecordBeforeGap = CASE WHEN LEAD(RateID) OVER(ORDER BY CalendarID) IS NULL AND RateID IS NOT NULL THEN 1 ELSE 0 END
FROM
Dim_Calendar c
LEFT JOIN Dim_Rates r ON r.RateDate = c.CalendarID
)AS x
)
SELECT
RateID = ISNULL(RateID, SUM(RateID) OVER(PARTITION BY VirtualGroupID)),
CalendarID
FROM
NormalizedData

using all values from one column in another query

I am trying to find a solution for the following issue that I have in sql-server:
I have one table t1 of which I want to use each date for each agency and loop it through the query to find out the avg_rate. Here is my table t1:
Table T1:
+--------+-------------+
| agency | end_date |
+--------+-------------+
| 1 | 2017-10-01 |
| 2 | 2018-01-01 |
| 3 | 2018-05-01 |
| 4 | 2012-01-01 |
| 5 | 2018-04-01 |
| 6 | 2017-12-01l |
+--------+-------------+
I literally want to use all values in the column end_date and plug it into the query here (I marked it with ** **):
with averages as (
select a.id as agency
,c.rate
, avg(c.rate) over (partition by a.id order by a.id ) as avg_cost
from table_a as a
join rates c on a.rate_id = c.id
and c.end_date = **here I use all values from t1.end_date**
and c.Start_date = **here I use all values from above minus half a year** = dateadd(month,-6,end_date)
group by a.id
,c.rate
)
select distinct agency, avg_cost from averages
order by 1
The reason why I need two dynamic dates is that the avg_rates vary if you change the timeframe between these dates.
My problem and my question is now:
How can you take the end_date from table t1 plug it into the query where c.end_date is and loop if through all values in t1.end_date?
I appreciate your help!

Do you really need a windowed average? Try this out.
;with timeRanges AS
(
SELECT
T.end_date,
start_date = dateadd(month,-6, T.end_date)
FROM
T1 AS T
)
select
a.id as agency,
c.rate,
T.end_date,
T.start_date,
avg_cost = avg(c.rate)
from
table_a as a
join rates c on a.rate_id = c.id
join timeRanges AS T ON A.DateColumn BETWEEN T.start_date AND T.end_date
group by
a.id ,
c.rate,
T.end_date,
T.start_date
You need a date column to join your data against T1 (I called it DateColumn in this example), otherwise all time ranges would return the same averages.

I can think of several ways to do this - Cursor, StoredProcedure, Joins ...
Given the simplicity of your query, a cartesian product (Cross Join) of Table T1 against the averages CTE should do the magic.

SQL Min value not being selected

I have a select statement with a joining table, and I am attempting to select the first row of the joined table.
For example, dbo.Projects has many dbo.Buffers.
My query is:
SELECT PM.PROJECTID, PM.PROJECTNAME, BU.PERCENTPENWREALIGNED
FROM dbo.PROJECTMGRVIEW AS PM
JOIN dbo.S2M_BUFFER AS BU ON BU.PROJECTID = ( SELECT DISTINCT MIN(TASKUNIQUEID) FROM dbo.S2M_BUFFER WHERE PROJECTID = PM.PROJECTID )
WHERE PM.PROJECT_TYPE = 8 AND PM.CATEGORY = 'Engineering' ANd PM.PROJECTID = 244;
My result set is many rows:
PROJECTID | PROJECTNAME | PERCENTPENWREALIGNED
244 | PROJECT A | 100
244 | PROJECT A | 0
244 | PROJECT A | 0
244 | PROJECT A | 0
244 | PROJECT A | 0
244 | PROJECT A | 0
244 | PROJECT A | 0
Obviously in this case, I simply need the first row.

Your join doesn't really make any sense, you can't do it using a subquery, but my guess is you want something like this:
SELECT PM.PROJECTID, PM.PROJECTNAME, BU.PERCENTPENWREALIGNED
FROM dbo.PROJECTMGRVIEW AS PM
CROSS APPLY (
select top 1 PERCENTPENWREALIGNED
from dbo.S2M_BUFFER BU
where BU.PROJECTID = PM.PROJECTID
order by TASKUNIQUEID ASC
) BU
WHERE PM.PROJECT_TYPE = 8 AND PM.CATEGORY = 'Engineering' AND PM.PROJECTID = 244;
This will join the row with smallest TASKUNIQUEID in S2M_BUFFER with the PROJECTMGRVIEW

I agree your Join statement seems flawed due a circular reference between the PROJECTID and the TASKUNIQUEID.
I think this might be more what you were trying to do:
SELECT PM.PROJECTID, PM.PROJECTNAME, BU.PERCENTPENWREALIGNED
FROM dbo.PROJECTMGRVIEW AS PM
JOIN dbo.S2M_BUFFER AS BU ON BU.TASKUNIQUEID =
( SELECT DISTINCT MIN(TASKUNIQUEID) FROM dbo.S2M_BUFFER WHERE PROJECTID = PM.PROJECTID )
WHERE PM.PROJECT_TYPE = 8 AND PM.CATEGORY = 'Engineering' ANd PM.PROJECTID = 244;

Netezza: Show dates even if 0 data for that day

I have this query through an odbc connection in excel for a refreshable report with data for every 4 weeks. I need to show the dates in each of the 4 weeks even if there is no data for that day because this data is then linked to a Graph. Is there a way to do this?
thanks.
Select b.INV_DT, sum( a.ORD_QTY) as Ordered, sum( a.SHIPPED_QTY) as Shipped
from fct_dly_invoice_detail a, fct_dly_invoice_header b, dim_invoice_customer c
where a.INV_HDR_SK = b.INV_HDR_SK
and b.DIM_INV_CUST_SK = c.DIM_INV_CUST_SK
and a.SRC_SYS_CD = 'ABC'
and a.NDC_NBR is not null
**and b.inv_dt between CURRENT_DATE - 16 and CURRENT_DATE**
and b.store_nbr in (2851, 2963, 3249, 3385, 3447, 3591, 3727, 4065, 4102, 4289, 4376, 4793, 5209, 5266, 5312, 5453, 5569, 5575, 5892, 6534, 6571, 7110, 9057, 9262, 9652, 9742, 10373, 12392, 12739, 13870
)
group by 1

The general purpose solution to this is to create a date dimension table, and then perform an outer join to that date dimension table on the INV_DT column.
There are tons of good resources you can search for on creating a good date dimension table, so I'll just create a quick and dirty (and trivial) example here. I highly recommend some research in that area if you'll be doing a lot of BI/reporting.
If our table we want to report from looks like this:
Table "TABLEZ"
Attribute | Type | Modifier | Default Value
-----------+--------+----------+---------------
AMOUNT | BIGINT | |
INV_DT | DATE | |
Distributed on random: (round-robin)
select * from tablez order by inv_dt
AMOUNT | INV_DT
--------+------------
1 | 2015-04-04
1 | 2015-04-04
1 | 2015-04-06
1 | 2015-04-06
(4 rows)
and our report looks like this:
SELECT inv_dt,
SUM(amount)
FROM tablez
WHERE inv_dt BETWEEN CURRENT_DATE - 5 AND CURRENT_DATE
GROUP BY inv_dt;
INV_DT | SUM
------------+-----
2015-04-04 | 2
2015-04-06 | 2
(2 rows)
We can create a date dimension table that contains a row for every date (or ate last 1024 days in the past and 1024 days in the future using the _v_vector_idx view in this example).
create table date_dim (date_dt date);
insert into date_dim select current_date - idx from _v_vector_idx;
insert into date_dim select current_date + idx +1 from _v_vector_idx;
Then our query would look like this:
SELECT d.date_dt,
SUM(amount)
FROM tablez a
RIGHT OUTER JOIN date_dim d
ON a.inv_dt = d.date_dt
WHERE d.date_dt BETWEEN CURRENT_DATE -5 AND CURRENT_DATE
GROUP BY d.date_dt;
DATE_DT | SUM
------------+-----
2015-04-01 |
2015-04-02 |
2015-04-03 |
2015-04-04 | 2
2015-04-05 |
2015-04-06 | 2
(6 rows)
If you actually needed a zero value instead of a NULL for the days where you had no data, you could use a COALESCE or NVL like this:
SELECT d.date_dt,
COALESCE(SUM(amount),0)
FROM tablez a
RIGHT OUTER JOIN date_dim d
ON a.inv_dt = d.date_dt
WHERE d.date_dt BETWEEN CURRENT_DATE -5 AND CURRENT_DATE
GROUP BY d.date_dt;
DATE_DT | COALESCE
------------+----------
2015-04-01 | 0
2015-04-02 | 0
2015-04-03 | 0
2015-04-04 | 2
2015-04-05 | 0
2015-04-06 | 2
(6 rows)

I agree with #ScottMcG that you need to get the list of dates. However if you are in a situation where you aren't allowed to create a table. You can simplify things. All you need is a table that has at least 28 rows. Using your example, this should work.
select date_list.dt_nm, nvl(results.Ordered,0) as Ordered, nvl(results.Shipped,0) as Shipped
from
(select row_number() over(order by sub.arb_nbr)+ (current_date -28) as dt_nm
from (select rowid as arb_nbr
from fct_dly_invoice_detail b
limit 28) sub ) date_list left outer join
( Select b.INV_DT, sum( a.ORD_QTY) as Ordered, sum( a.SHIPPED_QTY) as Shipped
from fct_dly_invoice_detail a inner join
fct_dly_invoice_header b
on a.INV_HDR_SK = b.INV_HDR_SK
and a.SRC_SYS_CD = 'ABC'
and a.NDC_NBR is not null
**and b.inv_dt between CURRENT_DATE - 16 and CURRENT_DATE**
and b.store_nbr in (2851, 2963, 3249, 3385, 3447, 3591, 3727, 4065, 4102, 4289, 4376, 4793, 5209, 5266, 5312, 5453, 5569, 5575, 5892, 6534, 6571, 7110, 9057, 9262, 9652, 9742, 10373, 12392, 12739, 13870)
inner join
dim_invoice_customer c
on b.DIM_INV_CUST_SK = c.DIM_INV_CUST_SK
group by 1 ) results
on date_list.dt_nm = results.inv_dt

Get value with MAX(date) from two table

I have two tables.
MainTable:
MainID | LastValue | LastReadingDate
1 | 234 | 01.01.2012
2 | 534 | 03.02.2012
Readings:
MainID | ValueRead | ReadingDate
1 | 123 | 03.02.2012
1 | 488 | 04.03.2012
2 | 324 | 03.02.2012
2 | 683 | 05.04.2012
I want to get
SELECT MainTable.MainID, MainTable.LastValue, MainTable.LastReadingDate, (SELECT ValueRead, MAX(ReadingDate)
FROM Readings
WHERE Readings.MainID=MainTable.MainID ORDER BY ValueRead)
In other words, I want to get the current LastValue and LastReadingDate from MainTable along side the ValueRead with the most recent ReadingDate from Readings.

Here is a query you could use. It'll show all MainTable entries, including those that doesn't have a "Reading" entry yet. Change the LEFT JOIN to an INNER JOIN if you don't want it like that.
WITH LastReads AS (
SELECT ROW_NUMBER() OVER (PARTITION BY MainID ORDER BY ReadingDate DESC) AS ReadingNumber,
MainID,
ValueRead,
ReadingDate
FROM Readings
)
SELECT M.MainID, M.LastValue, M.LastReadingDate, R.ValueRead, R.ReadingDate
FROM MainTable M
LEFT OUTER JOIN LastReads R
ON M.MainID = R.MainID
AND R.ReadingNumber = 1 -- Last reading, use 2 or 3 to get the 2nd newest, 3rd newest, etc.
SQLFiddle-link: http://sqlfiddle.com/#!3/16c68/3
Another link with N number of readings per mainid: http://sqlfiddle.com/#!3/16c68/4

Not tried this myself, but here goes. Please try
select max(r.readingdate), max(t.lastvalue), max(t.lastreadingdate)
from readings r inner join
( select MainID, LastValue, LastReadingDate
from MainTable m
where LastReadingDate =
(select max(minner.LastReadingDate)
from MainTable minner
where minner.MainID = m.MainID
)
) t
on (r.mainid = t.mainid)

try this:
select M.LastValue, M.LastReadingDate,
(select top 1 ValueRead from Readings where MainID=M.MainID order by ReadingDate desc)
from MainTable M

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Sum variable amount of intervals together - sql-server

Related

How to repeat values in case of null values on left join

using all values from one column in another query

SQL Min value not being selected

Netezza: Show dates even if 0 data for that day

Get value with MAX(date) from two table

Categories

Resources