SQL Server: use function after certain value in table - sql-server

I am trying to find the time difference between two certain points in stored conversations. These points can differ in each conversation which makes it difficult for me. I need the time difference between the Agent's message and the first EndUser response after it.
In the example in CaseNr 1234 below I need the time difference between MessageNrs 3&4, 5&6 and 7&8.
In CaseNr 2345 I need the time difference between MessageNrs 3&4, 5&6, 7&8 and 10&11.
In CaseNr 4567 I need the time difference between 2&3 and 4&5.
As is shown, the order Agent & EndUser can differ in each conversation as well as the positions these types are in.
Is there a way to calculate the time difference the way I have described it in SQL server?

I think this code should help you.
with t(MessageNr,CaseNr,Type, AgentTime, EndUserTime) as
(
select
t1.MessageNr,
t1.CaseNr,
t1.Type,
t1.EntryTime,
(select top 1 t2.EntryTime
from [Your_Table] as t2
where t1.CaseNr = t2.CaseNr
and t2.[Type] = 'EndUser'
and t1.EntryTime < t2.EntryTime
order by t2.EntryTime) as userTime
from [Your_Table] as t1
where t1.[Type] = 'Agent'
)
select t.*, DATEDIFF(second, AgentTime, EndUserTime)
from t;

It appears the logic you require is the time difference between an Agent row and the immediately following EndUser row.
You can do this with LEAD, which will be more performant than the use of self-joins.
SELECT *,
DATEDIFF(second, t.EntryTime, t.NextTime) TimeDifference
FROM (
SELECT *,
LEAD(CASE WHEN t.[Type] = 'EndUser' THEN t.EntryTime END) NextTime
FROM myTable t
) t
WHERE t.[Type] = 'Agent'
AND t.NextTime IS NOT NULL

Related

Calculating Days Between Dates in Separate Rows For Same UnitID

I am trying to calculate the time a commercial real estate space sits vacant. I have move-in & move-out dates for each tenant that has occupied that unit. It is easy to calculate the occupied time of each tenant as that data is within the same row. However, I want to calculate the vacant time: the time between move-out of the previous tenant and move-in of the next tenant. These dates appear in separate rows.
Here is a sample of what I have currently:
SELECT
uni_vch_UnitNo AS UnitNumber,
uty_vch_Code AS UnitCode,
uty_int_Id AS UnitID, tul_int_FacilityId AS FacilityID,
tul_dtm_MoveInDate AS Move_In_Date,
tul_dtm_MoveOutDate AS Move_Out_Date,
DATEDIFF(day, tul_dtm_MoveInDate, tul_dtm_MoveOutDate) AS Occupancy_Days
FROM TenantUnitLeases
JOIN units
ON tul_int_UnitId = uni_int_UnitId
JOIN UnitTypes
ON uni_int_UnitTypeId = uty_int_Id
WHERE
tul_int_UnitId = '26490'
ORDER BY tul_dtm_MoveInDate ASC
Is there a way to assign an id to each row in chronological, sequential order and find the difference between row 2 move-in date less row 1 move-out date and so on?
Thank you in advance for the help.
I can't really tell which tables provide which columns for your query. Please alias and dot-qualify them in the future.
If you're using SQL 2012 or later, you've got LEAD and LAG functions which do exactly what you want: bring a "leading" or "lagging" row into a current row. See if this works (hopefully it should at least get you started):
SELECT
uni_vch_UnitNo AS UnitNumber,
uty_vch_Code AS UnitCode,
uty_int_Id AS UnitID, tul_int_FacilityId AS FacilityID,
tul_dtm_MoveInDate AS Move_In_Date,
tul_dtm_MoveOutDate AS Move_Out_Date,
DATEDIFF(day, tul_dtm_MoveInDate, tul_dtm_MoveOutDate) AS Occupancy_Days
, LAG(tul_dtm_MoveOutDate) over (partition by uni_vch_UnitNo order by tul_dtm_MoveOutDate) as Previous_Move_Out_Date
, DATEDIFF(day,LAG(tul_dtm_MoveOutDate) over (partition by uni_vch_UnitNo order by tul_dtm_MoveOutDate),tul_dtm_MoveInDate) as Days_Vacant
FROM TenantUnitLeases
JOIN units
ON tul_int_UnitId = uni_int_UnitId
JOIN UnitTypes
ON uni_int_UnitTypeId = uty_int_Id
WHERE
tul_int_UnitId = '26490'
ORDER BY tul_dtm_MoveInDate ASC
Just comparing a value from the current row with a value in the previous row is functionality provided by the lag() function.
Try this in your query:
select...
tul_dtm_MoveInDate AS Move_In_Date,
tul_dtm_MoveOutDate AS Move_Out_Date,
DateDiff(day, Lag(tul_dtm_MoveOutDate,1) over(partition by uty_vch_Code, tul_int_FacilityId order by tul_dtm_MoveInDate), tul_dtm_MoveInDate) DaysVacant,
...
This needs a window function or correlated sub query. The goal is to provide the previous move out date for each row, which is in turn a function of that row. The term 'window' in this context means to apply an aggregate function over a smaller range than the whole set.
If you had a function called GetPreviousMoveOutDate, the parameters would be the key to filter on, and the ranges to search within the filter. So we would pass the UnitID as the key and the MoveInDate for this row, and the function should return the most recent MoveOutDate for the same unit that is before the passed in date. By getting the max date before this one, we will ensure we get only the previous occupancy if it exists.
To use a sub-query in ANSI-SQL you just add the select as a column. This should work on MS-SQL as well as other DB platforms; however, it requires using aliases for the table names so they can be referenced in the query more than once. I've updated your sample SQL with aliases using the AS syntax, although it looks redundant to your table naming convention. I added a uni_dtm_UnitFirstAvailableDate to your units table to handle the first vacancy, but this can be a default:
SELECT
uni.uni_vch_UnitNo AS UnitNumber,
uty.uty_vch_Code AS UnitCode,
uty.uty_int_Id AS UnitID, tul_int_FacilityId AS FacilityID,
tul.tul_dtm_MoveInDate AS Move_In_Date,
tul.tul_dtm_MoveOutDate AS Move_Out_Date,
DATEDIFF(day, tul.tul_dtm_MoveInDate, tul.tul_dtm_MoveOutDate) AS Occupancy_Days,
-- select the date:
(SELECT MAX (prev_tul.tul_dtm_MoveOutDate )
FROM TenantUnitLeases AS prev_tul
WHERE prev_tul.tul_int_UnitId = tul.tul_int_UnitId
AND prev_tul.tul_dtm_MoveOutDate > tul.tul_dtm_MoveInDate
AND prev_tul.tul_dtm_MoveOutDate is not null
) AS previous_moveout,
-- use the date in a function:
DATEDIFF(day, tul.tul_dtm_MoveInDate,
ISNULL(
(SELECT MAX (prev_tul.tul_dtm_MoveOutDate )
FROM TenantUnitLeases AS prev_tul
WHERE prev_tul.tul_int_UnitId = tul.tul_int_UnitId
AND prev_tul.tul_dtm_MoveOutDate > tul.tul_dtm_MoveInDate
AND prev_tul.tul_dtm_MoveOutDate is not null
) , uni.uni_dtm_UnitFirstAvailableDate) -- handle first occupancy
) AS Vacancy_Days
FROM TenantUnitLeases AS tul
JOIN units AS uni
ON tul.tul_int_UnitId = uni.uni_int_UnitId
JOIN UnitTypes AS uty
ON uni.uni_int_UnitTypeId = uty.uty_int_Id
WHERE
tul.tul_int_UnitId = '26490'
ORDER BY tul.tul_dtm_MoveInDate ASC

While loop for Teradata

world.
I'm try to find a way to compile multiple events together. Data looks like this:
Basically, it is series of row data with event logs
I want to generate an aggregation of these row events such that if a new event occurred within 30 seconds of the other ending, it combines the time together. However, if the event log does not have an abutting event, then it is not captured. And these events are 'person' specific.
I envision the output to look something like this:
My intuition suggests using some kind of while loop, but I'm not sure where to start
There's no need for recursion (and very hard to write) or a loop over a cursor.
SELECT
Person,
Min(starttime),
Max(starttime),
-- get a concatenated string
Trim(Trailing ',' FROM (XmlAgg(Reason || ',' ORDER BY Reason ) (VARCHAR(1000))))
FROM
(
SELECT Person, Start_timestamp, Stop_timestamp, Reason,
-- assign the same number to all rows within 30 seconds
Sum(flag) Over
Over (PARTITION BY Person
ORDER BY Start_timestamp
ROWS Unbounded Preceding) AS grp
FROM
(
SELECT Person, Start_timestamp, Stop_timestamp, Reason,
-- check if previous end is within 30 seconds of the current start
CASE WHEN Lag(Stop_timestamp)
Over (PARTITION BY Person
ORDER BY Start_timestamp) + INTERVAL '30' SECOND < Start_timestamp
THEN 0
ELSE 1
END AS flag
FROM tab
) AS dt
) AS dt
-- aggregate per person and group
GROUP BY Person, grp
If your Teradata version supports SESSIONIZE you can simplify the group calculation, but I couldn't write this syntaxc ad hoc :-)
You can achieve this using RECURSIVE CTE
WITH RECURSIVE MYREC(Person,Start_timestamp,Stop_timestamp ,Reason,LVL)
AS(
SELECT Person,MIN(Start_timestamp),MAX(Stop_timestamp),MIN(Reason)(varchar(100)) AS Reason,1
FROM MYTABLE
GROUP BY 1
UNION ALL
SELECT b.Person,b.Start_timestamp,b.Stop_timestamp ,trim(a.Reason) || ',' || trim( b.Reason), LVL+1
FROM MYTABLE a INNER JOIN MYREC b
ON a.Person = b.Person
AND a.Reason > b.Reason
)
SELECT Person,Start_timestamp,Stop_timestamp,Reason
FROM MYREC
QUALIFY RANK() OVER(PARTITION BY Person ORDER BY Reason DESC) = 1
Change MYTABLE to your tablename

Transaction data aggregate

As a disclaimer, I am not entirely sure the title of the question is best, if not I apologize.
I am trying to calculate cycle times for individuals, but files are occasionally transferred out of their work queues and eventually back. There are no unique transaction IDs recorded just a date and time stamp.
I tried looking for an aggregate group by functions and was told that is not a feature sql-server has.
I started by trying to identify the first and last transaction and was going to build out the query from there but it wasn't too helpful. Any insight would be very helpful.
Changedate is when the transfer from one person to another is recorded (year, moth, day time)
select a.claimId,
a.claimincidentID,
cast(a.changeDate as date) changedate,
a.claimNum,
a.Coverage,
a.AssignedAdjID,
a.AssignedAdj,
a.AssignedUnit,
a.TransferedAdjID,
a.TransferedAdj,
a.TransferedUnit,
a.usertypeid,
a.ChangedBy,
b.Feature_Create_Date,
DATEDIFF(day, b.Feature_Create_Date, a.changedate) transfer1,
cast(FIRST_VALUE(changeDate) OVER (ORDER BY changedate ASC)as date) AS firstchangedate,
cast(LAST_VALUE(changeDate) OVER (ORDER BY a.changedate ASC)as date) AS lastchangedate
from DB1.dbo.Assign_Transfer a
left join DB2.claimslist b on a.claimid=b.claimId
group by a.claimId, a.claimincidentID, a.changeDate, a.claimNum, a.Coverage, a.AssignedAdjID, a.AssignedAdj, a.AssignedUnit, a.TransferedAdjID, a.TransferedAdj, a.TransferedUnit, a.usertypeid, a.ChangedBy, b.Feature_Create_Date
Think of each of these rows as a Start (because the most recent one hasn't ended)
We would need to generate the complement End for this person in the chain.
Then with pairs of Start/End one could create GrossDuration.
Even after we get an assignment's start and end date/time,
we will have workday (8-4, or 9-5, or noon-8, ...) considerations,
also Sat/Sun/Hol and Vacation/out-of-office.
All of which affect Duration--- For Each Person differently.
Which would need to be factored by workday/etc into AdjDuration.
Lets say we can sequence these
Row_Number() Over (Partition by claimID Order by changeDate) as tfrNum
Assigned is the prior, and Transfered is the next
1, 2, 3, ... thru N
V
a.changeDate -- NOW()
V V
a.AssignedAdjID, | a.TransferedAdjID,
a.AssignedAdj, | a.TransferedAdj,
a.AssignedUnit, | a.TransferedUnit,
|
a.usertypeid,
a.ChangedBy,
So, is tfrNum=1 or tfrNum=N the oddball??
Lets look at pairs: each pair goes StartFrom->EndTo
1-2, 2-3, 3-4, 4-5, 5-6, 6-Now
----
From row1 we get TransferredID Start(changeDate) and
from row2 we get AssignedAdjID End (changeDate)
-- 2-3, 3-4, 4-5, etc repeating
--except for
From row6 we get TransferredID Start(changeDate) and
from default (still them) End (Now)
-- -- except again when TransferredUnit is "Closed"
After getting these pairs and their Start and End, we can do the Duration calc.
I need to visualize this problem before I try to run some sql. Real data would help.
Lets start with this, and later I would expand on it after you get it working and look at some data--
With cte_tfrNum (claimID, changeDate, tfrNum, tfrMax) AS
(
SELECT
a.claimId
,a.changeDate
,ROW_NUMBER() Over ( Partition By a.claimId Order By a.changeDate) as tfrNum
,b.tfrMax
FROM DB1.dbo.Assign_Transfer a
-- just for giggles, lets also get the max# of transfers for this claim
Left Join
(SELECT claimId, COUNT(*) as tfrMax
FROM DB1.dbo.Assign_Transfer
Group By claimId
) as b
On b.claimId = a.claimId
)
-- Statement using the CTE
Select
tfrTo.*
From cte_tfrNum as tfrTo
Thank you! I was able to take what you gave me and add a few things to be able to look at what I needed.
select
case when abc.tfrMax > abc.tfrnum then datediff(day,lag(abc.changedate) over(partition by abc.claimID order by abc.claimId),abc.changeDate)
when abc.tfrMax = abc.tfrnum then datediff(day,lag(abc.changedate) over(partition by abc.claimID order by abc.claimId),abc.changeDate)
end as test
, abc.*
from
(
SELECT
a.claimId
,a.changeDate
,a.AssignedAdj
,a.TransferedAdj
,a.Coverage
,ROW_NUMBER() Over ( Partition By a.claimId Order By a.changeDate) as tfrNum
,b.tfrMax
FROM db1.dbo.Assign_Transfer a
Left Join
(SELECT claimId, COUNT(*) as tfrMax
FROM db1.dbo.Assign_Transfer
Group By claimId
) as b
On b.claimId = a.claimId
) abc
group by
abc.claimId
,abc.changeDate
,abc.AssignedAdj
,abc.TransferedAdj
,abc.Coverage
,abc.tfrMax
,abc.tfrNum

Flatten/merge overlapping time intervals

I have a 'Service' table with millions of rows. Each row corresponds to a service provided by a staff in a given date and time interval (Each row has a unique ID). There are cases where a staff might provide services in overlapping time frames. I need to write a query that merges overlapping time intervals and returns the data in the format shown below.
I tried grouping by StaffID and Date fields and getting the Min of BeginTime and Max of EndTime but that does not account for the non-overlapping time frames. How can I accomplish this? Again, the table contains several million records so a recursive CTE approach might have performance issues. Thanks in advance.
Service Table
ID StaffID Date BeginTime EndTime
1 101 2014-01-01 08:00 09:00
2 101 2014-01-01 08:30 09:30
3 101 2014-01-01 18:00 20:30
4 101 2014-01-01 19:00 21:00
Output
StaffID Date BeginTime EndTime
101 2014-01-01 08:00 09:30
101 2014-01-01 18:00 21:00
Here is another sample data set with a query proposed by a contributor.
http://sqlfiddle.com/#!6/bfbdc/3
The first two rows in the results set should be merged into one row (06:00-08:45) but it generates two rows (06:00-08:30 & 06:00-08:45)
I only came up with a CTE query as the problem is there may be a chain of overlapping times, e.g. record 1 overlaps with record 2, record 2 with record 3 and so on. This is hard to resolve without CTE or some other kind of loops, etc. Please give it a go anyway.
The first part of the CTE query gets the services that start a new group and are do not have the same starting time as some other service (I need to have just one record that starts a group). The second part gets those that start a group but there's more then one with the same start time - again, I need just one of them. The last part recursively builds up on the starting group, taking all overlapping services.
Here is SQLFiddle with more records added to demonstrate different kinds of overlapping and duplicate times.
I couldn't use ServiceID as it would have to be ordered in the same way as BeginTime.
;with flat as
(
select StaffID, ServiceDate, BeginTime, EndTime, BeginTime as groupid
from services S1
where not exists (select * from services S2
where S1.StaffID = S2.StaffID
and S1.ServiceDate = S2.ServiceDate
and S2.BeginTime <= S1.BeginTime and S2.EndTime <> S1.EndTime
and S2.EndTime > S1.BeginTime)
union all
select StaffID, ServiceDate, BeginTime, EndTime, BeginTime as groupid
from services S1
where exists (select * from services S2
where S1.StaffID = S2.StaffID
and S1.ServiceDate = S2.ServiceDate
and S2.BeginTime = S1.BeginTime and S2.EndTime > S1.EndTime)
and not exists (select * from services S2
where S1.StaffID = S2.StaffID
and S1.ServiceDate = S2.ServiceDate
and S2.BeginTime < S1.BeginTime
and S2.EndTime > S1.BeginTime)
union all
select S.StaffID, S.ServiceDate, S.BeginTime, S.EndTime, flat.groupid
from flat
inner join services S
on flat.StaffID = S.StaffID
and flat.ServiceDate = S.ServiceDate
and flat.EndTime > S.BeginTime
and flat.BeginTime < S.BeginTime and flat.EndTime < S.EndTime
)
select StaffID, ServiceDate, MIN(BeginTime) as begintime, MAX(EndTime) as endtime
from flat
group by StaffID, ServiceDate, groupid
order by StaffID, ServiceDate, begintime, endtime
Elsewhere I've answered a similar Date Packing question with
a geometric strategy. Namely, I interperet the date ranges
as a line, and utilize geometry::UnionAggregate to merge
the ranges.
Your question has two peculiarities though. First, it calls
for sql-server-2008. geometry::UnionAggregate is not then
avialable. However, download the microsoft library at
https://github.com/microsoft/SQLServerSpatialTools and load
it in as a clr assembly to your instance and you have it
available as dbo.GeometryUnionAggregate.
But the real peculiarity that has my interest is the concern
that you have several million rows to work with. So I thought
I'd repeat the strategy here but with an added technique to
improve it's performance. This technique will work well if
you have a lot of your StaffID/date subsets that are the same.
First, let's build a numbers table. Swap this out with your favorite
way to do it.
select i = row_number() over (order by (select null))
into #numbers
from #services; -- where i put your data
Then convert the dates to floats and use those floats to create
geometrical points.
These points can then be turned into lines via STUnion and STEnvelope.
With your ranges now represented as geometric lines, merge them via
UnionAggregate. The resulting geometry object 'lines' might contain
multiple lines. But any overlapping lines turn into one line.
select s.StaffID,
s.Date,
linesWKT = geometry::UnionAggregate(line).ToString()
-- If you have SQLSpatialTools installed then:
-- linesWKT = dbo.GeometryUnionAggregate(line).ToString()
into #aggregateRangesToGeo
from #services s
cross apply (select
beginTimeF = convert(float, convert(datetime,beginTime)),
endTimeF = convert(float, convert(datetime,endTime))
) prepare
cross apply (select
beginPt = geometry::Point(beginTimeF, 0, 0),
endPt = geometry::Point(endTimeF, 0, 0)
) pointify
cross apply (select
line = beginPt.STUnion(endPt).STEnvelope()
) lineify
group by s.StaffID,
s.Date;
You have one 'lines' object for each staffId/date combo. But depending
on your dataset, there may be many 'lines' objects that are the same
between these combos. This may very well be true if staff are expected
to follow a routine and data is recorded to the nearest whatever.
So get a distinct lising of 'lines' objects. This should improve
performance.
From this, extract the individual lines inside 'lines'. Envelope the lines,
which ensures that the lines are stored only as their endpoints. Read the
endpoint x values and convert them back to their time representations.
Keep the WKT representation to join it back to the combos later on.
select lns.linesWKT,
beginTime = convert(time, convert(datetime, ap.beginTime)),
endTime = convert(time, convert(datetime, ap.endTime))
into #parsedLines
from (select distinct linesWKT from #aggregateRangesToGeo) lns
cross apply (select
lines = geometry::STGeomFromText(linesWKT, 0)
) geo
join #numbers n on n.i between 1 and geo.lines.STNumGeometries()
cross apply (select
line = geo.lines.STGeometryN(n.i).STEnvelope()
) ln
cross apply (select
beginTime = ln.line.STPointN(1).STX,
endTime = ln.line.STPointN(3).STX
) ap;
Now just join your parsed data back to the StaffId/Date combos.
select ar.StaffID,
ar.Date,
pl.beginTime,
pl.endTime
from #aggregateRangesToGeo ar
join #parsedLines pl on ar.linesWKT = pl.linesWKT
order by ar.StaffID,
ar.Date,
pl.beginTime;

RowNumber() and Partition By performance help wanted

I've got a table of stock market moving average values, and I'm trying to compare two values within a day, and then compare that value to the same calculation of the prior day. My sql as it stands is below... when I comment out the last select statement that defines the result set, and run the last cte shown as the result set, I get my data back in about 15 minutes. Long, but manageable since it'll run as an insert sproc overnight. When I run it as shown, I'm at 40 minutes before any results even start to come in. Any ideas? It goes from somewhat slow, to blowing up, probably with the addition of ROW_NUMBER() OVER (PARTITION BY) BTW I'm still working through the logic, which is currently impossible with this performance issue. Thanks in advance..
Edit: I fixed my partition as suggested below.
with initialSmas as
(
select TradeDate, Symbol, Period, Value
from tblDailySMA
),
smaComparisonsByPer as
(
select i.TradeDate, i.Symbol, i.Period FastPer, i.Value FastVal,
i2.Period SlowPer, i2.Value SlowVal, (i.Value-i2.Value) FastMinusSlow
from initialSmas i join initialSmas as i2 on i.Symbol = i2.Symbol
and i.TradeDate = i2.TradeDate and i2.Period > i.Period
),
smaComparisonsByPerPartitioned as
(
select ROW_NUMBER() OVER (PARTITION BY sma.Symbol, sma.FastPer, sma.SlowPer
ORDER BY sma.TradeDate) as RowNum, sma.TradeDate, sma.Symbol, sma.FastPer,
sma.FastVal, sma.SlowPer, sma.SlowVal, sma.FastMinusSlow
from smaComparisonsByPer sma
)
select scp.TradeDate as LatestDate, scp.FastPer, scp.FastVal, scp.SlowPer, scp.SlowVal,
scp.FastMinusSlow, scp2.TradeDate as LatestDate, scp2.FastPer, scp2.FastVal, scp2.SlowPer,
scp2.SlowVal, scp2.FastMinusSlow, (scp.FastMinusSlow * scp2.FastMinusSlow) as Comparison
from smaComparisonsByPerPartitioned scp join smaComparisonsByPerPartitioned scp2
on scp.Symbol = scp2.Symbol and scp.RowNum = (scp2.RowNum - 1)
1) You have some fields both in the Partition By and the Order By clauses. That doesn't make sense since you will have one and only one value for each (sma.FastPer, sma.SlowPer). You can safely remove these fields from the Order By part of the window function.
2) Assuming that you already have indexes for adequate performance in "initialSmas i join initialSmas" and that you already have and index for (initialSmas.Symbol, initialSmas.Period, initialSmas.TradeDate) the best you can do is to copy smaComparisonsByPer into a temporary table where you can create an index on (sma.Symbol, sma.FastPer, sma.SlowPer, sma.TradeDate)

Resources