SQL Server, How to group rows that are near in time

SQL Server, How to group rows that are near in time - sql-server

I have a table that has a time value, and a user id, and I want to group the rows if they are near in time (less than 2 mn between each row), and group them by user id.
Here is an Example :
CreatedAt | User ID
'16:01:01' | '01'
'16:02:20' | '01'
'16:03:20' | '01'
'16:04:20' | '01'
'16:05:20' | '02'
'16:06:20' | '02'
'16:07:20' | '02'
'16:08:20' | '02'
'16:14:02' | '02'
'16:15:01' | '02'
'16:20:02' | '03'
The result should be :
User ID = 01
'16:01:01'
'16:02:20'
'16:03:20'
'16:04:20'
User ID = 02
'16:05:20'
'16:06:20'
'16:07:20'
'16:08:20'
'16:14:02'
'16:15:01'
User ID = 03
'16:20:02'
I'm not even sure if it's doable by SQL, or I have to code it (I have few millions lines in my database so it's not the most effective way).
Thanks for your help.

This assigns a "Group Number" to the sets. however, not sure what this really achieves, but might help you achieve what you want on your presentation layer:
WITH VTE AS(
SELECT CONVERT(time(0), V.CreatedAt) AS CreatedAt, UserID
FROM (VALUES ('16:01:01','01'),
('16:02:20','01'),
('16:03:20','01'),
('16:04:20','01'),
('16:05:20','02'),
('16:06:20','02'),
('16:07:20','02'),
('16:08:20','02'),
('16:14:02','02'),
('16:15:01','02'),
('16:20:02','03')) V(CreatedAt, UserID)),
TimeDiff AS(
SELECT *,
CASE WHEN DATEDIFF(SECOND,LAG(CreatedAt,1,CreatedAt) OVER (PARTITION BY UserID ORDER BY CreatedAt ASC),CreatedAt) <= 120 THEN 1 ELSE 0 END AS Succession
FROM VTE)
SELECT TD.CreatedAt,
TD.UserID,
COUNT(CASE WHEN TD.Succession = 0 THEN 1 END) OVER (PARTITION BY UserID ORDER BY TD.CreatedAt
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS GroupNumber
FROM TimeDiff TD;

Related

SQL Server - assign value to a field based on a running total

For a customer, I'm sending through an XML file to another system, the sales orders and I sum the quantities for each item across all sales orders lines (e.g.: if I have "ItemA" in 10 sales orders with different quantities in each one, I sum the quantity and send the total).
In return, I get a response whether the requested quantities can be delivered to the customers or not. If not, I still get the total quantity that can be delivered. However, could be situations when I request 100 pieces of "ItemA" and I cannot deliver all 100, but 98. In cases like this, I need to distribute (to UPDATE a custom field) those 98 pieces FIFO, according to the requested quantity in each sales order and based on the registration date of each sales order.
I tried to use a WHILE LOOP but I couldn't achieve the desired result. Here's my piece of code:
DECLARE #PickedQty int
DECLARE #PickedERPQty int
DECLARE #OrderedERPQty int=2
SET #PickedQty =
WHILE (#PickedQty>0)
BEGIN
SET #PickedERPQty=(SELECT CASE WHEN #PickedQty>#OrderedERPQty THEN #OrderedERPQty ELSE #PickedQty END)
SET #PickedQty=#PickedQty-#PickedERPQty
PRINT #PickedQty
IF #PickedQty>=0
BEGIN
UPDATE OrderLines
SET UDFValue2=#PickedERPQty
WHERE fDocID='82DADC71-6706-44C7-9B78-7FCB55D94A69'
END
IF #PickedQty <= 0
BREAK;
END
GO
Example of response
I requested 35 pieces but only 30 pieces are available to be delivered. I need to distribute those 30 pieces for each sales order, based on requested quantity and also FIFO, based on the date of the order. So, in this example, I will update the RealQty column with the requested quantity (because I have stock) and in the last one, I assign the remaining 5 pieces.
ord_Code CustOrderCode Date ItemCode ReqQty AvailQty RealQty
----------------------------------------------------------------------------
141389 CV/2539 2018-11-25 PX085 10 30 10
141389 CV/2550 2018-11-26 PX085 5 30 5
141389 CV/2563 2018-11-27 PX085 10 30 10
141389 CV/2564 2018-11-28 PX085 10 30 5
Could anyone give me a hint? Thanks

This might be more verbose than it needs to be, but I'll leave it to you to skinny it down if that's possible.
Set up the data:
DECLARE #OrderLines TABLE(
ord_Code INTEGER NOT NULL
,CustOrderCode VARCHAR(7) NOT NULL
,[Date] DATE NOT NULL
,ItemCode VARCHAR(5) NOT NULL
,ReqQty INTEGER NOT NULL
,AvailQty INTEGER NOT NULL
,RealQty INTEGER NOT NULL
);
INSERT INTO #OrderLines(ord_Code,CustOrderCode,[Date],ItemCode,ReqQty,AvailQty,RealQty) VALUES (141389,'CV/2539','2018-11-25','PX085',10,0,0);
INSERT INTO #OrderLines(ord_Code,CustOrderCode,[Date],ItemCode,ReqQty,AvailQty,RealQty) VALUES (141389,'CV/2550','2018-11-26','PX085', 5,0,0);
INSERT INTO #OrderLines(ord_Code,CustOrderCode,[Date],ItemCode,ReqQty,AvailQty,RealQty) VALUES (141389,'CV/2563','2018-11-27','PX085',10,0,0);
INSERT INTO #OrderLines(ord_Code,CustOrderCode,[Date],ItemCode,ReqQty,AvailQty,RealQty) VALUES (141389,'CV/2564','2018-11-28','PX085',10,0,0);
DECLARE #AvailQty INTEGER = 30;
For running totals, for SQL Server 20012 and up anyway, SUM() OVER is the preferred technique so I started off with some variants on that. This query brought in some useful numbers:
SELECT
ol.ord_Code,
ol.CustOrderCode,
ol.Date,
ol.ItemCode,
ol.ReqQty,
#AvailQty AS AvailQty,
SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date]) AS TotalOrderedQty,
#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date]) AS RemainingQty
FROM
#OrderLines AS ol;
Then I used the RemainingQty to do a little math. The CASE expression is hairy, but the first step checks to see if the RemainingQty after processing this row will be positive, and if it is, we fulfill the order. If not, we fulfill what we can. The nested CASE is there to stop negative numbers from coming into the result set.
SELECT
ol.ord_Code,
ol.CustOrderCode,
ol.Date,
ol.ItemCode,
ol.ReqQty,
#AvailQty AS AvailQty,
SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date]) AS TotalOrderedQty,
#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date]) AS RemainingQty,
CASE
WHEN (#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date])) > 0
THEN ol.ReqQty
ELSE
CASE
WHEN ol.ReqQty + (#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date])) > 0
THEN ol.ReqQty + (#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date]))
ELSE 0
END
END AS RealQty
FROM
#OrderLines AS ol
Windowing functions (like SUM() OVER) can only be in SELECT and ORDER BY clauses, so I had to do a derived table with a JOIN. A CTE would work here, too, if you prefer. But I used that derived table to UPDATE the base table.
UPDATE Lines
SET
Lines.AvailQty = d.AvailQty
,Lines.RealQty = d.RealQty
FROM
#OrderLines AS Lines
JOIN
(
SELECT
ol.ord_Code,
ol.CustOrderCode,
ol.Date,
ol.ItemCode,
#AvailQty AS AvailQty,
CASE
WHEN (#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date])) > 0
THEN ol.ReqQty
ELSE
CASE
WHEN ol.ReqQty + (#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date])) > 0
THEN ol.ReqQty + (#AvailQty-SUM(ReqQty) OVER (PARTITION BY ord_Code ORDER BY [Date]))
ELSE 0
END
END AS RealQty
FROM
#OrderLines AS ol
) AS d
ON d.CustOrderCode = Lines.CustOrderCode
AND d.ord_Code = Lines.ord_Code
AND d.ItemCode = Lines.ItemCode
AND d.Date = Lines.Date;
SELECT * FROM #OrderLines;
Results:
+----------+---------------+---------------------+----------+--------+----------+---------+
| ord_Code | CustOrderCode | Date | ItemCode | ReqQty | AvailQty | RealQty |
+----------+---------------+---------------------+----------+--------+----------+---------+
| 141389 | CV/2539 | 25.11.2018 00:00:00 | PX085 | 10 | 30 | 10 |
| 141389 | CV/2550 | 26.11.2018 00:00:00 | PX085 | 5 | 30 | 5 |
| 141389 | CV/2563 | 27.11.2018 00:00:00 | PX085 | 10 | 30 | 10 |
| 141389 | CV/2564 | 28.11.2018 00:00:00 | PX085 | 10 | 30 | 5 |
+----------+---------------+---------------------+----------+--------+----------+---------+
Play with different available qty values here: https://rextester.com/MMFAR17436

TSQL - Return duplicate rows with highest value and longest date

I have got a list of staff who are contractors and it includes duplicates as some work on multiple contracts at the same time. I need to find the row with the most hours for that person and secondly with the end date furthest away (if the hours is the same). I guess this is the Current main contract. I also need to make sure the Date From and the Date to is in between the current date - how can this be done?
+------------+----------+------+-------+------------+------------+
| ContractID | PersonID | Name | Hours | Date From | Date To |
+------------+----------+------+-------+------------+------------+
| 8 | 1 | John | 30 | 20/02/2018 | 26/02/2018 |
| 8 | 2 | Paul | 5 | 20/02/2018 | 26/02/2018 |
| 7 | 3 | John | 7 | 20/02/2018 | 26/02/2018 |
+------------+----------+------+-------+------------+------------+
In the above example, I would need to bring back the John – 30hours and the Paul 5 Hours row. PS - The PersonID is different for each row but the "Name" is the same for the person if on multiple contracts.
Thanks

One approach is simply to use exists with appropriate ordering logic:
select c.*
from contracts c
where c.contractid = (select top 1 c2.contractid
from contracts c2
where c2.name = c.cname and
getdate() >= c2.datefrom and
getdate() < c2.dateto
order by c2.hours desc, c2.dateto desc
);
You can put similar logic into a window function:
select c.*
from (select c.*,
row_number() over (partition by c.name order by c.hours desc, c.dateto desc) as seqnum
from contracts c
where getdate() >= c.dateto and getdate() < c.datefrom
) c
where seqnum = 1;

If you need the full row, I'd do somehthing like this:
with
rankedByHours as (
select
ContractID,
PersonID,
Name,
Hours,
[Date From],
[Date To],
row_number() over (partition by PersonID order by Hours desc) as RowID
from
Contracts
)
select
ContractID,
PersonID,
Name,
Hours,
[Date From],
[Date To],
case
when getdate() between [Date From] and [Date To] then 'Current'
when getdate() < [Date From] then 'Not Started'
else 'Expired'
end as ContractStatus
from
RankedByHours
where
RowID = 1;
Use the CTE to inject a row_number() sorting all rows by your sort criteria, then select out the top one in the main body. It can be easily extended to also capture your farthest-out end date.

Splitting data from one record is a specific column T-SQL

I'm working on a old legacy database that got imported into SQL Server 2012 from Oracle. I have the following table called INSOrders which includes a column called OrderID of type varchar(8).
An example of the data inserted is:
A04-05 | B81-02 | C02-01
A01-01 | B95-01 | C99-05
A02-02 | B06-07 | C03-02
A98-06 | B10-01 | C17-01
A78-07 | B02-03 | C15-03
A79-01 | B02-01 | C78-06
First Letter = Ordertype, next 2 digit = Year - and last 2 digit = OrderNum within that Year.
So I split all the data into 3 column : (not stored , just presented)
select
orderid,
substring(orderid, 0, patindex('%[0-9]%', orderid)) as ordtype,
right(max(datepart(yyyy, '01/01/' + substring(orderid, patindex('%[0-9]-%', orderid) - 1, 2))),2) as year,
max(substring(orderid, patindex('%-[0-9]%', orderid) + 1, 2)) as ordnum
from
ins.insorders
where
orderid is not null
group by
substring(orderid, 0, patindex('%[0-9]%', orderid)), orderid
order by
ordtype
It is looking like this:
OrderID | OrderType | OrderYear | OrderNum
---------+-------------+-------------+----------
A04-05 | A | 04 | 05
A01-01 | A | 01 | 01
B10-03 | B | 10 | 03
B95-01 | B | 95 | 01
etc....
But now I just want to select the Max for all of the OrderType: show only the max for letter A, Show the max for letter B, etc. What I mean Max, I mean from Letter A I need to show the latest year and the latest ordernumber. so if I have A04-01 and A04-02 Just show A04-02.
I need to modify my query were I can see the following:
OrderID | OrderType | OrderYear | OrderNum
---------+-------------+-------------+----------
A04-05 | A | 04 | 05
B10-03 | B | 10 | 03
C17-01 | C | 17 | 01
Thank you, I will truly appreciate the help.

You can try the below. Using your original query as a cte and assigning row numbers to each group of order types based on order year and order number. Then get all row number 1's which should be the max for each order type.
This little bit DATEPART(yyyy,('01/01/' + OrderYear)) will make sure we get the correct year so that 95 is 1995 and 10 is 2010 etc.
;WITH cte
AS (
select orderid,
substring(orderid, 0, patindex('%[0-9]%', orderid)) as ordtype,
right(max(datepart(yyyy,'01/01/' + substring(orderid, patindex('%[0-9]-%', orderid) - 1, 2))),2) as year,
max(substring(orderid, patindex('%-[0-9]%', orderid) + 1, 2)) as ordnum
from ins.insorders
where orderid is not null
group by substring(orderid, 0, patindex('%[0-9]%', orderid)), orderid
)
SELECT *
FROM
(SELECT
*
, ROW_NUMBER() OVER (PARTITION BY OrderType ORDER BY DATEPART(yyyy,('01/01/' + OrderYear)) DESC, OrderNum DESC) AS RowNum
FROM cte) t
WHERE t.RowNum = 1

The data is represented poorly and I only have a way to "cheese" it, and we'll need to make a lot of assumptions:
with cte_example
as
( your query )
select OrderID
,OrderType
,OrderYear
,OrderNum
from
(select *, row_number() over(partition by OrderType order by OrderYear DESC) rn
from cte_example
where OrderYear <= right(year(getdate()),2)) t1
where t1.rn = 1
Since you already have a query extracting the information I won't bother changing it. We wrap your query in a CTE, query from it and apply the row_number function to decide whichOrderType has the most recent OrderYear, along with its OrderNum and OrderID
Now the tricky part is that the years are poorly represented (assuming my comment on your original post is true), then using any sort of aggregation for OrderType B will return 95 since it is numerically greatest.
We make the assumption that no order date will be greater than this current year, and anything greater is in the 90s, using this statement: where OrderYear < right(year(getdate()),2). In other words get this year and the two right characters of it. First by retrieving 2017 from getdate and then 17 with the RIGHT function. I'm sure why you can see this is dangerous, because what if your latest date is 1999?
So by filtering them out, we can then see the latest year for each OrderType... hope this helps.
Here is the rextester test I built around to play with your query in case you want to try it.

I think your original query was almost exactly what you needed except you need to use MAX(OrderID) and not group by it.
declare #Something table
(
orderid varchar(6)
)
insert #Something
(
orderid
) values
('A04-05'), ('B81-02'), ('C02-01'),
('A01-01'), ('B95-01'), ('C99-05'),
('A02-02'), ('B06-07'), ('C03-02'),
('A98-06'), ('B10-01'), ('C17-01'),
('A78-07'), ('B02-03'), ('C15-03'),
('A79-01'), ('B02-01'), ('C78-06')
select max(orderid),
substring(orderid, 0, patindex('%[0-9]%', orderid)) as ordtype,
right(max(datepart(yyyy,'01/01/' + substring(orderid, patindex('%[0-9]-%', orderid) - 1, 2))),2) as year,
max(substring(orderid, patindex('%-[0-9]%', orderid) + 1, 2)) as ordnum
from myTable
where orderid is not null
group by substring(orderid, 0, patindex('%[0-9]%', orderid))
order by ordtype

Finding max date difference on a single column

in the below table example - Table A, we have entries for four different ID's 1,2,3,4 with the respective status and its time. I wanted to find the "ID" which took the maximum amount of time to change the "Status" from Started to Completed. In the below example it is ID = 4. I wanted to run a query and find the results, where we currently has approximately million records in a table. It would be really great, if someone provide an effective way to retrieve this data.
Table A
ID Status Date(YYYY-DD-MM HH:MM:SS)
1. Started 2017-01-01 01:00:00
1. Completed 2017-01-01 02:00:00
2. Started 2017-10-02 03:00:00
2. Completed 2017-10-02 05:00:00
3. Started 2017-15-03 06:00:00
3. Completed 2017-15-03 09:00:00
4. Started 2017-22-04 10:00:00
4. Completed 2017-22-04 15:00:00
Thanks!
Bruce

You can query as below:
Select top 1 with ties Id from #yourDate y1
join #yourDate y2
On y1.Id = y2.Id
and y1.[STatus] = 'Started'
and y2.[STatus] = 'Completed'
order by Row_number() over(order by datediff(mi,y1.[Date], y2.[date]) desc)

SELECT
started.ID, timediff(completed.date, started.date) as elapsed_time
FROM TABLE_A as started
INNER JOIN TABLE_A as completed ON (completed.ID=started.ID AND completed.status='Completed')
WHERE started.status='Started'
ORDER BY elapsed_time desc
be sure there's a index on TABLE_A for the columns ID, date

I haven't run this sql but it may solve your problem.
select a.id, max(DATEDIFF(SECOND, a.date, b.date + 1)) from TableA as a
join TableA as b on a.id = b.id
where a.status="started" and b.status="completed"

Here's a way with a correlated sub-query. Just uncomment the TOP 1 to get ID 4 in this case. This is based off your comments that there is only 1 "started" record, but could be multiple "completed" records for each ID.
declare #TableA table (ID int, Status varchar(64), Date datetime)
insert into #TableA
values
(1,'Started','2017-01-01 01:00:00'),
(1,'Completed','2017-01-01 02:00:00'),
(2,'Started','2017-02-10 03:00:00'),
(2,'Completed','2017-02-10 05:00:00'),
(3,'Started','2017-03-15 06:00:00'),
(3,'Completed','2017-03-15 09:00:00'),
(4,'Started','2017-04-22 10:00:00'),
(4,'Completed','2017-04-22 15:00:00')
select --top 1
s.ID
,datediff(minute,s.Date,e.EndDate) as TimeDifference
from #TableA s
inner join(
select
ID
,max(Date) as EndDate
from #TableA
where Status = 'Completed'
group by ID) e on e.ID = s.ID
where
s.Status = 'Started'
order by
datediff(minute,s.Date,e.EndDate) desc
RETURNS
+----+----------------+
| ID | TimeDifference |
+----+----------------+
| 4 | 300 |
| 3 | 180 |
| 2 | 120 |
| 1 | 60 |
+----+----------------+

If you know that 'started' will always be the earliest point in time for each ID and the last 'completed' record you are considering will always be the latest point in time for each ID, the following should have good performance for a large number of records:
SELECT TOP 1
id
, DATEDIFF(s, MIN([Date]), MAX([date])) AS Elapsed
FROM #TableA
GROUP BY ID
ORDER BY DATEDIFF(s, MIN([Date]), MAX([date])) DESC

Count number of days in a year with a record

I have a SQL Server table named AgentLog in which I store for each agent his daily number of sales.
+-----------+------------+-------------+
| AgentName | Date | SalesNumber |
+-----------+------------+-------------+
| John | 01.01.2014 | 45 |
| Terry | 01.01.2014 | 30 |
| John | 02.01.2014 | 20 |
| Terry | 02.01.2014 | 15 |
| Terry | 03.01.2014 | 52 |
| Terry | 04.01.2014 | 24 |
| Terry | 05.01.2014 | 12 |
| Terry | 06.01.2014 | 10 |
| Terry | 07.01.2014 | 23 |
| John | 08.01.2014 | 48 |
| Terry | 08.01.2014 | 35 |
| John | 09.01.2014 | 37 |
| Terry | 10.01.2014 | 35 |
+-----------+------------+-------------+
If an agent doesn't work on one particular day, there is no record of his sales on that date.
I want to generate a report(query) on a given date interval (ex: 01.01.2014 - 10.01.2014) that counts on how many days an agent wasn't present for work (ex: John - 6 days), was at work (John - 4 days) and also returns the date interval it wasn't present (ex: John 03.01.2014 - 07.01.2014, 10.01.2014) (there can be multiple intervals).

You need to create a custom table and populate it with a record for each date you want in your range (Feel free to go as far back in the past and forward into the future as you feel you may need.). You could do this in Excel very easily and import it.
Select *
from Custom.DateListTable dlt
left outer join agentlog ag
on dlt.Date = ag.Date

I would approach this by getting the number of dates in the interval, as well as the number of dates the agent was at work, and you then have everything you need.
To get the number of days you can use DATEDIFF:
SELECT DATEDIFF(day, '2014-01-01', '2014-10-01') AS totalDays;
To get the number of days an agent worked, you can use the COUNT(*) aggregate function:
SELECT agentName, COUNT(*) AS daysWorked
FROM myTable
GROUP BY agentName;
Then, you can just add to that query to get the days not worked by subtracting totalDays - daysWorked:
SELECT agentName, COUNT(*) AS daysWorked, (DATEDIFF(day, '2014-01-01', '2014-10-01') - COUNT(*)) AS daysMissed
FROM myTable
GROUP BY agentName;
Here is an SQL Fiddle example.

The only way I can think of to resolve this is to creating a temporary table with only one column (datetime) and save there all the dates from the selected range. You can create an stored procedure that fills that temporary table using a cursor with all the dates from the interval. Then do a LEFT join between your table and the temporary table to look for null values in your table (The days where that person didn't come to work)

Try this...
SET DATEFIRST 1; --Monday
DECLARE #StartDate DATETIME = '2014-01.01',
#EndDate DATETIME = '2014-01.10';
WITH data as (
select 0 as i, DATEADD(DAY, 0, #StartDate) as TheDate
union all
select i + 1, DATEADD(DAY, i + 1, #StartDate) as TheDate
from data
where i < (#EndDate - #StartDate)
)
SELECT a.AgentName,
SUM(CASE WHEN c.Date IS NULL THEN 1 ELSE 0 END) AS Missing,
SUM(CASE WHEN c.Date IS NOT NULL THEN 1 ELSE 0 END) AS Working
FROM Agent a
JOIN data b ON NOT EXISTS(SELECT NULL FROM SpecialDate s WHERE s.date = b.TheDate)
LEFT JOIN AgentLog c ON
c.AgentName = a.AgentName
AND c.Date = b.TheDate
WHERE DATEPART(weekday, b.TheDate) <= 5
GROUP BY a.AgentName
OPTION (MAXRECURSION 10000);
It includes a check for weekends, as well as a reference to "SpecialDate" where a list of non working days can be maintained, and excluded from the check.
Reading your question again, I realise that this will only solve half your problem.

NOTE: The following answer mainly addresses the trickiest part of the question, which is how to obtain "absence from work" intervals.
Given these values as Interval Start - End dates:
DECLARE #IntervalStart DATE = '2013-12-30'
DECLARE #IntervalEnd DATE = '2014-01-10'
the following query gives you the "absence from work" intervals:
SELECT AgentName,
DATEADD(d, 1, t.[Date]) As OffWorkStart,
DATEADD(d, -1, t.NextDate) As OffWorkEnd
FROM (
SELECT AgentName, [Date], LEAD([Date]) OVER (PARTITION BY AgentName ORDER BY [Date] ASC) As NextDate,
DATEDIFF(DAY, [Date], LEAD([Date]) OVER (PARTITION BY AgentName ORDER BY [Date] ASC)) As NextMinusCurrent
FROM #AgentLog) t
WHERE t.NextMinusCurrent > 1
-- Get marginal beginning interval (in case such an interval exists)
UNION ALL
SELECT AgentName, #IntervalStart AS OffWorkStart, DATEADD(DAY, -1, MIN([Date])) AS OffWorkEnd
FROM #AgentLog
GROUP BY AgentName
HAVING MIN([Date]) > #IntervalStart
-- Get marginal ending interval (in case such an interval exists)
UNION ALL
SELECT AgentName, DATEADD(DAY, 1, MAX([Date])) AS OffWorkStart, #IntervalEnd
FROM #AgentLog
GROUP BY AgentName
HAVING MAX([Date]) < #IntervalEnd
ORDER By AgentName, OffWorkStart
With the input data you supplied, the above query gives you the following output:
AgentName OffWorkStart OffWorkEnd
---------------------------------------
John 2013-12-30 2013-12-31
John 2014-01-03 2014-01-07
John 2014-01-10 2014-01-10
Terry 2013-12-30 2013-12-31
Terry 2014-01-09 2014-01-09
The idea behind the basic part of the query is to employ the following nested query:
SELECT AgentName,
[Date],
LEAD([Date]) OVER (PARTITION BY AgentName ORDER BY [Date] ASC) As NextDate,
DATEDIFF(DAY, [Date], LEAD([Date]) OVER (PARTITION BY AgentName ORDER BY [Date] ASC)) As NextMinusCurrent
FROM #AgentLog
in order to get any existing gaps between the days a certain agent is present for work. A value of NextMinusCurrent > 1 indicates such a gap.
Counting days is trivial once you have the above query in place. E.g. placing the above query in a CTE you can count total number of absence days with sth like:
;WITH cte (
... query goes here
)
SELECT AgentName, SUM(DATEDIFF(DAY, OffWorkStart, OffWorkEnd) + 1) AS AbsenceDays
FROM cte
GROUP By AgentName
P.S. The above query makes use of SQL Server LEAD function, which is available from SQL SERVER 2012 onwards.
SQL Fiddle here
EDIT:
CTEs together with ROW_NUMBER() can be used to simulate LEAD function. The first part of the query becomes:
;WITH cte1 AS (
SELECT AgentName,
[Date],
ROW_NUMBER() OVER (PARTITION BY AgentName ORDER BY [Date] ASC) As rn
FROM #AgentLog
),
cte2 AS (
SELECT cte1.AgentName, cte1.[Date],
cteLead.[Date] AS NextDate,
DATEDIFF(DAY, cte1.[Date], cteLead.[Date]) As NextMinusCurrent
FROM cte1
LEFT OUTER JOIN cte1 AS cteLead
ON (cte1.rn = cteLead.rn - 1) AND (cte1.AgentName = cteLead.AgentName)
)
SELECT AgentName,
DATEADD(d, 1, cte2.[Date]) As OffWorkStart,
DATEADD(d, -1, cte2.NextDate) As OffWorkEnd
FROM cte2
WHERE NextMinusCurrent > 1
SQL Fiddle for SQL Server 2008 here. I hope it executes in SQL Server 2005 also!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Server, How to group rows that are near in time - sql-server

Related

SQL Server - assign value to a field based on a running total

TSQL - Return duplicate rows with highest value and longest date

Splitting data from one record is a specific column T-SQL

Finding max date difference on a single column

Count number of days in a year with a record

Categories

Resources