Related
I have a table with, for example this data:
ID |start_date |end_date |amount
---|------------|-----------|--------
1 |2019-03-21 |2019-05-09 |10000.00
2 |2019-04-02 |2019-04-10 |30000.00
3 |2018-11-01 |2019-01-08 |20000.00
I would like te get the splitted records back with the correct calculated amount based on the year/month.
I expect the outcome to be like this:
ID |month |year |amount
---|------|-------|--------
1 |3 | 2019 | 2200.00
1 |4 | 2019 | 6000.00
1 |5 | 2019 | 1800.00
2 |4 | 2019 |30000.00
3 |11 | 2018 | 8695.65
3 |12 | 2018 | 8985.51
3 |1 | 2019 | 2318.84
What would be the best way to achieve this? I think you would have to use DATEDIFF to get the number of days between the start_date and end_date to calculate the amount per day, but I'm not sure how to return it as records per month/year.
Tnx in advance!
This is one idea. I use a Tally to create a day for every day the amount is relevant for for that ID. Then, I aggregate the value of the Amount divided by the numbers of days, which is grouped by Month and year:
CREATE TABLE dbo.YourTable(ID int,
StartDate date,
EndDate date,
Amount decimal(12,2));
GO
INSERT INTO dbo.YourTable (ID,
StartDate,
EndDate,
Amount)
VALUES(1,'2019-03-21','2019-05-09',10000.00),
(2,'2019-04-02','2019-04-10',30000.00),
(3,'2018-11-01','2019-01-08',20000.00);
GO
--Create a tally
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT TOP (SELECT MAX(DATEDIFF(DAY, t.StartDate, t.EndDate)+1) FROM dbo.YourTable t) --Limits the rows, might be needed in a large dataset, might not be, remove as required
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) -1 AS I
FROM N N1, N N2, N N3), --1000 days, is that enough?
--Create the dates
Dates AS(
SELECT YT.ID,
DATEADD(DAY, T.I, YT.StartDate) AS [Date],
YT.Amount,
COUNT(T.I) OVER (PARTITION BY YT.ID) AS [Days]
FROM Tally T
JOIN dbo.YourTable YT ON T.I <= DATEDIFF(DAY, YT.StartDate, YT.EndDate))
--And now aggregate
SELECT D.ID,
DATEPART(MONTH,D.[Date]) AS [Month],
DATEPART(YEAR,D.[Date]) AS [Year],
CONVERT(decimal(12,2),SUM(D.Amount / D.[Days])) AS Amount
FROM Dates D
GROUP BY D.ID,
DATEPART(MONTH,D.[Date]),
DATEPART(YEAR,D.[Date])
ORDER BY D.ID,
[Year],
[Month];
GO
DROP TABLE dbo.YourTable;
GO
DB<>Fiddle
I'm working on a old legacy database that got imported into SQL Server 2012 from Oracle. I have the following table called INSOrders which includes a column called OrderID of type varchar(8).
An example of the data inserted is:
A04-05 | B81-02 | C02-01
A01-01 | B95-01 | C99-05
A02-02 | B06-07 | C03-02
A98-06 | B10-01 | C17-01
A78-07 | B02-03 | C15-03
A79-01 | B02-01 | C78-06
First Letter = Ordertype, next 2 digit = Year - and last 2 digit = OrderNum within that Year.
So I split all the data into 3 column : (not stored , just presented)
select
orderid,
substring(orderid, 0, patindex('%[0-9]%', orderid)) as ordtype,
right(max(datepart(yyyy, '01/01/' + substring(orderid, patindex('%[0-9]-%', orderid) - 1, 2))),2) as year,
max(substring(orderid, patindex('%-[0-9]%', orderid) + 1, 2)) as ordnum
from
ins.insorders
where
orderid is not null
group by
substring(orderid, 0, patindex('%[0-9]%', orderid)), orderid
order by
ordtype
It is looking like this:
OrderID | OrderType | OrderYear | OrderNum
---------+-------------+-------------+----------
A04-05 | A | 04 | 05
A01-01 | A | 01 | 01
B10-03 | B | 10 | 03
B95-01 | B | 95 | 01
etc....
But now I just want to select the Max for all of the OrderType: show only the max for letter A, Show the max for letter B, etc. What I mean Max, I mean from Letter A I need to show the latest year and the latest ordernumber. so if I have A04-01 and A04-02 Just show A04-02.
I need to modify my query were I can see the following:
OrderID | OrderType | OrderYear | OrderNum
---------+-------------+-------------+----------
A04-05 | A | 04 | 05
B10-03 | B | 10 | 03
C17-01 | C | 17 | 01
Thank you, I will truly appreciate the help.
You can try the below. Using your original query as a cte and assigning row numbers to each group of order types based on order year and order number. Then get all row number 1's which should be the max for each order type.
This little bit DATEPART(yyyy,('01/01/' + OrderYear)) will make sure we get the correct year so that 95 is 1995 and 10 is 2010 etc.
;WITH cte
AS (
select orderid,
substring(orderid, 0, patindex('%[0-9]%', orderid)) as ordtype,
right(max(datepart(yyyy,'01/01/' + substring(orderid, patindex('%[0-9]-%', orderid) - 1, 2))),2) as year,
max(substring(orderid, patindex('%-[0-9]%', orderid) + 1, 2)) as ordnum
from ins.insorders
where orderid is not null
group by substring(orderid, 0, patindex('%[0-9]%', orderid)), orderid
)
SELECT *
FROM
(SELECT
*
, ROW_NUMBER() OVER (PARTITION BY OrderType ORDER BY DATEPART(yyyy,('01/01/' + OrderYear)) DESC, OrderNum DESC) AS RowNum
FROM cte) t
WHERE t.RowNum = 1
The data is represented poorly and I only have a way to "cheese" it, and we'll need to make a lot of assumptions:
with cte_example
as
( your query )
select OrderID
,OrderType
,OrderYear
,OrderNum
from
(select *, row_number() over(partition by OrderType order by OrderYear DESC) rn
from cte_example
where OrderYear <= right(year(getdate()),2)) t1
where t1.rn = 1
Since you already have a query extracting the information I won't bother changing it. We wrap your query in a CTE, query from it and apply the row_number function to decide whichOrderType has the most recent OrderYear, along with its OrderNum and OrderID
Now the tricky part is that the years are poorly represented (assuming my comment on your original post is true), then using any sort of aggregation for OrderType B will return 95 since it is numerically greatest.
We make the assumption that no order date will be greater than this current year, and anything greater is in the 90s, using this statement: where OrderYear < right(year(getdate()),2). In other words get this year and the two right characters of it. First by retrieving 2017 from getdate and then 17 with the RIGHT function. I'm sure why you can see this is dangerous, because what if your latest date is 1999?
So by filtering them out, we can then see the latest year for each OrderType... hope this helps.
Here is the rextester test I built around to play with your query in case you want to try it.
I think your original query was almost exactly what you needed except you need to use MAX(OrderID) and not group by it.
declare #Something table
(
orderid varchar(6)
)
insert #Something
(
orderid
) values
('A04-05'), ('B81-02'), ('C02-01'),
('A01-01'), ('B95-01'), ('C99-05'),
('A02-02'), ('B06-07'), ('C03-02'),
('A98-06'), ('B10-01'), ('C17-01'),
('A78-07'), ('B02-03'), ('C15-03'),
('A79-01'), ('B02-01'), ('C78-06')
select max(orderid),
substring(orderid, 0, patindex('%[0-9]%', orderid)) as ordtype,
right(max(datepart(yyyy,'01/01/' + substring(orderid, patindex('%[0-9]-%', orderid) - 1, 2))),2) as year,
max(substring(orderid, patindex('%-[0-9]%', orderid) + 1, 2)) as ordnum
from myTable
where orderid is not null
group by substring(orderid, 0, patindex('%[0-9]%', orderid))
order by ordtype
I have a forecast of change that I need to add on to actuals.
Example:
Date Group Count ActForc
Nov-15 GrpA 10 A
Dec-15 GrpA 12 A
Jan-16 GrpA -1 F
Feb-16 GrpA 2 F
What I would like to see is:
Date Group Count
Nov-15 GrpA 10
Dec-15 GrpA 12
Jan-16 GrpA 11
Feb-16 GrpA 13
but all of the counting/running sum queries I have seen assume that I want the sections to be separate, and give me ways to create sums for each section, but essentially, I want to seed the sum for the second section with the final value from the first section, and continue from that point, without disturbing the values from the second section
If your forecasts are always in the end of the date range, you can also do this by using few window functions inside each other. Here is a running total calculated over a field that checks if the next row is 'F' then it takes count, otherwise 0. When that is then taken instead of count when the next row is F, it will contain the figure you want.
select
[date],
[group],
case when isnull(lead(ActForc) over (order by Date asc),ActForc) = 'F' then
sum(Count2) over (order by Date asc) else [Count] end,
[count],
ActForc
from (
select
[date],
[group],
case when isnull(lead(ActForc) over (order by Date asc),ActForc) = 'F' then [Count] else 0 end as Count2,
[count],
ActForc
from
table1
) X
This should perform better than any recursive CTEs / correlated subqueries because the data isn't read several times. If you have more groups, partitioning the window functions with the group should fix that.
Example in SQL Fiddle with few more months.
Try with a recursive cte.
First create a subquery to have a row_id
Then create the base case with rn = 1
And finally the recursion calculate each next level.
SQL Fiddle Demo
WITH addID as (
SELECT [Date], [Group], [Count], [ActForc],
ROW_NUMBER() OVER ( ORDER BY [DATE]) as rn
FROM myTable
), cte_name ( [Date], [Group], [Count], [level] ) AS
(
SELECT [Date], [Group], [Count], 1 as [level]
FROM addID
WHERE rn = 1
UNION ALL
SELECT A.[Date],
A.[Group],
CASE WHEN [ActForc] = 'F' THEN C.[Count] + A.[Count]
ELSE A.[Count]
END AS [Count],
C.[level] + 1
FROM addID A
INNER JOIN cte_name C
ON A.rn = C.[level] + 1
)
SELECT *
FROM cte_name
OUTPUT
| Date | Group | Count | level |
|----------------------------|-------|-------|-------|
| November, 01 2015 00:00:00 | GrpA | 10 | 1 |
| December, 01 2015 00:00:00 | GrpA | 12 | 2 |
| January, 01 2016 00:00:00 | GrpA | 11 | 3 |
| February, 01 2016 00:00:00 | GrpA | 13 | 4 |
So I have two tables. One tracking a a persons location, and one that has the shifts of staff members.
Staff members have a staffId, location, start and end times, and cost of that shift.
People have an eventId, stayId, personId, location, start and end time. A person will have an event with multiple stays.
What I am attempting to do is mesh these two tables together, so that I can accurately report the cost of each location stay, based on the duration of that stay multiplied by the associated cost of staff covering that location at that time.
The issues I have are:
Location stays do not align with staff shifts. i.e. a person might be in location a between 1pm and 2pm, and four staff might be on shifts from 12:30 to 1:30, and two on from 1:30 till 5.
There are a lot of records.
Not all staff are paid the same
My current method is to expand both tables to have a record for every single minute. So a stay that is between 1pm and 2pm will have 60 records, and a staff shift that goes for 5 hours will have 300 records. I can then take all staff that are working on that location at that minute to get a minute value based on the cost of each staff member divided by the duration of their shift, and apply that value to the corresponding record in the other table.
Techniques used:
I create a table with 50,000 numbers, since some stays can be quite
long.
I take the staff table and join onto the numbers table to split each
shift. Then group it together based on location and minute, with a
staff count and minute cost.
The final step, and the one causing issues, is where I take the
location table, join onto numbers, and also onto the modified staff
table to produce a cost for that minute. I also count the number of
people in that location to account for staff covering multiple
people.
I'm finding this process extremely slow as you can imagine, since my person table has about 500 million records when expanded to the minute level, and the staff table has about 35 million when the same thing is done.
Can people suggest a better method for me to use?
Sample data:
Locations
| EventId | ID | Person | Loc | Start | End
| 1 | 987 | 123 | 1 | May, 20 2015 07:00:00 | May, 20 2015 08:00:00
| 1 | 374 | 123 | 4 | May, 20 2015 08:00:00 | May, 20 2015 10:00:00
| 1 | 184 | 123 | 3 | May, 20 2015 10:00:00 | May, 20 2015 11:00:00
| 1 | 798 | 123 | 8 | May, 20 2015 11:00:00 | May, 20 2015 12:00:00
Staff
| Loc | StaffID | Cost | Start | End
| 1 | 99 | 40 | May, 20 2015 04:00:00 | May, 20 2015 12:00:00
| 1 | 15 | 85 | May, 20 2015 03:00:00 | May, 20 2015 5:00:00
| 3 | 85 | 74 | May, 20 2015 18:00:00 | May, 20 2015 20:00:00
| 4 | 10 | 36 | May, 20 2015 06:00:00 | May, 20 2015 14:00:00
Result
| EventId | ID | Person | Loc | Start | End | Cost
| 1 | 987 | 123 | 1 | May, 20 2015 07:00:00 | May, 20 2015 08:00:00 | 45.50
| 1 | 374 | 123 | 4 | May, 20 2015 08:00:00 | May, 20 2015 10:00:00 | 81.20
| 1 | 184 | 123 | 3 | May, 20 2015 10:00:00 | May, 20 2015 11:00:00 | 95.00
| 1 | 798 | 123 | 8 | May, 20 2015 11:00:00 | May, 20 2015 12:00:00 | 14.75
SQL:
Numbers table
;WITH x AS
(
SELECT TOP (224) object_id FROM sys.all_objects
)
SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY x.object_id)
INTO #numbers
FROM x CROSS JOIN x AS y
ORDER BY n
Staff Table
SELECT
Location,
ISNULL(SUM(ROUND(Cost/ CASE WHEN (DateDiff(MINUTE, StartDateTime, EndDateTime)) = 0 THEN 1 ELSE (DateDiff(MINUTE, StartDateTime, EndDateTime)) END, 5)),0) AS MinuteCost,
Count(Name) AS StaffCount,
RosterMinute = DATEADD(MI, DATEDIFF(MI, 0, StartDateTime) + n.n -1, 0)
INTO #temp_StaffRoster
FROM dbo.StaffRoster
Grouping together, and where help is needed I think
INSERT INTO dbo.FinalTable
SELECT [EventId]
,[Id]
,[Start]
,[End]
,event.[Location]
,SUM(ISNULL(MinuteCost,1)/ISNULL(PeopleCount, 1)) AS Cost
,AVG(ISNULL(StaffCount,1)) AS AvgStaff
FROM dbo.Events event WITH (NOLOCK)
INNER JOIN #numbers n ON n.n BETWEEN 0 AND DATEDIFF(MINUTE, Start, End)
LEFT OUTER JOIN #temp_StaffRoster staff WITH (NOLOCK) ON staff.Location= event.Location AND staff.RosterMinute = DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 , 0)
LEFT OUTER JOIN (SELECT [Location], DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 , 0) AS Mins, COUNT(Id) as PeopleCount
FROM dbo.Events WITH (NOLOCK)
INNER JOIN #numbers n ON n.n BETWEEN 0 AND DATEDIFF(MINUTE, Start, End)
GROUP BY [Location], DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 , 0)
) cap ON cap.Location= event.LocationAND cap.Mins = DATEADD(MI, DATEDIFF(MI, 0, Start) + n.n -1 , 0)
GROUP BY [EventId]
,[Id]
,[Start]
,[End]
,event.[Location]
UPDATE
So I have two tables. One tracking a a persons location, and one that has the shifts of staff members with their cost. I am attempting to consolidate the two tables to calculate the cost of each location stay.
Here is my method:
;;WITH stay AS
(
SELECT TOP 650000
StayId,
Location,
Start,
End
FROM stg_Stay
WHERE Loction IS NOT NULL -- Some locations don't currently have a matching shift location
ORDER BY Location, ADTM
),
shift AS
(
SELECT TOP 36000000
Location,
ShiftMinute,
MinuteCost,
StaffCount
FROM stg_Shifts
ORDER BY Location, ShiftMinute
)
SELECT
[StayId],
SUM(MinuteCost) AS Cost,
AVG(StaffCount) AS StaffCount
INTO newTable
FROM stay S
CROSS APPLY (SELECT MinuteCost, StaffCount
FROM shift R
WHERE R.Location = S.Location
AND R.ShiftMinute BETWEEN S.Start AND S.End
) AS Shifts
GROUP BY [StayId]
This is where I'm at.
I've split the Shifts table into a minute by minute level since there is no clear alignment of shifts to stays.
stg_Stay contains more columns than needed for this operation. stg_Shift is as shown.
Indexes used on stg_Shifts:
CREATE NONCLUSTERED INDEX IX_Shifts_Loc_Min
ON dbo.stg_Shifts (Location, ShiftMinute)
INCLUDE (MinuteCost, StaffCount);
on stg_Stay
CREATE INDEX IX_Stay_StayId ON dbo.stg_Stay (StayId);
CREATE CLUSTERED INDEX IX_Stay_Start_End_Loc ON dbo.stg_Stay (Location,Start,End);
Due to the fact that Shifts has ~36 million records and Stays has ~650k, what can I do to make this perform better?
Don't break down the rows by minutes.
Staging table may help if you can create fast relationship between them. i.e. the overlapped interval
SELECT *
FROM Locations l
OUTER APPLY -- Assume a staff won't appear in different location in the same period of time, of course.
(
SELECT
CONVERT(decimal(14,2), SUM(CostPerMinute * OverlappedMinutes)) AS ActualCost,
COUNT(DISTINCT StaffId) AS StaffCount,
SUM(OverlappedMinutes) AS StaffMinutes
FROM
(
SELECT
*,
-- Calculate overlapped time in minutes
DATEDIFF(MINUTE,
CASE WHEN StartTime > l.StartTime THEN StartTime ELSE l.StartTime END, -- Get greatest start time
CASE WHEN EndTime > l.EndTime THEN l.EndTime ELSE EndTime END -- Get least end time
) AS OverlappedMinutes,
Cost / DATEDIFF(MINUTE, StartTime, EndTime) AS CostPerMinute
FROM Staff
WHERE LocationId = l.LocationId
AND StartTime <= l.EndTime AND l.StartTime <= EndTime -- Match with overlapped time
) data
) StaffInLoc
SQL Fiddle
Take below with a grain of salt since your naming is horrible.
Location should really be a Stay as i guess location is another table defining an single physical location.
Your Staff table is also badly named. Why not name it Shift. I would expect a staff table to contain stuff like Name, Phone etc. Where a Shift table can contain multiple shifts for the same Staff etc.
Second i think your missing a relation between the two tables.
If you join Location and Staff only on Location and overlapping date times i don't think it would make a whole lot of sense for what your trying to do. How do you know which staff is at any location for a given time? Onlything you can do with location and overlapping dates is assume a entry is in the location table relates to every staff who have a shift at that location within the timeframe. So look at the below more as an inspiration to solving your problems and how to find overlapping datetime intervals and less like an actual solution to your problem since i think your data and model is in a bad shape.
If i got it all wrong please provide Primary Keys and Foreign Keys on your tables and a better explanation.
Some dummy data
DROP TABLE dbo.Location
CREATE TABLE dbo.Location
(
StayId INT,
EventId INT,
PersonId INT,
LocationId INT,
StartTime DATETIME2(0),
EndTime DATETIME2(0)
)
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES ( 987 ,1 ,123 ,1 ,'2015-05-20T07:00:00','2015-05-20T08:00:00')
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES ( 374 ,1 ,123 ,4 ,'2015-05-20T08:00:00','2015-05-20T10:00:00')
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES ( 184 ,1 ,123 ,3 ,'2015-05-20T10:00:00','2015-05-20T11:00:00')
INSERT INTO dbo.Location ( StayId ,EventId ,PersonId ,LocationId ,StartTime ,EndTime)
VALUES ( 798 ,1 ,123 ,8 ,'2015-05-20T11:00:00','2015-05-20T12:00:00')
DROP TABLE dbo.Staff
CREATE TABLE Staff
(
StaffId INT,
Cost INT,
LocationId INT,
StartTime DATETIME2(0),
EndTime DATETIME2(0)
)
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES ( 99 ,40 ,1 ,'2015-05-20T04:00:00','2015-05-20T12:00:00')
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES ( 15 ,85 ,1 ,'2015-05-20T03:00:00','2015-05-20T05:00:00')
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES ( 85 ,74 ,3 ,'2015-05-20T18:00:00','2015-05-20T20:00:00')
INSERT INTO dbo.Staff ( StaffId ,Cost ,LocationId,StartTime ,EndTime)
VALUES ( 10 ,36 ,4 ,'2015-05-20T06:00:00','2015-05-20T14:00:00')
Actual query
WITH OnLocation AS
(
SELECT
L.StayId, L.EventId, L.LocationId, L.PersonId, S.Cost
, IIF(L.StartTime > S.StartTime, L.StartTime, S.StartTime) AS OnLocationStartTime
, IIF(L.EndTime < S.EndTime, L.EndTime, S.EndTime) AS OnLocationEndTime
FROM dbo.Location L
LEFT JOIN dbo.Staff S
ON S.LocationId = L.LocationId -- TODO are you not missing a join condition on staffid
-- Detects any overlaps between stays and shifts
AND L.StartTime <= S.EndTime AND L.EndTime >= S.StartTime
)
SELECT
*
, DATEDIFF(MINUTE, D.OnLocationStartTime, D.OnLocationEndTime) AS DurationMinutes
, DATEDIFF(MINUTE, D.OnLocationStartTime, D.OnLocationEndTime) / 60.0 * Cost AS DurationCost
FROM OnLocation D
To get a summary you can take the query and add a GROUP BY for whatever your wan't to summarize.
I have a SQL Server table named AgentLog in which I store for each agent his daily number of sales.
+-----------+------------+-------------+
| AgentName | Date | SalesNumber |
+-----------+------------+-------------+
| John | 01.01.2014 | 45 |
| Terry | 01.01.2014 | 30 |
| John | 02.01.2014 | 20 |
| Terry | 02.01.2014 | 15 |
| Terry | 03.01.2014 | 52 |
| Terry | 04.01.2014 | 24 |
| Terry | 05.01.2014 | 12 |
| Terry | 06.01.2014 | 10 |
| Terry | 07.01.2014 | 23 |
| John | 08.01.2014 | 48 |
| Terry | 08.01.2014 | 35 |
| John | 09.01.2014 | 37 |
| Terry | 10.01.2014 | 35 |
+-----------+------------+-------------+
If an agent doesn't work on one particular day, there is no record of his sales on that date.
I want to generate a report(query) on a given date interval (ex: 01.01.2014 - 10.01.2014) that counts on how many days an agent wasn't present for work (ex: John - 6 days), was at work (John - 4 days) and also returns the date interval it wasn't present (ex: John 03.01.2014 - 07.01.2014, 10.01.2014) (there can be multiple intervals).
You need to create a custom table and populate it with a record for each date you want in your range (Feel free to go as far back in the past and forward into the future as you feel you may need.). You could do this in Excel very easily and import it.
Select *
from Custom.DateListTable dlt
left outer join agentlog ag
on dlt.Date = ag.Date
I would approach this by getting the number of dates in the interval, as well as the number of dates the agent was at work, and you then have everything you need.
To get the number of days you can use DATEDIFF:
SELECT DATEDIFF(day, '2014-01-01', '2014-10-01') AS totalDays;
To get the number of days an agent worked, you can use the COUNT(*) aggregate function:
SELECT agentName, COUNT(*) AS daysWorked
FROM myTable
GROUP BY agentName;
Then, you can just add to that query to get the days not worked by subtracting totalDays - daysWorked:
SELECT agentName, COUNT(*) AS daysWorked, (DATEDIFF(day, '2014-01-01', '2014-10-01') - COUNT(*)) AS daysMissed
FROM myTable
GROUP BY agentName;
Here is an SQL Fiddle example.
The only way I can think of to resolve this is to creating a temporary table with only one column (datetime) and save there all the dates from the selected range. You can create an stored procedure that fills that temporary table using a cursor with all the dates from the interval. Then do a LEFT join between your table and the temporary table to look for null values in your table (The days where that person didn't come to work)
Try this...
SET DATEFIRST 1; --Monday
DECLARE #StartDate DATETIME = '2014-01.01',
#EndDate DATETIME = '2014-01.10';
WITH data as (
select 0 as i, DATEADD(DAY, 0, #StartDate) as TheDate
union all
select i + 1, DATEADD(DAY, i + 1, #StartDate) as TheDate
from data
where i < (#EndDate - #StartDate)
)
SELECT a.AgentName,
SUM(CASE WHEN c.Date IS NULL THEN 1 ELSE 0 END) AS Missing,
SUM(CASE WHEN c.Date IS NOT NULL THEN 1 ELSE 0 END) AS Working
FROM Agent a
JOIN data b ON NOT EXISTS(SELECT NULL FROM SpecialDate s WHERE s.date = b.TheDate)
LEFT JOIN AgentLog c ON
c.AgentName = a.AgentName
AND c.Date = b.TheDate
WHERE DATEPART(weekday, b.TheDate) <= 5
GROUP BY a.AgentName
OPTION (MAXRECURSION 10000);
It includes a check for weekends, as well as a reference to "SpecialDate" where a list of non working days can be maintained, and excluded from the check.
Reading your question again, I realise that this will only solve half your problem.
NOTE: The following answer mainly addresses the trickiest part of the question, which is how to obtain "absence from work" intervals.
Given these values as Interval Start - End dates:
DECLARE #IntervalStart DATE = '2013-12-30'
DECLARE #IntervalEnd DATE = '2014-01-10'
the following query gives you the "absence from work" intervals:
SELECT AgentName,
DATEADD(d, 1, t.[Date]) As OffWorkStart,
DATEADD(d, -1, t.NextDate) As OffWorkEnd
FROM (
SELECT AgentName, [Date], LEAD([Date]) OVER (PARTITION BY AgentName ORDER BY [Date] ASC) As NextDate,
DATEDIFF(DAY, [Date], LEAD([Date]) OVER (PARTITION BY AgentName ORDER BY [Date] ASC)) As NextMinusCurrent
FROM #AgentLog) t
WHERE t.NextMinusCurrent > 1
-- Get marginal beginning interval (in case such an interval exists)
UNION ALL
SELECT AgentName, #IntervalStart AS OffWorkStart, DATEADD(DAY, -1, MIN([Date])) AS OffWorkEnd
FROM #AgentLog
GROUP BY AgentName
HAVING MIN([Date]) > #IntervalStart
-- Get marginal ending interval (in case such an interval exists)
UNION ALL
SELECT AgentName, DATEADD(DAY, 1, MAX([Date])) AS OffWorkStart, #IntervalEnd
FROM #AgentLog
GROUP BY AgentName
HAVING MAX([Date]) < #IntervalEnd
ORDER By AgentName, OffWorkStart
With the input data you supplied, the above query gives you the following output:
AgentName OffWorkStart OffWorkEnd
---------------------------------------
John 2013-12-30 2013-12-31
John 2014-01-03 2014-01-07
John 2014-01-10 2014-01-10
Terry 2013-12-30 2013-12-31
Terry 2014-01-09 2014-01-09
The idea behind the basic part of the query is to employ the following nested query:
SELECT AgentName,
[Date],
LEAD([Date]) OVER (PARTITION BY AgentName ORDER BY [Date] ASC) As NextDate,
DATEDIFF(DAY, [Date], LEAD([Date]) OVER (PARTITION BY AgentName ORDER BY [Date] ASC)) As NextMinusCurrent
FROM #AgentLog
in order to get any existing gaps between the days a certain agent is present for work. A value of NextMinusCurrent > 1 indicates such a gap.
Counting days is trivial once you have the above query in place. E.g. placing the above query in a CTE you can count total number of absence days with sth like:
;WITH cte (
... query goes here
)
SELECT AgentName, SUM(DATEDIFF(DAY, OffWorkStart, OffWorkEnd) + 1) AS AbsenceDays
FROM cte
GROUP By AgentName
P.S. The above query makes use of SQL Server LEAD function, which is available from SQL SERVER 2012 onwards.
SQL Fiddle here
EDIT:
CTEs together with ROW_NUMBER() can be used to simulate LEAD function. The first part of the query becomes:
;WITH cte1 AS (
SELECT AgentName,
[Date],
ROW_NUMBER() OVER (PARTITION BY AgentName ORDER BY [Date] ASC) As rn
FROM #AgentLog
),
cte2 AS (
SELECT cte1.AgentName, cte1.[Date],
cteLead.[Date] AS NextDate,
DATEDIFF(DAY, cte1.[Date], cteLead.[Date]) As NextMinusCurrent
FROM cte1
LEFT OUTER JOIN cte1 AS cteLead
ON (cte1.rn = cteLead.rn - 1) AND (cte1.AgentName = cteLead.AgentName)
)
SELECT AgentName,
DATEADD(d, 1, cte2.[Date]) As OffWorkStart,
DATEADD(d, -1, cte2.NextDate) As OffWorkEnd
FROM cte2
WHERE NextMinusCurrent > 1
SQL Fiddle for SQL Server 2008 here. I hope it executes in SQL Server 2005 also!