Related
I have been going through stack overflow to try and work this out over the last week and I still can't work out a viable solution so was wondering if anyone could offer me some help/advice?
Explanation of the data structures
I have the following tables:
Position table (zz_position) which is used to hold the details of the
position (Job ID) include the date range that it is valid for.
PosNo Description Date_From Date_To
---------------------------------------------------------
10001 System Administrator 20170101 20231231
Resource table (zz_resource) which is used to hold the details of a resource (employee) including the date that they joined the company and left it
resID description date_from date_to
------------------------------------------
100 Sam 20160101 20991231
101 Joe 20150101 20991231
Employment table (zz_employment) which is used to link position to resources within a date from and to range
PosNo resID Date_From Date_To seqNo
---------------------------------------------------
10001 100 20180101 20180401 1
10001 101 20180601 20191231 2
10001 100 20200101 20991231 3
Problem
Now due to people changing positions, a post might not be filled for a period of time and what I am trying to do is produce a report that I can use to give me a breakdown of the status of a post at any point in time.
I know that I can produce one which fully maps each day using a calendar table however what I want is a report which produces the data in the following aggregated format:
PosNo resID Date_From Date_To seqNo
-------------------------------------------------
10001 NULL 20170101 20171231 0
10001 100 20180101 20180401 1
10001 NULL 20180402 20180530 0
10001 101 20180601 20191231 2
10001 100 20200101 20231231 3
insert into zz_employment
values ('10001', '100', '2018-01-01 00:00:00.000', '2018-04-01 00:00:00.000', 1),
('10001', '101', '2018-06-01 00:00:00.000', '2019-12-31 00:00:00.000', 2),
('10001', '100', '2020-01-01 00:00:00.000', '2099-12-31 00:00:00.000', 3)
(note how the report has taken the two lines in the table and produced a fully speced out life of the employment where the first null line date from is pulled from the position start date and the last line date to is pulled from the position end date.
Ideally I would like this as a view/function however due to the complexity I would be more than happy to have a series of T SQL statements that I can run each night as part of a data warehouse routine.
Rules
all dates are truncated to datetime so that an date_to is referencing the date that it ends not the date and time that it ends
if the post/employment/resource has no end date then it will be denoted as 20991231
if the employment itself is open ended then the date to in the employment table is denoted as 20991231 even through the position itself might end in 20231231. Ideally I would like the result to respect the position end date.
SQL code:
CREATE TABLE zz_position
(
posNo varchar(25) NOT NULL,
description varchar(25) NOT NULL,
date_from datetime NULL,
date_to datetime NULL
)
insert into zz_position
values ('10001', 'System Administrator', '2017-01-01 00:00:00.000', '2020-12-31 00:00:00.000')
go
CREATE TABLE zz_resource
(
resID varchar(25) NOT NULL,
description varchar(25) NOT NULL,
date_from datetime NULL,
date_to datetime NULL
)
insert into zz_resource
values ('100', 'Sam', '2016-01-01 00:00:00.000', '2099-12-31 00:00:00.000'),
('101', 'Joe', '2015-01-01 00:00:00.000', '2099-12-31 00:00:00.000')
go
CREATE TABLE zz_employment
(
posNo varchar(25) NOT NULL,
resID varchar(25) NOT NULL,
date_from datetime NULL,
date_to datetime NULL,
seqNo int NULL
)
insert into zz_employment
values ('10001', '100', '2018-01-01 00:00:00.000', '2018-04-01 00:00:00.000', 1),
('10001', '101', '2018-06-01 00:00:00.000', '2019-12-31 00:00:00.000', 2),
('10001', '100', '2020-01-01 00:00:00.000', '2099-12-31 00:00:00.000', 3)
There are 2 caveats for this problem:
A calendar table.
A way to correctly group unemployed periods when there's an employed period in between.
The following solution uses a calendar table (SQL included) and an DATEDIFF() with anchor-date trick to group correctly for the 2nd point.
Complete DB Fiddle here.
Solution (explanation below):
;WITH AllPositionDates AS
(
SELECT
T.posNo,
C.GeneratedDate
FROM
zz_position AS T
INNER JOIN Calendar AS C ON C.GeneratedDate BETWEEN T.date_from AND T.date_to
),
AllEmployedDates AS
(
SELECT
T.posNo,
T.resID,
T.seqNo,
C.GeneratedDate
FROM
zz_employment AS T
INNER JOIN Calendar AS C ON C.GeneratedDate BETWEEN T.date_from AND T.date_to
),
PositionsByEmployed AS
(
SELECT
P.posNo,
P.GeneratedDate,
E.resID,
E.seqNo,
NullRowNumber = ROW_NUMBER() OVER (
PARTITION BY
P.posNo,
CASE WHEN E.posNo IS NULL THEN 1 ELSE 2 END
ORDER BY
P.GeneratedDate ASC)
FROM
AllPositionDates AS P
LEFT JOIN AllEmployedDates AS E ON
P.posNo = E.posNo AND
P.GeneratedDate = E.GeneratedDate
)
SELECT
P.posNo,
P.resID,
Date_From = MIN(P.GeneratedDate),
Date_To = MAX(P.GeneratedDate),
seqNo = ISNULL(P.seqNo, 0)
FROM
PositionsByEmployed AS P
GROUP BY
P.posNo,
P.resID,
P.seqNo,
CASE WHEN P.resId IS NULL THEN P.NullRowNumber - DATEDIFF(DAY, '2000-01-01', P.GeneratedDate) END -- GroupingValueGroupingValue
ORDER BY
P.posNo,
Date_From,
Date_To
The result:
posNo resID Date_From Date_To seqNo
10001 NULL 2017-01-01 2017-12-31 0
10001 100 2018-01-01 2018-04-01 1
10001 NULL 2018-04-02 2018-05-31 0
10001 101 2018-06-01 2019-12-31 2
10001 100 2020-01-01 2020-12-31 3
Explanation
First the creating of a calendar table. This holds 1 row for each day and in this example it's limited to the first and last possible day of the job positions:
DECLARE #DateStart DATE = (SELECT MIN(P.date_from) FROM zz_position AS P)
DECLARE #DateEnd DATE = (SELECT(MAX(P.date_to)) FROM zz_position AS P)
;WITH GeneratedDates AS
(
SELECT
GeneratedDate = #DateStart
UNION ALL
SELECT
GeneratedDate = DATEADD(DAY, 1, G.GeneratedDate)
FROM
GeneratedDates AS G
WHERE
DATEADD(DAY, 1, G.GeneratedDate) <= #DateEnd
)
SELECT
DateID = IDENTITY(INT, 1, 1),
G.GeneratedDate
INTO
Calendar
FROM
GeneratedDates AS G
OPTION
(MAXRECURSION 0)
This generates the following (up to 2020-12-31, which is max date from sample data):
DateID GeneratedDate
1 2017-01-01
2 2017-01-02
3 2017-01-03
4 2017-01-04
5 2017-01-05
6 2017-01-06
7 2017-01-07
Now we use a join with a between to "spread" the periods of both the positions and the employees periods (on different CTEs), so we get 1 row for each day, for each position/employee.
-- AllPositionDates
SELECT
T.posNo,
C.GeneratedDate
FROM
zz_position AS T
INNER JOIN Calendar AS C ON C.GeneratedDate BETWEEN T.date_from AND T.date_to
-- AllEmployedDates
SELECT
T.posNo,
T.resID,
T.seqNo,
C.GeneratedDate
FROM
zz_employment AS T
INNER JOIN Calendar AS C ON C.GeneratedDate BETWEEN T.date_from AND T.date_to
With these, we join them together by position and date using LEFT JOIN, so we get all days of each position and the matching employee (if exists). We also calculate a row number for all NULL values for each position that we are gonna use later. Note that this row number increases 1 by 1 with each following date accordingly.
;WITH AllPositionDates AS
(
SELECT
T.posNo,
C.GeneratedDate
FROM
zz_position AS T
INNER JOIN Calendar AS C ON C.GeneratedDate BETWEEN T.date_from AND T.date_to
),
AllEmployedDates AS
(
SELECT
T.posNo,
T.resID,
T.seqNo,
C.GeneratedDate
FROM
zz_employment AS T
INNER JOIN Calendar AS C ON C.GeneratedDate BETWEEN T.date_from AND T.date_to
)
-- PositionsByEmployee
SELECT
P.posNo,
P.GeneratedDate,
E.resID,
E.seqNo,
NullRowNumber = ROW_NUMBER() OVER (
PARTITION BY
P.posNo,
CASE WHEN E.posNo IS NULL THEN 1 ELSE 2 END
ORDER BY
P.GeneratedDate ASC)
FROM
AllPositionDates AS P
LEFT JOIN AllEmployedDates AS E ON
P.posNo = E.posNo AND
P.GeneratedDate = E.GeneratedDate
Now with the tricky part. If we calculate the amount of days of difference between a hard-coded date and each day, we get a similar "row number" that increases consistently for each date.
SELECT
P.posNo,
P.GeneratedDate,
DateDiff = DATEDIFF(DAY, '2000-01-01', P.GeneratedDate),
P.NullRowNumber
FROM
PositionsByEmployed AS P -- This is declare with the WITH (full solution below)
ORDER BY
P.posNo,
P.GeneratedDate
We get the following:
posNo GeneratedDate DateDiff NullRowNumber
10001 2017-01-01 6210 1
10001 2017-01-02 6211 2
10001 2017-01-03 6212 3
10001 2017-01-04 6213 4
10001 2017-01-05 6214 5
10001 2017-01-06 6215 6
10001 2017-01-07 6216 7
10001 2017-01-08 6217 8
10001 2017-01-09 6218 9
If we add another column with the rest of these 2 you will see that the value remains the same:
SELECT
P.posNo,
P.GeneratedDate,
DateDiff = DATEDIFF(DAY, '2000-01-01', P.GeneratedDate),
P.NullRowNumber,
GroupingValue = P.NullRowNumber - DATEDIFF(DAY, '2000-01-01', P.GeneratedDate)
FROM
PositionsByEmployed AS P
ORDER BY
P.posNo,
P.GeneratedDate
We get:
posNo GeneratedDate DateDiff NullRowNumber GroupingValue
10001 2017-01-01 6210 1 -6209
10001 2017-01-02 6211 2 -6209
10001 2017-01-03 6212 3 -6209
10001 2017-01-04 6213 4 -6209
10001 2017-01-05 6214 5 -6209
10001 2017-01-06 6215 6 -6209
10001 2017-01-07 6216 7 -6209
10001 2017-01-08 6217 8 -6209
10001 2017-01-09 6218 9 -6209
10001 2017-01-10 6219 10 -6209
But if we scroll down until we see values that are NULL for employee (from the ROW_NUMBER() PARTITION BY expression E.PosNo), we see that the rest differs, since the ROW_NUMBER() kept increasing 1 by 1 and the DATEDIFF jumped because there are employed people in between:
posNo GeneratedDate DateDiff NullRowNumber GroupingValue
10001 2017-12-28 6571 362 -6209
10001 2017-12-29 6572 363 -6209
10001 2017-12-30 6573 364 -6209
10001 2017-12-31 6574 365 -6209
...
10001 2018-04-02 6666 366 -6300
10001 2018-04-03 6667 367 -6300
10001 2018-04-04 6668 368 -6300
10001 2018-04-05 6669 369 -6300
10001 2018-04-06 6670 370 -6300
10001 2018-04-07 6671 371 -6300
Use use this "GroupingValue" as an additional GROUP BY to correctly separate position intervals that fall outside employed intervals.
i have a table like the following
bill_id sonvinid tid date brandname
1000109201701 13413 1 2015-10-03 00:00:00.000 QED - TM
1000109201701 13741 1 2015-10-13 00:00:00.000 QED - TM
1000109201702 14258 1 2015-11-05 00:00:00.000 QED - TM
now i want to run a query in which bill_id should not repeat, and repeated column with same bill_id should be shown as null
bill_id sonvinid tid date brandname
1000109201701 13413 1 2015-10-03 00:00:00.000 QED - TM
13741 1 2015-10-13 00:00:00.000 QED - TM
1000109201702 14258 1 2015-11-05 00:00:00.000 QED - TM
i know i can't use distinct here
then, what query will be the best to run this type of select command?
SELECT CASE WHEN row_num = 1 THEN bill_id ELSE NULL END AS bill_id
, sonvinid
, tid
, date
, brandname
FROM
( SELECT bill_id
, ROW_NUMBER() OVER (PARTITION BY bill_id ORDER BY date ASC) row_num
, sonvinid
, tid
, date
, brandname
FROM table1
) a;
Anyway I agree with the Sean's comment, that this is supposed to be done on UI side
I've spent way too much time on this issue, and I'm not getting to the finish line. Please read this through before you run to a conclusion that this is a duplicate of all the other pivot with multiple columns on SO.
We have properties and units, with a table which keeps track of when something changed in the unit. We cannot change the structure of the table, as this is a vendor application.
Objective: Pull out the begin and end date for when a unit had an unavailable code of "model".
Issue: I need to filter out the dates where it was available in the middle, though that seems to omit one row of data each time (for unit 105).
what I've tried: PIVOT, CROSS APPLY in conjunction with LEAD/LAG
Here's a link to a SQLFiddle: http://sqlfiddle.com/#!6/29592/2/0
The rest of the question has the tsql from the SQLfiddle including the results which I got. The desired result is at the end.
Create table and insert sample data
DROP TABLE IF EXISTS testModelUnit;
CREATE TABLE testModelUnit(
propertykey INT NOT NULL
,unitNumber VARCHAR(10) NOT NULL
,rowStartDate DATETIME NOT NULL
,rowEndDate DATETIME NOT NULL
,unavailableCode varchar(10) NULL
,CONSTRAINT pk_testModelUnit PRIMARY KEY (propertykey, unitNumber, rowStartDate )
)
GO
INSERT INTO testModelUnit VALUES
(33,'105', '2010-11-11 00:00:00.000','2016-11-11 00:00:00.000','MODEL')
,(33,'105', '2016-11-11 00:00:00.000','2016-12-14 07:51:03.307','MODEL')
,(33,'105', '2016-12-14 07:51:03.307','2017-01-01 00:00:00.000',NULL)
,(33,'105', '2017-01-01 00:00:00.00','2017-03-21 12:21:13.703','MODEL')
,(33,'105', '2017-03-21 12:21:13.703','2017-04-21 12:21:13.703','MODEL')
,(33,'105', '2017-04-21 12:21:13.703','9999-12-31 00:00:00.000','MODEL')
,(33,'2606','2017-04-21 12:21:23.207','9999-12-31 00:00:00.000','MODEL')
,(33,'2606','2017-04-19 10:30:09.227','2017-04-21 12:21:23.207','MODEL')
,(33,'2703','2016-12-14 07:51:03.307','2017-04-19 10:29:47.970','MODEL')
,(33,'2703','2011-11-11 00:00:00.000','2016-12-14 07:51:03.307','MODEL')
GO
That gives you all the data which you need in order to test it, as unit 105 was available for a short period of time at the end of 2016.
Attempt 1 - use LEAD/LAG to determine if a date is the first in a series - then use multiple PIVOT statements
SELECT
propertykey
,unitNumber
,firstDate
,lastDate
FROM (
SELECT
propertykey
,unitNumber
,rowStartDate
,rowEndDate
,CASE
WHEN propertykey = LAG(propertykey,1,NULL) OVER (PARTITION BY propertykey,unitNumber ORDER BY rowStartDate)
AND unitNumber = LAG(unitNumber,1,NULL) OVER (PARTITION BY propertykey,unitNumber ORDER BY rowStartDate)
AND LAG(rowEndDate,1,NULL) OVER (PARTITION BY propertykey,unitNumber ORDER BY rowStartDate) = rowStartDate THEN NULL
ELSE 'firstDate'
END ISFIRST
,CASE
WHEN propertykey = LEAD(propertykey,1,NULL) OVER (PARTITION BY propertykey,unitNumber ORDER BY rowStartDate)
AND unitNumber = LEAD(unitNumber,1,NULL) OVER (PARTITION BY propertykey,unitNumber ORDER BY rowStartDate)
AND LEAD(rowStartDate,1,NULL) OVER (PARTITION BY propertykey,unitNumber ORDER BY rowStartDate) = rowEndDate THEN NULL
ELSE 'lastDate'
END ISLAST
FROM testModelUnit
WHERE UnavailableCode = 'model'
) SRC
PIVOT (
MAX(rowStartDate)
FOR isfirst in ([firstDate])
) as pivotFirst
PIVOT (
MAX(rowEndDate)
FOR islast in ([lastDate])
) as pivotLast
Results were:
propertykey unitNumber firstDate lastDate
33 105 NULL 9999-12-31 00:00:00.000
33 105 2010-11-11 00:00:00.000 NULL
33 105 2017-01-01 00:00:00.000 NULL
33 2606 NULL 9999-12-31 00:00:00.000
33 2606 2017-04-19 10:30:09.227 NULL
33 2703 NULL 2017-04-19 10:29:47.970
33 2703 2011-11-11 00:00:00.000 NULL
Issue is twofold: firstly, I have the NULLs in different rows, and secondly, I am missing an end date for unit 105 (by reversing the order of the two pivot statements, I reversed the issue, and I was then missing on start date)
Second attempt: use the LAG/LEAD as before, though this time use CROSS APPLY to get the first/last values into one column and then pivot the result
SELECT
propertykey
,unitNumber
,firstDate
,lastDate
FROM(
SELECT
propertykey
,unitNumber
,ca.col
,ca.value
FROM
(
SELECT
propertykey
,unitNumber
,rowStartDate
,rowEndDate
,CASE
WHEN propertykey = LAG(propertykey,1,NULL) OVER (PARTITION BY propertykey,unitNumber ORDER BY rowStartDate)
AND unitNumber = LAG(unitNumber,1,NULL) OVER (PARTITION BY propertykey,unitNumber ORDER BY rowStartDate)
AND LAG(rowEndDate,1,NULL) OVER (PARTITION BY propertykey,unitNumber ORDER BY rowStartDate) = rowStartDate THEN NULL
ELSE 'firstDate'
END ISFIRST
,CASE
WHEN propertykey = LEAD(propertykey,1,NULL) OVER (PARTITION BY propertykey,unitNumber ORDER BY rowStartDate)
AND unitNumber = LEAD(unitNumber,1,NULL) OVER (PARTITION BY propertykey,unitNumber ORDER BY rowStartDate)
AND LEAD(rowStartDate,1,NULL) OVER (PARTITION BY propertykey,unitNumber ORDER BY rowStartDate) = rowEndDate THEN NULL
ELSE 'lastDate'
END ISLAST
FROM testModelUnit
WHERE UnavailableCode = 'model'
) sub
OUTER APPLY (
SELECT ISFIRST, rowStartDate
UNION ALL
SELECT ISLAST, rowEndDate
) CA (col, value)
WHERE col IS NOT NULL
)src
PIVOT
(
max(value)
for col in ([firstDate],[lastDate])
) AS pivoted
Result:
propertykey unitNumber firstDate lastDate
33 105 2017-01-01 00:00:00.000 9999-12-31 00:00:00.000
33 2606 2017-04-19 10:30:09.227 9999-12-31 00:00:00.000
33 2703 2011-11-11 00:00:00.000 2017-04-19 10:29:47.970
Issue: I got rid of the NULL rows, though I am still missing one record of data for 105
Desired result:
propertykey unitNumber firstDate lastDate
33 105 2010-11-11 00:00:00.000 2016-12-14 07:51:03.307
33 105 2017-01-01 00:00:00.000 9999-12-31 00:00:00.000
33 2606 2017-04-19 10:30:09.227 9999-12-31 00:00:00.000
33 2703 2011-11-11 00:00:00.000 2017-04-19 10:29:47.970
Are you looking query like below?
Select PropertyKey, UnitNumber, Min(RowStartDate) as FirstDate, Max(rowEndDate) as LastDate from (
Select *, Bucket = Row_number() over(partition by propertykey, unitnumber order by rowStartDate) -
Row_number() over(partition by propertykey, unitnumber, unavailablecode order by rowStartDate)
from testModelUnit
) a
Where a.unavailableCode is not null
group by propertykey, unitNumber, Bucket
Output as below:
+-------------+------------+-------------------------+-------------------------+
| PropertyKey | UnitNumber | FirstDate | LastDate |
+-------------+------------+-------------------------+-------------------------+
| 33 | 105 | 2010-11-11 00:00:00.000 | 2016-12-14 07:51:03.307 |
| 33 | 105 | 2017-01-01 00:00:00.000 | 9999-12-31 00:00:00.000 |
| 33 | 2606 | 2017-04-19 10:30:09.227 | 9999-12-31 00:00:00.000 |
| 33 | 2703 | 2011-11-11 00:00:00.000 | 2017-04-19 10:29:47.970 |
+-------------+------------+-------------------------+-------------------------+
Demo
I have been struggling with a problem that should be pretty simple actually but after a full week of reading, googling, experimenting and so on, my colleague and we cannot find the proper solution. :(
The problem: We have a table with two values:
an employeenumber (P_ID, int) <--- identification of employee
a date (starttime, datetime) <--- time employee checked in
We need to know what periods each employee has been working.
When two dates are less then #gap days apart, they belong to the same period
For each employee there can be multiple records for any given day but I just need to know which dates he worked, I am not interested in the time part
As soon as there is a gap > #gap days, the next date is considered the start of a new range
A range is at least 1 day (example: 21-9-2011 | 21-09-2011) but has no maximum length. (An employee checking in every #gap - 1 days should result in a period from the first day he checked in until today)
What we think we need are the islands in this table where the gap in days is greater than #variable (#gap = 30 means 30 days)
So an example:
SOURCETABLE:
P_ID | starttime
------|------------------
12121 | 24-03-2009 7:30
12121 | 24-03-2009 14:25
12345 | 27-06-2011 10:00
99999 | 01-05-2012 4:50
12345 | 27-06-2011 10:30
12345 | 28-06-2011 11:00
98765 | 13-04-2012 10:00
12345 | 21-07-2011 9:00
99999 | 03-05-2012 23:15
12345 | 21-09-2011 12:00
45454 | 12-07-2010 8:00
12345 | 21-09-2011 17:00
99999 | 06-05-2012 11:05
99999 | 20-05-2012 12:45
98765 | 26-04-2012 16:00
12345 | 07-07-2012 14:00
99999 | 01-06-2012 13:55
12345 | 13-08-2012 13:00
Now what I need as a result is:
PERIODS:
P_ID | Start | End
-------------------------------
12121 | 24-03-2009 | 24-03-2009
12345 | 27-06-2012 | 21-07-2012
12345 | 21-09-2012 | 21-09-2012
12345 | 07-07-2012 | (today) OR 13-08-2012 <-- (less than #gap days ago) OR (last date in table)
45454 | 12-07-2010 | 12-07-2010
45454 | 17-06-2012 | 17-06-2012
98765 | 13-04-2012 | 26-04-2012
99999 | 01-05-2012 | 01-06-2012
I hope this is clear this way, I already thank you for reading this far, it would be great if you could contribute!
I've done a rough script that should get you started. Haven't bothered refining the datetimes and the endpoint comparisons might need tweaking.
select
P_ID,
src.starttime,
endtime = case when src.starttime <> lst.starttime or lst.starttime < DATEADD(dd,-1 * #gap,GETDATE()) then lst.starttime else GETDATE() end,
frst.starttime,
lst.starttime
from #SOURCETABLE src
outer apply (select starttime = MIN(starttime) from #SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime > DATEADD(dd,-1 * #gap,src.starttime)) frst
outer apply (select starttime = MAX(starttime) from #SOURCETABLE sub where src.p_id = sub.p_id and src.starttime > DATEADD(dd,-1 * #gap,sub.starttime)) lst
where src.starttime = frst.starttime
order by P_ID, src.starttime
I get the following output, which is a litle different to yours, but I think its ok:
P_ID starttime endtime starttime starttime
----------- ----------------------- ----------------------- ----------------------- -----------------------
12121 2009-03-24 07:30:00.000 2009-03-24 14:25:00.000 2009-03-24 07:30:00.000 2009-03-24 14:25:00.000
12345 2011-06-27 10:00:00.000 2011-07-21 09:00:00.000 2011-06-27 10:00:00.000 2011-07-21 09:00:00.000
12345 2011-09-21 12:00:00.000 2011-09-21 17:00:00.000 2011-09-21 12:00:00.000 2011-09-21 17:00:00.000
12345 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000
12345 2012-08-13 13:00:00.000 2012-08-16 11:23:25.787 2012-08-13 13:00:00.000 2012-08-13 13:00:00.000
45454 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000
98765 2012-04-13 10:00:00.000 2012-04-26 16:00:00.000 2012-04-13 10:00:00.000 2012-04-26 16:00:00.000
The last two output cols are the results of the outer apply sections, and are just there for debugging.
This is based on the following setup:
declare #gap int
set #gap = 30
set dateformat dmy
-----P_ID----|----starttime----
declare #SOURCETABLE table (P_ID int, starttime datetime)
insert #SourceTable values
(12121,'24-03-2009 7:30'),
(12121,'24-03-2009 14:25'),
(12345,'27-06-2011 10:00'),
(12345,'27-06-2011 10:30'),
(12345,'28-06-2011 11:00'),
(98765,'13-04-2012 10:00'),
(12345,'21-07-2011 9:00'),
(12345,'21-09-2011 12:00'),
(45454,'12-07-2010 8:00'),
(12345,'21-09-2011 17:00'),
(98765,'26-04-2012 16:00'),
(12345,'07-07-2012 14:00'),
(12345,'13-08-2012 13:00')
UPDATE: Slight rethink. Now uses a CTE to work out the gaps forwards and backwards from each item, then aggregates those:
--Get the gap between each starttime and the next and prev (use 999 to indicate non-closed intervals)
;WITH CTE_Gaps As (
select
p_id,
src.starttime,
nextgap = coalesce(DATEDIFF(dd,src.starttime,nxt.starttime),999), --Gap to the next entry
prevgap = coalesce(DATEDIFF(dd,prv.starttime,src.starttime),999), --Gap to the previous entry
isold = case when DATEDIFF(dd,src.starttime,getdate()) > #gap then 1 else 0 end --Is starttime more than gap days ago?
from
#SOURCETABLE src
cross apply (select starttime = MIN(starttime) from #SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime > src.starttime) nxt
cross apply (select starttime = max(starttime) from #SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime < src.starttime) prv
)
--select * from CTE_Gaps
select
p_id,
starttime = min(gap.starttime),
endtime = nxt.starttime
from
CTE_Gaps gap
--Find the next starttime where its gap to the next > #gap
cross apply (select starttime = MIN(sub.starttime) from CTE_Gaps sub where gap.p_id = sub.p_id and sub.starttime >= gap.starttime and sub.nextgap > #gap) nxt
group by P_ID, nxt.starttime
order by P_ID, nxt.starttime
Jon most definitively has shown us the right direction. Performance was horrible though (4million+ records in the database). And it looked like we were missing some information. With all that we learned from you we came up with the solution below. It uses elements of all the proposed answers and cycles through 3 temptables before finally spewing results but performance is good enough, as well as the data it generates.
declare #gap int
declare #Employee_id int
set #gap = 30
set dateformat dmy
--------------------------------------------------------------- #temp1 --------------------------------------------------
CREATE TABLE #temp1 ( EmployeeID int, starttime date)
INSERT INTO #temp1 ( EmployeeID, starttime)
select distinct ck.Employee_id,
cast(ck.starttime as date)
from SERVER1.DB1.dbo.checkins pd
inner join SERVER1.DB1.dbo.Team t on ck.team_id = t.id
where t.productive = 1
--------------------------------------------------------------- #temp2 --------------------------------------------------
create table #temp2 (ROWNR int, Employeeid int, ENDOFCHECKIN datetime, FIRSTCHECKIN datetime)
INSERT INTO #temp2
select Row_number() OVER (partition by EmployeeID ORDER BY t.prev) + 1 as ROWNR,
EmployeeID,
DATEADD(DAY, 1, t.Prev) AS start_gap,
DATEADD(DAY, 0, t.next) AS end_gap
from
(
select a.EmployeeID,
a.starttime as Prev,
(
select min(b.starttime)
from #temp1 as b
where starttime > a.starttime and b.EmployeeID = a.EmployeeID
) as Next
from #temp1 as a) as t
where datediff(day, prev, next ) > 30
group by EmployeeID,
t.Prev,
t.next
union -- add first known date for Employee
select 1 as ROWNR,
EmployeeID,
NULL,
min(starttime)
from #temp1 ct
group by ct.EmployeeID
--------------------------------------------------------------- #temp3 --------------------------------------------------
create table #temp3 (ROWNR int, Employeeid int, ENDOFCHECKIN datetime, STARTOFCHECKIN datetime)
INSERT INTO #temp3
select ROWNR,
Employeeid,
ENDOFCHECKIN,
FIRSTCHECKIN
from #temp2
union -- add last known date for Employee
select (select count(*) from #temp2 b where Employeeid = ct.Employeeid)+1 as ROWNR,
ct.Employeeid,
(select dateadd(d,1,max(starttime)) from #temp1 c where Employeeid = ct.Employeeid),
NULL
from #temp2 ct
group by ct.EmployeeID
---------------------------------------finally check our data-------------------------------------------------
select a1.Employeeid,
a1.STARTOFCHECKIN as STARTOFCHECKIN,
ENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN a1.ENDOFCHECKIN ELSE b1.ENDOFCHECKIN END,
year(a1.STARTOFCHECKIN) as JaarSTARTOFCHECKIN,
JaarENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN year(a1.ENDOFCHECKIN) ELSE year(b1.ENDOFCHECKIN) END,
Month(a1.STARTOFCHECKIN) as MaandSTARTOFCHECKIN,
MaandENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN month(a1.ENDOFCHECKIN) ELSE month(b1.ENDOFCHECKIN) END,
(year(a1.STARTOFCHECKIN)*100)+month(a1.STARTOFCHECKIN) as JaarMaandSTARTOFCHECKIN,
JaarMaandENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN (year(a1.ENDOFCHECKIN)*100)+month(a1.STARTOFCHECKIN) ELSE (year(b1.ENDOFCHECKIN)*100)+month(b1.ENDOFCHECKIN) END,
datediff(M,a1.STARTOFCHECKIN,b1.ENDOFCHECKIN) as MONTHSCHECKEDIN
from #temp3 a1
full outer join #temp3 b1 on a1.ROWNR = b1.ROWNR -1 and a1.Employeeid = b1.Employeeid
where not (a1.STARTOFCHECKIN is null AND b1.ENDOFCHECKIN is null)
order by a1.Employeeid, a1.STARTOFCHECKIN
I have a SQL statement.
SELECT
ID, LOCATION, CODE,MAX(DATE),FLAG
FROM
TABLE1
WHERE
DATE <= CONVERT(DATETIME,'11-11-2012')
AND EXISTS (SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE = #TEMP_CODE.CODE)
AND ID IN (14, 279)
GROUP BY
ID, LOCATION, CODE
I need rows with the nearest date to the 11-11-2012, but the table returns all the values. What am I doing wrong. Thanks
ID LOCATION CODE DATE FLAG
-------------------------------------------------------------------
14 CAR STREET,UDUPI 234 2012-08-08 00:00:00.000 0
14 CAR STREET,UDUPI 234 2012-08-10 00:00:00.000 1
14 CAR STREET,UDUPI 234 2012-08-14 00:00:00.000 0
279 MADHUGIRI 234 2012-08-08 00:00:00.000 1
279 MADHUGIRI 234 2012-08-11 00:00:00.000 0
I want to show only the rows with dates less than or equal to the given date. The required result is
ID LOCATION CODE DATE FLAG
-------------------------------------------------------------------
14 CAR STREET,UDUPI 234 2012-08-10 00:00:00.000 1
279 MADHUGIRI 234 2012-08-11 00:00:00.000 0
;WITH x AS
(
SELECT ID, Location, Code, Date, Flag,
rn = ROW_NUMBER() OVER
(PARTITION BY ID, Location, Code ORDER BY [Date] DESC)
FROM dbo.TABLE1 AS t1
WHERE [Date] <= '20121111'
AND ID IN (14, 279) -- sorry, missed this
AND EXISTS (SELECT 1 FROM #TEMP_CODE WHERE CODE = t1.CODE)
)
SELECT ID, Location, Code, Date, Flag
FROM x WHERE rn = 1;
This yields:
ID LOCATION CODE [Date] FLAG
--- ---------------- ---- ---------- ----
14 CAR STREET,UDUPI 234 2012-08-14 0
279 MADHUGIRI 234 2012-08-11 0
This disagrees with your required results, but I think those are wrong and I think you should check them.
Use a subquery to get the max date for each ID, and then join that to your table:
SELECT
ID, LOCATION, CODE, DATE, FLAG
FROM
TABLE1
JOIN (
SELECT ID AS SubID, MAX(DATE) AS SubDATE
FROM TABLE1
WHERE DATE < '11/11/2012'
AND EXISTS (SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE = #TEMP_CODE.CODE)
AND ID IN (14, 279)
GROUP BY ID
) AS SUB ON ID = SubID AND DATE = SubDATE
add a Order BY DATE LIMIT 0,2
With the order by you will make the date order by the closest to your condition in where and with the limit will return only the top 2 values!
SET ROWCOUNT 2
SELECT
ID, LOCATION, CODE,MAX(DATE),FLAG
FROM
TABLE1
WHERE
DATE <= CONVERT(DATETIME,'11-11-2012')
AND EXISTS (SELECT * FROM #TEMP_CODE WHERE TABLE1.CODE = #TEMP_CODE.CODE)
AND ID IN (14, 279)
GROUP BY
ID, LOCATION, CODE
ORDER BY DATE