Related
My goal is to check if an email is answered within 24 hours during workdays. de definition of a workday is if there is time registered in another table. this because we sometimes work on a Saturday or a Sunday or to exclude holidays. I made a view from that table that gives a 1 if the date has worktime or a 0 if there is no worktime registered.
DateWorked
HasWorked
2021-04-01 00:00:00.000
1
2021-04-02 00:00:00.000
1
2021-04-03 00:00:00.000
1
2021-04-04 00:00:00.000
0
2021-04-05 00:00:00.000
1
So for example a few situations:
1. MailIncoming: 2021-04-01 16:30:00, MailAnswering: 2021-04-02 14:00:00
This one is easy, I don't have to subtract anything and the mail is answered within 24 hours.
2. MailIncoming: 2021-04-01 09:30:00, MailAnswering: 2021-04-03 14:00:00
This one is also easy, I don't have to subtract anything and the mail is not answered within 24 hours.
3. MailIncoming: 2021-04-03 12:30:00, MailAnswering: 2021-04-05 10:00:00
There is 1 day where no one has worked, so I need to subtract 1 whole day from the total time, and in that case the email is answered within 24 hours during workdays.
4. MailIncoming: 2021-04-04 11:00:00, MailAnswering: 2021-04-05 18:00:00
The remaining 13 hours from 04 do not count toward the '24 hours during workdays' so the email is answered within 24 during workdays.
Also, there can be multiple dates with zero after each other.
So the outcome I'm looking for is:
MailIncoming
MailAnswering
TotalTime
TotalTimeWithoutDaysNotWorked
2021-04-04 11:00:00.000
2021-04-05 18:00:00.000
31
18
How can I calculate this last column? Or am I approaching this in the wrong way?
The query needs a way to generate calculated dates between MailIncoming and MailAnswering so there can be a LEFT JOIN (or INNER JOIN) to the WorkingDay table. In this case the query uses dbo.fnTally which is known to be a fast and efficient way to generate rows.
tables
drop table if exists #WorkingDay;
go
create table #WorkingDay(
DateWorked Date,
HasNotWorked int);
drop table if exists #MailIncoming;
go
create table #MailIncoming(
MailIncoming DateTime,
MailAnswering DateTime);
insert into #WorkingDay values
('2021-04-01', 0),
('2021-04-02', 0),
('2021-04-03', 0),
('2021-04-04', 1),
('2021-04-05', 0),
('2021-04-06', 0);
insert into #MailIncoming values
('2021-04-01 16:30:00', '2021-04-02 14:00:00'),
('2021-04-01 09:30:00', '2021-04-03 14:00:00'),
('2021-04-03 12:30:00', '2021-04-05 10:00:00'),
('2021-04-04 11:00:00', '2021-04-05 18:00:00');
dbo.fnTally
CREATE FUNCTION [dbo].[fnTally]
/**********************************************************************************************************************
Jeff Moden Script on SSC: https://www.sqlservercentral.com/scripts/create-a-tally-function-fntally
**********************************************************************************************************************/
(#ZeroOrOne BIT, #MaxN BIGINT)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN WITH
H2(N) AS ( SELECT 1
FROM (VALUES
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
)V(N)) --16^2 or 256 rows
, H4(N) AS (SELECT 1 FROM H2 a, H2 b) --16^4 or 65,536 rows
, H8(N) AS (SELECT 1 FROM H4 a, H4 b) --16^8 or 4,294,967,296 rows
SELECT N = 0 WHERE #ZeroOrOne = 0 UNION ALL
SELECT TOP(#MaxN)
N = ROW_NUMBER() OVER (ORDER BY N)
FROM H8
;
query
select mi.MailIncoming, mi.MailAnswering,
avg(datediff(hour, MailIncoming, MailAnswering)) hrs_to_ans,
sum(case when w.HasNotWorked=1 and
v.calc_dt > mi_dt.inc_dt and
v.calc_dt < mi_dt.ans_dt
then -24
when w.HasNotWorked=1
then datediff(hour, dateadd(day, 1, mi_dt.inc_dt), mi.MailIncoming)
else 0 end) hrs_to_sub
from #MailIncoming mi
cross apply (values (cast(MailIncoming as date),
cast(MailAnswering as date))) mi_dt(inc_dt, ans_dt)
cross apply dbo.fnTally(0, datediff(day, mi.MailIncoming, mi.MailAnswering)) fn
cross apply (values (dateadd(day, fn.n, mi_dt.inc_dt))) v(calc_dt)
left join #WorkingDay w on v.calc_dt=w.DateWorked
group by mi.MailIncoming, mi.MailAnswering
order by mi.MailIncoming;
MailIncoming MailAnswering hrs_to_ans hrs_to_sub
2021-04-01 09:30:00.000 2021-04-03 14:00:00.000 53 0
2021-04-01 16:30:00.000 2021-04-02 14:00:00.000 22 0
2021-04-03 12:30:00.000 2021-04-05 10:00:00.000 46 -24
2021-04-04 11:00:00.000 2021-04-05 18:00:00.000 31 -13
I suggest you to use a column HasNotWorked, so the tables are
create table WorkingDay(DateWorked Date, HasNotWorked int);
create table MailIncoming(MailIncoming DateTime, MailAnswering DateTime);
and the rows
insert into WorkingDay values('2021-04-01', 0);
insert into WorkingDay values('2021-04-02', 0);
insert into WorkingDay values('2021-04-03', 0);
insert into WorkingDay values('2021-04-04', 1);
insert into WorkingDay values('2021-04-05', 0);
insert into WorkingDay values('2021-04-06', 0);
insert into MailIncoming values('2021-04-04 11:00:00.000', '2021-04-06 18:00:00.000');
I want calculate the start date. If is in working day, we must consider the hour of the mail, else the first working day with
case when
(select HasNotWorked from WorkingDay where DateWorked = convert(date, MailIncoming)) = 1 then
(select min(DateWorked) from WorkingDay where DateWorked > MailIncoming and HasNotWorked = 0)
else MailIncoming end as startDate
and discard the day that are not working day
((select sum(HasNotWorked) from WorkingDay where DateWorked between convert(date, startDate)
and convert(date, MailAnswering)
) * 24) as numNotWorkingDay
so the query could be
select startDate, MailAnswering, MailIncoming, hour, numNotWorkingDay, hour - numNotWorkingDay hourWitoutWorkingDay
from (
select
MailAnswering, startDate, MailIncoming,
DateDiff("hh", startDate, MailAnswering) hour,
((select sum(HasNotWorked) from WorkingDay where DateWorked between convert(date, startDate)
and convert(date, MailAnswering)
) * 24) as numNotWorkingDay
from (
select *,
case when
(select HasNotWorked from WorkingDay where DateWorked = convert(date, MailIncoming)) = 1 then
(select min(DateWorked) from WorkingDay where DateWorked > MailIncoming and HasNotWorked = 0)
else MailIncoming end as startDate
from MailIncoming) as startCalc
) as calcTable;
sqlfiddle
I have a table of transactions that includes txn_date and cust_id.
For each customer that had a transaction in December, I want to know how many transactions that customer had in the 90 days previous to the given transaction.
This seems to be a query that I could run with a window function and a RANGE sliding window, but Snowflake doesn't support the RANGE sliding window frame.
How can I run this query in Snowflake?
How about something like this:
WITH T1 AS (
SELECT CUSTOMER_ID, TX_DATE
FROM TRANSACTIONS
WHERE TX_DATE BETWEEN '2020-12-01' AND '2020-12-31')
SELECT T2.CUSTOMER_ID, T2.TX_DATE
FROM TRANSACTIONS T2
INNER JOIN T1 ON T2.CUSTOMER_ID = T2.CUSTOMER_ID
WHERE T2.TX_DATE BETWEEN (T1.TX_DATE - 90) AND T1.TX_DATE
So much the same is NickW's answer at first.
WITH data AS (
SELECT txn_date::timestamp_ntz as txn_date, cust_id, txn_id
FROM VALUES
('2020-12-04',0, 0),
('2020-12-03',1, 1),
('2020-11-04',1, 2),
('2020-10-04',1, 3),
('2020-09-04',1, 4), -- just on 90 days
('2020-09-02',1, 5), -- too far
('2021-01-05',1, 6) -- in the future
v(txn_date , cust_id, txn_id)
), dec_txn AS (
SELECT txn_id,
cust_id,
DATEADD('day',-90, txn_date) AS win_start,
txn_date AS win_end
FROM data
WHERE date_trunc('month', txn_date) = '2020-12-01'
)
SELECT dt.*
,t.*
,datediff('days', dt.win_end, t.txn_date) as win_time
FROM dec_txn AS dt
LEFT JOIN data AS t
ON t.cust_id = dt.cust_id
AND t.txn_date between dt.win_start and win_end AND t.txn_id != dt.txn_id
;
which gives:
TXN_ID CUST_ID WIN_START WIN_END TXN_DATE CUST_ID TXN_ID WIN_TIME
1 1 2020-09-04 00:00:00.000 2020-12-03 00:00:00.000 2020-11-04 00:00:00.000 1 2 -29
1 1 2020-09-04 00:00:00.000 2020-12-03 00:00:00.000 2020-10-04 00:00:00.000 1 3 -60
1 1 2020-09-04 00:00:00.000 2020-12-03 00:00:00.000 2020-09-04 00:00:00.000 1 4 -90
0 0 2020-09-05 00:00:00.000 2020-12-04 00:00:00.000 NULL NULL NULL NULL
thus to counts we:
WITH data AS (
SELECT txn_date::timestamp_ntz as txn_date, cust_id, txn_id
FROM VALUES
('2020-12-04',0, 0),
('2020-12-03',1, 1),
('2020-11-04',1, 2),
('2020-10-04',1, 3),
('2020-09-04',1, 4), -- just on 90 days
('2020-09-02',1, 5), -- too far
('2021-01-05',1, 6) -- in the future
v(txn_date , cust_id, txn_id)
), dec_txn AS (
SELECT txn_id,
cust_id,
txn_date,
DATEADD('day',-90, txn_date) AS win_start,
txn_date AS win_end
FROM data
WHERE date_trunc('month', txn_date) = '2020-12-01'
)
SELECT dt.cust_id
,dt.txn_id
,dt.txn_date
,count(t.txn_id) as c__prior_90_days_transaction
FROM dec_txn AS dt
LEFT JOIN data AS t
ON t.cust_id = dt.cust_id
AND t.txn_date >= dt.win_start and t.txn_date < dt.win_end AND t.txn_id != dt.txn_id
GROUP BY 1,2,3
ORDER BY 1,2
;
giving:
CUST_ID TXN_ID TXN_DATE C__PRIOR_90_DAYS_TRANSACTION
0 0 2020-12-04 00:00:00.000 0
1 1 2020-12-03 00:00:00.000 3
What is not well defined in the question is what to do if there are many requests in december for one customer
What to do if there are multiple transactions in the same december day.
The above will return a row for each Dec transaction per customer, and it includes transactions that happen on the same day. But if you date/timestamp has time then it will only count transtions earlier in the same day.
But if you want prior days and the txn_date is just a date then
AND t.txn_date >= dt.win_start and t.txn_date < dt.win_end AND t.txn_id != dt.txn_id
should be used.
if txn_date is a timestamp, then dec_txn should be altered to:
dec_txn AS (
SELECT txn_id,
cust_id,
DATEADD('day',-90, txn_date::date) AS win_start,
txn_date::date AS win_end
FROM data
WHERE date_trunc('month', txn_date) = '2020-12-01'
and now that the window timestamps are truncated to days, then you will have to workout if you want midnight transaction to count on the day, or if you don't have midnight timestamps...
I am trying to create a 13 period calendar in mssql but I am a bit stuck. I am not sure if my approach is the best way to achieve this. I have my base script which can be seen below:
Set DateFirst 1
Declare #Date1 date = '20180101' --startdate should always be start of
financial year
Declare #Date2 date = '20181231' --enddate should always be start of
financial year
SELECT * INTO #CalendarTable
FROM dbo.CalendarTable(#Date1,#Date2,0,0,0)c
DECLARE #StartDate datetime,#EndDate datetime
SELECT #StartDate=MIN(CASE WHEN [Day]='Monday' THEN [Date] ELSE NULL END),
#EndDate=MAX([Date])
FROM #CalendarTable
;With Period_CTE(PeriodNo,Start,[End])
AS
(SELECT 1,#StartDate,DATEADD(wk,4,#StartDate) -1
UNION ALL
SELECT PeriodNo+1,DATEADD(wk,4,Start),DATEADD(wk,4,[End])
FROM Period_CTE
WHERE DATEADD(wk,4,[End])< =#EndDate
OR PeriodNo+1 <=13
)
select * from Period_CTE
Which gives me this:
PeriodNo Start End
1 2018-01-01 00:00:00.000 2018-01-28 00:00:00.000
2 2018-01-29 00:00:00.000 2018-02-25 00:00:00.000
3 2018-02-26 00:00:00.000 2018-03-25 00:00:00.000
4 2018-03-26 00:00:00.000 2018-04-22 00:00:00.000
5 2018-04-23 00:00:00.000 2018-05-20 00:00:00.000
6 2018-05-21 00:00:00.000 2018-06-17 00:00:00.000
7 2018-06-18 00:00:00.000 2018-07-15 00:00:00.000
8 2018-07-16 00:00:00.000 2018-08-12 00:00:00.000
9 2018-08-13 00:00:00.000 2018-09-09 00:00:00.000
10 2018-09-10 00:00:00.000 2018-10-07 00:00:00.000
11 2018-10-08 00:00:00.000 2018-11-04 00:00:00.000
12 2018-11-05 00:00:00.000 2018-12-02 00:00:00.000
13 2018-12-03 00:00:00.000 2018-12-30 00:00:00.000
The result i am trying to get is
Even if I have to take a different approach I would not mind, as long as the result is the same as the above.
dbo.CalendarTable() is a function that returns the following results. I can share the code if desired.
I'd create a general number's table like suggested here and add a column Periode13.
The trick to get the tiling is the integer division:
DECLARE #PeriodeSize INT=28; --13 "moon-months" a 28 days
SELECT TOP 100 (ROW_NUMBER() OVER(ORDER BY (SELECT NULL))-1)/#PeriodeSize
FROM master..spt_values --just a table with many rows to show the principles
You can add this to an existing numbers table with a simple update statement.
UPDATE A fully working example (using the logic linked above)
DECLARE #RunningNumbers TABLE (Number INT NOT NULL
,CalendarDate DATE NOT NULL
,CalendarYear INT NOT NULL
,CalendarMonth INT NOT NULL
,CalendarDay INT NOT NULL
,CalendarWeek INT NOT NULL
,CalendarYearDay INT NOT NULL
,CalendarWeekDay INT NOT NULL);
DECLARE #CountEntries INT = 100000;
DECLARE #StartNumber INT = 0;
WITH E1(N) AS(SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)), --10 ^ 1
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), -- 10 ^ 2 = 100 rows
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), -- 10 ^ 4 = 10,000 rows
E8(N) AS(SELECT 1 FROM E4 a CROSS JOIN E4 b), -- 10 ^ 8 = 10,000,000 rows
CteTally AS
(
SELECT TOP(ISNULL(#CountEntries,1000000)) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) -1 + ISNULL(#StartNumber,0) As Nmbr
FROM E8
)
INSERT INTO #RunningNumbers
SELECT CteTally.Nmbr,CalendarDate.d,CalendarExt.*
FROM CteTally
CROSS APPLY
(
SELECT DATEADD(DAY,CteTally.Nmbr,{ts'2018-01-01 00:00:00'})
) AS CalendarDate(d)
CROSS APPLY
(
SELECT YEAR(CalendarDate.d) AS CalendarYear
,MONTH(CalendarDate.d) AS CalendarMonth
,DAY(CalendarDate.d) AS CalendarDay
,DATEPART(WEEK,CalendarDate.d) AS CalendarWeek
,DATEPART(DAYOFYEAR,CalendarDate.d) AS CalendarYearDay
,DATEPART(WEEKDAY,CalendarDate.d) AS CalendarWeekDay
) AS CalendarExt;
--The mockup table from above is now filled and can be queried
WITH AddPeriode AS
(
SELECT Number/28 +1 AS PeriodNumber
,CalendarDate
,CalendarWeek
,r.CalendarDay
,r.CalendarMonth
,r.CalendarWeekDay
,r.CalendarYear
,r.CalendarYearDay
FROM #RunningNumbers AS r
)
SELECT TOP 100 p.*
,(SELECT MIN(CalendarDate) FROM AddPeriode AS x WHERE x.PeriodNumber=p.PeriodNumber) AS [Start]
,(SELECT MAX(CalendarDate) FROM AddPeriode AS x WHERE x.PeriodNumber=p.PeriodNumber) AS [End]
,(SELECT MIN(CalendarDate) FROM AddPeriode AS x WHERE x.PeriodNumber=p.PeriodNumber AND x.CalendarWeek=p.CalendarWeek) AS [wkStart]
,(SELECT MAX(CalendarDate) FROM AddPeriode AS x WHERE x.PeriodNumber=p.PeriodNumber AND x.CalendarWeek=p.CalendarWeek) AS [wkEnd]
,(ROW_NUMBER() OVER(PARTITION BY PeriodNumber ORDER BY CalendarDate)-1)/7+1 AS WeekOfPeriode
FROM AddPeriode AS p
ORDER BY CalendarDate
Try it out...
Hint: Do not use a VIEW or iTVF for this.
This is non-changing data and much better placed in a physically stored table with appropriate indexes.
Not abundantly sure external links are accepted here, but I wrote an article that pulls of a 5-4-4 'Crop Year' fiscal year with all the code. Feel free to use all the code in these articles.
SQL Server Calendar Table
SQL Server Calendar Table: Fiscal Years
Can you help out with a problem
I have table price table which has daily prices starting 31st Dec 2010 till todays date.The table contains daily prices
2009-12-31 00:00:00.000 1.0020945351
2010-01-01 00:00:00.000 1.0021009300
2010-01-04 00:00:00.000 1.0021910181
2010-01-05 00:00:00.000 1.0022005986
2010-01-06 00:00:00.000 1.0022428696
2010-01-07 00:00:00.000 1.0022647147
2010-01-08 00:00:00.000 1.0022842726
2010-01-11 00:00:00.000 1.0023374302
2010-01-12 00:00:00.000 1.0023465374
2010-01-13 00:00:00.000 1.0023638081
2010-01-14 00:00:00.000 1.0023856533
2010-01-00 00:00:00.000 1.0024083955
2010-01-18 00:00:00.000 1.0024779677
2010-01-19 00:00:00.000 1.0025020553
2010-01-20 00:00:00.000 1.002521135
2010-01-21 00:00:00.000 1.0025420688
2010-01-22 00:00:00.000 1.0025593397
2010-01-25 00:00:00.000 1.0026180146
2010-01-26 00:00:00.000 1.002637573
2010-01-27 00:00:00.000 1.0026648447
2010-01-28 00:00:00.000 1.0026957934
2010-01-29 00:00:00.000 1.0027267421
2010-02-01 00:00:00.000 1.0028195885
2010-02-02 00:00:00.000 1.0028573523
2010-02-03 00:00:00.000 1.0028964611
2010-02-04 00:00:00.000 1.00293557
2010-02-05 00:00:00.000 1.002973334
2010-02-08 00:00:00.000 1.0030879717
2010-02-09 00:00:00.000 1.0031279777
2010-02-10 00:00:00.000 1.003171166
2010-02-11 00:00:00.000 1.0032007452
2010-02-12 00:00:00.000 1.0032575895
2010-02-00 00:00:00.000 1.0033749191
2010-02-1 00:00:00.000 1.0034140292
2010-02-17 00:00:00.000 1.003452691
2010-02-18 00:00:00.000 1.0034918013
2010-02-19 00:00:00.000 1.0035395633
2010-02-22 00:00:00.000 1.0036664439
2010-02-23 00:00:00.000 1.0037042097
2010-02-24 00:00:00.000 1.0037510759
2010-02-25 00:00:00.000 1.0038001834
2010-02-26 00:00:00.000 1.003850077
I need to write a query to get index based on
(Last day of current month/Previous month last day) - 1 * 100.So that output comes something like this
31-Jan-10 0.01%
28-Feb-10 0.02%
31-Mar-10 0.00%
Following is one of the solution I thought about however please share best ideas to implement this problem
Extract last day of all the months with values into a temp table and then order by dates so that they subtract and put the values into another temp table
Looking forward to your help.
Try this....
DECLARE #StartDate DATETIME = '2010-01-01',
#EndDate DATETIME = GETDATE();
WITH data AS (
SELECT 1 AS i, CONVERT(DATETIME, NULL) AS StartDate, DATEADD(MONTH, 0, #StartDate) - 1 AS EndDate
UNION ALL
SELECT i + 1, data.EndDate, DATEADD(MONTH, i, #StartDate) - 1 AS EndDate
FROM data
WHERE DATEADD(MONTH, i, #StartDate) - 1 < #EndDate
)
SELECT (
((SELECT TOP 1 Rate FROM RateTable WHERE Date <= data.EndDate ORDER BY Date DESC) /
(SELECT TOP 1 Rate FROM RateTable WHERE Date <= data.StartDate ORDER BY Date DESC)- 1) * 100)
FROM DATA -- parenthesis were causing issues
WHERE data.StartDate IS NOT NULL
OPTION (MAXRECURSION 10000);
You'll need to replace the
(SELECT Rate FROM RateTable WHERE Date = data.StartDate)
and
(SELECT Rate FROM RateTable WHERE Date = data.EndDate)
With the values for your rate table. as you didn't mention column and table names in your question.
rwking indicated that there might be gaps in the rates table that would cause issues.
I've modified the subquery to bring back the first rate on or nearest the start and end dates.
Hope that helps
You can use the LAG function introduced in SQL2012 to make it a bit easier:
WITH DataWithOrder AS
(
SELECT DateField, PriceField,
ROW_NUMBER() OVER(PARTITION BY YEAR(DateField), Month(DateField) ORDER BY DateField DESC) AS Pos
FROM PriceTable
)
SELECT
DateField,
PriceField,
LAG(PriceField) OVER(ORDER BY DateField) AS PriceLastMonth,
((PriceField / LAG(PriceField) OVER(ORDER BY DateField)) - 1) * 100 AS PCIncrease
FROM DataWithOrder
WHERE Pos = 1
ORDER BY DateField
I took a very different approach than the other guy. His is more elegant and would work better if the daily data does represent every single day of every month. If there are gaps in days, however, as your sample data represents, you can try the following code.
with cte as (select mydate
, price
, ROW_NUMBER() over(partition by YEAR(mydate), MONTH(mydate)
order by day(mydate) desc) row_n
from #temp)
select mydate, price, ROW_NUMBER() over(order by mydate desc) row_num
into #temp2
from cte
where row_n = 1
alter table #temp2
add idx float
declare #counter int = 1
while #counter < (select MAX(row_num)+1 from #temp2)
begin
update t2
set t2.idx = ((t2.price/t3.price)-1)*100
from #temp2 t2 left join
#temp2 t3 on 1 = 1
where t2.row_num = #counter and t3.row_num = #counter + 1
set #counter = #counter + 1
end
select mydate, idx
from #temp2
As the other poster mentioned, you didn't provide column or table names. My process was to insert your data into a table called [#temp] with column names [mydate] and [price].
Also, the data sample you provided contains two invalid dates that I changed to arbitrary dates just for the purposes of getting code to run. (2010-01-00 and 2010-02-00)
I have been struggling with a problem that should be pretty simple actually but after a full week of reading, googling, experimenting and so on, my colleague and we cannot find the proper solution. :(
The problem: We have a table with two values:
an employeenumber (P_ID, int) <--- identification of employee
a date (starttime, datetime) <--- time employee checked in
We need to know what periods each employee has been working.
When two dates are less then #gap days apart, they belong to the same period
For each employee there can be multiple records for any given day but I just need to know which dates he worked, I am not interested in the time part
As soon as there is a gap > #gap days, the next date is considered the start of a new range
A range is at least 1 day (example: 21-9-2011 | 21-09-2011) but has no maximum length. (An employee checking in every #gap - 1 days should result in a period from the first day he checked in until today)
What we think we need are the islands in this table where the gap in days is greater than #variable (#gap = 30 means 30 days)
So an example:
SOURCETABLE:
P_ID | starttime
------|------------------
12121 | 24-03-2009 7:30
12121 | 24-03-2009 14:25
12345 | 27-06-2011 10:00
99999 | 01-05-2012 4:50
12345 | 27-06-2011 10:30
12345 | 28-06-2011 11:00
98765 | 13-04-2012 10:00
12345 | 21-07-2011 9:00
99999 | 03-05-2012 23:15
12345 | 21-09-2011 12:00
45454 | 12-07-2010 8:00
12345 | 21-09-2011 17:00
99999 | 06-05-2012 11:05
99999 | 20-05-2012 12:45
98765 | 26-04-2012 16:00
12345 | 07-07-2012 14:00
99999 | 01-06-2012 13:55
12345 | 13-08-2012 13:00
Now what I need as a result is:
PERIODS:
P_ID | Start | End
-------------------------------
12121 | 24-03-2009 | 24-03-2009
12345 | 27-06-2012 | 21-07-2012
12345 | 21-09-2012 | 21-09-2012
12345 | 07-07-2012 | (today) OR 13-08-2012 <-- (less than #gap days ago) OR (last date in table)
45454 | 12-07-2010 | 12-07-2010
45454 | 17-06-2012 | 17-06-2012
98765 | 13-04-2012 | 26-04-2012
99999 | 01-05-2012 | 01-06-2012
I hope this is clear this way, I already thank you for reading this far, it would be great if you could contribute!
I've done a rough script that should get you started. Haven't bothered refining the datetimes and the endpoint comparisons might need tweaking.
select
P_ID,
src.starttime,
endtime = case when src.starttime <> lst.starttime or lst.starttime < DATEADD(dd,-1 * #gap,GETDATE()) then lst.starttime else GETDATE() end,
frst.starttime,
lst.starttime
from #SOURCETABLE src
outer apply (select starttime = MIN(starttime) from #SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime > DATEADD(dd,-1 * #gap,src.starttime)) frst
outer apply (select starttime = MAX(starttime) from #SOURCETABLE sub where src.p_id = sub.p_id and src.starttime > DATEADD(dd,-1 * #gap,sub.starttime)) lst
where src.starttime = frst.starttime
order by P_ID, src.starttime
I get the following output, which is a litle different to yours, but I think its ok:
P_ID starttime endtime starttime starttime
----------- ----------------------- ----------------------- ----------------------- -----------------------
12121 2009-03-24 07:30:00.000 2009-03-24 14:25:00.000 2009-03-24 07:30:00.000 2009-03-24 14:25:00.000
12345 2011-06-27 10:00:00.000 2011-07-21 09:00:00.000 2011-06-27 10:00:00.000 2011-07-21 09:00:00.000
12345 2011-09-21 12:00:00.000 2011-09-21 17:00:00.000 2011-09-21 12:00:00.000 2011-09-21 17:00:00.000
12345 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000
12345 2012-08-13 13:00:00.000 2012-08-16 11:23:25.787 2012-08-13 13:00:00.000 2012-08-13 13:00:00.000
45454 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000
98765 2012-04-13 10:00:00.000 2012-04-26 16:00:00.000 2012-04-13 10:00:00.000 2012-04-26 16:00:00.000
The last two output cols are the results of the outer apply sections, and are just there for debugging.
This is based on the following setup:
declare #gap int
set #gap = 30
set dateformat dmy
-----P_ID----|----starttime----
declare #SOURCETABLE table (P_ID int, starttime datetime)
insert #SourceTable values
(12121,'24-03-2009 7:30'),
(12121,'24-03-2009 14:25'),
(12345,'27-06-2011 10:00'),
(12345,'27-06-2011 10:30'),
(12345,'28-06-2011 11:00'),
(98765,'13-04-2012 10:00'),
(12345,'21-07-2011 9:00'),
(12345,'21-09-2011 12:00'),
(45454,'12-07-2010 8:00'),
(12345,'21-09-2011 17:00'),
(98765,'26-04-2012 16:00'),
(12345,'07-07-2012 14:00'),
(12345,'13-08-2012 13:00')
UPDATE: Slight rethink. Now uses a CTE to work out the gaps forwards and backwards from each item, then aggregates those:
--Get the gap between each starttime and the next and prev (use 999 to indicate non-closed intervals)
;WITH CTE_Gaps As (
select
p_id,
src.starttime,
nextgap = coalesce(DATEDIFF(dd,src.starttime,nxt.starttime),999), --Gap to the next entry
prevgap = coalesce(DATEDIFF(dd,prv.starttime,src.starttime),999), --Gap to the previous entry
isold = case when DATEDIFF(dd,src.starttime,getdate()) > #gap then 1 else 0 end --Is starttime more than gap days ago?
from
#SOURCETABLE src
cross apply (select starttime = MIN(starttime) from #SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime > src.starttime) nxt
cross apply (select starttime = max(starttime) from #SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime < src.starttime) prv
)
--select * from CTE_Gaps
select
p_id,
starttime = min(gap.starttime),
endtime = nxt.starttime
from
CTE_Gaps gap
--Find the next starttime where its gap to the next > #gap
cross apply (select starttime = MIN(sub.starttime) from CTE_Gaps sub where gap.p_id = sub.p_id and sub.starttime >= gap.starttime and sub.nextgap > #gap) nxt
group by P_ID, nxt.starttime
order by P_ID, nxt.starttime
Jon most definitively has shown us the right direction. Performance was horrible though (4million+ records in the database). And it looked like we were missing some information. With all that we learned from you we came up with the solution below. It uses elements of all the proposed answers and cycles through 3 temptables before finally spewing results but performance is good enough, as well as the data it generates.
declare #gap int
declare #Employee_id int
set #gap = 30
set dateformat dmy
--------------------------------------------------------------- #temp1 --------------------------------------------------
CREATE TABLE #temp1 ( EmployeeID int, starttime date)
INSERT INTO #temp1 ( EmployeeID, starttime)
select distinct ck.Employee_id,
cast(ck.starttime as date)
from SERVER1.DB1.dbo.checkins pd
inner join SERVER1.DB1.dbo.Team t on ck.team_id = t.id
where t.productive = 1
--------------------------------------------------------------- #temp2 --------------------------------------------------
create table #temp2 (ROWNR int, Employeeid int, ENDOFCHECKIN datetime, FIRSTCHECKIN datetime)
INSERT INTO #temp2
select Row_number() OVER (partition by EmployeeID ORDER BY t.prev) + 1 as ROWNR,
EmployeeID,
DATEADD(DAY, 1, t.Prev) AS start_gap,
DATEADD(DAY, 0, t.next) AS end_gap
from
(
select a.EmployeeID,
a.starttime as Prev,
(
select min(b.starttime)
from #temp1 as b
where starttime > a.starttime and b.EmployeeID = a.EmployeeID
) as Next
from #temp1 as a) as t
where datediff(day, prev, next ) > 30
group by EmployeeID,
t.Prev,
t.next
union -- add first known date for Employee
select 1 as ROWNR,
EmployeeID,
NULL,
min(starttime)
from #temp1 ct
group by ct.EmployeeID
--------------------------------------------------------------- #temp3 --------------------------------------------------
create table #temp3 (ROWNR int, Employeeid int, ENDOFCHECKIN datetime, STARTOFCHECKIN datetime)
INSERT INTO #temp3
select ROWNR,
Employeeid,
ENDOFCHECKIN,
FIRSTCHECKIN
from #temp2
union -- add last known date for Employee
select (select count(*) from #temp2 b where Employeeid = ct.Employeeid)+1 as ROWNR,
ct.Employeeid,
(select dateadd(d,1,max(starttime)) from #temp1 c where Employeeid = ct.Employeeid),
NULL
from #temp2 ct
group by ct.EmployeeID
---------------------------------------finally check our data-------------------------------------------------
select a1.Employeeid,
a1.STARTOFCHECKIN as STARTOFCHECKIN,
ENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN a1.ENDOFCHECKIN ELSE b1.ENDOFCHECKIN END,
year(a1.STARTOFCHECKIN) as JaarSTARTOFCHECKIN,
JaarENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN year(a1.ENDOFCHECKIN) ELSE year(b1.ENDOFCHECKIN) END,
Month(a1.STARTOFCHECKIN) as MaandSTARTOFCHECKIN,
MaandENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN month(a1.ENDOFCHECKIN) ELSE month(b1.ENDOFCHECKIN) END,
(year(a1.STARTOFCHECKIN)*100)+month(a1.STARTOFCHECKIN) as JaarMaandSTARTOFCHECKIN,
JaarMaandENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN (year(a1.ENDOFCHECKIN)*100)+month(a1.STARTOFCHECKIN) ELSE (year(b1.ENDOFCHECKIN)*100)+month(b1.ENDOFCHECKIN) END,
datediff(M,a1.STARTOFCHECKIN,b1.ENDOFCHECKIN) as MONTHSCHECKEDIN
from #temp3 a1
full outer join #temp3 b1 on a1.ROWNR = b1.ROWNR -1 and a1.Employeeid = b1.Employeeid
where not (a1.STARTOFCHECKIN is null AND b1.ENDOFCHECKIN is null)
order by a1.Employeeid, a1.STARTOFCHECKIN