I have people that do many multi-day assignments (date x to date Y). I would like to find the date that they completed a milestone e.g. 50 days work completed.
Data is stored as a single row per Assignment
AssignmentId
StartDate
EndDate
I can sum up the total days they have completed up to a date, but am struggling to see how I would find out the date that a milestone was hit. e.g. How many people completed 50 days in October 2020 showing the date within the month that this occurred?
Thanks in advance
PS. Our database is SQL Server.
As mentioned by prwvious comments, it would be much easier to help you if you could provide example data and table structure in order help you answer this question.
However, guessing a simple DB structure with a table for your peolple, your tasks and the work each user completed, you can get the required sum of days by use of a date table (or cte) which contains a entry for each day and the window function SUM with UNBOUNDED PRECEDING. Following an example:
DECLARE #people TABLE(
id int
,name nvarchar(50)
)
DECLARE #tasks TABLE(
id int
,name nvarchar(50)
)
DECLARE #work TABLE(
people_id int
,task_id int
,task_StartDate date
,task_EndDate date
)
INSERT INTO #people VALUES (1, 'Peter'), (2, 'Paul'), (3, 'Mary');
INSERT INTO #tasks VALUES (1, 'Devleopment'), (2, 'QA'), (3, 'Sales');
INSERT INTO #work VALUES
(1, 1, '2019-04-05', '2019-04-08')
,(1, 1, '2019-05-05', '2019-06-08')
,(1, 1, '2019-07-05', '2019-09-08')
,(2, 2, '2019-04-08', '2019-06-08')
,(2, 2, '2019-09-08', '2019-10-03')
,(3, 1, '2019-11-01', '2019-12-01')
;WITH cte AS(
SELECT CAST('2019-01-01' AS DATE) AS dateday
UNION ALL
SELECT DATEADD(d, 1, dateday)
FROM cte
WHERE DATEADD(d, 1, dateday) < '2020-01-01'
),
cteWorkDays AS(
SELECT people_id, task_id, dateday, 1 AS cnt
FROM #work w
INNER JOIN cte c ON c.dateday BETWEEN w.task_StartDate AND w.task_EndDate
),
ctePeopleWorkdays AS(
SELECT *, SUM(cnt) OVER (PARTITION BY people_id ORDER BY dateday ROWS UNBOUNDED PRECEDING) dayCnt
FROM cteWorkDays
)
SELECT *
FROM ctePeopleWorkdays
WHERE dayCnt = 50
OPTION (MAXRECURSION 0)
The solution depends on how you store your data. The solution below assumes that each worked day exists as a single row in your data model.
The approach below uses a common table expression (cte) to generate a running total (Total) for each person (PersonId) and then filters on the milestone target (I set it to 5 to reduce the sample data size) and target month.
Sample data
create table WorkedDays
(
PersonId int,
TaskDate date
);
insert into WorkedDays (PersonId, TaskDate) values
(100, '2020-09-01'),
(100, '2020-09-02'),
(100, '2020-09-03'),
(100, '2020-09-04'),
(100, '2020-09-05'), -- person 100 worked 5 days by 2020-09-05 = milestone (in september)
(200, '2020-09-29'),
(200, '2020-09-30'),
(200, '2020-10-01'),
(200, '2020-10-02'),
(200, '2020-10-03'), -- person 200 worked 5 days by 2020-10-03 = milestone (in october)
(200, '2020-10-04'),
(200, '2020-10-05'),
(200, '2020-10-06'),
(300, '2020-10-10'),
(300, '2020-10-11'),
(300, '2020-10-12'),
(300, '2020-10-13'),
(300, '2020-10-14'), -- person 300 worked 5 days by 2020-10-14 = milestone (in october)
(300, '2020-10-15'),
(400, '2020-10-20'),
(400, '2020-10-21'); -- person 400 did not reach the milestone yet
Solution
with cte as
(
select wd.PersonId,
wd.TaskDate,
count(1) over(partition by wd.PersonId
order by wd.TaskDate
rows between unbounded preceding and current row) as Total
from WorkedDays wd
)
select cte.PersonId,
cte.TaskDate as MileStoneDate
from cte
where cte.Total = 5 -- milestone reached
and year(cte.TaskDate) = 2020
and month(cte.TaskDate) = 10; -- in october
Result
PersonId MilestoneDate
-------- -------------
200 2020-10-03
300 2020-10-14
Fiddle (also shows the common table expression output).
Here is my problem, I have a tickets table which stores tickets read and users work 8 hours shift. I need to group tickets read in 8 groups.
Basically I need something like
if HourStart is 15:20
Group Hour Quantity
1 15:20:00 20
2 16:20:00 20
3 17:20:00 40
4 18:20:00 0
5 19:20:00 0
6 20:20:00 0
7 21:20:00 0
8 22:20:00 0
so because i need 8 rows all the time i thought creating a temporary table would be the best so i could make a join and rows were still showing even with null data if no records were entered in those hours.
Problem is this is a bit slow in terms of performance and a bit dirty and i'm looking if there is a better way to group data by some generated rows without having to create a temporary table
CREATE TABLE Production.hProductionRecods
(
ID INT IDENTITY PRIMARY KEY,
HourStart TIME
)
CREATE TABLE Production.hTickets
(
ID INT IDENTITY PRIMARY KEY,
DateRead DATETIME,
ProductionRecordId INT
)
CREATE TABLE #TickersPerHour
(
Group INT,
Hour TIME
)
DECLARE #HourStart TIME = (SELECT HourStart
FROM Production.hProductionRecords
WHERE Id = 1)
INSERT INTO #TickersPerHour (Group, Hour)
VALUES (1, #HourStart),
(2, DATEADD(hh, 1, #HourStart)),
(3, DATEADD(hh, 2, #HourStart)),
(4, DATEADD(hh, 3, #HourStart)),
(5, DATEADD(hh, 4, #HourStart)),
(6, DATEADD(hh, 5, #HourStart)),
(7, DATEADD(hh, 6, #HourStart)),
(8, DATEADD(hh, 7, #HourStart))
SELECT
TEMP.Group,
TEMP.Hour,
ISNULL(SUM(E.Quantity),0) Quantity
FROM
Production.hProductionRecords P
LEFT JOIN
Production.hTickets E ON E.ProductionRecordId = P.Id
RIGHT JOIN
#TickersPerHour TEMP
ON TEMP.Hour = CASE
WHEN CAST(E.DateRead AS TIME) >= P.HourStart
AND CAST(E.DateRead AS TIME) < DATEADD(hour, 1, P.HourStart)
THEN DATEADD(hour, 1, P.HourStart)
WHEN CAST(E.DateRead AS TIME) >= P.HourStart
AND CAST(E.DateRead AS TIME) < DATEADD(hour, 2, P.HourStart)
THEN DATEADD(hour, 2, P.HourStart)
WHEN CAST(E.DateRead AS TIME) >= P.HourStart
AND CAST(E.DateRead AS TIME) < DATEADD(hour, 3, P.HourStart)
THEN DATEADD(hour, 3, P.HourStart)
WHEN CAST(E.DateRead AS TIME) >= P.HourStart
AND CAST(E.DateRead AS TIME) < DATEADD(hour, 4, P.HourStart)
THEN DATEADD(hour, 4, P.HourStart)
WHEN CAST(E.DateRead AS TIME) >= P.HourStart
AND CAST(E.DateRead AS TIME) < DATEADD(hour, 5, P.HourStart)
THEN DATEADD(hour,5, P.HourStart)
WHEN CAST(E.DateRead AS TIME) >= P.HourStart
AND CAST(E.DateRead AS TIME) < DATEADD(hour, 6, P.HourStart)
THEN DATEADD(hour, 6, P.HourStart)
WHEN CAST(E.DateRead AS TIME) >= P.HourStart
AND CAST(E.DateRead AS TIME) < DATEADD(hour, 7, P.HourStart)
THEN DATEADD(hour,7, P.HourStart)
WHEN CAST(E.DateRead AS TIME) >= P.HourStart
AND CAST(E.DateRead AS TIME) < DATEADD(hour, 8, P.HourStart)
THEN DATEADD(hour, 8, P.HourStart)
END
GROUP BY
TEMP.Group, TEMP.Hour
ORDER BY
Group
DROP TABLE #TickersPerHour
You could try aggregating the tickets without joining them with the temp table ranges, because the aggregation is quite static: Tickets whose minute part is before the minute of HourStart "belong" to the previous hour. The aggregation will return 8 groups and those 8 groups can be joined with the ranges (either temp table, or derived).
/*
--drop table Production.hTickets
--drop table Production.hProductionRecords
--drop schema Production
go
create schema Production
go
CREATE TABLE Production.hProductionRecords
(
ID INT IDENTITY PRIMARY KEY,
HourStart TIME
)
CREATE TABLE Production.hTickets
(
ID INT IDENTITY PRIMARY KEY,
DateRead DATETIME,
ProductionRecordId INT,
Quantity INT
)
go
insert into Production.hProductionRecords values('15:20')
insert into Production.hTickets(DateRead, ProductionRecordId, Quantity)
select dateadd(minute, abs(checksum(newid()))% 600, '15:20'), 1, abs(checksum(newid()))% 75
from sys.columns as a
cross join sys.columns as b
*/
DECLARE #HourStart TIME = (SELECT HourStart
FROM Production.hProductionRecords
WHERE Id = 1)
declare #MinuteBoundary int = datepart(minute, #HourStart);
select *
from
(
values (1, #HourStart),
(2, DATEADD(hh, 1, #HourStart)),
(3, DATEADD(hh, 2, #HourStart)),
(4, DATEADD(hh, 3, #HourStart)),
(5, DATEADD(hh, 4, #HourStart)),
(6, DATEADD(hh, 5, #HourStart)),
(7, DATEADD(hh, 6, #HourStart)),
(8, DATEADD(hh, 7, #HourStart))
) AS h(id, HourStart)
full outer join
(
select AdjustedHour, sum(Quantity) AS SumQuantity
from
(
select
--tickets read before the minute boundary, belong to the previous hour
case
when datepart(minute, DateRead) < #MinuteBoundary then datepart(hour, dateadd(hour, -1, DateRead))
else datepart(hour, DateRead)
end AS AdjustedHour,
Quantity
from Production.hTickets
where 1=1
--and --filter, for dates, hours outside the 8 hours period and whatnot...
) as src
group by AdjustedHour
) AS grp ON datepart(hour, h.HourStart) = grp.AdjustedHour;
i'm looking if there is a better way to group data by some generated
rows without having to create a temporary table
Instead of a temporary table you can build a lazy sequence, for that you need rangeAB (code at the end of this post). Lazy Sequences (AKA Tally Table or Virtual Auxiliary Table of Numbers) are nasty fast. Note this example:
DECLARE #HourStart TIME = '15:20:00';
SELECT
r.RN,
r.OP,
TimeAsc = DATEADD(HOUR,r.RN,#HourStart),
TimeDesc = DATEADD(HOUR,r.OP,#HourStart)
FROM dbo.rangeAB(0,7,1,0) AS r
ORDER BY r.RN;
Results:
RN OP TimeAsc TimeDesc
---- ---- ---------------- ----------------
0 7 15:20:00.0000000 22:20:00.0000000
1 6 16:20:00.0000000 21:20:00.0000000
2 5 17:20:00.0000000 20:20:00.0000000
3 4 18:20:00.0000000 19:20:00.0000000
4 3 19:20:00.0000000 18:20:00.0000000
5 2 20:20:00.0000000 17:20:00.0000000
6 1 21:20:00.0000000 16:20:00.0000000
7 0 22:20:00.0000000 15:20:00.0000000
Note that I am able to generate these dates in ASCending and/or DESCending order without a sort in the execution plan. This is because rangeAB leverages what I call a Virtual Index. You can order by, group by, etc, even join on the RN column without sorting. Note the execution plan - No Sort, that's huge!
Now to use RangeAB to solve you problem:
-- Create Some Sample Data
DECLARE #things TABLE (Something CHAR(1), SomeTime TIME);
WITH Something(x) AS (SELECT 1)
INSERT #things
SELECT x, xx
FROM Something
CROSS APPLY (VALUES('15:30:00'),('20:19:00'),('16:30:00'),('16:33:00'),
('17:10:00'),('18:13:00'),('19:01:00'),('21:35:00'),
('15:13:00'),('21:55:00'),('19:22:00'),('16:39:00')) AS f(xx);
-- Solution:
DECLARE #HourStart TIME = '15:20:00'; -- you get via a subquery
SELECT
GroupNumber = r.RN+1,
HourStart = MAX(f.HStart),
Quantity = COUNT(t.SomeThing)
FROM dbo.rangeAB(0,7,1,0) AS r
CROSS APPLY (VALUES(DATEADD(HOUR,r.RN,#HourStart))) AS f(HStart)
CROSS APPLY (VALUES(DATEADD(SECOND,3599,f.HStart))) AS f2(HEnd)
LEFT JOIN #things AS t
ON t.SomeTime BETWEEN f.HStart AND f2.HEnd
GROUP BY r.RN;
Results:
GroupNumber HourStart Quantity
------------- ----------------- -----------
1 15:20:00.0000000 1
2 16:20:00.0000000 4
3 17:20:00.0000000 1
4 18:20:00.0000000 1
5 19:20:00.0000000 2
6 20:20:00.0000000 0
7 21:20:00.0000000 2
8 22:20:00.0000000 0
Execution Plan:
Let me know if you have questions. dbo.rangeAB below.
CREATE FUNCTION dbo.rangeAB
(
#low bigint,
#high bigint,
#gap bigint,
#row1 bit
)
/****************************************************************************************
[Purpose]:
Creates up to 531,441,000,000 sequentia1 integers numbers beginning with #low and ending
with #high. Used to replace iterative methods such as loops, cursors and recursive CTEs
to solve SQL problems. Based on Itzik Ben-Gan's getnums function with some tweeks and
enhancements and added functionality. The logic for getting rn to begin at 0 or 1 is
based comes from Jeff Moden's fnTally function.
The name range because it's similar to clojure's range function. The name "rangeAB" as
used because "range" is a reserved SQL keyword.
[Author]: Alan Burstein
[Compatibility]:
SQL Server 2008+ and Azure SQL Database
[Syntax]:
SELECT r.RN, r.OP, r.N1, r.N2
FROM dbo.rangeAB(#low,#high,#gap,#row1) AS r;
[Parameters]:
#low = a bigint that represents the lowest value for n1.
#high = a bigint that represents the highest value for n1.
#gap = a bigint that represents how much n1 and n2 will increase each row; #gap also
represents the difference between n1 and n2.
#row1 = a bit that represents the first value of rn. When #row = 0 then rn begins
at 0, when #row = 1 then rn will begin at 1.
[Returns]:
Inline Table Valued Function returns:
rn = bigint; a row number that works just like T-SQL ROW_NUMBER() except that it can
start at 0 or 1 which is dictated by #row1.
op = bigint; returns the "opposite number that relates to rn. When rn begins with 0 and
ends with 10 then 10 is the opposite of 0, 9 the opposite of 1, etc. When rn begins
with 1 and ends with 5 then 1 is the opposite of 5, 2 the opposite of 4, etc...
n1 = bigint; a sequential number starting at the value of #low and incrementing by the
value of #gap until it is less than or equal to the value of #high.
n2 = bigint; a sequential number starting at the value of #low+#gap and incrementing
by the value of #gap.
[Dependencies]:
N/A
[Developer Notes]:
1. The lowest and highest possible numbers returned are whatever is allowable by a
bigint. The function, however, returns no more than 531,441,000,000 rows (8100^3).
2. #gap does not affect rn, rn will begin at #row1 and increase by 1 until the last row
unless its used in a query where a filter is applied to rn.
3. #gap must be greater than 0 or the function will not return any rows.
4. Keep in mind that when #row1 is 0 then the highest row-number will be the number of
rows returned minus 1
5. If you only need is a sequential set beginning at 0 or 1 then, for best performance
use the RN column. Use N1 and/or N2 when you need to begin your sequence at any
number other than 0 or 1 or if you need a gap between your sequence of numbers.
6. Although #gap is a bigint it must be a positive integer or the function will
not return any rows.
7. The function will not return any rows when one of the following conditions are true:
* any of the input parameters are NULL
* #high is less than #low
* #gap is not greater than 0
To force the function to return all NULLs instead of not returning anything you can
add the following code to the end of the query:
UNION ALL
SELECT NULL, NULL, NULL, NULL
WHERE NOT (#high&#low&#gap&#row1 IS NOT NULL AND #high >= #low AND #gap > 0)
This code was excluded as it adds a ~5% performance penalty.
8. There is no performance penalty for sorting by rn ASC; there is a large performance
penalty for sorting in descending order WHEN #row1 = 1; WHEN #row1 = 0
If you need a descending sort the use op in place of rn then sort by rn ASC.
Best Practices:
--===== 1. Using RN (rownumber)
-- (1.1) The best way to get the numbers 1,2,3...#high (e.g. 1 to 5):
SELECT RN FROM dbo.rangeAB(1,5,1,1);
-- (1.2) The best way to get the numbers 0,1,2...#high-1 (e.g. 0 to 5):
SELECT RN FROM dbo.rangeAB(0,5,1,0);
--===== 2. Using OP for descending sorts without a performance penalty
-- (2.1) The best way to get the numbers 5,4,3...#high (e.g. 5 to 1):
SELECT op FROM dbo.rangeAB(1,5,1,1) ORDER BY rn ASC;
-- (2.2) The best way to get the numbers 0,1,2...#high-1 (e.g. 5 to 0):
SELECT op FROM dbo.rangeAB(1,6,1,0) ORDER BY rn ASC;
--===== 3. Using N1
-- (3.1) To begin with numbers other than 0 or 1 use N1 (e.g. -3 to 3):
SELECT N1 FROM dbo.rangeAB(-3,3,1,1);
-- (3.2) ROW_NUMBER() is built in. If you want a ROW_NUMBER() include RN:
SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,1);
-- (3.3) If you wanted a ROW_NUMBER() that started at 0 you would do this:
SELECT RN, N1 FROM dbo.rangeAB(-3,3,1,0);
--===== 4. Using N2 and #gap
-- (4.1) To get 0,10,20,30...100, set #low to 0, #high to 100 and #gap to 10:
SELECT N1 FROM dbo.rangeAB(0,100,10,1);
-- (4.2) Note that N2=N1+#gap; this allows you to create a sequence of ranges.
-- For example, to get (0,10),(10,20),(20,30).... (90,100):
SELECT N1, N2 FROM dbo.rangeAB(0,90,10,1);
-- (4.3) Remember that a rownumber is included and it can begin at 0 or 1:
SELECT RN, N1, N2 FROM dbo.rangeAB(0,90,10,1);
[Examples]:
--===== 1. Generating Sample data (using rangeAB to create "dummy rows")
-- The query below will generate 10,000 ids and random numbers between 50,000 and 500,000
SELECT
someId = r.rn,
someNumer = ABS(CHECKSUM(NEWID())%450000)+50001
FROM rangeAB(1,10000,1,1) r;
--===== 2. Create a series of dates; rn is 0 to include the first date in the series
DECLARE #startdate DATE = '20180101', #enddate DATE = '20180131';
SELECT r.rn, calDate = DATEADD(dd, r.rn, #startdate)
FROM dbo.rangeAB(1, DATEDIFF(dd,#startdate,#enddate),1,0) r;
GO
--===== 3. Splitting (tokenizing) a string with fixed sized items
-- given a delimited string of identifiers that are always 7 characters long
DECLARE #string VARCHAR(1000) = 'A601225,B435223,G008081,R678567';
SELECT
itemNumber = r.rn, -- item's ordinal position
itemIndex = r.n1, -- item's position in the string (it's CHARINDEX value)
item = SUBSTRING(#string, r.n1, 7) -- item (token)
FROM dbo.rangeAB(1, LEN(#string), 8,1) r;
GO
--===== 4. Splitting (tokenizing) a string with random delimiters
DECLARE #string VARCHAR(1000) = 'ABC123,999F,XX,9994443335';
SELECT
itemNumber = ROW_NUMBER() OVER (ORDER BY r.rn), -- item's ordinal position
itemIndex = r.n1+1, -- item's position in the string (it's CHARINDEX value)
item = SUBSTRING
(
#string,
r.n1+1,
ISNULL(NULLIF(CHARINDEX(',',#string,r.n1+1),0)-r.n1-1, 8000)
) -- item (token)
FROM dbo.rangeAB(0,DATALENGTH(#string),1,1) r
WHERE SUBSTRING(#string,r.n1,1) = ',' OR r.n1 = 0;
-- logic borrowed from: http://www.sqlservercentral.com/articles/Tally+Table/72993/
--===== 5. Grouping by a weekly intervals
-- 5.1. how to create a series of start/end dates between #startDate & #endDate
DECLARE #startDate DATE = '1/1/2015', #endDate DATE = '2/1/2015';
SELECT
WeekNbr = r.RN,
WeekStart = DATEADD(DAY,r.N1,#StartDate),
WeekEnd = DATEADD(DAY,r.N2-1,#StartDate)
FROM dbo.rangeAB(0,datediff(DAY,#StartDate,#EndDate),7,1) r;
GO
-- 5.2. LEFT JOIN to the weekly interval table
BEGIN
DECLARE #startDate datetime = '1/1/2015', #endDate datetime = '2/1/2015';
-- sample data
DECLARE #loans TABLE (loID INT, lockDate DATE);
INSERT #loans SELECT r.rn, DATEADD(dd, ABS(CHECKSUM(NEWID())%32), #startDate)
FROM dbo.rangeAB(1,50,1,1) r;
-- solution
SELECT
WeekNbr = r.RN,
WeekStart = dt.WeekStart,
WeekEnd = dt.WeekEnd,
total = COUNT(l.lockDate)
FROM dbo.rangeAB(0,datediff(DAY,#StartDate,#EndDate),7,1) r
CROSS APPLY (VALUES (
CAST(DATEADD(DAY,r.N1,#StartDate) AS DATE),
CAST(DATEADD(DAY,r.N2-1,#StartDate) AS DATE))) dt(WeekStart,WeekEnd)
LEFT JOIN #loans l ON l.lockDate BETWEEN dt.WeekStart AND dt.WeekEnd
GROUP BY r.RN, dt.WeekStart, dt.WeekEnd ;
END;
--===== 6. Identify the first vowel and last vowel in a along with their positions
DECLARE #string VARCHAR(200) = 'This string has vowels';
SELECT TOP(1) position = r.rn, letter = SUBSTRING(#string,r.rn,1)
FROM dbo.rangeAB(1,LEN(#string),1,1) r
WHERE SUBSTRING(#string,r.rn,1) LIKE '%[aeiou]%'
ORDER BY r.rn;
-- To avoid a sort in the execution plan we'll use op instead of rn
SELECT TOP(1) position = r.op, letter = SUBSTRING(#string,r.op,1)
FROM dbo.rangeAB(1,LEN(#string),1,1) r
WHERE SUBSTRING(#string,r.rn,1) LIKE '%[aeiou]%'
ORDER BY r.rn;
---------------------------------------------------------------------------------------
[Revision History]:
Rev 00 - 20140518 - Initial Development - Alan Burstein
Rev 01 - 20151029 - Added 65 rows to make L1=465; 465^3=100.5M. Updated comment section
- Alan Burstein
Rev 02 - 20180613 - Complete re-design including opposite number column (op)
Rev 03 - 20180920 - Added additional CROSS JOIN to L2 for 530B rows max - Alan Burstein
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH L1(N) AS
(
SELECT 1
FROM (VALUES
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),
(0),(0)) T(N) -- 90 values
),
L2(N) AS (SELECT 1 FROM L1 a CROSS JOIN L1 b CROSS JOIN L1 c),
iTally AS (SELECT rn = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM L2 a CROSS JOIN L2 b)
SELECT
r.RN,
r.OP,
r.N1,
r.N2
FROM
(
SELECT
RN = 0,
OP = (#high-#low)/#gap,
N1 = #low,
N2 = #gap+#low
WHERE #row1 = 0
UNION ALL -- ISNULL required in the TOP statement below for error handling purposes
SELECT TOP (ABS((ISNULL(#high,0)-ISNULL(#low,0))/ISNULL(#gap,0)+ISNULL(#row1,1)))
RN = i.rn,
OP = (#high-#low)/#gap+(2*#row1)-i.rn,
N1 = (i.rn-#row1)*#gap+#low,
N2 = (i.rn-(#row1-1))*#gap+#low
FROM iTally AS i
ORDER BY i.rn
) AS r
WHERE #high&#low&#gap&#row1 IS NOT NULL AND #high >= #low AND #gap > 0;
GO
I have a SQL Server 2014 table with millions of gps coordinates, each at a particular time. However the interval between the registrations is not fixed and varies from 1 second to a couple of hours. I only want to keep one measurement every 4 minutes, so the other records have to be deleted.
I tried a WHILE loop in T-SQL that traverses every record, with inside the loop a select statement with a double CROSS APPLY to only return a record if it sits in beween 2 other records which are not more than 4 minutes apart. However this strategy turns out to be too slow.
Can this be done with a set-based solution ? Or is there a way to speed-up this query ? (the test query below is just printing, not yet deleting)
SELECT * INTO #myTemp FROM gps ORDER BY TimePoint asc
declare #Id Uniqueidentifier
declare #d1 varchar(19)
declare #d2 varchar(19)
declare #d3 varchar(19)
While EXISTS (select * from #myTemp )
BEGIN
select top 1 #Id = ID FROM #myTemp order by TimePoint asc
SELECT
#d1 = convert(varchar(19), a.justbefore, 121),
#d2 = convert(varchar(19), b.tijdstip, 121),
#d3 = convert(varchar(19), c.justafter, 121)
FROM Gps B CROSS APPLY
(
SELECT top 1 TimePoint as justbefore
FROM Gps
WHERE (B.TimePoint > TimePoint ) AND (B.Id = #Id )
ORDER by TimePoint desc
) A
CROSS APPLY (
SELECT top 1 TimePoint as justafter
FROM Gps
WHERE (Datediff(n,A.justbefore,TimePoint ) between -4 AND 0)
AND (B.TimePoint < TimePoint )
ORDER by TimePoint asc
) C
print 'ID=' + Cast(#id as varchar(50))
+ ' / d1=' + #d1 + ' / d2=' + #d2 + ' / d3=' + #d3
DELETE #myTemp where Id = #id
END
--
Sample data:
Id TimePoint Lat Lon
1 20170725 13:05:27 12,256 24,123
2 20170725 13:10:27 12,254 24,120
3 20170725 13:10:29 12,253 24,125
4 20170725 13:11:55 12,259 24,127
5 20170725 13:11:59 12,255 24,123
6 20170725 13:14:28 12,254 24,126
7 20170725 13:16:52 12,259 24,121
8 20170725 13:20:53 12,257 24,125
In this case records 3,4,5 should be deleted.
Record 7 should stay as the gap between 7 and 8 is longer than 4 minutes.
Looking at the numbers... It looks like 1 & 2 stay (5 mins apart)...3, 4, & 5 should go... 6 stays (4 mins from 2)... 7 should go (only 2 mins from 6) and 8 stays (6 mins from 6)...
If this is correct, the following will do what you're looking for...
IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL
DROP TABLE #TestData;
CREATE TABLE #TestData (
Id INT NOT NULL PRIMARY KEY CLUSTERED,
TimePoint DATETIME2(0) NOT NULL,
Lat DECIMAL(9,3),
Lon DECIMAL(9,3)
);
INSERT #TestData (Id, TimePoint, Lat, Lon) VALUES
(1, '20170725 13:05:27', 12.256, 24.123),
(2, '20170725 13:10:27', 12.254, 24.120),
(3, '20170725 13:10:29', 12.253, 24.125),
(4, '20170725 13:11:55', 12.259, 24.127),
(5, '20170725 13:11:59', 12.255, 24.123),
(6, '20170725 13:14:28', 12.254, 24.126),
(7, '20170725 13:16:52', 12.259, 24.121),
(8, '20170725 13:20:53', 12.257, 24.125);
-- SELECT * FROM #TestData td;
--================================================================================
WITH
cte_AddLag AS (
SELECT
td.Id, td.TimePoint, td.Lat, td.Lon,
MinFromPrev = DATEDIFF(mi, LAG(td.TimePoint, 1) OVER (ORDER BY td.TimePoint), td.TimePoint)
FROM
#TestData td
),
cte_TimeGroup AS (
SELECT
*,
TimeGroup = ISNULL(SUM(al.MinFromPrev) OVER (ORDER BY al.TimePoint ROWS UNBOUNDED PRECEDING) / 4, 0)
FROM
cte_AddLag al
)
SELECT TOP 1 WITH TIES
tg.Id,
tg.TimePoint,
tg.Lat,
tg.Lon
FROM
cte_TimeGroup tg
ORDER BY
ROW_NUMBER() OVER (PARTITION BY tg.TimeGroup ORDER BY tg.TimePoint);
Results...
Id TimePoint Lat Lon
----------- --------------------------- --------------------------------------- ---------------------------------------
1 2017-07-25 13:05:27 12.256 24.123
2 2017-07-25 13:10:27 12.254 24.120
6 2017-07-25 13:14:28 12.254 24.126
8 2017-07-25 13:20:53 12.257 24.125
HTH, Jason
I'm going to preface this question with the disclaimer that creating what I call "complex" queries isn't in my forte. Most of the time, there is a much simpler way to accomplish what I'm trying to accomplish so if the below query isn't up to par, I apologize.
With that said, I have a table that keeps track of Vendor Invoices and Vendor Invoice Items (along with a Vendor Invoice Type and Vendor Invoice Item Type). Our bookkeeper wants a report that simply shows: Vendor | Location | Inv Number | Inv Type | Item Type | Inv Date | Rental Fee | Restock Fee | Shipping Fee | Line Item Cost | Total (Line Item + Fees)
Most of the time, one vendor invoice is one line. However, there are exceptions where a vendor invoice can have many item types, thus creating two rows. Not a big deal EXCEPT the fees (Rental, Restock, Shipping) are attached to the Vendor Invoice table. So, I first created a query that checks the temp table for Invoices that have multiple rows, takes the last row, and zero's out the fees. So that only one line item would have the fee. However, our bookkeeper doesn't like that. Instead, she'd like the fees to be "distributed" among the line items.
So, if a vendor invoice has a $25 shipping charge, has two line items, then each line item would be $12.50.
After working with the query, I got it to update the Last Row to be the adjusted amount but row 1+ would have the original amount.
I'm going to post my entire query here (again - I'm sorry that this may not be the best looking query; however, suggestions are always welcome)
DROP TABLE #tVendorInvoiceReport
DROP TABLE #tSummary
SELECT v.Name AS Vendor ,
vii.Location ,
vi.VendorInvNumber ,
vit.Descr AS InvoiceType ,
vii.VendorInvoiceItemType ,
CONVERT(VARCHAR(10), vi.VendorInvDate, 120) VendorInvDate ,
vi.RentalFee ,
vi.RestockFee ,
vi.ShippingFee ,
SUM(vii.TotalUnitCost) TotalItemCost ,
CONVERT(MONEY, 0) TotalInvoice ,
RowID = IDENTITY( INT,1,1)
INTO #tVendorInvoiceReport
FROM dbo.vVendorInvoiceItems AS vii
JOIN dbo.VendorInvoices AS vi ON vii.VendorInvID = vi.VendorInvID
JOIN dbo.Vendors AS v ON vi.VendorID = v.VendorID
JOIN dbo.VendorInvoiceTypes AS vit ON vi.VendorInvTypeID = vit.VendorInvTypeID
WHERE vi.VendorInvDate >= '2012-01-01'
AND vi.VendorInvDate <= '2012-01-31'
GROUP BY v.Name ,
vii.Location ,
vi.VendorInvNumber ,
vit.Descr ,
vii.VendorInvoiceItemType ,
CONVERT(VARCHAR(10), vi.VendorInvDate, 120) ,
vi.RentalFee ,
vi.RestockFee ,
vi.ShippingFee
ORDER BY v.Name ,
vii.Location ,
vi.VendorInvNumber ,
vit.Descr ,
vii.VendorInvoiceItemType ,
CONVERT(VARCHAR(10), vi.VendorInvDate, 120)
SELECT VendorInvNumber ,
COUNT(RowID) TotalLines ,
MAX(RowID) LastLine
INTO #tSummary
FROM #tVendorInvoiceReport
GROUP BY VendorInvNumber
WHILE ( SELECT COUNT(LastLine)
FROM #tSummary AS ts
WHERE TotalLines > 1
) > 0
BEGIN
DECLARE #LastLine INT
DECLARE #NumItems INT
SET #LastLine = ( SELECT MAX(LastLine)
FROM #tSummary AS ts
WHERE TotalLines > 1
)
SET #NumItems = ( SELECT COUNT(VendorInvNumber)
FROM #tVendorInvoiceReport
WHERE VendorInvNumber IN (
SELECT VendorInvNumber
FROM #tSummary
WHERE LastLine = #LastLine )
)
UPDATE #tVendorInvoiceReport
SET RentalFee = ( RentalFee / #NumItems ) ,
RestockFee = ( RestockFee / #NumItems ) ,
ShippingFee = ( ShippingFee / #NumItems )
WHERE RowID = #LastLine
DELETE FROM #tSummary
WHERE LastLine = #LastLine
--PRINT #NumItems
END
UPDATE #tVendorInvoiceReport
SET TotalInvoice = ( TotalItemCost + RentalFee + RestockFee + ShippingFee )
SELECT Vendor ,
Location ,
VendorInvNumber ,
InvoiceType ,
VendorInvoiceItemType ,
VendorInvDate ,
RentalFee ,
RestockFee ,
ShippingFee ,
TotalItemCost ,
TotalInvoice
FROM #tVendorInvoiceReport AS tvir
I sincerely appreciate anyone who took the time to read this and attempt to point me in the right direction.
Thank you,
Andrew
PS - I did try and remove "WHERE RowID = #LastLine" from the first Update, but that changed the Shipping Fees for the first line with two items to "0.0868" instead of 12.50 ($25/2)
If I understand correctly, you're looking for a way to split something like an invoice shipping fee over one or more invoice items.
I created some sample invoice and invoice item tables shown below and used the
over(partition) clause to split out the shipping per item.
-- sample tables
declare #Invoice table (InvoiceID int, customerID int, Date datetime, ShippingFee float)
declare #InvoiceItem table (InvoiceItemID int identity, InvoiceID int, ItemDesc varchar(50), Quantity float, ItemPrice float)
-- Example 1
insert #Invoice values(1, 800, getdate(), 20);
insert #InvoiceItem values(1, 'Widget', 1, 10.00)
insert #InvoiceItem values(1, 'Wing Nut', 5, 2.00)
insert #InvoiceItem values(1, 'Doodad', 8, 0.50)
insert #InvoiceItem values(1, 'Thingy', 3, 1.00)
-- Example 2
insert #Invoice values(2, 815, getdate(), 15);
insert #InvoiceItem values(2, 'Green Stuff', 10, 1.00)
insert #InvoiceItem values(2, 'Blue Stuff', 10, 1.60)
-- Example 3
insert #Invoice values(3, 789, getdate(), 15);
insert #InvoiceItem values(3, 'Widget', 10, 1.60)
-- query
select
n.InvoiceID,
n.InvoiceItemID,
n.ItemDesc,
n.Quantity,
n.ItemPrice,
ExtendedPrice = n.Quantity * n.ItemPrice,
Shipping = i.ShippingFee / count(n.InvoiceItemID) over(partition by n.InvoiceID)
from #InvoiceItem n
join #Invoice i on i.InvoiceID = n.InvoiceID
Output:
InvoiceID InvoiceItemID ItemDesc Quantity ItemPrice ExtendedPrice Shipping
1 1 Widget 1 10 10 5
1 2 Wing Nut 5 2 10 5
1 3 Doodad 8 0.5 4 5
1 4 Thingy 3 1 3 5
2 5 Green Stuff 10 1 10 7.5
2 6 Blue Stuff 10 1.6 16 7.5
3 7 Widget 10 1.6 16 15