Group Non-Contiguous Dates By Criteria In Column

Group Non-Contiguous Dates By Criteria In Column - sql-server

I have a table with start and end dates for team consultations with customers.
I need to merge certain consultations based on a number of days specified in another column (sometimes the consultations may overlap, sometimes they are contiguous, sometimes they arent), Team and Type.
Some example data is as follows:
DECLARE #TempTable TABLE([CUSTOMER_ID] INT
,[TEAM] VARCHAR(1)
,[TYPE] VARCHAR(1)
,[START_DATE] DATETIME
,[END_DATE] DATETIME
,[GROUP_DAYS_CRITERIA] INT)
INSERT INTO #TempTable VALUES (1,'A','A','2013-08-07','2013-12-31',28)
,(2,'B','A','2015-05-15','2015-05-28',28)
,(2,'B','A','2015-05-15','2016-05-12',28)
,(2,'B','A','2015-05-28','2015-05-28',28)
,(3,'C','A','2013-05-27','2014-07-23',28)
,(3,'C','A','2015-01-12','2015-05-28',28)
,(3,'B','A','2015-01-12','2015-05-28',28)
,(3,'C','A','2015-05-28','2015-05-28',28)
,(3,'C','A','2015-05-28','2015-12-17',28)
,(4,'A','B','2013-07-09','2014-04-21',7)
,(4,'A','B','2014-04-29','2014-08-01',7)
Which looks like this:
+-------------+------+------+------------+------------+---------------------+
| CUSTOMER_ID | TEAM | TYPE | START_DATE | END_DATE | GROUP_DAYS_CRITERIA |
+-------------+------+------+------------+------------+---------------------+
| 1 | A | A | 07/08/2013 | 31/12/2013 | 28 |
| 2 | B | A | 15/05/2015 | 28/05/2015 | 28 |
| 2 | B | A | 15/05/2015 | 12/05/2016 | 28 |
| 2 | B | A | 28/05/2015 | 28/05/2015 | 28 |
| 3 | C | A | 27/05/2013 | 23/07/2014 | 28 |
| 3 | C | A | 12/01/2015 | 28/05/2015 | 28 |
| 3 | B | A | 12/01/2015 | 28/05/2015 | 28 |
| 3 | C | A | 28/05/2015 | 28/05/2015 | 28 |
| 3 | C | A | 28/05/2015 | 17/12/2015 | 28 |
| 4 | A | B | 09/07/2013 | 21/04/2014 | 7 |
| 4 | A | B | 29/04/2014 | 01/08/2014 | 7 |
+-------------+------+------+------------+------------+---------------------+
My desired output is as follows:
+-------------+------+------+------------+------------+---------------------+
| CUSTOMER_ID | TEAM | TYPE | START_DATE | END_DATE | GROUP_DAYS_CRITERIA |
+-------------+------+------+------------+------------+---------------------+
| 1 | A | A | 07/08/2013 | 31/12/2013 | 28 |
| 2 | B | A | 15/05/2015 | 12/05/2016 | 28 |
| 3 | C | A | 27/05/2013 | 23/07/2014 | 28 |
| 3 | C | A | 12/01/2015 | 17/12/2015 | 28 |
| 3 | B | A | 12/01/2015 | 28/05/2015 | 28 |
| 4 | A | B | 09/07/2013 | 21/04/2014 | 7 |
| 4 | A | B | 29/04/2014 | 01/08/2014 | 7 |
+-------------+------+------+------------+------------+---------------------+
I am struggling to do this at all, let alone with any efficiency! Any ideas / code will be greatly received.
Server version is MS SQL Server 2014
Thanks,
Dan

If I am understanding your question correctly, we want to return rows only when a second, third, etc consultation has not occurred within group_days_criteria number of days after the previous consultation end date.
We can get the previous consultation end date and eliminate rows (since we are not concerned with the number of consultations) where a consultation occurred for the same customer by the same team and of the same consultation type within our date range.
DECLARE #TempTable TABLE([CUSTOMER_ID] INT
,[TEAM] VARCHAR(1)
,[TYPE] VARCHAR(1)
,[START_DATE] DATETIME
,[END_DATE] DATETIME
,[GROUP_DAYS_CRITERIA] INT)
INSERT INTO #TempTable VALUES (1,'A','A','2013-08-07','2013-12-31',28)
,(2,'B','A','2015-05-15','2015-05-28',28)
,(2,'B','A','2015-05-15','2016-05-12',28)
,(2,'B','A','2015-05-28','2015-05-28',28)
,(3,'C','A','2013-05-27','2014-07-23',28)
,(3,'C','A','2015-01-12','2015-05-28',28)
,(3,'B','A','2015-01-12','2015-05-28',28)
,(3,'C','A','2015-05-28','2015-05-28',28)
,(3,'C','A','2015-05-28','2015-12-17',28)
,(4,'A','B','2013-07-09','2014-04-21',7)
,(4,'A','B','2014-04-29','2014-08-01',7)
;with prep as (
select Customer_ID,
Team,
[Type],
[Start_Date],
[End_Date],
Group_Days_Criteria,
ROW_NUMBER() over (partition by customer_id, team, [type] order by [start_date] asc, [end_date] desc) as rn, -- earliest start date with latest end date
lag([End_Date] + Group_Days_Criteria, 1, 0) over (partition by customer_id, team, [type] order by [start_date] asc, [end_date] desc) as PreviousEndDate -- previous end date +
from #TempTable
)
select p.Customer_Id,
p.[Team],
p.[Type],
p.[Start_Date],
p.[End_Date],
p.Group_Days_Criteria
from prep p
where p.rn = 1
or (p.rn != 1 and p.[Start_date] > p.PreviousEndDate)
order by p.Customer_Id, p.[Team], p.[Start_Date], p.[Type]
This returned the desired result set.

Related

How to generate IDs based on column values

I will provide examples and code where I can. Assume everything except [CycleStart] and [CycleEnd] datatypes are Varchar, I'm not too fussed about this at this stage.
Table A consists of the following RAW sample data:
+-------+---------+----------------+------------+------------+
| JobID | JobName | CycleDesc | CycleStart | CycleEnd |
+-------+---------+----------------+------------+------------+
| 10003 | Run1 | January 2019 | 31/12/2018 | 31/12/2018 |
| 10005 | Run2 | December 2018 | 31/12/2017 | 31/11/2018 |
| 10006 | Run3 | March 2019 | 31/12/2018 | 31/02/2019 |
| 10007 | Run4 | September 2019 | 31/12/2018 | 31/09/2019 |
| 10008 | Run5 | November 2019 | 31/12/2018 | 31/10/2019 |
+-------+---------+----------------+------------+------------+
Table B consists of the following sample data and the code used to generate this data is below:
+-------+---------+---------+
| JobID | PeriodID | Entity |
+-------+---------+---------+
| 10003 | 202101 | XYZ1 |
| 10003 | 202112 | XYZ2 |
| 10007 | 202008 | XYZ3 |
| 10007 | 202003 | XYZ4 |
| 10008 | 201904 | XYZ5 |
+-------+----------+--------+
Declare #Counter3 INT
SELECT #Counter3=1
WHILE #Counter3 <= 1000
BEGIN
INSERT INTO [dbo].[TableB]
SELECT
FLOOR(RAND()*(33979-1+1))+1 [JobID]
,CAST(ROUND(((2021 - 2019 -1) * RAND() + 2020), 0) AS VARCHAR) + RIGHT('0'+CAST(FLOOR(RAND()*(12-1+1))+1 AS VARCHAR),2) [PeriodID]
,FLOOR(RAND()*(23396-1+1))+1 [Entity]
The issue lies within Table B column [PeriodID]. This column represents an ID generated from [CycleStart] in Table A e.g. 31/12/2018 = 201812 (YYYYMM).
What I want to show in Table B is a Period ID for each Job ID but show EACH month + 30 years ahead of the [CycleStart] date. Example table of what I am looking to achieve:
+-------+---------+---------+
| JobID | PeriodID | Entity |
+-------+---------+---------+
| 10006 | 201812 | XYZ1 |
| 10006 | 201901 | XYZ2 |
| 10006 | 201902 | XYZ3 |
| 10006 | 201903 | XYZ4 |
| 10006 | 201904 | XYZ5 |
| 10006 | 201905 | XYZ5 |
| 10006 | 201906 | XYZ5 |
| 10006 | 201907 | XYZ5 |
| ... | +30yrs | ... |
| 10006 | 204812 | XYZ5 |
+-------+----------+--------+
How can I achieve this? Currently I am just randomly generating IDs which is not related to the [CycleStart] date and therefore just skewing my data but this is the only way I can think of doing it.

The best way is to create a calendar table / date dimension. You can use this table to solve this issue, and reuse it for other problems later. (Search online for some examples on how to build one).
If you have this table then you only need to join this table and that's it.
e.g.
INSERT INTO TableB ( JobID , PeriodID)
SELECT DISTINCT A.JobID , D.TheYear * 100 + D.TheMonth
FROM tableA A
JOIN myDateTable D
ON D.TheDate BETWEEN CONVERT(date , A.CycleStart , 103) AND DATEADD(YEAR,30, CONVERT(date , A.CycleStart , 103));

T-SQL: Values are grouped by month, if there is no value for a month the month should also appear and display "NULL"

i have a SQL that displays turnover, stock and other values for stores grouped by month. Logically, if there is no value for a month, the month doesn't appear. The target is that the empty month should appear and display "NULL" for the values. The empty months should range from the #FROM to the #TO parameter (201807 to 201907) in this case.
Before:
+-------+--------+----------+----------+-------+
| Store | Month | Incoming | Turnover | Stock |
+-------+--------+----------+----------+-------+
| 123 | 201810 | 5 | 4 | 1 |
| 123 | 201811 | 0 | 1 | 0 |
| 123 | 201901 | 25 | 5 | 20 |
| 123 | 201902 | 5 | 10 | 15 |
| 123 | 201903 | 8 | 9 | 14 |
| 123 | 201904 | 5 | 4 | 15 |
| 123 | 201905 | 10 | 5 | 20 |
+-------+--------+----------+----------+-------+
After:
+-------+--------+----------+----------+-------+
| Store | Month | Incoming | Turnover | Stock |
+-------+--------+----------+----------+-------+
| 123 | 201807 | NULL | NULL | NULL |
| 123 | 201808 | NULL | NULL | NULL |
| 123 | 201809 | NULL | NULL | NULL |
| 123 | 201810 | 5 | 4 | 1 |
| 123 | 201811 | 0 | 1 | 0 |
| 123 | 201812 | NULL | NULL | NULL |
| 123 | 201901 | 25 | 5 | 20 |
| 123 | 201902 | 5 | 10 | 15 |
| 123 | 201903 | 8 | 9 | 14 |
| 123 | 201904 | 5 | 4 | 15 |
| 123 | 201905 | 10 | 5 | 20 |
| 123 | 201906 | NULL | NULL | NULL |
| 123 | 201907 | NULL | NULL | NULL |
+-------+--------+----------+----------+-------+
Code Example: db<>fiddle
I have absolutely no idea how to solve this and will thank you in advance for your help! :)

You can try to use cte recursive make a calendar table, then do outer-join
;WITH CTE AS (
SELECT CAST(CAST(#FROM AS VARCHAR(10)) + '01' AS DATE) fromDt,
CAST(CAST(#TO AS VARCHAR(10)) + '01' AS DATE) toDt,
Store
FROM (SELECT DISTINCT Store FROM #Test) t1
UNION ALL
SELECT DATEADD(MONTH,1,fromDt),toDt,Store
FROM CTE
WHERE DATEADD(MONTH,1,fromDt) <= toDt
)
SELECT FORMAT(fromDt,'yyyyMM') Month,
c.Store,
t.Incoming,
t.Turnover,
t.Stock
FROM CTE c
LEFT JOIN #Test t on
c.fromDt = CAST(CAST(t.Month AS VARCHAR(10)) + '01' AS DATE)
and
c.Store = t.Store
sqlfiddle

Split Time Frequency To Rows

I am trying to split a time frequency that has a start time, an end time, a frequency and a duration into separate rows. Here is some example data:
+------+------------+----------+-----------------+---------------+
| Name | Start_Time | End_Time | Frequency_Hours | Duration_Mins |
+------+------------+----------+-----------------+---------------+
| A | 08:00:00 | 18:00:00 | 2 | 2 |
| B | 00:00:00 | 23:59:59 | 1 | 5 |
| C | 00:00:00 | 23:59:59 | 4 | 15 |
+------+------------+----------+-----------------+---------------+
Can be created using the following query:
DECLARE #Tmp AS TABLE(Name VARCHAR(128)
,Start_Time VARCHAR(8)
,End_Time VARCHAR(8)
,Frequency_Hours INT
,Duration_Mins INT)
INSERT INTO #Tmp VALUES ('A','08:00:00', '18:00:00', 2,2)
,('B','00:00:00', '23:59:59', 1,5)
,('C','00:00:00', '23:59:59', 4,15)
Here is my desired output (I will then use this to drive a gantt chart visualisation):
+------+------------+----------+
| Name | Start_Time | End_Time |
+------+------------+----------+
| A | 08:00:00 | 08:02:00 |
| A | 10:00:00 | 10:02:00 |
| A | 12:00:00 | 12:02:00 |
| A | 14:00:00 | 14:02:00 |
| A | 16:00:00 | 16:02:00 |
| A | 18:00:00 | 18:02:00 |
| B | 00:00:00 | 00:05:00 |
| B | 01:00:00 | 01:05:00 |
| B | 02:00:00 | 02:05:00 |
| B | 03:00:00 | 03:05:00 |
| B | 04:00:00 | 04:05:00 |
| B | 05:00:00 | 05:05:00 |
| B | 06:00:00 | 06:05:00 |
| B | 07:00:00 | 07:05:00 |
| B | 08:00:00 | 08:05:00 |
| B | 09:00:00 | 09:05:00 |
| B | 10:00:00 | 10:05:00 |
| B | 11:00:00 | 11:05:00 |
| B | 12:00:00 | 12:05:00 |
| B | 13:00:00 | 13:05:00 |
| B | 14:00:00 | 14:05:00 |
| B | 15:00:00 | 15:05:00 |
| B | 16:00:00 | 16:05:00 |
| B | 17:00:00 | 17:05:00 |
| B | 18:00:00 | 18:05:00 |
| B | 19:00:00 | 19:05:00 |
| B | 20:00:00 | 20:05:00 |
| B | 21:00:00 | 21:05:00 |
| B | 22:00:00 | 22:05:00 |
| B | 23:00:00 | 23:05:00 |
| C | 00:00:00 | 00:15:00 |
| C | 04:00:00 | 04:15:00 |
| C | 08:00:00 | 08:15:00 |
| C | 12:00:00 | 12:15:00 |
| C | 16:00:00 | 16:15:00 |
| C | 20:00:00 | 20:15:00 |
+------+------------+----------+
I am hoping to be able to create a view out of this so I am trying to do it without cursors or other cpu intensive methods.
Any ideas?
Thanks,
Dan.

You could use a recursive cte like this
;WITH temp AS
(
SELECT t.Name, CAST(t.Start_Time AS time) AS CurrentStart_Time, dateadd(minute,t.Duration_Mins,CAST(t.Start_Time AS time)) AS CurrentEnd_Time, t.Frequency_Hours, CAST(t.End_Time AS time) AS End_Time
FROM #Tmp t
UNION ALL
SELECT t.Name, dateadd(hour,t.Frequency_Hours,t.CurrentStart_Time), dateadd(hour,t.Frequency_Hours,t.CurrentEnd_Time), t.Frequency_Hours, t.End_Time
FROM temp t
WHERE t.CurrentStart_Time < t.End_Time AND t.CurrentStart_Time < dateadd(hour,t.Frequency_Hours,t.CurrentStart_Time)
)
SELECT t.Name, t.CurrentStart_Time, t.CurrentEnd_Time
FROM temp t
ORDER BY t.Name
OPTION (MAXRECURSION 0)
Demo link: http://rextester.com/XJK25805

It can be done without RECURSIIVE CTE also.
If we create number instead of using
select distinct number master..spt_values then performance will be far better.
Like Number table can be populated from 1 to 100.
try this with various sample data,
declare #t table(Name varchar(20), Start_Time time(0),End_Time time(0)
, Frequency_Hours int,Duration_Mins int)
insert into #t VALUES
('A','08:00:00','18:00:00', 2 , 2 )
,('B','00:00:00','23:59:59', 1 , 5 )
,('C','00:00:00','23:59:59', 4 ,15 )
SELECT NAME
,dateadd(hour, n, Start_Time) Start_Time
,dateadd(minute, Duration_Mins, (dateadd(hour, n, Start_Time))) End_Time
FROM #t t
CROSS APPLY (
SELECT DISTINCT number * Frequency_Hours n
FROM master..spt_values
WHERE number >= 0
AND number <= datediff(HOUR, t.Start_Time, t.End_Time) / Frequency_Hours
) ca

Select a row of first non-null values

I have the following table. How do I select the first non-null value of reviewers and voting column if they have the same product_id? The first here mean the first row sorting by created_at
+------------+-----------+--------+---------------------+
| product_id | reviewers | voting | created_at |
+------------+-----------+--------+---------------------+
| B0021ZFV9M | null | null | 2015-03-20 00:34:09 |
| B0021ZFV9M | 4 | 3 | 2015-03-24 00:34:09 |
| B0021ZFV9M | null | null | 2015-04-13 00:55:51 |
| B0021ZFV9M | 30 | 4 | 2015-04-15 00:44:38 |
| B00JKO4CHO | null | null | 2015-09-17 00:41:40 |
| B00JKO4CHO | null | null | 2015-09-19 00:41:47 |
| B00JKO4CHO | 50 | 1 | 2015-09-21 00:41:31 |
+------------+-----------+--------+---------------------+
Expected
+------------+-----------+--------+---------------------+
| product_id | reviewers | voting | created_at |
+------------+-----------+--------+---------------------+
| B0021ZFV9M | 4 | 3 | 2015-03-20 00:34:09 |
| B0021ZFV9M | 4 | 3 | 2015-03-24 00:34:09 |
| B0021ZFV9M | 30 | 4 | 2015-04-13 00:55:51 |
| B0021ZFV9M | 30 | 4 | 2015-04-15 00:44:38 |
| B00JKO4CHO | 50 | 1 | 2015-09-17 00:41:40 |
| B00JKO4CHO | 50 | 1 | 2015-09-19 00:41:47 |
| B00JKO4CHO | 50 | 1 | 2015-09-21 00:41:31 |
+------------+-----------+--------+---------------------+

Try this:
select
product_id,
case
when reviewers is null then (
select reviewers from test
where product_id = a.product_id
and created_at > a.created_at
and reviewers is not null
limit 1)
else reviewers
end as reviewers,
case
when voting is null then (
select voting from test
where product_id = a.product_id
and created_at > a.created_at
and voting is not null
limit 1)
else voting
end as voting,
created_at
from test a;
Example: http://sqlfiddle.com/#!9/546dff/3
create table test (
product_id varchar(20),
reviewers int,
voting int,
created_at datetime
);
insert into test values
('B0021ZFV9M',null , null ,'2015-03-20 00:34:09')
,('B0021ZFV9M',4 , 3 ,'2015-03-24 00:34:09')
,('B0021ZFV9M',null , null ,'2015-04-13 00:55:51')
,('B0021ZFV9M',30 , 4 ,'2015-04-15 00:44:38')
,('B00JKO4CHO',null , null ,'2015-09-17 00:41:40')
,('B00JKO4CHO',null , null ,'2015-09-19 00:41:47')
,('B00JKO4CHO',50 , 1 ,'2015-09-21 00:41:31');
Result:
| product_id | reviewers | voting | created_at |
|------------|-----------|--------|-----------------------------|
| B0021ZFV9M | 4 | 3 | March, 20 2015 00:34:09 |
| B0021ZFV9M | 4 | 3 | March, 24 2015 00:34:09 |
| B0021ZFV9M | 30 | 4 | April, 13 2015 00:55:51 |
| B0021ZFV9M | 30 | 4 | April, 15 2015 00:44:38 |
| B00JKO4CHO | 50 | 1 | September, 17 2015 00:41:40 |
| B00JKO4CHO | 50 | 1 | September, 19 2015 00:41:47 |
| B00JKO4CHO | 50 | 1 | September, 21 2015 00:41:31 |
EDIT:
To update old data, you could do this:
-- create a duplicate empty table
create table test1 like test;
-- insert good data into this duplicate table
insert into test1
select
product_id,
case
when reviewers is null then (
select reviewers from test
where product_id = a.product_id
and created_at > a.created_at
and reviewers is not null
limit 1)
else reviewers
end as reviewers,
case
when voting is null then (
select voting from test
where product_id = a.product_id
and created_at > a.created_at
and voting is not null
limit 1)
else voting
end as voting,
created_at
from test a;
-- remove data from original table
truncate table test;
-- re-insert good data into original table
insert into test select * from test1;
-- drop the duplicate table
drop table test1;
Make a backup of test (original) table before you try this.

select distinct on (a.product_id, a.created_at)
a.product_id,
coalesce(a.reviewers, b.reviewers) reviewers,
coalesce(a.voting, b.voting) voting,
a.created_at
from a_table a
left join a_table b
on a.product_id = b.product_id
and b.reviewers notnull
and b.created_at > a.created_at
order by 1, 4;
SqlFiddle.
Note: it is assumed that if reviewers is not null then voting is not null too.

pivot and cascade null columns

I have a table that holds values for particular months:
| MFG | DATE | FACTOR |
-----------------------------
| 1 | 2013-01-01 | 1 |
| 2 | 2013-01-01 | 0.8 |
| 2 | 2013-02-01 | 1 |
| 2 | 2013-12-01 | 1.55 |
| 3 | 2013-01-01 | 1 |
| 3 | 2013-04-01 | 1.3 |
| 3 | 2013-05-01 | 1.2 |
| 3 | 2013-06-01 | 1.1 |
| 3 | 2013-07-01 | 1 |
| 4 | 2013-01-01 | 0.9 |
| 4 | 2013-02-01 | 1 |
| 4 | 2013-12-01 | 1.8 |
| 5 | 2013-01-01 | 1.4 |
| 5 | 2013-02-01 | 1 |
| 5 | 2013-10-01 | 1.3 |
| 5 | 2013-11-01 | 1.2 |
| 5 | 2013-12-01 | 1.5 |
What I would like to do is pivot these using a calendar table (already defined):
And finally, cascade the NULL columns to use the previous value.
What I've got so far is a query that will populate the NULLs with the last value for mfg = 3. Each mfg will always have a value for the first of the year. My question is; how do I pivot this and extend to all mfg?
SELECT c.[date],
f.[factor],
Isnull(f.[factor], (SELECT TOP 1 factor
FROM factors
WHERE [date] < c.[date]
AND [factor] IS NOT NULL
AND mfg = 3
ORDER BY [date] DESC)) AS xFactor
FROM (SELECT [date]
FROM calendar
WHERE Datepart(yy, [date]) = 2013
AND Datepart(d, [date]) = 1) c
LEFT JOIN (SELECT [date],
[factor]
FROM factors
WHERE mfg = 3) f
ON f.[date] = c.[date]
Result
| DATE | FACTOR | XFACTOR |
---------------------------------
| 2013-01-01 | 1 | 1 |
| 2013-02-01 | (null) | 1 |
| 2013-03-01 | (null) | 1 |
| 2013-04-01 | 1.3 | 1.3 |
| 2013-05-01 | 1.2 | 1.2 |
| 2013-06-01 | 1.1 | 1.1 |
| 2013-07-01 | 1 | 1 |
| 2013-08-01 | (null) | 1 |
| 2013-09-01 | (null) | 1 |
| 2013-10-01 | (null) | 1 |
| 2013-11-01 | (null) | 1 |
| 2013-12-01 | (null) | 1 |
SQL Fiddle

Don't know if you need the dates to be dynamic from the calender table or if mfg can be more than 5 but this should give you some ideas.
select *
from (
select c.date,
t.mfg,
(
select top 1 f.factor
from factors as f
where f.date <= c.date and
f.mfg = t.mfg and
f.factor is not null
order by f.date desc
) as factor
from calendar as c
cross apply(values(1),(2),(3),(4),(5)) as t(mfg)
) as t
pivot (
max(t.factor) for t.date in ([20130101], [20130201], [20130301],
[20130401], [20130501], [20130601],
[20130701], [20130801], [20130901],
[20131001], [20131101], [20131201])
) as P
SQL Fiddle

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Group Non-Contiguous Dates By Criteria In Column - sql-server

Related

How to generate IDs based on column values

T-SQL: Values are grouped by month, if there is no value for a month the month should also appear and display "NULL"

Split Time Frequency To Rows

Select a row of first non-null values

pivot and cascade null columns

Categories

Resources