SQL group by date difference with previous row - sql-server

I looking for some grouping using datetime daily rows to build date range intervals
My table is something like:
id | A | B | Date
1 | 1 | 2 | 1/10/2010
2 | 1 | 2 | 2/10/2010
3 | 1 | 2 | 3/10/2010
4 | 1 | 3 | 4/10/2010
5 | 1 | 3 | 5/10/2010
6 | 1 | 2 | 6/10/2010
7 | 1 | 2 | 7/10/2010
8 | 1 | 2 | 8/10/2010
My first try was:
SELECT A, B, MIN(DATE), MAX(date)
FROM table
GROUP BY A, B
So after group by A, B and use min and max with date on my select, I get invalid results due the repetition of B = 2.
A B Date A B min(Date) max(Date)
1 | 1 | 2 | 1/10/2010 1 2 | 1/10/2010 8/10/2010
2 | 1 | 2 | 2/10/2010 Invalid
3 | 1 | 2 | 3/10/2010 ------->
6 | 1 | 2 | 6/10/2010
7 | 1 | 2 | 7/10/2010
8 | 1 | 2 | 8/10/2010
I'm looking for how to calculate the third member of the group by...
So the expected intervals results:
A B Start Date End Date
.. | 1 | 2 | 1/10/2010 | 3/10/2010
.. | 1 | 3 | 4/10/2010 | 5/10/2010
.. | 1 | 2 | 6/10/2010 | 8/10/2010
I need to support SQL Server 2008
Thank you in advance for your help

The following is an easy way to deal with "islands and gaps" where you need to find gaps in consecutive dates:
SELECT A, B, StartDate = MIN([Date]), EndDate = MAX([Date])
FROM
(
SELECT *,
RN = DATEDIFF(DAY, 0, [Date]) - ROW_NUMBER() OVER (PARTITION BY A, B ORDER BY [Date])
FROM myTable
) AS T
GROUP BY A, B, RN;
To break it down into slightly simpler-to-understand logic: you assign each date a number (DATEDIFF(DAY, 0, [Date]) here) and each date a row number (partitioned by A and B here), then any time there's a gap in the dates, the difference between those two will change.
There are a variety of resources you can use to understand different approaches to "islands and gaps" problems. Here is one that might help you with tackling other varieties of this in the future: https://www.red-gate.com/simple-talk/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/

Related

Calculating week numbers from custom dates

I have client ids and their dates of login. i want to calculate the week number with respect to their first login date
i am fairly new to sql
Demo output
ClientID Date of login Week Number
1 2019-12-20 1
1 2019-12-21 1
1 2019-12-21 1
1 2019-12-22 1
1 2019-12-29 2
1 2019-12-29 2
2 2020-01-27 1
2 2020-01-28 1
2 2020-02-05 2
2 2020-02-06 2
2 2020-02-16 3
This is very trivial date arithmetic that just requires the min DateOfLogin for each ClientID, which you can find with a windowed function.
Calculate the datediff in days between this date and the current DateOfLogin, integer divide by 7 (to return no fractional days) and then add 1 to correctly offset the WeekNum value:
declare #l table(ClientID int, DateOfLogin date);
insert into #l values(1,'2019-12-20'),(1,'2019-12-21'),(1,'2019-12-21'),(1,'2019-12-22'),(1,'2019-12-29'),(1,'2019-12-29'),(2,'2020-01-27'),(2,'2020-01-28'),(2,'2020-02-05'),(2,'2020-02-06'),(2,'2020-02-16');
select ClientID
,DateOfLogin
,(datediff(day,min(DateOfLogin) over (partition by ClientID),DateOfLogin) / 7) + 1 as WeekNum
from #l;
Output
+----------+-------------+---------+
| ClientID | DateOfLogin | WeekNum |
+----------+-------------+---------+
| 1 | 2019-12-20 | 1 |
| 1 | 2019-12-21 | 1 |
| 1 | 2019-12-21 | 1 |
| 1 | 2019-12-22 | 1 |
| 1 | 2019-12-29 | 2 |
| 1 | 2019-12-29 | 2 |
| 2 | 2020-01-27 | 1 |
| 2 | 2020-01-28 | 1 |
| 2 | 2020-02-05 | 2 |
| 2 | 2020-02-06 | 2 |
| 2 | 2020-02-16 | 3 |
+----------+-------------+---------+
This query returns the week number.
select DATENAME(WW, '2019-12-20')
This is for MSSQL.
Here might be a solution for you, you'll maybe just have to look at the way you are going to do the insert and maybe optimize it a bit better.
select 1 AS 'ClientID', '2019-12-20' AS 'LogInDate', 1 AS 'Week'
into #test
insert into #test
select top(1) 1, '2020-02-05', case DATEDIFF(week,'2020-02-05',LogInDate) when 0 then week else Week +1 end from #test where ClientID = 1 order by LogInDate desc

SQL Server - identify combinations of values and assign combination identifier

I am trying to assign what amounts to a 'combinationid' to rows of my table, based on the values in the two columns below. Each product has a number of customers linked to it. For every combination of customers, I need to create a combination ID.
For example, the combination of customers for product 'a' is the same combination of customers for product 'c' (they both have customers 1, 2 and 3), so products a and c should have the same combination identifier ('customergroup'). However, products should not share the same customergroup if they only share some of the same customers - e.g. product b only has customers 1 and 2 (not 3), so should have a different customergroup to products 'a' and 'c'.
Input:
| productid | customerid |
|-----------|------------|
| a | 1 |
| a | 2 |
| a | 3 |
| b | 1 |
| b | 2 |
| c | 3 |
| c | 2 |
| c | 1 |
| d | 1 |
| d | 3 |
| e | 1 |
| e | 2 |
| f | 1 |
| g | 2 |
| h | 3 |
Desired output:
| productid | customerid | customergroup |
|-----------|------------|---------------|
| a | 1 | 1 |
| a | 2 | 1 |
| a | 3 | 1 |
| b | 1 | 2 |
| b | 2 | 2 |
| c | 3 | 1 |
| c | 2 | 1 |
| c | 1 | 1 |
| d | 1 | 3 |
| d | 3 | 3 |
| e | 1 | 2 |
| e | 2 | 2 |
| f | 1 | 4 |
| g | 2 | 5 |
| h | 3 | 6 |
or just
| productid | customergroupid |
|-----------|-----------------|
| a | 1 |
| b | 2 |
| c | 1 |
| d | 3 |
| e | 2 |
| f | 4 |
| g | 5 |
| h | 6 |
Edit: first version of this did include a description of my attempts. I currently have nested queries that basically give me a column for customer 1, 2, 3 etc and then uses dense rank to get the grouping. The problem is that is not dynamic for different numbers of customers and I did not know where to start for getting a dynamic result as above. Thanks for the replies.
Considering you haven't shown your efforts, or confirmed the version you're using, I've assumed you have the latest ("and greatest") version of SQL Server, which means you have access to STRING_AGG.
This doesn't give the groupings in the same order, but I'm going to also also that doesn't matter, and the grouping is just arbitrary. This gives you the following:
WITH VTE AS(
SELECT *
FROM (VALUES('a',1),
('a',2),
('a',3),
('b',1),
('b',2),
('c',3),
('c',2),
('c',1),
('d',1),
('d',3),
('e',1),
('e',2),
('f',1),
('g',2),
('h',3)) V(productid,customerid)),
Groups AS(
SELECT productid,
STRING_AGG(customerid,',') WITHIN GROUP (ORDER BY customerid) AS CustomerIDs
FROM VTE
GROUP BY productid),
Rankings AS(
SELECT productid,
CustomerIDs,
DENSE_RANK() OVER (ORDER BY CustomerIDs ASC) AS Grouping
FROM Groups)
SELECT V.productid,
V.customerid,
R.Grouping AS customergroupid
FROM VTE V
JOIN Rankings R ON V.productid = R.productid
ORDER BY V.productid,
V.customerid;
db<>fiddle.
If you aren't using SQL Server 2017, I suggest looking up the FOR XML PATH method for string aggregation.
Using Larnu's answer this is how I got the result for 2008:
WITH VTE AS(
SELECT *
FROM (VALUES('a','1'),
('a','2'),
('a','3'),
('b','1'),
('b','2'),
('c','3'),
('c','2'),
('c','1'),
('d','1'),
('d','3'),
('e','1'),
('e','2'),
('f','1'),
('g','2'),
('h','3')) V(productid,customerid)),
Groups AS(
SELECT productid, CustomerIDs = STUFF((SELECT N', ' + customerid
FROM VTE AS p2
WHERE p2.productid = p.productid
ORDER BY customerid
FOR XML PATH(N'')), 1, 2, N'')
FROM VTE AS p
GROUP BY productid),
Rankings AS(
SELECT productid,
CustomerIDs,
DENSE_RANK() OVER (ORDER BY CustomerIDs ASC) AS Grouping
FROM Groups)
SELECT V.productid,
V.customerid,
R.Grouping AS customergroupid
FROM VTE V
JOIN Rankings R ON V.productid = R.productid
ORDER BY V.productid,
V.customerid;
Thanks again for your assistance.

partitioning and selecting clusters with multiple records

The header of question might be confusing so I put my issue into words:
I have a table with master_ids, ids and years. A master_id can contain different ids. Each Id is associated with a year. I already partitioned by master_id and gave each year a rank (year_rank).
+-----------+----+------+-----------+
| master_id | id | year | year_rank |
+-----------+----+------+-----------+
| 100 | 1 | 2017 | 1 |
| 100 | 2 | 2016 | 2 |
| 100 | 3 | 2015 | 3 |
| 200 | 9 | 2001 | 1 |
| 300 | 5 | 2020 | 1 |
| 300 | 4 | 2010 | 2 |
| 400 | 7 | 1999 | 1 |
| 400 | 11 | 1996 | 2 |
| 500 | 20 | 1999 | 1 |
| 600 | 25 | 2005 | 1 |
| 600 | 29 | 2005 | 1 |
+-----------+----+------+-----------+
My goal is to pick only the clusters which have more than 1 record in order to compare it:
+-----------+----+------+-----------+
| master_id | id | year | year_rank |
+-----------+----+------+-----------+
| 100 | 1 | 2017 | 1 |
| 100 | 2 | 2016 | 2 |
| 100 | 3 | 2015 | 3 |
| 300 | 5 | 2020 | 1 |
| 300 | 4 | 2010 | 2 |
| 400 | 7 | 1999 | 1 |
| 400 | 11 | 1996 | 2 |
+-----------+----+------+-----------+
If I put where year_rank > 1 it eliminates the first rows in the clusters with multiple records which I don't want. How can I solve this? I thought about a group by but I don't know how to apply this.
Thank you very much!
Edit: Completely updated for new requirement. This will only show records for master_ids which have multiple years associated with them, however it will show all records associated for that master_id even if they are in the same year (see 600 vs 700).
SQLFiddle here
We will perform your year_rank in cte1 so we can aggregate it with the MAX() function in cte2 to filter out where max is greater than whatever variable you want to put there. We then query cte1 and join on cte2 to only show the records for master_ids that have multiple years associated with them.
WITH cte1 AS (
SELECT
master_id,
id,
year,
RANK() OVER (PARTITION BY master_id ORDER BY year DESC) AS year_rank
FROM tbl
),
cte2 AS (
SELECT
master_id
FROM cte1
GROUP BY master_id
HAVING MAX(year_rank) > 1
)
SELECT
cte1.master_id,
cte1.id,
cte1.year,
cte1.year_rank
FROM cte1
JOIN cte2 ON
cte1.master_id = cte2.master_id
I figured out to eliminate rows which don't have a discrepancy in years within their master_id:
select *,
case
when (master_id = (lead(master_id) over (order by master_id))) and
(year = (lead(service_year) over (order by master_id))) then 'no show'
when (master_id = (lag(master_id) over (order by master_id))) and
(year = (lag(service_year) over (order by master_id))) then 'no show'
else ''
end as note
from table
Now I can put all of that into a temp table and delete the records which have 'no show' in the note column.
What do you think of this? Is there an easier way?

Selecting grouped rows after first two rows SQL Server

This is a bit of a tricky question/situation and my search fu failed me.
Lets say i have the following data
| UID | SharedID | Type | Date |
|-----|----------|------|-----------|
| 1 | 1 | foo | 2/4/2016 |
| 2 | 1 | foo | 2/5/2016 |
| 3 | 1 | foo | 2/8/2016 |
| 4 | 1 | foo | 2/11/2016 |
| 5 | 2 | bar | 1/11/2016 |
| 6 | 2 | bar | 2/11/2016 |
| 7 | 3 | baz | 2/1/2016 |
| 8 | 3 | baz | 2/3/2016 |
| 9 | 3 | baz | 2/11/2016 |
And I would like to ommit a variable number of leading rows (most recent date in this case) and lets say that number is 2 in this example. The resulting table would be something like this:
| UID | SharedID | Type | Date |
|-----|----------|------|-----------|
| 1 | 1 | foo | 2/4/2016 |
| 2 | 1 | foo | 2/5/2016 |
| 7 | 3 | baz | 2/1/2016 |
Is this possible in SQL? Essentially I want to filter on an unknown number of rows which uses the date column as the order by. The goal is to get the oldest types and get a list of UID's in the process.
Sure, it's possible. Use a ROW_NUMBER function to assign a value to each row, partitioning by the SharedID column so that the count restarts every time that ID changes, and select those rows with a value greater than your limit.
WITH cteNumberedRows AS (
SELECT UID, SharedID, Type, Date,
ROW_NUMBER() OVER(PARTITION BY SharedID ORDER BY Date DESC) AS RowNum
FROM YourTable
)
SELECT UID, SharedID, Type, Date
FROM cteNumberedRows
WHERE RowNum > 2;
Not sure if I understand what you mean but something like this?
SELECT * FROM MyTable t1 JOIN MyTable T2 ON t2.id NOT IN (
SELECT TOP 2 UID FROM myTable
WHERE SharedID = t1.sharedID
ORDER BY [Date] DESC
)

Sql server join by group?

I have this table :
id | type | date
1 | a | 01/1/2012
2 | b | 01/1/2012
3 | b | 01/2/2012
4 | b | 01/3/2012
5 | a | 01/5/2012
6 | b | 01/5/2012
7 | b | 01/9/2012
8 | a | 01/10/2012
The POV is per date. if 2 rows contains the same date , so both will visible in the same line ( left join).
Same date can be shared by 2 rows max.
so this situation can't be :
1 | a | 01/1/2012
2 | b | 01/1/2012
3 | a | 01/1/2012
if in the same date there is group a and b show both of them in single line using left join
if in date there is only a group , show it as single line ( +null at the right side )
if in date there is only b group , show it as single line ( +null at the left side )
Desired result :
Date |typeA|typeB |a'id|b'id
01/1/2012 | a | b | 1 | 2
01/2/2012 | | b | | 3
01/3/2012 | | b | | 4
01/5/2012 | a | b | 5 | 6
01/9/2012 | | b | | 7
01/10/2012 | a | | 8 |
I know this suppose to be simple , but the main anchor of join here is the date.
The problem I've encountered is when I read line 1 , i search in the table all rows with the same date...fine. - its ok.
But when I read the second line , I do it also , and it yields the first row - which already was counted...
any help ?
here is the sql fiddle :
https://data.stackexchange.com/stackoverflow/query/edit/82605
I think you want a pivot
select
[date],
case when [a] IS null then null else 'a' end typea,
case when [b] IS null then null else 'b' end typeb,
a as aid,
b as bid
from yourtable src
pivot (max(id) for type in ([a],[b]))p
If you want to do it with joins..
select ISNULL(a.date, b.date), a.type,b.type, a.id,b.id
from
(select * from yourtable where type='a') a
full outer join
(select * from yourtable where type='b') b
on a.date = b.date

Resources