Sql server join by group? - sql-server

I have this table :
id | type | date
1 | a | 01/1/2012
2 | b | 01/1/2012
3 | b | 01/2/2012
4 | b | 01/3/2012
5 | a | 01/5/2012
6 | b | 01/5/2012
7 | b | 01/9/2012
8 | a | 01/10/2012
The POV is per date. if 2 rows contains the same date , so both will visible in the same line ( left join).
Same date can be shared by 2 rows max.
so this situation can't be :
1 | a | 01/1/2012
2 | b | 01/1/2012
3 | a | 01/1/2012
if in the same date there is group a and b show both of them in single line using left join
if in date there is only a group , show it as single line ( +null at the right side )
if in date there is only b group , show it as single line ( +null at the left side )
Desired result :
Date |typeA|typeB |a'id|b'id
01/1/2012 | a | b | 1 | 2
01/2/2012 | | b | | 3
01/3/2012 | | b | | 4
01/5/2012 | a | b | 5 | 6
01/9/2012 | | b | | 7
01/10/2012 | a | | 8 |
I know this suppose to be simple , but the main anchor of join here is the date.
The problem I've encountered is when I read line 1 , i search in the table all rows with the same date...fine. - its ok.
But when I read the second line , I do it also , and it yields the first row - which already was counted...
any help ?
here is the sql fiddle :
https://data.stackexchange.com/stackoverflow/query/edit/82605

I think you want a pivot
select
[date],
case when [a] IS null then null else 'a' end typea,
case when [b] IS null then null else 'b' end typeb,
a as aid,
b as bid
from yourtable src
pivot (max(id) for type in ([a],[b]))p
If you want to do it with joins..
select ISNULL(a.date, b.date), a.type,b.type, a.id,b.id
from
(select * from yourtable where type='a') a
full outer join
(select * from yourtable where type='b') b
on a.date = b.date

Related

DELETE TOP variable records with variable from grouping of another table

Say I have two tables: A and B
Table A
+----+-------+
| id | value |
+----+-------+
| 1 | 20 |
| 2 | 20 |
| 3 | 10 |
| 4 | 0 |
+----+-------+
Table B
+----+-------+
| id | value |
+----+-------+
| 1 | 20 |
| 2 | 10 |
| 3 | 30 |
| 4 | 20 |
| 5 | 20 |
| 6 | 10 |
+----+-------+
If I do SELECT value, COUNT(*) AS occurrence FROM A GROUP BY value, I'll get:
+-------+------------+
| value | occurrence |
+-------+------------+
| 20 | 2 |
| 10 | 1 |
| 0 | 1 |
+-------+------------+
Based on this grouping of table A, I want to delete occurrence records from table B with the same values. In other words, I want to delete from B 2 records with value 20, 1 record with value 10, and 1 record with value 0. (Other conditions include 'do nothing if no record exists' and 'smallest id first', but I think these conditions are pretty trivial compared to the bulk of this question.)
Table B after deleting should be:
+----+-------+
| id | value |
+----+-------+
| 3 | 30 |
| 5 | 20 |
| 6 | 10 |
+----+-------+
From the official TOP documentation, doesn't seems like I can perform some JOIN to use as the TOP expression.
We could use ROW_NUMBER with CTEs here:
WITH cteA AS (
SELECT value, COUNT(*) cnt
FROM A
GROUP BY value
),
cteB AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY value ORDER BY id) rn
FROM B
)
DELETE
FROM cteB b
INNER JOIN cteA a
ON b.value = a.value
WHERE
b.rn <= a.cnt;
The logic here is that we use ROW_NUMBER to keep track of the order of each value in the B table. Then, we join to bring in the counts of each value in the A table, and we only delete B records for which the row number is strictly less than or equal to the A count.
See the demo link below to verify that the logic be correct. Note that I use a select there, not a delete, but the correct rows are being targeted for deletion.
Demo

SQL Server - identify combinations of values and assign combination identifier

I am trying to assign what amounts to a 'combinationid' to rows of my table, based on the values in the two columns below. Each product has a number of customers linked to it. For every combination of customers, I need to create a combination ID.
For example, the combination of customers for product 'a' is the same combination of customers for product 'c' (they both have customers 1, 2 and 3), so products a and c should have the same combination identifier ('customergroup'). However, products should not share the same customergroup if they only share some of the same customers - e.g. product b only has customers 1 and 2 (not 3), so should have a different customergroup to products 'a' and 'c'.
Input:
| productid | customerid |
|-----------|------------|
| a | 1 |
| a | 2 |
| a | 3 |
| b | 1 |
| b | 2 |
| c | 3 |
| c | 2 |
| c | 1 |
| d | 1 |
| d | 3 |
| e | 1 |
| e | 2 |
| f | 1 |
| g | 2 |
| h | 3 |
Desired output:
| productid | customerid | customergroup |
|-----------|------------|---------------|
| a | 1 | 1 |
| a | 2 | 1 |
| a | 3 | 1 |
| b | 1 | 2 |
| b | 2 | 2 |
| c | 3 | 1 |
| c | 2 | 1 |
| c | 1 | 1 |
| d | 1 | 3 |
| d | 3 | 3 |
| e | 1 | 2 |
| e | 2 | 2 |
| f | 1 | 4 |
| g | 2 | 5 |
| h | 3 | 6 |
or just
| productid | customergroupid |
|-----------|-----------------|
| a | 1 |
| b | 2 |
| c | 1 |
| d | 3 |
| e | 2 |
| f | 4 |
| g | 5 |
| h | 6 |
Edit: first version of this did include a description of my attempts. I currently have nested queries that basically give me a column for customer 1, 2, 3 etc and then uses dense rank to get the grouping. The problem is that is not dynamic for different numbers of customers and I did not know where to start for getting a dynamic result as above. Thanks for the replies.
Considering you haven't shown your efforts, or confirmed the version you're using, I've assumed you have the latest ("and greatest") version of SQL Server, which means you have access to STRING_AGG.
This doesn't give the groupings in the same order, but I'm going to also also that doesn't matter, and the grouping is just arbitrary. This gives you the following:
WITH VTE AS(
SELECT *
FROM (VALUES('a',1),
('a',2),
('a',3),
('b',1),
('b',2),
('c',3),
('c',2),
('c',1),
('d',1),
('d',3),
('e',1),
('e',2),
('f',1),
('g',2),
('h',3)) V(productid,customerid)),
Groups AS(
SELECT productid,
STRING_AGG(customerid,',') WITHIN GROUP (ORDER BY customerid) AS CustomerIDs
FROM VTE
GROUP BY productid),
Rankings AS(
SELECT productid,
CustomerIDs,
DENSE_RANK() OVER (ORDER BY CustomerIDs ASC) AS Grouping
FROM Groups)
SELECT V.productid,
V.customerid,
R.Grouping AS customergroupid
FROM VTE V
JOIN Rankings R ON V.productid = R.productid
ORDER BY V.productid,
V.customerid;
db<>fiddle.
If you aren't using SQL Server 2017, I suggest looking up the FOR XML PATH method for string aggregation.
Using Larnu's answer this is how I got the result for 2008:
WITH VTE AS(
SELECT *
FROM (VALUES('a','1'),
('a','2'),
('a','3'),
('b','1'),
('b','2'),
('c','3'),
('c','2'),
('c','1'),
('d','1'),
('d','3'),
('e','1'),
('e','2'),
('f','1'),
('g','2'),
('h','3')) V(productid,customerid)),
Groups AS(
SELECT productid, CustomerIDs = STUFF((SELECT N', ' + customerid
FROM VTE AS p2
WHERE p2.productid = p.productid
ORDER BY customerid
FOR XML PATH(N'')), 1, 2, N'')
FROM VTE AS p
GROUP BY productid),
Rankings AS(
SELECT productid,
CustomerIDs,
DENSE_RANK() OVER (ORDER BY CustomerIDs ASC) AS Grouping
FROM Groups)
SELECT V.productid,
V.customerid,
R.Grouping AS customergroupid
FROM VTE V
JOIN Rankings R ON V.productid = R.productid
ORDER BY V.productid,
V.customerid;
Thanks again for your assistance.

SQL group by date difference with previous row

I looking for some grouping using datetime daily rows to build date range intervals
My table is something like:
id | A | B | Date
1 | 1 | 2 | 1/10/2010
2 | 1 | 2 | 2/10/2010
3 | 1 | 2 | 3/10/2010
4 | 1 | 3 | 4/10/2010
5 | 1 | 3 | 5/10/2010
6 | 1 | 2 | 6/10/2010
7 | 1 | 2 | 7/10/2010
8 | 1 | 2 | 8/10/2010
My first try was:
SELECT A, B, MIN(DATE), MAX(date)
FROM table
GROUP BY A, B
So after group by A, B and use min and max with date on my select, I get invalid results due the repetition of B = 2.
A B Date A B min(Date) max(Date)
1 | 1 | 2 | 1/10/2010 1 2 | 1/10/2010 8/10/2010
2 | 1 | 2 | 2/10/2010 Invalid
3 | 1 | 2 | 3/10/2010 ------->
6 | 1 | 2 | 6/10/2010
7 | 1 | 2 | 7/10/2010
8 | 1 | 2 | 8/10/2010
I'm looking for how to calculate the third member of the group by...
So the expected intervals results:
A B Start Date End Date
.. | 1 | 2 | 1/10/2010 | 3/10/2010
.. | 1 | 3 | 4/10/2010 | 5/10/2010
.. | 1 | 2 | 6/10/2010 | 8/10/2010
I need to support SQL Server 2008
Thank you in advance for your help
The following is an easy way to deal with "islands and gaps" where you need to find gaps in consecutive dates:
SELECT A, B, StartDate = MIN([Date]), EndDate = MAX([Date])
FROM
(
SELECT *,
RN = DATEDIFF(DAY, 0, [Date]) - ROW_NUMBER() OVER (PARTITION BY A, B ORDER BY [Date])
FROM myTable
) AS T
GROUP BY A, B, RN;
To break it down into slightly simpler-to-understand logic: you assign each date a number (DATEDIFF(DAY, 0, [Date]) here) and each date a row number (partitioned by A and B here), then any time there's a gap in the dates, the difference between those two will change.
There are a variety of resources you can use to understand different approaches to "islands and gaps" problems. Here is one that might help you with tackling other varieties of this in the future: https://www.red-gate.com/simple-talk/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/

Merging multiple bytea rows based on id

I'm working with a postgres database where I need to merge multiple rows into a row based on ID.
ID | A | B | C |
--------------------
1 | x | | |
1 | x | y | |
2 | x | | z |
3 | | y | |
3 | | | z |
A, B and C are bytea columns.
I need to merge it as follows:
ID | A | B | C |
--------------------
1 | x | y | |
2 | x | | z |
3 | | y | z |
The problem occurs when I do GROUP BY on ID, as I'm not able to find a appropriate aggregate function for bytea columns.
You can always do it with sub queries
WITH allID as (
SELECT distinct ID
FROM YourTable
)
SELECT
ID,
(SELECT A FROM yourTable yt where yt.ID = ai.ID ORDER BY A LIMIT 1) as A,
(SELECT B FROM yourTable yt where yt.ID = ai.ID ORDER BY B LIMIT 1) as B,
(SELECT C FROM yourTable yt where yt.ID = ai.ID ORDER BY C LIMIT 1) as C
FROM allID as ai

Getting duplicates with additional information

I've inherited a database and I'm having trouble constructing a working SQL query.
Suppose this is the data:
[Products]
| Id | DisplayId | Version | Company | Description |
|---- |----------- |---------- |-----------| ----------- |
| 1 | 12345 | 0 | 16 | Random |
| 2 | 12345 | 0 | 2 | Random 2 |
| 3 | AB123 | 0 | 1 | Random 3 |
| 4 | 12345 | 1 | 16 | Random 4 |
| 5 | 12345 | 1 | 2 | Random 5 |
| 6 | AB123 | 0 | 5 | Random 6 |
| 7 | 12345 | 2 | 16 | Random 7 |
| 8 | XX45 | 0 | 5 | Random 8 |
| 9 | XX45 | 0 | 7 | Random 9 |
| 10 | XX45 | 1 | 5 | Random 10 |
| 11 | XX45 | 1 | 7 | Random 11 |
[Companies]
| Id | Code |
|---- |-----------|
| 1 | 'ABC' |
| 2 | '456' |
| 5 | 'XYZ' |
| 7 | 'XYZ' |
| 16 | '456' |
The Versioncolumn is a version number. Higher numbers indicate more recent versions.
The Company column is a foreign key referencing the Companies table on the Id column.
There's another table called ProductData with a ProductId column referencing Products.Id.
Now I need to find duplicates based on the DisplayId and the corresponding Companies.Code. The ProductData table should be joined to show a title (ProductData.Title), and only the most recent ones should be included in the results. So the expected results are:
| Id | DisplayId | Version | Company | Description | ProductData.Title |
|---- |----------- |---------- |-----------|------------- |------------------ |
| 5 | 12345 | 1 | 2 | Random 2 | Title 2 |
| 7 | 12345 | 2 | 16 | Random 7 | Title 7 |
| 10 | XX45 | 1 | 5 | Random 10 | Title 10 |
| 11 | XX45 | 1 | 7 | Random 11 | Title 11 |
because XX45 has 2 "entries": one with Company 5 and one with Company 7, but both companies share the same code.
because 12345 has 2 "entries": one with Company 2 and one with Company 16, but both companies share the same code. Note that the most recent version of both differs (version 2 for company 16's entry and version 1 for company 2's entry)
ABC123 should not be included as its 2 entries have different company codes.
I'm eager to learn your insights...
Based on your sample data, you just need to JOIN the tables:
SELECT
p.Id, p.DisplayId, p.Version, p.Company, d.Title
FROM Products AS p
INNER JOIN Companies AS c ON p.Company = c.Id
INNER JOIN ProductData AS d ON d.ProductId = p.Id;
But if you want the latest one, you can use the ROW_NUMBER():
WITH CTE
AS
(
SELECT
p.Id, p.DisplayId, p.Version, p.Company, d.Title,
ROW_NUMBER() OVER(PARTITION BY p.DisplayId,p.Company ORDER BY p.Id DESC) AS RN
FROM Products AS p
INNER JOIN Companies AS c ON p.Company = c.Id
INNER JOIN ProductData AS d ON d.ProductId = p.Id
)
SELECT *
FROM CTE
WHERE RN = 1;
sample fiddle
| Id | DisplayId | Version | Company | Title |
|----|-----------|---------|---------|----------|
| 5 | 12345 | 1 | 2 | Title 5 |
| 7 | 12345 | 2 | 16 | Title 7 |
| 10 | XX45 | 1 | 5 | Title 10 |
| 11 | XX45 | 1 | 7 | Title 11 |
If i understood you correctly, you can use CTE to find all the duplicated rows from your table, then you can just use SELECT from CTE and even add more manipulations.
WITH CTE AS(
SELECT Id,DisplayId,Version,Company,Description,ProductData.Title
RN = ROW_NUMBER()OVER(PARTITION BY DisplayId, Company ORDER BY p.Id DESC)
FROM dbo.YourTable1
)
SELECT *
FROM CTE
Try this:
SELECT b.ID,displayid,version,company,productdata.title
FROM
(select A.ID,a.displayid,version,a.company,rn,a.code, COUNT(displayid) over (partition by displayid,code) cnt from
(select Prod.ID,displayid,version,company,Companies.code, Row_number() over (partition by displayid,company order by version desc) rn
from Prod inner join Companies on Prod.Company = Companies.id) a
where a.rn=1) b inner join productdata on b.id = productdata.id where cnt =2
You have to first get the current version and then you see how many times the DisplayID + Code show-up. Then based on that you can select only the ones that have a count greater than one. You can then INNER JOIN ProductData on the final query to get the Title.
WITH
MaxVersion AS --Get the current versions
(
SELECT
MAX(Version) AS Version,
DisplayID,
Company
FROM
#TmpProducts
GROUP BY
DisplayID,
Company
)
,CTE AS
(
SELECT
p.DisplayID,
c.Code,
COUNT(*) AS RowCounter
FROM
#TmpProducts p
INNER JOIN
#TmpCompanies c
ON
c.ID = p.Company
INNER JOIN
MaxVersion mv
ON
mv.DisplayID = p.DisplayID
AND mv.Version = p.Version
AND mv.Company = p.Company
GROUP BY
p.DisplayID,
c.Code
)
SELECT
p.*
FROM
#TmpProducts p
INNER JOIN
CTE c
ON
c.DisplayID = p.DisplayID
INNER JOIN
MaxVersion mv
ON
mv.DisplayID = p.DisplayID
AND mv.Company = p.Company
AND mv.Version = p.Version
WHERE
c.RowCounter > 1

Resources