How to use GROUP BY to identify row differences

How to use GROUP BY to identify row differences - sql-server

I'm working in SQL Server 2008. I have the following situation. I have a table that has 2 columns which comprise the primary key. (No uniqueness constraint is defined on the key, though.) I know that I have primary key duplicates. Per primary key, I want to identify the distinct values in another column. So, let's say I have the following table:
INSERT INTO some_table (Col1, Col2, COl3) VALUES
('A', '1', 'a'),
('A', '1', 'b'),
('B', '1', 'a'),
('B', '2', 'b'),
('C', '1', 'a'),
('C', '1', 'a'),
('C', '2', 'b')
I want to group by Col1 and Col2, and I want to find all rows where there are more than 1 distinct Col3 values. For example, with the above table, I expect to see:
(A, 1, a), (A, 1, b).
How do I write this SQL query? My SELECT statement needs to include Col1, Col2, and Col3. But, if I do a GROUP BY Col1, Col2, then I can't have Col3 in the SELECT statement.

You can't really have more than one item selected per group in group by, but maybe you need something like this:
select
col1,
col2,
stuff((select ',' + col3 from some_table t2
where t1.col1 = t2.col1 and t1.col2 = t2.col2
FOR XML PATH ('')), 1, 1, '') as items
from
some_table t1
group by
col1,
col2
having count(distinct col3) > 1
This will return the "duplicate" items in comma separated list in the 3rd column.
SQL Fiddle

Here is one way to solve this:
;WITH CTE AS (
Select col1, col2, min(col3) as minvalue, max(col3) as maxvalue
From myTable
Group by col1, col2
Having min(col3) < max(col3)
)
Select *
From myTable t
Inner join cte
On t.col1 = cte.col1
And t.col2 = cte.col2
Where col3 >= minvalue
And col3 <= maxvalue
Note code written directly here, there might be some mistakes.

Try this:
CREATE TABLE #some_table
(
Col1 char(1),
Col2 char(1),
Col3 char(1)
)
INSERT INTO #some_table (Col1, Col2, COl3) VALUES
('A', '1', 'a'),
('A', '1', 'b'),
('B', '1', 'a'),
('B', '2', 'b'),
('C', '1', 'a'),
('C', '1', 'a'),
('C', '2', 'b')
select
stuff((select ',' + '('+Col1 +', '+ Col2 + ', ' + Col3 + ')' from #some_table T_IN
where T.col1 = T_IN.col1 and T.col2 = T_IN.col2 FOR XML PATH ('')), 1, 1, '') as items
from
#some_table T
group by col1, col2

This is one way to get there:
SELECT *
FROM
T T1
WHERE
EXISTS(SELECT * FROM T T2
WHERE
T1.a = T2.a
AND T1.b = T2.b
AND T1.c <> T2.c);
sql fiddle
or in similar variation, that would allow to give the required minimum distinct number:
WHERE
(SELECT COUNT(DISTINCT T2.c)
FROM T T2
WHERE T1.a = T2.a AND T1.b = T2.b) >= 2;
sql fiddle

Related

Stored procedure in SnowSQL to combine 2 tables and avoid duplicates

DECLARE #tmp1 TABLE (ItemCode INT, Item VARCHAR(30), Qty INT)
INSERT INTO #tmp1 (ItemCode, Item, Qty)
VALUES(1, 'Item1', 300),
(2, 'Item2', 500)
DECLARE #tmp2 TABLE (JobNo INT, ItemCode INT, Item VARCHAR(30), Qty INT)
INSERT INTO #tmp2 (JobNo, ItemCode, Item, Qty)
VALUES(1, 1, 'Item1', 150),
(2, 1, 'Item1', 150),
(3, 2, 'Item1', 50),
(4, 2, 'Item1', 75),
(5, 2, 'Item1', 125),
(6, 2, 'Item1', 100)
;WITH MyCTE AS
(
SELECT t1.ItemCode, t1.Item, t1.Qty, t2.JobNo, t2.ItemCode AS ItemCode2, t2.Item AS Item2, t2.Qty AS Qty2, t2.MySeed
FROM #tmp1 AS t1 INNER JOIN (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ItemCode ORDER BY JobNo) AS [MySeed]
FROM #tmp2
) AS t2 ON t1.ItemCode = t2.ItemCode
WHERE t2.MySeed = 1
UNION ALL
SELECT NULL AS ItemCode, NULL AS Item, NULL AS Qty, t2.JobNo, t2.ItemCode AS ItemCode2, t2.Item AS Item2, t2.Qty AS Qty2, t2.MySeed
FROM #tmp1 AS t1 INNER JOIN (
SELECT *, ROW_NUMBER() OVER(PARTITION BY ItemCode ORDER BY JobNo) AS [MySeed]
FROM #tmp2
) AS t2 ON t1.ItemCode = t2.ItemCode
WHERE t2.MySeed > 1
)
SELECT *
FROM MyCTE
ORDER BY ItemCode2, MySeed
I found a sample query online as above which is straight forward as it combines table 1 and table 2 and it assumes in table 2 if myseed count is 1 then input null values.
But, I need a query in snowflake for the scenario where table 1 can have any number of entries for each id and table 2 can also have multiple entries for each id. I need to join both the tables without duplicates and input null in the rows when the one table has multiple entries and other table don't have multiple entries.
How to write in snowflake(SNOWSQL)?
I need a result like below to combine two tables and avoid duplicates in either one of the table

SQL concat integers and group them with from to

I'm new to stackoverflow, but I'm stuck with my query.
I've got a SQL table whitch looks like this:
+-------+------------+
| col1 | col2 |
+-------+------------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 6 |
+-------+------------+
I don't know how to get the following resultset:
+-------+------------+
| col1 |SerialNumber|
+-------|------------+
| 1 | 1 to 4, 6 |
+--------------------+
With XML Path i can get this:
+-------+------------+
| col1 |SerialNumber|
+-------|------------+
| 1 | 1,2,3,4,6, |
+--------------------+
This is my query for it:
SELECT DISTINCT O.Col1,
(SELECT CAST(P.Col2 As varchar(5)) + ',' AS [text()]
FROM #Test P
WHERE P.Col1 = O.Col1
ORDER BY P.Col1
FOR XML PATH('')) AS 'SerialNumber'
FROM #Test O
I'm sorry if my question was already asked. I'm also lacking Keywords for this topic.

Test data:
CREATE TABLE t(col1 int,col2 int)
INSERT t(col1,col2)VALUES
(1,1),(1,2),(1,3),(1,4),
(1,6),(1,7),(1,8),(1,9),
(1,11),
(1,13),
(2,3),(2,4),(2,5),
(2,7)
A variant with FOR XML PATH:
SELECT col1,col2,outVal
INTO #temp
FROM
(
SELECT
col1,
col2,
outVal,
ISNULL(LEAD(outVal)OVER(PARTITION BY col1 ORDER BY col2),'') nextOutVal
FROM
(
SELECT
col1,
col2,
CASE
WHEN col2-1=LAG(col2)OVER(PARTITION BY col1 ORDER BY col2) AND col2+1=LEAD(col2)OVER(PARTITION BY col1 ORDER BY col2)
THEN 'to'
ELSE CAST(col2 AS varchar(10))
END outVal
FROM t
) q
) q
WHERE outVal<>nextOutVal
ORDER BY col1,col2
SELECT
t1.col1,
REPLACE(STUFF(
(
SELECT ','+t2.outVal
FROM #temp t2
WHERE t2.col1=t1.col1
ORDER BY t2.col2
FOR XML PATH('')
),1,1,''),',to,',' to ') SerialNumber
FROM (SELECT DISTINCT col1 FROM #temp) t1
DROP TABLE #temp
A variant for SQL Server 2017 (with STRING_AGG):
SELECT
col1,
REPLACE(STRING_AGG(outVal,',')WITHIN GROUP(ORDER BY col2),',to,',' to ')
FROM
(
SELECT
col1,
col2,
outVal,
ISNULL(LEAD(outVal)OVER(PARTITION BY col1 ORDER BY col2),'') nextOutVal
FROM
(
SELECT
col1,
col2,
CASE
WHEN col2-1=LAG(col2)OVER(PARTITION BY col1 ORDER BY col2) AND col2+1=LEAD(col2)OVER(PARTITION BY col1 ORDER BY col2)
THEN 'to'
ELSE CAST(col2 AS varchar(10))
END outVal
FROM t
) q
) q
WHERE outVal<>nextOutVal
GROUP BY col1
Result:
col1 SerialNumber
1 1 to 4,6 to 9,11,13
2 3 to 5,7

Solution:
Another possible approach using CTE for start and end values for each sequence and group concatenation:
T-SQL:
-- Table creation
CREATE TABLE #ValuesTable (
Col1 int,
Col2 int
)
INSERT INTO #ValuesTable VALUES (1, 1)
INSERT INTO #ValuesTable VALUES (1, 2)
INSERT INTO #ValuesTable VALUES (1, 3)
INSERT INTO #ValuesTable VALUES (1, 4)
INSERT INTO #ValuesTable VALUES (1, 6)
INSERT INTO #ValuesTable VALUES (2, 1)
INSERT INTO #ValuesTable VALUES (2, 2)
INSERT INTO #ValuesTable VALUES (2, 3)
INSERT INTO #ValuesTable VALUES (2, 4)
INSERT INTO #ValuesTable VALUES (2, 6)
INSERT INTO #ValuesTable VALUES (2, 7);
INSERT INTO #ValuesTable VALUES (2, 10);
-- Find sequences
WITH
TableStart AS (
SELECT t.Col1, t.Col2, ROW_NUMBER() OVER (ORDER BY t.Col1, t.Col2) AS RN
FROM #ValuesTable t
LEFT JOIN #ValuesTable b ON (t.Col1 = b.Col1) AND (t.Col2 = b.Col2 + 1)
WHERE (b.Col2 IS NULL)
),
TableEnd AS (
SELECT t.Col1, t.Col2, ROW_NUMBER() OVER (ORDER BY t.Col1, t.Col2) AS RN
FROM #ValuesTable t
LEFT JOIN #ValuesTable b ON (t.Col1 = b.Col1) AND (t.Col2 = b.Col2 - 1)
WHERE (b.Col2 IS NULL)
),
TableSequences AS (
SELECT
TableStart.Col1 AS Col1,
CASE
WHEN (TableStart.Col2 <> TableEnd.Col2) THEN CONVERT(nvarchar(max), TableStart.Col2) + N' to ' + CONVERT(nvarchar(max), TableEnd.Col2)
ELSE CONVERT(nvarchar(max), TableStart.Col2)
END AS Sequence
FROM TableStart
LEFT JOIN TableEnd ON (TableStart.RN = TableEnd.RN)
)
-- Select with group concatenation
SELECT
t1.Col1,
(
SELECT t2.Sequence + N', '
FROM TableSequences t2
WHERE t2.Col1 = t1.Col1
ORDER BY t2.Col1
FOR XML PATH('')
) SerialNumber
FROM (SELECT DISTINCT Col1 FROM TableSequences) t1
Output:
Col1 SerialNumber
1 1 to 4, 6,
2 1 to 4, 6 to 7, 10,
Notes:
Tested on SQL Server 2005, 2012, 2016.

how to select and join same table in mssql [duplicate]

I have a simple categories table as with the following columns:
Id
Name
ParentId
So, an infinite amount of Categories can be the child of a category. Take for example the following hierarchy:
I want, in a simple query that returns the category "Business Laptops" to also return a column with all it's parents, comma separator or something:
Or take the following example:

Recursive cte to the rescue....
Create and populate sample table (Please save us this step in your future questions):
DECLARE #T as table
(
id int,
name varchar(100),
parent_id int
)
INSERT INTO #T VALUES
(1, 'A', NULL),
(2, 'A.1', 1),
(3, 'A.2', 1),
(4, 'A.1.1', 2),
(5, 'B', NULL),
(6, 'B.1', 5),
(7, 'B.1.1', 6),
(8, 'B.2', 5),
(9, 'A.1.1.1', 4),
(10, 'A.1.1.2', 4)
The cte:
;WITH CTE AS
(
SELECT id, name, name as path, parent_id
FROM #T
WHERE parent_id IS NULL
UNION ALL
SELECT t.id, t.name, cast(cte.path +','+ t.name as varchar(100)), t.parent_id
FROM #T t
INNER JOIN CTE ON t.parent_id = CTE.id
)
The query:
SELECT id, name, path
FROM CTE
Results:
id name path
1 A A
5 B B
6 B.1 B,B.1
8 B.2 B,B.2
7 B.1.1 B,B.1,B.1.1
2 A.1 A,A.1
3 A.2 A,A.2
4 A.1.1 A,A.1,A.1.1
9 A.1.1.1 A,A.1,A.1.1,A.1.1.1
10 A.1.1.2 A,A.1,A.1.1,A.1.1.2
See online demo on rextester

Delete or Select Only row with same id but have values for certain fields

I would like to select/delete data with different rows but with same id.
For Example.
ID ColumnA
A Honda
A NULL
B Yamaha
B NULL
C NULL
C Merc
D NULL
E NULL
Output:
ID ColumnA
A Honda
B Yamaha
C Merc
D NULL
E NULL
First thing, I already google for the solutions, but no answers. Any help is greatly appreciated

You could use Row_number and TOP 1 WITH TIES
DECLARE #SampleData AS TABLE
(
ID varchar(10),
ColumnA varchar(20)
)
INSERT INTO #SampleData
VALUES
('A', 'Honda'),
('A', NULL),
('B', 'Yamaha'),
('B', NULL),
('C', NULL),
('C', 'Merc'),
('D', NULL),
('E', NULL)
SELECT TOP (1) WITH TIES
sd.ID,
sd.ColumnA
FROM #SampleData sd
ORDER BY Row_number() OVER(PARTITION BY sd.ID ORDER BY sd.ColumnA DESC)
Return
ID ColumnA
------------
A Honda
B Yamaha
C Merc
D NULL
E NULL

;With cte(ID, ColumnA)
AS
(
SELECT 'A','Honda' Union all
SELECT 'A',NULL Union all
SELECT 'B','Yamaha' Union all
SELECT 'B',NULL Union all
SELECT 'C',NULL Union all
SELECT 'C','Merc' Union all
SELECT 'D' , NULL Union all
SELECT 'E', NULL
)
SELECT ID, ColumnA From
(
SELECT *,ROW_NUMBER()Over(Partition by ID order by ColumnA DESc)AS Seq from cte
)Dt
WHERE dt.Seq =1
Output:
ID ColumnA
A Honda
B Yamaha
C Merc
D NULL
E NULL

try this:
declare #tb table(ID varchar(50), ColumnA varchar(50))
insert into #tb
select 'A', 'Honda' union all
select 'A' , null union all
select 'B', 'Yamaha' union all
select 'B' , null union all
select 'C' , null union all
select 'C', 'Merc' union all
select 'D', NULL union all
select 'E', NULL
select a.id,b.ColumnA from
(select count(1) cnt,ID from #tb group by ID having count(1)>1 or count(1)=1) as a
left join
(select * from #tb) as b on a.ID = b.ID
where b.columnA is not null and cnt>1 or cnt =1
result:
A Honda
B Yamaha
C Merc
D NULL
E NULL

drop table if exists dbo.Motorcycle;
create table dbo.Motorcycle (
ID char(1)
, ColumnA varchar(100)
);
insert into dbo.Motorcycle (ID, ColumnA)
values ('A', 'Honda'), ('A', null)
, ('B', 'Yamaha'), ('B', null)
, ('C', null), ('C', 'Merc')
, ('D', null)
, ('E', null);
select
t.ID, t.ColumnA
from (
select
*
, ROW_NUMBER() over (partition by m.ID order by m.ColumnA desc) as RBr
from dbo.Motorcycle m
) t
where t.RBr = 1

Row based Condition in Left Join

Is there a way to write a row based condition in Left Join.
If some row not exists based on column condition, then it should take the next first row.
I have the structure below,
create table Report
(
id int,
name varchar(10)
)
create table ReportData
(
report_id int references report(id),
flag bit,
path varchar(50)
)
insert into Report values (1, 'a');
insert into Report values (2, 'b');
insert into Report values (3, 'c');
insert into ReportData values (1, 0, 'xx');
insert into ReportData values (2, 0, 'yy');
insert into ReportData values (2, 1, 'yy');
insert into ReportData values (3, 1, 'zz');
insert into ReportData values (3, 1, 'mm');
I need some output like
1 a 0 xx
2 b 0 yy
3 c 1 zz

You can use ROW_NUMBER for this:
;WITH ReportDate_Rn AS (
SELECT report_id, flag, path,
ROW_NUMBER() OVER (PARTITION BY report_id ORDER BY path) AS rn
FROM ReportDate
)
SELECT t1.id, t1.name, t2.flag, t2.path
FROM Report AS t1
JOIN ReportDate_Rn AS t2 ON t1.id = t2.report_id AND t2.rn = 1
The above query regards as first record of each report_id slice, the one having the alphabetically smallest path. You may amend the ORDER BY clause of the ROW_NUMBER() window function as you wish.

SELECT id,name,flag,path
FROM
(
SELECT Report.id,Report.name,ReportData.flag,ReportData.path,
row_number() over(partition by ReportData.report_id order by flag) as rownum
FROM Report
JOIN ReportData on Report.id = ReportData.report_id
) tmp
WHERE tmp.rownum=1

A simpler alternative to the left join, using rowid and rownum
SELECT id, name, flag, path
FROM report, reportdata
WHERE reportdata.rowid = (SELECT rowid
FROM reportdata
WHERE id = report_id
AND rownum = 1);

Without using row_numner() you can achieve this.
Have a look at this SQL Fiddle
select r.id, r.name, d.flag, d.path from report r
inner join reportdata d
on r.id = d.report_id group by d.report_id
PS: I wasn't believing the result - I was just building the query - haven't used d.report_id in the select clause and it worked. Will be updating this answer once I get the reason why this query worked :)

Use Partition BY:
declare #Report AS table
(
id int,
name varchar(10)
)
declare #ReportData AS table
(
report_id int ,
flag bit,
path varchar(50)
)
insert into #Report values (1, 'a');
insert into #Report values (2, 'b');
insert into #Report values (3, 'c');
insert into #ReportData values (1, 0, 'xx');
insert into #ReportData values (2, 0, 'yy');
insert into #ReportData values (2, 1, 'yy');
insert into #ReportData values (3, 1, 'zz');
insert into #ReportData values (3, 1, 'mm');
;WITH T AS
(
Select
R.id,
r.name,
RD.flag,
RD.path,
ROW_NUMBER () OVER(PARTITION BY R.id ORDER BY R.id) AS PartNo
FROM #Report R
LEFT JOIN #ReportData RD ON R.id=RD.report_id
)
SELECT
T.id,
T.name,
T.flag,
T.path
FROM T WHERE T.PartNo=1

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to use GROUP BY to identify row differences - sql-server

Related

Stored procedure in SnowSQL to combine 2 tables and avoid duplicates

SQL concat integers and group them with from to

how to select and join same table in mssql [duplicate]

Delete or Select Only row with same id but have values for certain fields

Row based Condition in Left Join

Categories

Resources