How to select from duplicate rows from a table?

How to select from duplicate rows from a table? - sql-server

I have the following table:
CREATE TABLE TEST(ID TINYINT NULL, COL1 CHAR(1))
INSERT INTO TEST(ID,COL1) VALUES (1,'A')
INSERT INTO TEST(ID,COL1) VALUES (2,'B')
INSERT INTO TEST(ID,COL1) VALUES (1,'A')
INSERT INTO TEST(ID,COL1) VALUES (1,'B')
INSERT INTO TEST(ID,COL1) VALUES (1,'B')
INSERT INTO TEST(ID,COL1) VALUES (2,'B')
I would like to select duplicate rows from that table. How can I select them?
I tried the following:
SELECT TEST.ID,TEST.COL1
FROM TEST WHERE TEST.ID IN
(SELECT ID
FROM TEST WHERE TEST.COL1 IN
(SELECT COL1
FROM TEST WHERE TEST.ID IN
(SELECT ID
FROM TEST
GROUP BY ID
HAVING COUNT(*) > 1)
GROUP BY COL1
HAVING COUNT(*) > 1)
GROUP BY ID
HAVING COUNT(*) > 1)
Where's the error? What do I need to modify?
And I would like it to show as:
ID COL1
---- ----
1 A
1 A
1 B
1 B
(4 row(s) affected)

SELECT id, col1
FROM Test
GROUP BY id, col1
HAVING COUNT(*) > 1
when you use
SELECT id, col1, COUNT(*) AS cnt
FROM Test
GROUP BY id, col1
HAVING COUNT(*) > 1
you practically have all duplicate rows and how often they appear. You can't identify them individually either way.
A slower way would be:
SELECT id, col1
FROM Test T
WHERE (SELECT COUNT(*)
FROM Test I
WHERE I.id = T.id AND I.col1 = T.col1) > 1

Using Sql Server 2005+ and CTE you could try
;WITH Dups AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY ID, Col1 ORDER BY ID, Col1) Rnum
FROM #TEST t
)
SELECT *
FROM Dups
WHERE Rnum > 1
OR just a standard
SELECT ID,
Col1,
COUNT(1) Cnt
FROM #TEST
GROUP BY ID,
Col1
HAVING COUNT(1) > 1
EDIT:
Display duplicate rows
SELECT t.*
FROM #Test t INNER JOIN
(
SELECT ID,
Col1,
COUNT(1) Cnt
FROM #TEST
GROUP BY ID,
Col1
HAVING COUNT(1) > 1
) dups ON t.ID = dups.ID
AND t.Col1 = dups.Col1

Every row in that set of data is a duplicate
select id, col1, count(*)
from test
group by id, col1
shows this
if you want to exclude the 2,B rows you need to do it explicitly
eg
SELECT id, col1
FROM Test
WHERE NOT (id = 2 and col1 = 'B')

SELECT t.*
FROM TEST t
INNER JOIN (
SELECT ID,COL1
from test
GROUP BY ID,COL1
HAVING COUNT(*) > 1
)
AS t2
ON t2.ID = t.ID AND t2.COL1 =t.COL1
order by t.ID,t.COL1

Related

Replacing a single column with randomly selected values from another table

I found some solutions to replace (below example) #test.col2 with data from #test2.src. But in the result it just selects a single random value and replaces them all with it. How to fix? Thanks!
#test (the target table)
col1 col2
-------------
A 1
B 2
C 3
D 4
E 5
#test2 (the source table)
src1
sample1
sample2
sample3
Query:
UPDATE #test
SET col1 = data1.LastName
FROM #test
CROSS APPLY
(SELECT TOP(1) #test2.LastName
FROM #test2
ORDER BY NEWID()) data1
Example result:
col1 col2
----------------
A sample2
B sample2
C sample2
D sample2
E sample2

Here is one way to tackle this. It is using ROW_NUMBER in a cte to "randomize" the values.
if OBJECT_ID('tempdb..#test') is not null
drop table #test;
create table #test
(
col1 varchar(20)
, col2 int
);
insert #test
select 'A', 1 union all
select 'B', 2 union all
select 'C', 3 union all
select 'D', 4 union all
select 'E', 5;
if OBJECT_ID('tempdb..#test2') is not null
drop table #test2;
create table #test2
(
LastName varchar(20)
);
insert #test2
select 'src1' union all
select 'sample1' union all
select 'sample2' union all
select 'sample3';
--here is the data before any updates
select * from #test;
with t1 as
(
select col1
, col2
, RowNum = ROW_NUMBER() over(order by newid())
from #test
)
, t2 as
(
select LastName
, RowNum = ROW_NUMBER() over(order by newid())
from #test2
)
update t
set col1 = t2.LastName
from t1
join t2 on t1.RowNum = t2.RowNum
join #test t on t.col1 = t1.col1
--we now have updated with a "random" row
select * from #test;

Deactivate duplicate data based on two group by columns with check not null

I have this following query to return duplicate rows by 2 columns but it returns rows with NULL values. I would like to return rows with NOT NULL for both columns. Also how can I deactivate duplicate record with old ID?
With deactivaterows as (
Select t.*, count(*) over (partition by col1, col2) as cnt from TABLE1 t)
Select * from deactivaterows where cnt > 1;

You want row_number not count(*):
with deactivaterows as
(
select t.*, row_number() over(partition by t.col1, t.col2 order by (select null)) as rn
from TABLE1
where t.col1 is not null and t.col2 is not null
)
select * from deactivaterows where rn > 1

I would do it this way:
select col1, col2, count(*)
from TABLE1
where col1 is not null and col2 is not null
group by col1, col2
having count(*) > 1
The having command is great, it filters results after the group by and the select. You can always throw in a where regardless, and this will compare before you start grouping, counting or selecting anything.
Alternatively, you can certainly modify your existing solution by throwing in a where clause without issue:
With deactivaterows as (
Select t.*, count(*) over (partition by col1, col2) as cnt from TABLE1 t
where col1 is not null and col2 is not null)
Select * from deactivaterows where cnt > 1;

Unexpected result using CTE to perform a random join on two tables for all rows one-to-many

I am attempting to randomly join the rows of two tables (TableA and TableB) such that each row in TableA is joined to only one row in TableB and every row in TableB is joined to at least one row in TableA.
For example, a random join of TableA with 5 distinct rows and TableB with 3 distinct rows should result in something like this:
TableA TableB
1 3
2 1
3 1
4 2
5 1
However, sometimes not all the rows from TableB are included in the final result; so in the example above might have row 2 from TableB missing because in its place is either row 1 or 3 joined to row 4 on TableA. You can see this occur by executing the script a number of times and checking the result. It seems that it is necessary for some reason to use an interim table (#Q) to be able to ensure that a correct result is returned which has all rows from both TableA and TableB.
Can someone please explain why this is happening?
Also, can someone please advise on what would be a better way to get the desired result?
I understand that sometimes no result is returned due to a failure of some kind in the cross apply and ordering which i have yet to identify and goes to the point that I am sure there is a better way to perform this operation. I hope that makes sense. Thanks in advance!
declare #TableA table (
ID int
);
declare #TableB table (
ID int
);
declare #Q table (
RN int,
TableAID int,
TableBID int
);
with cte as (
select
1 as ID
union all
select
ID + 1
from cte
where ID < 5
)
insert #TableA (ID)
select ID from cte;
with cte as (
select
1 as ID
union all
select
ID + 1
from cte
where ID < 3
)
insert #TableB (ID)
select ID from cte;
select * from #TableA;
select * from #TableB;
with cte as (
select
row_number() over (partition by TableAID order by newid()) as RN,
TableAID,
TableBID
from (
select
a.ID as TableAID,
b.ID as TableBID
from #TableA as a
cross apply #TableB as b
) as M
)
select --All rows from TableB not always included
TableAID,
TableBID
from cte
where RN in (
select
top 1
iCTE.RN
from cte as iCTE
group by iCTE.RN
having count(distinct iCTE.TableBID) = (
select count(1) from #TableB
)
)
order by TableAID;
with cte as (
select
row_number() over (partition by TableAID order by newid()) as RN,
TableAID,
TableBID
from (
select
a.ID as TableAID,
b.ID as TableBID
from #TableA as a
cross apply #TableB as b
) as M
)
insert #Q
select
RN,
TableAID,
TableBID
from cte;
select * from #Q;
select --All rows from both TableA and TableB included
TableAID,
TableBID
from #Q
where RN in (
select
top 1
iQ.RN
from #Q as iQ
group by iQ.RN
having count(distinct iQ.TableBID) = (
select count(1) from #TableB
)
)
order by TableAID;

See if this gives you what you're looking for...
DECLARE
#CountA INT = (SELECT COUNT(*) FROM #TableA ta),
#CountB INT = (SELECT COUNT(*) FROM #TableB tb),
#MinCount INT;
SELECT #MinCount = CASE WHEN #CountA < #CountB THEN #CountA ELSE #CountB END;
WITH
cte_A1 AS (
SELECT
*,
rn = ROW_NUMBER() OVER (ORDER BY NEWID())
FROM
#TableA ta
),
cte_B1 AS (
SELECT
*,
rn = ROW_NUMBER() OVER (ORDER BY NEWID())
FROM
#TableB tb
),
cte_A2 AS (
SELECT
a1.ID,
rn = CASE WHEN a1.rn > #MinCount THEN a1.rn - #MinCount ELSE a1.rn end
FROM
cte_A1 a1
),
cte_B2 AS (
SELECT
b1.ID,
rn = CASE WHEN b1.rn > #MinCount THEN b1.rn - #MinCount ELSE b1.rn end
FROM
cte_B1 b1
)
SELECT
A = a.ID,
B = b.ID
FROM
cte_A2 a
JOIN cte_B2 b
ON a.rn = b.rn;

Two columns into one using alternate rows

I have a table with two columns like this:
A 1
B 2
C 3
D 4
E 5
etc.
I want to get them into one column, with each column's data in alternate rows of the new column like this:
A
1
B
2
C
3
D
4
E
5
etc.

I would use a UNION ALL but here is the UNPIVOT alternative:
CREATE TABLE #Table1(letter VARCHAR(10),Id VARCHAR(10))
INSERT INTO #Table1(letter ,Id )
SELECT 'A',1 UNION ALL
SELECT 'B',2 UNION ALL
SELECT 'C',3 UNION ALL
SELECT 'D',4 UNION ALL
SELECT 'E',5
SELECT [value]
FROM #Table1
UNPIVOT
(
[value] FOR [Column] IN ([Id], [letter])
) UNPVT
DROP TABLE #Table1;

The tricky part is the data in alternate rows
select col2
from
( select col1, 1 as flag, col1 from tab
union all
select col1, 2, col2 from tab
) as dt
order by col1, flag
But why do you try to do this at all?

Try this :
WITH CTE AS
(
SELECT Col1Name Name,(ROW_NUMBER() OVER(ORDER BY(SELECT NULL))) RN
FROM TableName
UNION
SELECT Col2Name Name,(ROW_NUMBER() OVER(ORDER BY(SELECT NULL)))+1 RN
FROM TableName
)
SELECT Name
FROM CTE
ORDER BY RN

CREATE TABLE #Table1(Value VARCHAR(10),Id VARCHAR(10))
INSERT INTO #Table1(Value ,Id )
SELECT 'A',1 UNION ALL
SELECT 'B',2 UNION ALL
SELECT 'C',3 UNION ALL
SELECT 'D',4 UNION ALL
SELECT 'E',5
;WITH _CTE (Name) AS
(
SELECT Value [Name]
FROM #Table1
UNION ALL
SELECT Id [Name]
FROM #Table1
)
SELECT * FROM _CTE

How to find the maximum value in join without using if in sql stored procedure

I have a two tables like below
A
Id Name
1 a
2 b
B
Id Name
1 t
6 s
My requirement is to find the maximum id from table and display the name and id for that maximum without using case and if.
i findout the maximum by using below query
SELECT MAX(id)
FROM (SELECT id,name FROM A
UNION
SELECT id,name FROM B) as c
I findout the maximum 6 using the above query.but i can't able to find the name.I tried the below query but it's not working
SELECT MAX(id)
FROM (SELECT id,name FROM A
UNION
SELECT id,name FROM B) as c
How to find the name?
Any help will be greatly appreciated!!!

First combine the tables, since you need to search both. Next, determine the id you need. JOIN the id back with the temporarily created table to retreive the name that belongs to that id:
WITH tmpTable AS (
SELECT id,name FROM A
UNION
SELECT id,name FROM B
)
, max AS (
SELECT MAX(id) id
FROM tmpTable
)
SELECT t.id, t.name
FROM max m
JOIN tmpTable t ON m.id = t.id

You could use ROW_NUMBER(). You have to UNION ALL TableA and TableB first.
WITH TableA(Id, Name) AS(
SELECT 1, 'a' UNION ALL
SELECT 2, 'b'
)
,TableB(Id, Name) AS(
SELECT 1, 't' UNION ALL
SELECT 6, 's'
)
,Combined(Id, Name) AS(
SELECT * FROM TableA UNION ALL
SELECT * FROM TableB
)
,CTE AS(
SELECT *, RN = ROW_NUMBER() OVER(ORDER BY ID DESC) FROM Combined
)
SELECT Id, Name
FROM CTE
WHERE RN = 1

Just order by over the union and take first row:
SELECT TOP 1 * FROM (SELECT * FROM A UNION SELECT * FROM B) x
ORDER BY ID DESC
This won't show ties though.

For you stated that you used SQL Server 2008. Therefore,I used FULL JOIN and NESTED SELECT to get what your looking for. See below:
SELECT
(SELECT
1,
ISNULL(A.Id,B.Id)Id
FROM A FULL JOIN B ON A.Id=B.Id) AS Id,
(SELECT
1,
ISNULL(A.Name,B.Name)Name
FROM A FULL JOIN B ON A.Id=B.Id) AS Name

It's possible to use ROW_NUMBER() or DENSE_RANK() functions to get new numiration by Id, and then select value with newly created orderId equal to 1
Use:
ROW_NUMBER() to get only one value (even if there are some repetitions of max id)
DENSE_RANK() to get all equal max id values
Here is an example:
DECLARE #tb1 AS TABLE
(
Id INT
,[Name] NVARCHAR(255)
)
DECLARE #tb2 AS TABLE
(
Id INT
,[Name] NVARCHAR(255)
)
INSERT INTO #tb1 VALUES (1, 'A');
INSERT INTO #tb1 VALUES (7, 'B');
INSERT INTO #tb2 VALUES (4, 'C');
INSERT INTO #tb1 VALUES (7, 'D');
SELECT * FROM
(SELECT Id, Name, ROW_NUMBER() OVER (ORDER BY Id DESC) AS [orderId]
FROM
(SELECT Id, Name FROM #tb1
UNION
SELECT Id, Name FROM #tb2) as tb3) AS TB
WHERE [orderId] = 1
SELECT * FROM
(SELECT Id, Name, DENSE_RANK() OVER (ORDER BY Id DESC) AS [orderId]
FROM
(SELECT Id, Name FROM #tb1
UNION
SELECT Id, Name FROM #tb2) as tb3) AS TB
WHERE [orderId] = 1
Results are:

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to select from duplicate rows from a table? - sql-server

Every row in that set of data is a duplicate select id, col1, count(*) from test group by id, col1 shows this if you want to exclude the 2,B rows you need to do it explicitly eg SELECT id, col1 FROM Test WHERE NOT (id = 2 and col1 = 'B')

SELECT t.* FROM TEST t INNER JOIN ( SELECT ID,COL1 from test GROUP BY ID,COL1 HAVING COUNT(*) > 1 ) AS t2 ON t2.ID = t.ID AND t2.COL1 =t.COL1 order by t.ID,t.COL1

Related

Replacing a single column with randomly selected values from another table

Deactivate duplicate data based on two group by columns with check not null

Unexpected result using CTE to perform a random join on two tables for all rows one-to-many

Two columns into one using alternate rows

How to find the maximum value in join without using if in sql stored procedure

Categories

Resources