Remove duplicate lines in sql server - sql-server

I have a table with the following example format:
ID Name
1 NULL
1 NULL
2 HELLO
3 NULL
3 BYE
My goal is to remove repeated lines with same IDS, but with restrictions.
According to the example, I need to remove a row with ID-1, and the row with ID-3 and with no value (NULL).
I would stick with the table:
ID Name
1 NULL
2 HELLO
3 BYE
How can I do this in sql server? thank you

To just select the data, you can use a simple CTE (common table expression);
WITH cte AS (
SELECT id, name,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY name DESC) rn
FROM myTable
)
SELECT id,name FROM cte WHERE rn=1;
An SQLfiddle to test with.
If you mean to delete the duplicates from the table and not just select the data without updating anything, you could use the same CTE;
WITH cte AS (
SELECT id, name,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY name DESC) rn
FROM myTable
)
DELETE FROM cte WHERE rn<>1;
Another SQLfiddle to test with, and remember to always back up your data before running destructive SQL statements from random people on the Internet.

Related

Create a temporary table showing the most eventful country for each year in SQL Server

I have an exercise in SQL Server: I have two tables Country and Events.
The Events table holds the event details including the city where an event happens. The table Events has a foreign key CountryID (CountryID is the primary key in table Country).
I need to create a temporary table showing the most eventful country for each year.
Any help would be appreciated
Thanks
You weren't far off with your attempt, but you need to use a CTE to aggregate your data first. I've assumed that the final order of your data is important, so I used a second CTE, rather than a TOP 1 WITH TIES tio get the final result:
WITH CTE AS(
SELECT YEAR(e.EventDate) AS YearOfEvent,
c.CountryName,
COUNT(e.CountryID) AS NumberOfEvents
FROM [dbo].[tblEvent] AS e
INNER JOIN tblCountry AS c ON e.CountryID = c.CountryID
GROUP BY e.CountryId,
c.CountryName,
YEAR(e.EventDate)),
RNs AS(
SELECT YearOfEvent,
CountryName,
NumberOfEvents,
ROW_NUMBER() OVER (PARTITION BY YearOfEvent ORDER BY CTE.NumberOfEvents DESC) AS RN
FROM CTE)
SELECT YearOfEvent,
CountryName,
NumberOfEvents
FROM RNs
WHERE RN = 1
ORDER BY RNs.YearOfEvent ASC;

How to find and delete all duplicates from SQL Server database

I'm new to SQL in general and I need to delete all duplicates in a given database.
For the moment, I use this DB to experiment some things.
The table currently looks like this :
I know I can find all duplicates using this query :
SELECT COUNT(*) AS NBR_DOUBLES, Name, Owner
FROM dbo.animals
GROUP BY Name, Owner
HAVING COUNT(*) > 1
but I have a lot of trouble finding an adapted and updated solution to not only find all the duplicates, but also delete them all, only leaving one of each.
Thanks a lot for taking some of your time to help me.
;WITH numbered AS (
SELECT ROW_NUMBER() OVER(PARTITION BY Name, Owner ORDER BY Name, Owner) AS _dupe_num
FROM dbo.Animals
)
DELETE FROM numbered WHERE _dupe_num > 1;
This will delete all but one of each occurance with the same Name & Owner, if you need it to be more specific you should extend the PARTITION BY clause. If you want it to take in account the entire record you should add all your fields.
The record left behind is currently random, since it seems you do not have any field to have any sort of ordering on.
What you want to do is use a projection that numbers each record within a given duplicate set. You can do that with a Windowing Function, like this:
SELECT Name, Owner
,Row_Number() OVER ( PARTITION BY Name, Owner ORDER BY Name, Owner, Birth) AS RowNum
FROM dbo.animals
ORDER BY Name, Owner
This should give you results like this:
Name Owner RowNum
Ecstasy Sacha 1
Ecstasy Sacha 2
Ecstasy Sacha 3
Gremlin Max 1
Gremlin Max 2
Gremlin Max 3
Outch Max 1
Outch Max 2
Outch Max 3
Now you want to convert this to a DELETE statement that has a WHERE clause targeting rows with RowNum > 1. The way to use a windowing function with a DELETE is to first include the windowing function as part of a common table expression (CTE), like this:
WITH dupes AS
(
SELECT Name, Owner,
Row_Number() OVER ( PARTITION BY Name, Owner ORDER BY Name, Owner, Birth) AS RowNum
FROM dbo.animals
)
DELETE FROM dupes WHERE RowNum > 1;
This will delete later duplicates, but leave row #1 for each group intact. The only trick now is to make sure row #1 is the correct row, since not all of your duplicates have the same values for the Birth or Death columns. This is the reason I included the Birth column in the windowing function, while other answers (so far) have not. You need to decide if you want to keep the oldest animal or the youngest, and optionally change the Birth order in the OVER clause to match your needs.
Use CTE. I will show you a sample :
Create table #Table1(Field1 varchar(100));
Insert into #Table1 values
('a'),('b'),('f'),('g'),('a'),('b');
Select * from #Table1;
WITH CTE AS(
SELECT Field1,
RN = ROW_NUMBER()OVER(PARTITION BY Field1 ORDER BY Field1)
FROM #Table1
)
--SELECT * FROM CTE WHERE RN > 1
DELETE FROM CTE WHERE RN > 1
What I am doing is, numbering the rows. If there are duplicates based on PARTITION BY columns, it will be numbered sequentially, else 1.
Then delete those records whose count is greater than 1.
I won't spoon feed you solution hence you will have to play with PARTITION BY to reach your output
output :
Select * from #Table1;
Field1
---------
a
b
f
g
a
b
/*with cte as (...) SELECT * FROM CTE;*/
Field1 RN
------- -----
a 1
a 2
b 1
b 2
f 1
g 1
if NBR_DOUBLES had an ID field, I believe you could use this;
DELETE FROM NBR_DOUBLES WHERE ID IN
(
SELECT MAX(ID)
FROM dbo.animals
GROUP BY Name, Owner
HAVING COUNT(*) > 1
)

SSIS - Filter duplicate rows

I have a table (Id, ArticleCode, StoreCode, Adress, Number) that contains duplicate entries based on only these columns [ArticleCode, StoreCode].
Currently I can filter duplicate rows using Aggregate transformation, but the problem is in the output rows I have only two columns [Article, StoreCode] and I need the other columns as well.
Just in the OLEDB Source component use SQL Command as Source instead of Table name and write the following command (as a source):
SELECT [ID]
,[ArticleCode]
,[StoreCode]
,[Address]
,[Number] FROM (
SELECT [ID]
,[ArticleCode]
,[StoreCode]
,[Address]
,[Number]
,ROW_NUMBER() OVER(PARTITION BY [ArticleCode]
,[StoreCode] ORDER BY [ArticleCode]
,[StoreCode]) AS ROWNUM
FROM [dbo].[Table_1]) AS T1
WHERE T1.ROWNUM = 1
To get rid of duplicates and select unique records by [ArticleCode, StoreCode]:
select top 1 with ties
Id ,
ArticleCode ,
StoreCode ,
Adress ,
Number
from
YourTable
order by
row_number() over(partition by ArticleCode, StoreCode order by Id)
But which of two records have to be selected when [ArticleCode, StoreCode] are equal and [Adress, Number] differ?
If Id is auto-increment then order by Id gets the first entered record, order by Id desc - the last.
You have somehow to define which [Adress, Number] pair among the duplicates is correct to be selected.

SQL Server: select all duplicate rows where col1+col2 exists more than once

I have a table which has around 300,000 rows. 225 Rows are being added to this table daily since March 16,2015 till July 09,2015
My problem is that, from last 1 week or so, some duplicate rows are being entered in the table (i.e more than 225 per day)
Now I want to select (and ultimately delete!) all the duplicate rows from the table that have more than 1 siteID+ reportID combination existing against one Date column .
Example is attached in the screenshot:
When Row_Number() is used with Partition By clause, it can provide the SQL developer to select duplicate rows in a table
Please check the SQL tutorial on how to delete duplicate rows in SQL table
Below query is what is copied from that article and applied to your requirement:
;WITH DUPLICATES AS
(
SELECT *,
RN = ROW_NUMBER() OVER (PARTITION BY siteID, ReportID ORDER BY Date)
FROM myTable
)
DELETE FROM DUPLICATES WHERE RN > 1
I hope it helps,
When you want to filter duplicated rows I suggest you this type of query:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Col1, Col2 ORDER BY Col3) As seq
FROM yourTable) dt
WHERE (seq > 1)
Like this:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY siteID, reportID, [Date] ORDER BY ID) As seq
FROM yourTable) dt
WHERE (seq > 1)

How to delete duplicates from a table and keeping only the one with highest id in sql server?

I have a table with unique ID and then some fields. I would like to delete all the dupliacte rows and keep only one, the one with highest id.
For example assuming to have a table with 3 fields: RECORD_ID, FIELD_ONE, FIELD_TWO
which is the query that allows me to delete all records that have same value for FIELD_ONE and FIELD_TWO except the one that has highest RECORD_ID?
Found:
with cte
as
(
select *, row_number() over(partition by FIELD_ONE, FIELD_TWO order by RECORD_ID desc) RowNumber
from TestTable
)
delete cte
where RowNumber > 1

Resources