How to write a SQL script that deletes duplicate posts

How to write a SQL script that deletes duplicate posts - sql-server

I have a table with these columns:
id (pk, int identity), imei (varchar), name (varchar), lastconnected (datetime)
Some of the entries in this table have the same name and imei, but different id and different lastconnected date.
How can I effectively filter out all entries that have duplicates (with a SQL script), and then delete the one with the latest lastconnected date?

A simple ROW_NUMBER and DELETE should do the trick:
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY imei, [name] ORDER BY lastconnected DESC)
FROM dbo.YourTable
)
DELETE FROM CTE
WHERE RN = 1;

This is easy and will solve your problem
DECLARE #table TABLE
(
id int,
name varchar(10),
imei varchar(10)
)
insert into #table select 1, 'a','a'
insert into #table select 2, 'b','a'
insert into #table select 3, 'c','a'
insert into #table select 4, 'a','a'
insert into #table select 5, 'c','a'
insert into #table select 6, 'a','a'
insert into #table select 7, 'c','a'
insert into #table select 8, 'a','a'
WHILE (exists (select '' from #table group by name , imei having count(*) > 1))
BEGIN
delete from #table where id in (
select max(id) from #table group by imei , name having count(*) > 1)
End
select * from #table

My first instinct is to use RANK(). This will delete all duplicates, not just the most recent, in cases where things are duplicated multiple times.
delete a
from (
select id, imei, name, lastconnected, RANK() over(partition by imei, name order by lastconnected) as [rank] from #temp
) as a
where a.rank>1

It selects the maximum of the date for each combination of name and iemi and then deletes that particular row.
DELETE FROM yourtablee
WHERE (lastconnecteddate,name,imei) in
(SELECT max(lastconnecteddate), name,imei
FROM yourtable
GROUP BY name,imei)

Related

T-SQL - Customer Linking

Please run the below code, these are all the same Customer because 2 of them have the same TaxNumber while another one matches one based on CompanyName. I need to link them all and set the ParentCompanyID based on who was created first. I am struggling to get them linked.
CREATE TABLE #Temp
(
CustomerID INT,
CustomerName VARCHAR(20),
CustomerTaxNumber INT,
CreatedDate DATE
)
INSERT INTO #Temp
VALUES (8, 'Company PTY',1234, '2019-09-20'),
(2, 'Company PT', 1234, '2019-09-24'),
(3, 'Company PTY',NULL, '2019-09-29')
SELECT * FROM #Temp
Below is the result that I require....
Any help will be appreciated.

Using case expression with first_value can give you the desired results:
SELECT CustomerID, CustomerName, CustomerTaxNumber, CreatedDate,
CASE WHEN CustomerTaxNumber IS NULL THEN
FIRST_VALUE(CustomerID) OVER(PARTITION BY CustomerName ORDER BY CreatedDate)
ELSE
FIRST_VALUE(CustomerID) OVER(PARTITION BY CustomerTaxNumber ORDER BY CreatedDate)
END As ParentCompanyID
FROM #Temp

Try this:
CREATE TABLE #Temp
(
CustomerID INT,
CustomerName VARCHAR(20),
CustomerTaxNumber INT,
CreatedDate DATE
)
INSERT INTO #Temp
VALUES (8, 'Company PTY',1234, '2019-09-20'),
(2, 'Company PT', 1234, '2019-09-24'),
(3, 'Company PTY',NULL, '2019-09-29')
SELECT DS.[CreatedDate] AS [FirstEntry]
,DS.[CustomerID] AS [ParentCompanyID]
,#Temp.*
FROM #Temp
CROSS APPLY
(
SELECT TOP 1 *
FROM #Temp
ORDER BY CreatedDate
) DS
DROP TABLE #Temp
You are condition is pretty simple - get the first record. If you need to group the records in some way, you can add additional filtering in the CROSS APPLY clause.

What is the optimal way to get only latest ID's from table in SQL

I'm trying to get only a single row per Appointment Number in a table storing a history of appointments. It works fine with a few rows but then gets slower? Is this the best way to do this kind of check and I'm just missing some indexes or is there a better way?
DECLARE #temptable TABLE
(
id INT PRIMARY KEY NOT NULL
, ApptNumber INT NOT NULL
, ApptDate DATE NOT NULL
, Notes VARCHAR(50) NULL
)
INSERT INTO #temptable VALUES (1,1,'01-DEC-2018','First Appointment')
INSERT INTO #temptable VALUES (2,1,'01-DEC-2018','')
INSERT INTO #temptable VALUES (3,1,'01-DEC-2018','Rescheduled')
INSERT INTO #temptable VALUES (4,2,'02-DEC-2018','Second Appointment')
INSERT INTO #temptable VALUES (5,2,'02-DEC-2018','Cancelled')
INSERT INTO #temptable VALUES (6,3,'03-DEC-2018','Third Appointment')
INSERT INTO #temptable VALUES (7,4,'04-DEC-2018','Fourth Appointment')
SELECT * FROM #temptable
SELECT MAX(id) FROM #temptable GROUP BY ApptNumber
SELECT tt.* FROM #temptable tt
INNER JOIN (SELECT MAX(id) [Id] FROM #temptable GROUP BY ApptNumber) appts ON appts.Id = tt.id

Solution 1:
select * from (
SELECT f1.*, row_number() over(partition by ApptNumber order by id desc ) rang FROM #temptable f1
) tmp where rang=1

Solution 2:
with tmp as (
select ApptNumber, max(ID) MaxID
from #temptable
group by ApptNumber
)
select f1.* from #temptable f1 inner join tmp f2 on f1.ID=f2.MaxID

Solution 3:
select distinct f3.* from #temptable f1
cross apply
(
select top 1 * from #temptable f2
where f1.ApptNumber=f2.ApptNumber
order by f2.ID desc
) f3

Window function
SELECT tt.*
FROM (
SELECT *, row_number() over (partition by ApptNumber order by id desc) as rn
) tt
where tt.rn = 1

Filling the ID column of a table NOT using a cursor

Tables have been created and used without and ID column, but ID column is now needed. (classic)
I heard everything could be done without cursors. I just need every row to contain a different int value so I was looking for some kind of row number function :
How do I use ROW_NUMBER()?
I can't tell exactly how to use it even with these exemples.
UPDATE [TableA]
SET [id] = (select ROW_NUMBER() over (order by id) from [TableA])
Subquery returned more than 1 value.
So... yes of course it return more than one value. Then how to mix both update and row number to get that column filled ?
PS. I don't need a precise order, just unique values. I also wonder if ROW_NUMBER() is appropriate in this situation...

You can use a CTE for the update
Example
Declare #TableA table (ID int,SomeCol varchar(50))
Insert Into #TableA values
(null,'Dog')
,(null,'Cat')
,(null,'Monkey')
;with cte as (
Select *
,RN = Row_Number() over(Order by (Select null))
From #TableA
)
Update cte set ID=RN
Select * from #TableA
Updated Table
ID SomeCol
1 Dog
2 Cat
3 Monkey

You can use a subquery too as
Declare #TableA table (ID int,SomeCol varchar(50))
Insert Into #TableA values
(null,'Dog')
,(null,'Cat')
,(null,'Monkey');
UPDATE T1
SET T1.ID = T2.RN
FROM #TableA T1 JOIN
(
SELECT ROW_NUMBER()OVER(ORDER BY (SELECT 1)) RN,
*
FROM #TableA
) T2
ON T1.SomeCol = T2.SomeCol;
Select * from #TableA

Select Duplicate Records

I want to retrieve only Duplicated records not unique records.
Suppose I have data which consists of as below
Ids Names
1 A
2 B
1 A
I want like output like the following:
Sno Id Name
1 1 A
2 1 A

Try this:
DECLARE #DataSource TABLE
(
[ID] INT
,[name] CHAR(1)
,[value] CHAR(2)
);
INSERT INTO #DataSource ([ID], [name], [value])
VALUES (1, 'A', 'x1')
,(2, 'B', 'x2')
,(1, 'A', 'x3');
WITH DataSource AS
(
SELECT *
,COUNT(*) OVER (PARTITION BY [ID], [name]) AS [Count]
FROM #DataSource
)
SELECT *
FROM Datasource
WHERE [Count] > 1;
The grouping part is done in the PARTITION BY part of the window function. So, basically, we are counting records for each unique ID - name pairs. Of couse, you are able to add more columns columns here.

SELECT Id, Names
FROM T
GROUP BY Id,Name
HAVING COUNT(*) >1

like your request, you need to create a new column [SNo] that is partitioned on the orignal columns (Names, Id). Those with [SNo] >1 are duplicates. To Filter, just get RCount>1.
See a mockup below:
DECLARE #Records TABLE (Id int, Names VARCHAR(10))
INSERT INTO #Records
SELECT 1, 'A' UNION ALL
SELECT 2, 'B' UNION ALL
SELECT 1, 'A'
----To Get Duplicates -----
SELECT *
FROM
(
SELECT
SNo=ROW_NUMBER()over(PARTITION BY Names,Id order by Id),
RCount=COUNT(*) OVER (PARTITION BY [ID], Names),
*
FROM
#Records
)M
WHERE
RCount>1

SQL:How to keep inserting record number in the target table in INSERT INTO statement

I have query like this:
declare #guidd nvarchar(10)
set #guidd = '11233'
create table rrr_temp(value nvarchar(10), value2 int)
create table rrr_tempA(valueA nvarchar(10), guidd nvarchar(10), ranks int)
insert into rrr_temp values('AAA', 200)
insert into rrr_temp values ('BBB', 400)
insert into rrr_temp values ('CCC', 300)
INSERT INTO rrr_tempA(valueA , guidd , ranks )
SELECT RT.value, #guidd , row_number() over (order by (select NULL))
FROM rrr_temp(nolock) RT
INNER JOIN
(SELECT value, min(value2) AS lastLeg
FROM rrr_temp(nolock) RTL
GROUP BY value) GrpRoute
ON RT.value = GrpRoute.value
ORDER BY value2
select * from rrr_tempA
With the above INSERT iNTO statement, i am able to insert only the record number of source table(rrr_temp) for 'ranks' column of Target table by using 'row_number() over (order by (select NULL))'. But, i want the number to be incremented when target table got inserted. i cannot use IDENTITY. Thanks.

Are you asking about something like this?
select #max_rank = max(ranks)
from rrr_tempA
set #max_rank = IsNull(#max_rank, 0)
INSERT INTO rrr_tempA(valueA , guidd , ranks )
SELECT RT.value, #guidd , #max_rank + row_number() over (order by (select NULL))

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to write a SQL script that deletes duplicate posts - sql-server

A simple ROW_NUMBER and DELETE should do the trick: WITH CTE AS ( SELECT *, RN = ROW_NUMBER() OVER(PARTITION BY imei, [name] ORDER BY lastconnected DESC) FROM dbo.YourTable ) DELETE FROM CTE WHERE RN = 1;

It selects the maximum of the date for each combination of name and iemi and then deletes that particular row. DELETE FROM yourtablee WHERE (lastconnecteddate,name,imei) in (SELECT max(lastconnecteddate), name,imei FROM yourtable GROUP BY name,imei)

Related

T-SQL - Customer Linking

What is the optimal way to get only latest ID's from table in SQL

Filling the ID column of a table NOT using a cursor

Select Duplicate Records

SQL:How to keep inserting record number in the target table in INSERT INTO statement

Categories

Resources