SQL Server Remove Duplicates - sql-server

I have a table that Tracks Employees and the days they have spent in a policy. I don't generate this Data, it is dumped to our Server Daily.
The table looks like this:
My Goal is to get rid of the duplicates by keeping only the most recent Date.
In this example, if I run the query, I would like it to keep Rows 11 for Nicholas Morris and 14 for Tiana Sullivan.
Assumption: First name and Last Name combo are unique
So far,
This is what I have been doing:
select *
from
Employees IN(
Select ID
from Employees
group by FirstName, lastName
Having count(*) > 1)
This returns to me the rows that have duplicates and I have to manually search them and remove the ones I don't want to keep.
I am sure there is a better way of doing this
Thanks for your help

You can use a CTE and ROW_NUMBER() function to do it.
The query to get the data is:
SELECT ID, FirstName, LastName, ROW_NUMBER()
OVER (PARTITION BY FirstName, LastName ORDER BY DaysInPolicy DESC) AS Identifier
FROM
Employees
The query to remove duplicates is:
;WITH CTE AS (
SELECT ID, ROW_NUMBER()
OVER (PARTITION BY FirstName, LastName ORDER BY DaysInPolicy DESC) AS Identifier
FROM
Employees
)
DELETE E
FROM
Employees E
INNER JOIN CTE C ON C.ID = E.ID
WHERE
C.Identifier > 1

You could delete using an exists operator where you remove any row that has the same first and last name, but with a newer date:
DELETE FROM employees e1
WHERE EXISTS (SELECT *
FROM employees e2
WHERE e1.FirstName = e2.FirstName AND
e1.LastName = e2.LastName AND
e1.DaysInPolicy < e2.DaysInPolicy)

Try this:
SELECT * FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Last_Name, First_Nmae ORDER BY DaysInPolicy DESC) AS RowNum
FROM Employees
) AS Emp
WHERE Emp.RowNum > 1

Related

Error in SQLServer: Subquery returned more than 1 value

I would like to insert to Clients table data from two different tables (Surname and name). Moreover I would like to have a third column (email) that is a concatination from the first two. when i try the code hereunder it gives me the following error: "Subquery returned more than 1 value".
insert into CLIENTS (LastName,Firstname, EMAIL)
select (select top 150 Surname from Surname order by NEWID()),
(select top 150 Name from Name order by Newid()),
(select concat(concat(FisrtName, LastName),'#novaims.com') from clients);
Could you please help me understand where is the problem?
The error message is obvious your sub-query can result more than one record. Try this
;WITH cte
AS (SELECT 1 AS val
UNION ALL
SELECT val + 1
FROM cte
WHERE val < 150)
SELECT FisrtName,
LastName,
Concat(FisrtName, LastName, '#novaims.com')
FROM cte
OUTER apply (SELECT TOP 1 Surname FROM Surname ORDER BY Newid()) s (FisrtName)
OUTER apply (SELECT TOP 1 NAME FROM NAME ORDER BY Newid()) n (LastName)
Option (Maxrecursion 0)
You need to move the table references to the from clause. I think this does what you want:
insert into CLIENTS (LastName, Firstname, EMAIL)
select surname, name, concat(name, surname, '#novaims.com')
from (select Surname, row_number() over (order by newid()) as seqnum
from Surname
) s join
(select Name, row_number() over (order by newid()) as seqnum
from Name
)
on n.seqnum = s.seqnum;
Another method uses apply:
insert into CLIENTS (LastName, Firstname, EMAIL)
select top 150 s.surname, n.name, concat(n.name, s.surname, '#novaims.com')
from surname s cross apply
(select top 1 n.*
from names n
order by newid()
) n
order by newid();
This is more similar to your original idea. Do note, though, that the same name can appear more than once. And the performance should be better for the first version (because the sort is only happening once on each table).

Unique data based on name,email from sql server

I want to get unique rows based on FirstName,EmailID. I tried few things by adding DISTINCT to all row that still get duplicate rows. tried Group By that failed with error. I can do a subquery but that will be slow. WHat is the best solution for below query
SELECT FirstName,LastName,FamilyName, EmailID,Phone,City,Country,CreatedOn,t.Type , ID
FROM Forms C JOIN Form_Type T
ON c.Form_TypeID = t.Form_TypeID
WHERE c.Form_TypeID = 1 AND DATEDIFF( "d", CreatedOn, GETDATE()) < 31
ORDER BY CreatedOn DESC
See if this works for you:
SELECT *
FROM (
SELECT FirstName,LastName,FamilyName, EmailID,Phone,City,Country,CreatedOn,t.Type , ID,
ROW_NUMBER() OVER (PARTITION BY FirstName ,EmailID ORDER BY CreatedOn DESC ) NewCol
FROM Forms C
JOIN Form_Type T ON c.Form_TypeID = t.Form_TypeID
WHERE c.Form_TypeID = 1
AND DATEDIFF("d", CreatedOn, GETDATE()) < 31
) t
WHERE NewCol = 1
I have added an extra column (i.e. NewCol) in the inner table. I am assuming that you wanted to display recent record (using CREATEDON) for each combination of "FirstName, Email"
DISTINCT will not work in your case, as you want all the fields from the table. So you need to use a sub-query to create a list of distinct names/emails.
You should be able to adapt the following example to your needs:
SELECT User, EMail, Address1, Address2
FROM Table1 t1
INNER JOIN (SELECT DISTINCT(User, EMail) FROM Table1) tmp ON t1.User = tmp.User AND t1.EMail = tmp.EMail
Using an INNER JOIN this returns only rows from Table1 that are in table tmp. Table tmp is defined as the distinct combinations of User and EMail from Table1.
So what happens is: You create a distinct list of User and EMail from Table1. Then you select all the entries from Table1 where User and EMail are in that list.

Finding a recent most duplicate records from SQL Server 2012

I want to find the recent duplicate records from SQL Server 2012. Here is the table structure I have.
I have table name called UserRegistration which contains the duplicate of UserID(GUID) and in same table, I have CreatedDate Column as well (Date). Now I want to find the recent duplicate records from this table.
Here is the same data.
id FirstName LastName CreatedDate UserID
109 FirstNameA LastNameA 28-04-2015 GUID1
110 FirstNameC LastNameD 19-05-2015 GUID2
111 FirstNameE LastNameF 22-05-2015 GUID1
If you notice on above tables, GUID 1 are having the duplicate, Now I want to find the recent one means it should return me only those rows with duplication but recent data. So in above table structure, it should return me 111 because record has been created recently compared to the 109. I believe you understand.
Do let me know if you have any question. I am happy to answer. Thanks. Awaiting for the reply.
Harshal
Try the below query this should do the work based on your i/p data -
create table #UserRegistration (id int,FirstName varchar(20),LastName varchar(20),CreatedDate date,UserID varchar(20))
insert into #UserRegistration
select 109, 'FirstNameA', 'LastNameA', '2015-04-28', 'GUID1' union
select 110, 'FirstNameC', 'LastNameD', '2015-05-19', 'GUID2' union
select 111, 'FirstNameE', 'LastNameF', '2015-05-22', 'GUID1'
select id, FirstName, LastName, CreatedDate, UserID from
(SELECT ur.*,row_number() over(partition by UserID order by CreatedDate) rn
FROM #UserRegistration ur) A
where rn > 1
You could use CTE. Group your records by UserID and give your particular row a rank ordered by CreatedDate.
insert into tab(id, FirstName, LastName, CreatedDate, UserID)
values(109, 'FirstNameA', 'LastNameA', '2015-04-28', 'guid1'),
(110, 'FirstNameC', 'LastNameD', '2015-05-19', 'guid2'),
(111, 'FirstNameE', 'LastNameF', '2015-05-22', 'guid1');
with cte as
(
select id, ROW_NUMBER() over (partition by UserID order by CreatedDate asc) as [Rank],
FirstName, LastName, CreatedDate, UserID
from tab
)
select id, FirstName, LastName, CreatedDate, UserID from cte where Rank > 1
Rank > 1 condition is responsible for retrieving duplicated items.
sqlfiddle link:
http://sqlfiddle.com/#!6/4d1f2/6
Solved this by using tmp-tables:
SELECT a.UserID,
MAX(a.CreatedDate) As CreatedDate
INTO #latest
FROM <your table> a
GROUP BY a.UserID
HAVING COUNT(a.UserID) > 1
SELECT b.id
FROM #latest a
INNER JOIN <your table> b ON a.UserID = b.UserID AND a.CreatedDate = b.CreatedDate
try this,
SELECT * FROM TableName tt WHERE
exists(select MAX(createdDate)
from TableName
where tt.UserID = UserID
group by UserID
having MAX(createdDate)= tt.createdDate)
I think your createddate field is not a date field, then try Format
WITH TempAns (id,UserID,duplicateRecordCount)
AS
(
SELECT id,
UserID,
ROW_NUMBER()OVER(partition by UserID ORDER BY id)
AS duplicateRecordCount
FROM #t
)
select * from #t where id in (
select max(id )
from TempAns
where duplicateRecordCount > 1
group by name )
You'd rank your records with ROW_NUMBER() to give all last records per userid #1. With COUNT() you make sure only to get the userids having more than one record.
select
id, firstname, lastname, createddate, userid
from
(
select
id, firstname, lastname, createddate, userid,
row_number() over (partition by userid oder by createddate desc) as rn,
count(*) over (partition by userid) as cnt
from userregistration
) ranked
where rn = 1 -- only last one
and cnt > 1; -- but only if there is more than one record for the userid
This gets the latest record for every userid that has duplicates.

Select Maximum, using filelds from 2 tables

I have a database Library, which has a lot of tables and we need 3 tables for query:
Table Librarians: ID, Surname;
Table StudentCard: ID, foreign key on table Librarians and other columns which we don't use
Table TeacherCard: ID, foreign key on table Librarians and other columns which we don't use
Query: select the librarian's surname, which gave the most count of books.
I know, how to resolve, when I took data only from one table, e. g. TeacherCard
SELECT TOP 1 WITH TIES
Librarians.LastName, MAX(Librarians.CountOfBooks) AS Books
FROM
(SELECT
L.LastName, COUNT(*) AS CountOfBooks
FROM Libs L, T_Cards T
WHERE T.Id_Lib IN (SELECT L.Id)
GROUP BY L.LastName) AS Librarians
GROUP BY
Librarians.LastName
ORDER BY
MAX(Librarians.CountOfBooks) DESC
GO
I don't know, how to use data from TeacherCard and from StudetnCard at the same time.
Please, help to write this query.
I have a right resolving !!!!
SELECT TOP 1 B.Name, B.CountOut
FROM
(SELECT A.Name, SUM(A.Count) AS CountOut
FROM
(SELECT Libs.LastName AS Name, COUNT(S_Cards.DateOut) AS [Count]
FROM Libs JOIN S_Cards ON S_Cards.Id_Lib = Libs.Id
GROUP BY Libs.LastName
UNION ALL
SELECT Libs.LastName AS Name, COUNT(T_Cards.DateOut) AS [Count]
FROM Libs JOIN T_Cards ON T_Cards.Id_Lib = Libs.Id
GROUP BY Libs.LastName) AS A
GROUP BY A.Name ) AS B
ORDER BY B.CountOut DESC
I have another right answer:
SELECT TOP 2 LastName, COUNT (*) [count] FROM
(SELECT LastName FROM Libs L, S_Cards S
WHERE S.id_lib = L.id
UNION ALL
SELECT LastName FROM Libs L, T_Cards T
WHERE T.id_lib = L.id) As Res
GROUP By LastName
ORDER BY COUNT (*) DESC

Finding Duplicate Data in Oracle

I have a table with 500,000+ records, and fields for ID, first name, last name, and email address. What I'm trying to do is find rows where the first name AND last name are both duplicates (as in the same person has two separate IDs, email addresses, or whatever, they're in the table more than once). I think I know how to find the duplicates using GROUP BY, this is what I have:
SELECT first_name, last_name, COUNT(*)
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
The problem is that I need to then move the entire row with these duplicated names into a different table. Is there a way to find the duplicates and get the whole row? Or at least to get the IDs as well? I tried using a self-join, but got back more rows than were in the table to begin with. Would that be a better approach? Any help would be greatly appreciated.
The most effective way to remove duplicate rows is with a self-join:
DELETE FROM person_table a
WHERE a.rowid >
ANY (SELECT b.rowid
FROM person_table b
WHERE a.first_name = b.first_name
AND a.last_name = b.last_name);
This will remove all duplicates even if there are more than one duplicate row.
There is more on removing duplicates and differing methods here: http://www.dba-oracle.com/t_delete_duplicate_table_rows.htm
Hope it helps...
EDIT: As per your comments, if you want to select all but one of the duplicates then
SELECT *
FROM person_table a
WHERE a.rowid >
ANY (SELECT b.rowid
FROM person_table b
WHERE a.first_name = b.first_name
AND a.last_name = b.last_name);
An index on (first_name, last_name) or on (last_name, first_name) would help:
SELECT t.*
FROM
person_table t
JOIN
( SELECT first_name, last_name
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
) dup
ON dup.last_name = t.last_name
AND dup.first_name = t.first_name
or:
SELECT t.*
FROM person_table t
WHERE EXISTS
( SELECT *
FROM person_table dup
WHERE dup.last_name = t.last_name
AND dup.first_name = t.first_name
AND dup.ID <> t.ID
)
This will give you an ID you want to move/delete/etc. Note that it does not work if count(*) > 2, as you get only 1 ID (you could re-run your query for these cases).
SELECT max(ID), first_name, last_name, COUNT(*)
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
Edit: You can use COLLECT to get all IDs at once (but be careful, as you only want to move/delete all but one)
To add another option, I usually use this one to remove duplicates:
delete from person_table
where rowid in (select rid
from (select rowid rid, row_number() over
(partition by first_name,last_name order by rowid) rn
from person_table
)
where rn <> 1 )

Resources