Unique data based on name,email from sql server - sql-server

I want to get unique rows based on FirstName,EmailID. I tried few things by adding DISTINCT to all row that still get duplicate rows. tried Group By that failed with error. I can do a subquery but that will be slow. WHat is the best solution for below query
SELECT FirstName,LastName,FamilyName, EmailID,Phone,City,Country,CreatedOn,t.Type , ID
FROM Forms C JOIN Form_Type T
ON c.Form_TypeID = t.Form_TypeID
WHERE c.Form_TypeID = 1 AND DATEDIFF( "d", CreatedOn, GETDATE()) < 31
ORDER BY CreatedOn DESC

See if this works for you:
SELECT *
FROM (
SELECT FirstName,LastName,FamilyName, EmailID,Phone,City,Country,CreatedOn,t.Type , ID,
ROW_NUMBER() OVER (PARTITION BY FirstName ,EmailID ORDER BY CreatedOn DESC ) NewCol
FROM Forms C
JOIN Form_Type T ON c.Form_TypeID = t.Form_TypeID
WHERE c.Form_TypeID = 1
AND DATEDIFF("d", CreatedOn, GETDATE()) < 31
) t
WHERE NewCol = 1
I have added an extra column (i.e. NewCol) in the inner table. I am assuming that you wanted to display recent record (using CREATEDON) for each combination of "FirstName, Email"

DISTINCT will not work in your case, as you want all the fields from the table. So you need to use a sub-query to create a list of distinct names/emails.
You should be able to adapt the following example to your needs:
SELECT User, EMail, Address1, Address2
FROM Table1 t1
INNER JOIN (SELECT DISTINCT(User, EMail) FROM Table1) tmp ON t1.User = tmp.User AND t1.EMail = tmp.EMail
Using an INNER JOIN this returns only rows from Table1 that are in table tmp. Table tmp is defined as the distinct combinations of User and EMail from Table1.
So what happens is: You create a distinct list of User and EMail from Table1. Then you select all the entries from Table1 where User and EMail are in that list.

Related

SQL Server Remove Duplicates

I have a table that Tracks Employees and the days they have spent in a policy. I don't generate this Data, it is dumped to our Server Daily.
The table looks like this:
My Goal is to get rid of the duplicates by keeping only the most recent Date.
In this example, if I run the query, I would like it to keep Rows 11 for Nicholas Morris and 14 for Tiana Sullivan.
Assumption: First name and Last Name combo are unique
So far,
This is what I have been doing:
select *
from
Employees IN(
Select ID
from Employees
group by FirstName, lastName
Having count(*) > 1)
This returns to me the rows that have duplicates and I have to manually search them and remove the ones I don't want to keep.
I am sure there is a better way of doing this
Thanks for your help
You can use a CTE and ROW_NUMBER() function to do it.
The query to get the data is:
SELECT ID, FirstName, LastName, ROW_NUMBER()
OVER (PARTITION BY FirstName, LastName ORDER BY DaysInPolicy DESC) AS Identifier
FROM
Employees
The query to remove duplicates is:
;WITH CTE AS (
SELECT ID, ROW_NUMBER()
OVER (PARTITION BY FirstName, LastName ORDER BY DaysInPolicy DESC) AS Identifier
FROM
Employees
)
DELETE E
FROM
Employees E
INNER JOIN CTE C ON C.ID = E.ID
WHERE
C.Identifier > 1
You could delete using an exists operator where you remove any row that has the same first and last name, but with a newer date:
DELETE FROM employees e1
WHERE EXISTS (SELECT *
FROM employees e2
WHERE e1.FirstName = e2.FirstName AND
e1.LastName = e2.LastName AND
e1.DaysInPolicy < e2.DaysInPolicy)
Try this:
SELECT * FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Last_Name, First_Nmae ORDER BY DaysInPolicy DESC) AS RowNum
FROM Employees
) AS Emp
WHERE Emp.RowNum > 1

SQL - Return a value sum only once when grouped

I want to count the unique record of a string but grouping by dates, and if the string already appeared previously on a group it shouldn't be counted anymore.
I've tried using distinct and it does show the unique count of the record but the record is counted again on every month.
Actual and minified SQL query:
select
date,
count(distinct d.name) as count
from ...
group by date
Sample and desired output
Image
Grab unique names and tag them with the earliest date. At that point it's just a matter of regrouping the resulting rows by date. Each name will uniquely correspond to only one date as desired:
with data as (select name, min("date") as dt from T group by name)
select dt, count(name) as cnt from data group by dt;
If you still need to see the original dates even when no names are counted, then flag each row according to whether it should be counted and then count the flags per date:
with data as (
select *,
case when "date" = min("date") over (partition by name)
then 1 end as flag
from T
)
select "date", count(flag) as cnt
from data
group by "date";
So you want the name only count once:
SELECT COUNT(u.name) as name_count, u.[date]
FROM (
SELECT d.name,MIN(d.date) AS [date]
FROM yourTable d
GROUP BY d.name) u
GROUP BY u.[date];
You can add a ROW_NUMBER() that is Partitioned by name and ordered by date and add a WHERE clause that only returns the rows with Row_Number = 1.
You can check this following option-
SELECT A.Date,COUNT(B.[Name]) Count
FROM
(
SELECT DISTINCT Date FROM your_table
)A
LEFT JOIN
(
SELECT * FROM
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY [Name] ORDER BY Date) RN
FROM your_table
)A WHERE RN = 1
)B ON A.Date = B.Date
GROUP BY A.Date
But the best option if I modify a bit the concept from Shawnt00 is as below-
SELECT A.Date,COUNT(B.[Name]) Count
FROM
(
SELECT DISTINCT Date FROM your_table
)A
LEFT JOIN
(
SELECT [Name],MIN(Date) Date FROM your_table GROUP BY [Name]
)B ON A.Date = B.Date
GROUP BY A.Date
Both case the output will be-
Date Count
20190101 2
20190201 0
20190301 1

MSSQL Union All two queries with if statement

I have a query the following works as expected
If((Select count(*) from table1 where product = 'carrot')< 5)
Begin
Select Top (5 - (Select count(*) from table1 where product = 'carrot'))
id, product From table2
WHere id NOT IN
(Select id from table1) AND product = 'carrot'
Order by newid()
END
What i want to do is Union or Union all say another product potatoes
If((Select count(*) from table1 where product = 'potato')< 5)
Begin
Select Top (5 - (Select count(*) from table1 where product = 'potato'))
id, product From table2
WHere id NOT IN
(Select id from table1) AND product = 'potato'
Order by newid()
END
I keep getting a syntax error, when i add UNION between IF or after END. Is this possible or another way is better....
What i am doing is trying to select a random sample of carrots, first i want to check if i have the 5 carrots in table1. if i do don't run sample.
If i do not have 5 total carrots run the sampler and return 5 carrots. I then filter out if they already exist in table 1 by the id. Then it subtracts the count from the new sample for a total of five.
It works well, now i want to run for other products eg lettuce, potatoes etc...
But i want an UNION or UNION All. hope makes sense.
I'd be interested to see whether this way works-
Select Top (5 - (Select count(*) from table1 where product = 'carrots')< 5)
id
, product
From table2
WHere id NOT IN (Select id from table2)
AND (Select count(*) from table1 where product = 'carrots')< 5)
UNION ALL
Select Top (5 - (Select count(*) from table1 where product = 'potatoes')< 5)
id
, product
From table2
WHere id NOT IN (Select id from table2)
AND (Select count(*) from table1 where product = 'potatoes')< 5)
Your style is interesting, feels procedural rather than set-based.
You can try it this way
If(((Select count(*) from table1 where product = 'carrot'< 5) and (Select count(*) from table1 where product ='potato' <5))
)
Begin
Select Top (5 - (Select count(*) from table1 where product = 'carrot')) id, product
From table2
WHere id NOT IN (Select id from table1) AND product = 'carrot' Order by newid()
Union all
Select Top (5 - (Select count(*) from table1 where product = 'potato')) id, product From table2
WHere id NOT IN (Select id from table1) AND product = 'potato' Order by newid()
END
IF statements in SQL do not behave as sub-queries or row-sets in SQL, as you've found out. They are for branching the flow of control only.
Here is a more set based approach you could take:
SELECT ProdSamples.*
FROM
(
SELECT Table2.*, ROW_NUMBER() OVER (PARTITION BY table2.Product ORDER BY NEWID()) RowNum
FROM Table2
LEFT JOIN Table1
ON Table1.id = Table2.id
WHERE Table1.id IS NULL
) ProdSamples
JOIN
(
SELECT Product, COUNT(*) ProdCount
FROM Table1
GROUP BY Product
) ProdCounts
ON ProdSamples.Product = ProdCounts.Product
AND ProdSamples.RowNum <= (5 - ProdCounts.ProdCount)
The first sub-query ProdSamples returns all the products from Table2 that do not have an id in Table1. The RowNum field ranks them in random order partitioned by Product.
The second sub-query ProdCounts is the count of records for each product in Table1. Then it joins these sub-queries together and only returns the records from ProdSamples where the RowNum is lower or equal to the number of samples you want to return.

T- SQL Duplicate Records

I am trying to delete every other record which are duplicate my select query returns every other record duplicate (tblPoints.ptUser_ID) is the unique id
SELECT *, u.usMembershipID
FROM [ABCRewards].[dbo].[tblPoints]
inner join tblUsers u on u.User_ID = tblPoints.ptUser_ID
where ptUser_ID in (select user_id from tblusers where Client_ID = 8)
and ptCreateDate >= '3/9/2016'
and ptDesc = 'December Anniversary'
Usually duplicates getting returned by an INNER JOIN suggests an issue with the query but if you are certain that your join is correct then this would do it:
;WITH CTE
AS (SELECT *
, ROW_NUMBER() OVER(PARTITION BY t.ptUser_ID ORDER BY t.ptUser_ID) AS rn
FROM [ABCRewards].[dbo].[tblPoints] AS t)
/*Uncomment below to Review duplicates*/
--SELECT *
--FROM CTE
--WHERE rn > 1;
/*Uncomment below to Delete duplicates*/
--DELETE
--FROM CTE
--WHERE rn > 1;
When cleaning up data duplication, I have always used the same query pattern to delete all the duplicate and keep the wanted one(original, most recent, whatever). The below query pattern delete all duplicates and keep the one you wish to keep.
Just replace all [] with your table and fields.
[Field(s)ToDetectDuplications] : Put here the field(s) that allow you to say that they are dupplicate when they have the same values.
[Field(s)ToChooseWhichDupplicationIsKept ] : Put here a fields to choose which dupplicate will be kept. For exemple, the one with the
biggest value or the less old one.
.
DELETE [YourTableName]
FROM [YourTableName]
INNER JOIN (SELECT [YourTablePrimaryKey],
I = ROW_NUMBER() OVER(PARTITION BY [Field(s)ToDetectDuplications] ORDER BY [Field(s)ToChooseWhichDupplicationIsKept ] DESC)
FROM [dbo].[YourTableName]) AS T ON [YourTableName].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
AND T.I > 1
I recommend to have a look to what will be deleted before. To do so, just replace the "delete" statement with a "select" instead just like below.
SELECT T.I,
[YourTableName].*
FROM [YourTableName]
INNER JOIN (SELECT [YourTablePrimaryKey],
I = ROW_NUMBER() OVER(PARTITION BY [Field(s)ToDetectDuplications] ORDER BY [Field(s)ToChooseWhichDupplicationIsKept ] DESC)
FROM [dbo].[YourTableName]) AS T ON [YourTableName].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
AND T.I > 1
Explanation :
Here we use "row_number()", "Partition by" and "Order by" to detect duplicates. "Partition" group together all rows. Set your partitions fields in order to have one row per partition when the data is right. That way bad data come out with partition that have more than one row. Row_number assign them a number. When a number is greater then 1, then this mean there is a duplicate with this partition. The "order by" is use to tell "row_number" in what order to assign them a number. Number 1 is kept, all others are deleted.
Exemple with OP's schema and specification
Here I attempted to fill the patern with guess I have made on your database schema.
DECLARE #userID INT
SELECT #userID = 8
SELECT T.I,
[ABCRewards].[dbo].[tblPoints].*
FROM [ABCRewards].[dbo].[tblPoints]
INNER JOIN (SELECT [YourTablePrimaryKey],
I = ROW_NUMBER() OVER(PARTITION BY T.ptDesc, T.ptUser_ID ORDER BY ptCreateDate DESC)
FROM [ABCRewards].[dbo].[tblPoints]
WHERE T.ptCreateDate >= '3/9/2016'
AND T.ptDesc = 'December Anniversary'
AND T.ptUser_ID = #userID
) AS T ON [ABCRewards].[dbo].[tblPoints].[YourTablePrimaryKey] = T.[YourTablePrimaryKey]
AND T.I > 1

Finding Duplicate Data in Oracle

I have a table with 500,000+ records, and fields for ID, first name, last name, and email address. What I'm trying to do is find rows where the first name AND last name are both duplicates (as in the same person has two separate IDs, email addresses, or whatever, they're in the table more than once). I think I know how to find the duplicates using GROUP BY, this is what I have:
SELECT first_name, last_name, COUNT(*)
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
The problem is that I need to then move the entire row with these duplicated names into a different table. Is there a way to find the duplicates and get the whole row? Or at least to get the IDs as well? I tried using a self-join, but got back more rows than were in the table to begin with. Would that be a better approach? Any help would be greatly appreciated.
The most effective way to remove duplicate rows is with a self-join:
DELETE FROM person_table a
WHERE a.rowid >
ANY (SELECT b.rowid
FROM person_table b
WHERE a.first_name = b.first_name
AND a.last_name = b.last_name);
This will remove all duplicates even if there are more than one duplicate row.
There is more on removing duplicates and differing methods here: http://www.dba-oracle.com/t_delete_duplicate_table_rows.htm
Hope it helps...
EDIT: As per your comments, if you want to select all but one of the duplicates then
SELECT *
FROM person_table a
WHERE a.rowid >
ANY (SELECT b.rowid
FROM person_table b
WHERE a.first_name = b.first_name
AND a.last_name = b.last_name);
An index on (first_name, last_name) or on (last_name, first_name) would help:
SELECT t.*
FROM
person_table t
JOIN
( SELECT first_name, last_name
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
) dup
ON dup.last_name = t.last_name
AND dup.first_name = t.first_name
or:
SELECT t.*
FROM person_table t
WHERE EXISTS
( SELECT *
FROM person_table dup
WHERE dup.last_name = t.last_name
AND dup.first_name = t.first_name
AND dup.ID <> t.ID
)
This will give you an ID you want to move/delete/etc. Note that it does not work if count(*) > 2, as you get only 1 ID (you could re-run your query for these cases).
SELECT max(ID), first_name, last_name, COUNT(*)
FROM person_table
GROUP BY first_name, last_name
HAVING COUNT(*) > 1
Edit: You can use COLLECT to get all IDs at once (but be careful, as you only want to move/delete all but one)
To add another option, I usually use this one to remove duplicates:
delete from person_table
where rowid in (select rid
from (select rowid rid, row_number() over
(partition by first_name,last_name order by rowid) rn
from person_table
)
where rn <> 1 )

Resources