Counting records that have count(column) > 2 - sql-server

I have a CustomerData table with the following attributes:
FirstName, LastName, DateofBirth, ID, Location
I am trying to write a query that pulls the first name, last name, date of birth, and IDNumber of users who belong to more than one location.
I tried the following code but I am getting an error about aggregates needing a group by clause.
SELECT *
FROM CustomerData
WHERE Count(Location) > 2
Any assistance would be greatly appreciated

Use HAVING:
SELECT FirstName, LastName, DateofBirth, ID, Count(Location)
FROM CustomerData
GROUP BY FirstName, LastName, DateofBirth, ID
HAVING Count(Location) > 2
HAVING is basically a WHERE that allows you to use aggregate functions.

You can also use a sub-query:
select c1.FirstName, c1.LastName, c1.DateofBirth, c1.ID, c2.CntLocation
from CustomerData c1
left join
(
SELECT Count(Location) CntLocation, location
FROM CustomerData
group by location
) c2
on c1.location = c2.location
WHERE CntLocation > 2

Related

SQL Server Remove Duplicates

I have a table that Tracks Employees and the days they have spent in a policy. I don't generate this Data, it is dumped to our Server Daily.
The table looks like this:
My Goal is to get rid of the duplicates by keeping only the most recent Date.
In this example, if I run the query, I would like it to keep Rows 11 for Nicholas Morris and 14 for Tiana Sullivan.
Assumption: First name and Last Name combo are unique
So far,
This is what I have been doing:
select *
from
Employees IN(
Select ID
from Employees
group by FirstName, lastName
Having count(*) > 1)
This returns to me the rows that have duplicates and I have to manually search them and remove the ones I don't want to keep.
I am sure there is a better way of doing this
Thanks for your help
You can use a CTE and ROW_NUMBER() function to do it.
The query to get the data is:
SELECT ID, FirstName, LastName, ROW_NUMBER()
OVER (PARTITION BY FirstName, LastName ORDER BY DaysInPolicy DESC) AS Identifier
FROM
Employees
The query to remove duplicates is:
;WITH CTE AS (
SELECT ID, ROW_NUMBER()
OVER (PARTITION BY FirstName, LastName ORDER BY DaysInPolicy DESC) AS Identifier
FROM
Employees
)
DELETE E
FROM
Employees E
INNER JOIN CTE C ON C.ID = E.ID
WHERE
C.Identifier > 1
You could delete using an exists operator where you remove any row that has the same first and last name, but with a newer date:
DELETE FROM employees e1
WHERE EXISTS (SELECT *
FROM employees e2
WHERE e1.FirstName = e2.FirstName AND
e1.LastName = e2.LastName AND
e1.DaysInPolicy < e2.DaysInPolicy)
Try this:
SELECT * FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY Last_Name, First_Nmae ORDER BY DaysInPolicy DESC) AS RowNum
FROM Employees
) AS Emp
WHERE Emp.RowNum > 1

TSQL: group by Substring (Name) and retrieve ID in SELECT

We have companies' data stored in a table. In an effort to de-duplicate the rows, we need to identify duplicate data sets of companies by using following criterion: If First five letters of the CompanyName, City and postal code match with other records' same fields then it is a duplicate. We will later remove the duplicates. The problem I am running in to is that I can't retrieve IDs of these records since I am not grouping the records on ID.
I am using following SQL:
Select count(ID) as DupCount
, SUBSTRING(Name,1,5) as Name
, City
, PostalCode
from tblCompany
group by SUBSTRING(Name,1,5)
, City
, PostalCode
Having count(ID) > 1
order by count(ID) desc
How do I retrieve the ID of these records?
You can use window functions:
Select c.*
from (select c.*,
count(*) over (partition by left(Name, 5), City, PostalCode) as cnt
from tblCompany c
) c
where cnt >= 2;
This will return the individual rows with dups. You can then summarize this or do what you want with the result set.
Use group_concat() to get the ids as a comma separated list:
select
SUBSTRING(Name,1,5) as Name,
City,
PostalCode,
count(ID) as counter,
group_concat(id order by id) as ids
from tblCompany
group by SUBSTRING(Name,1,5), City, PostalCode
having count(ID) > 1
order by count(ID) desc

Error in SQLServer: Subquery returned more than 1 value

I would like to insert to Clients table data from two different tables (Surname and name). Moreover I would like to have a third column (email) that is a concatination from the first two. when i try the code hereunder it gives me the following error: "Subquery returned more than 1 value".
insert into CLIENTS (LastName,Firstname, EMAIL)
select (select top 150 Surname from Surname order by NEWID()),
(select top 150 Name from Name order by Newid()),
(select concat(concat(FisrtName, LastName),'#novaims.com') from clients);
Could you please help me understand where is the problem?
The error message is obvious your sub-query can result more than one record. Try this
;WITH cte
AS (SELECT 1 AS val
UNION ALL
SELECT val + 1
FROM cte
WHERE val < 150)
SELECT FisrtName,
LastName,
Concat(FisrtName, LastName, '#novaims.com')
FROM cte
OUTER apply (SELECT TOP 1 Surname FROM Surname ORDER BY Newid()) s (FisrtName)
OUTER apply (SELECT TOP 1 NAME FROM NAME ORDER BY Newid()) n (LastName)
Option (Maxrecursion 0)
You need to move the table references to the from clause. I think this does what you want:
insert into CLIENTS (LastName, Firstname, EMAIL)
select surname, name, concat(name, surname, '#novaims.com')
from (select Surname, row_number() over (order by newid()) as seqnum
from Surname
) s join
(select Name, row_number() over (order by newid()) as seqnum
from Name
)
on n.seqnum = s.seqnum;
Another method uses apply:
insert into CLIENTS (LastName, Firstname, EMAIL)
select top 150 s.surname, n.name, concat(n.name, s.surname, '#novaims.com')
from surname s cross apply
(select top 1 n.*
from names n
order by newid()
) n
order by newid();
This is more similar to your original idea. Do note, though, that the same name can appear more than once. And the performance should be better for the first version (because the sort is only happening once on each table).

T-SQL Grouping Issue

I have 2 tables.
Contacts
ContactID pk
EmailAddress
FirstName
LastName
Address
Orders
OrderID pk
ContactID fk
I want to get the number or orders for each email address in Contacts like below
select
Contacts.EmailAddress,
count(distinct Orders.OrderID) AS NumOrders
from
Contacts inner join Orders on Contacts.ContactID = Orders.ContactID
group by
Contacts.EmailAddress
Problem is, I also want the first name, last name, address. But I can't group by those because each email address in Contacts could have a different first name, lastname or address associated with it.
ie:
myname#email.com, Fred, Jackson, 123 Main St
myname#email.com, Bob, Smith, 456 Spruce St.
How can I change my query so that I can get the first name, last name and address for the most recent entry made in Contacts for that email address?
Thanks in advance!
My first thought would be to use windowed functions.
SELECT EmailAddress,
FirstName,
Lastname,
[Address],
EmailOrderCount
FROM (SELECT c.EmailAddress,
c.FirstName,
c.LastName,
c.[Address],
COUNT(o.OrderID) OVER (PARTITION BY c.EmailAddress) EmailOrderCount,
ROW_NUMBER() OVER (PARTITION BY c.EmailAddress ORDER BY c.ContactID DESC) Rn
FROM Contacts c
JOIN Orders o ON c.ContactID = o.ContactID
) t
WHERE Rn = 1
Demo
another way would be to use CROSS APPLY to append the top 1 contact record to the summary rows.
SELECT c.EmailAddress,
COUNT(o.OrderID) NumOrders,
ca.FirstName,
ca.LastName,
ca.[Address]
FROM Contacts c
INNER JOIN Orders ON c.ContactId = o.ContactID
CROSS APPLY (
SELECT TOP 1
FirstName,
Lastname,
[Address]
FROM Contacts c2
WHERE c2.EmailAddress = c.EmailAddress
ORDER BY c2.ContactID DESC) ca
GROUP BY c.EmailAddress,
ca.FirstName,
ca.LastName,
ca.[Address]
Try this:
select
Contacts.Name,
Contacts.FirstName,
Contacts.LastName
Contacts.EmailAddress,
count(distinct Orders.OrderID) AS NumOrders
from
(
select max(ContactID) as ContactID,
EmailAddress
from Contacts
group by EmailAddress
) MinContactForEachEMailAddress
inner join
Contacts
on MinContactForEachEMailAddress.ContactID = Contacts.ContactID
inner join
Orders
on Contacts.ContactID = Orders.ContactID
group by
Contacts.EmailAddress
Another way to get what you want is using a CTE and taking the "maximum" row by using ROW_NUMBER.
;WITH CTE AS (
SELECT C.ContactId, C.Name, C.FirstName, C.LastName, C.EmailAddress,
ROW_NUMBER() OVER (PARTITION BY EmailAddress ORDER BY ContactId DESC) RowNo
FROM Contact C
)
SELECT CTE.*, COUNT(o.OrderID) OVER (PARTITION BY CTE.EmailAddress) Cnt
FROM CTE
JOIN Orders O on CTE.ContactID = O.ContactID
-- select the "maximum" row
WHERE CTE.RowNo = 1
An easy way to do this is to make your original query a subquery and select from it. I'm making a slight change, because it's a better practice to group by your primary key than your email address. (Is it a safe bet that each contact has just one email address, and that the basic intent is to group by person?) If so, try this:
SELECT DISTINCT c.EmailAddress, c.FirstName, c.LastName, c.Address, sub.NumOrders
FROM
(
select
Contacts.ContactID
count(distinct Orders.OrderID) AS NumOrders
from
Contacts inner join Orders on Contacts.ContactID = Orders.ContactID
group by
Contacts.ContactID
) sub
JOIN Contacts c
ON sub.ContactID = c.ContactID
If you really need to group by email address instead, then change the above subquery to your original query and change c.EmailAddress to sub.EmailAddress. Of course you may order the SELECT fields however best suits you.
Edit follows:
The ContactID must be a sequence number and you can continually put the same person in the table. So if you add the DISTINCT keyword in the outer query I believe that will give you what you need.

Finding a recent most duplicate records from SQL Server 2012

I want to find the recent duplicate records from SQL Server 2012. Here is the table structure I have.
I have table name called UserRegistration which contains the duplicate of UserID(GUID) and in same table, I have CreatedDate Column as well (Date). Now I want to find the recent duplicate records from this table.
Here is the same data.
id FirstName LastName CreatedDate UserID
109 FirstNameA LastNameA 28-04-2015 GUID1
110 FirstNameC LastNameD 19-05-2015 GUID2
111 FirstNameE LastNameF 22-05-2015 GUID1
If you notice on above tables, GUID 1 are having the duplicate, Now I want to find the recent one means it should return me only those rows with duplication but recent data. So in above table structure, it should return me 111 because record has been created recently compared to the 109. I believe you understand.
Do let me know if you have any question. I am happy to answer. Thanks. Awaiting for the reply.
Harshal
Try the below query this should do the work based on your i/p data -
create table #UserRegistration (id int,FirstName varchar(20),LastName varchar(20),CreatedDate date,UserID varchar(20))
insert into #UserRegistration
select 109, 'FirstNameA', 'LastNameA', '2015-04-28', 'GUID1' union
select 110, 'FirstNameC', 'LastNameD', '2015-05-19', 'GUID2' union
select 111, 'FirstNameE', 'LastNameF', '2015-05-22', 'GUID1'
select id, FirstName, LastName, CreatedDate, UserID from
(SELECT ur.*,row_number() over(partition by UserID order by CreatedDate) rn
FROM #UserRegistration ur) A
where rn > 1
You could use CTE. Group your records by UserID and give your particular row a rank ordered by CreatedDate.
insert into tab(id, FirstName, LastName, CreatedDate, UserID)
values(109, 'FirstNameA', 'LastNameA', '2015-04-28', 'guid1'),
(110, 'FirstNameC', 'LastNameD', '2015-05-19', 'guid2'),
(111, 'FirstNameE', 'LastNameF', '2015-05-22', 'guid1');
with cte as
(
select id, ROW_NUMBER() over (partition by UserID order by CreatedDate asc) as [Rank],
FirstName, LastName, CreatedDate, UserID
from tab
)
select id, FirstName, LastName, CreatedDate, UserID from cte where Rank > 1
Rank > 1 condition is responsible for retrieving duplicated items.
sqlfiddle link:
http://sqlfiddle.com/#!6/4d1f2/6
Solved this by using tmp-tables:
SELECT a.UserID,
MAX(a.CreatedDate) As CreatedDate
INTO #latest
FROM <your table> a
GROUP BY a.UserID
HAVING COUNT(a.UserID) > 1
SELECT b.id
FROM #latest a
INNER JOIN <your table> b ON a.UserID = b.UserID AND a.CreatedDate = b.CreatedDate
try this,
SELECT * FROM TableName tt WHERE
exists(select MAX(createdDate)
from TableName
where tt.UserID = UserID
group by UserID
having MAX(createdDate)= tt.createdDate)
I think your createddate field is not a date field, then try Format
WITH TempAns (id,UserID,duplicateRecordCount)
AS
(
SELECT id,
UserID,
ROW_NUMBER()OVER(partition by UserID ORDER BY id)
AS duplicateRecordCount
FROM #t
)
select * from #t where id in (
select max(id )
from TempAns
where duplicateRecordCount > 1
group by name )
You'd rank your records with ROW_NUMBER() to give all last records per userid #1. With COUNT() you make sure only to get the userids having more than one record.
select
id, firstname, lastname, createddate, userid
from
(
select
id, firstname, lastname, createddate, userid,
row_number() over (partition by userid oder by createddate desc) as rn,
count(*) over (partition by userid) as cnt
from userregistration
) ranked
where rn = 1 -- only last one
and cnt > 1; -- but only if there is more than one record for the userid
This gets the latest record for every userid that has duplicates.

Resources