Select all columns from table where one field is duplicated - sql-server

I'm trying to get a list of users (all their data) that have a duplicated email.
I can get all the emails by using
SELECT EMAIL, Count(*)
FROM USER_TABLE
Group By EMAIL having COUNT(*) > 1
and that returns a table of emails and their count (greater than 1).
I could write a query and just do
SELECT *
FROM USER_TABLE
WHERE EMAIL IN ('dup#email.com', 'dup2#email.com' ...);`
but that requires me to always run the first query first and then copy paste them all into the IN statement.
What's the best way to combine these? Well not really combine, I don't care how many duplicates there are, I just want all the user info for users that have a duplicate email.

You pretty much wrote the whole solution yourself. You just need your first query as the IN instead of hard coded list.
SELECT *
FROM USER_TABLE
WHERE EMAIL IN
(
SELECT EMAIL
FROM USER_TABLE
GROUP By EMAIL
HAVING COUNT(*) > 1
)

With window function COUNT:
SELECT *
FROM
(
SELECT
u.*,
COUNT(*) OVER (PARTITION BY u.Email) AS Cnt
FROM USER_TABLE u
) AS t
WHERE t.Cnt > 1

We can also try self joining the USER_TABLE table to your first original query:
SELECT t1.*
FROM USER_TABLE t1
INNER JOIN
(
SELECT EMAIL
FROM USER_TABLE
GROUP BY EMAIL
HAVING COUNT(*) > 1
) t2
ON t1.EMAIL = t2.EMAIL

Related

SQL queries combined into one row

I'm having some difficulty combining the following queries, so that the results display in one row rather than in multiple rows:
SELECT value FROM dbo.parameter WHERE name='xxxxx.name'
SELECT dbo.contest.name AS Event_Name
FROM contest
INNER JOIN open_box on open_box.contest_id = contest.id
GROUP BY dbo.contest.name
SELECT COUNT(*) FROM open_option AS total_people
SELECT SUM(scanned) AS TotalScanned,SUM(number) AS Totalnumber
FROM dbo.open_box
GROUP BY contest_id
SELECT COUNT(*) FROM open AS reff
WHERE refer = 'True'
I would like to display data from the fields in each column similar to what is shown in the image below. Any help is appreciated!
Tab's solution is fine, I just wanted to show an alternative way of doing this. The following statement uses subqueries to get the information in one row:
SELECT
[xxxx.name]=(SELECT value FROM dbo.parameter WHERE name='xxxxx.name'),
[Event Name]=(SELECT dbo.contest.name
FROM contest
INNER JOIN open_box on open_box.contest_id = contest.id
GROUP BY dbo.contest.name),
[Total People]=(SELECT COUNT(*) FROM open_option),
[Total Scanned]=(SELECT SUM(scanned)
FROM dbo.open_box
GROUP BY contest_id),
[Total Number]=(SELECT SUM(number)
FROM dbo.open_box
GROUP BY contest_id),
Ref=(SELECT COUNT(*) FROM open WHERE refer = 'True');
This requires the Total Scanned and Total Number to be queried seperately.
Update: if you then want to INSERT that into another table there are essentially two ways to do that.
Create the table directly from the SELECT statement:
SELECT
-- the fields from the first query
INTO
[database_name].[schema_name].[new_table_name]; -- creates table new_table_name
Insert into a table that already exists from the INSERT
INSERT INTO [database_name].[schema_name].[existing_table_name](
-- the fields in the existing_table_name
)
SELECT
-- the fields from the first query
Just CROSS JOIN the five queries as derived tables:
SELECT * FROM (
Query1
) AS q1
CROSS JOIN (
Query2
) AS q2
CROSS JOIN (...
Assuming that each of your individual queries only returns one row, then this CROSS JOIN should result in only one row.

TSQL Group By Issues

I have a TSQL query that I am trying to group data on. The table contains records of users and the access keys they hold such as site admin, moderator etc. The PK is on User and access key because a user can exist multiple times with different keys.
I am now trying to display a table of all users and in one column, all of the keys that user holds.
If bob had three separate records for his three separate access keys, result should only have One record for bob with all three of is access levels.
SELECT A.[FirstName],
A.[LastName],
A.[ntid],
A.[qid],
C.FirstName AS addedFirstName,
C.LastName AS addedLastName,
C.NTID AS addedNTID,
CONVERT(VARCHAR(100), p.TIMESTAMP, 101) AS timestamp,
(
SELECT k.accessKey,
k.keyDescription
FROM TFS_AdhocKeys AS k
WHERE p.accessKey = k.accessKey
FOR XML PATH ('key'), TYPE, ELEMENTS, ROOT ('keys')
)
FROM TFS_AdhocPermissions AS p
LEFT OUTER JOIN dbo.EmployeeTable as A
ON p.QID = A.QID
LEFT OUTER JOIN dbo.EmployeeTable AS C
ON p.addedBy = C.QID
GROUP BY a.qid
FOR XML PATH ('data'), TYPE, ELEMENTS, ROOT ('root');
END
I am trying to group the data by a.qid but its forcing me to group on every column in the select which will then not be unique so it will contain the duplicates.
Whats another approach to handle this?
Currently:
UserID | accessKey
123 | admin
123 | moderator
Desired:
UserID | accessKey
123 | admin
moderator
Recently, I was working on something and had a similar problem. Like your query, I had an inner 'for xml' with joins in the outer 'for xml'. It turned out it worked better if the joins were in the inner 'for xml'. The code is pasted below. I hope this helps.
Select
(Select Institution.Name, Institution.Id
, (Select Course.Courses_Id, Course.Expires, Course.Name
From
(Select Course.Courses_Id, Course.Expires, Courses.Name
From Institutions Course Course Join Courses On Course.Courses_Id = Courses.Id
Where Course.Institutions_Id = 31) As Course
For Xml Auto, Type, Elements) As Courses
From Institutions Institution
For Xml Auto, Elements, Root('Institutions') )
As I don't have the definitions for the other tables you have I just make a sample test data and you can follow this to answer yours.
Create statement
CREATE TABLE #test(UserId INT, AccessLevel VARCHAR(20))
Insert sample data
INSERT INTO #test VALUES(123, 'admin')
,(123, 'moderator')
,(123, 'registered')
,(124, 'moderator')
,(124, 'registered')
,(125, 'admin')
By using ROW_NUMBER() you can achieve what you need
;WITH C AS(
SELECT ROW_NUMBER() OVER(PARTITION BY UserId ORDER BY UserId) As Rn
,UserId
,AccessLevel
FROM #test
)
SELECT CASE Rn
WHEN 1 THEN UserId
ELSE NULL
END AS UserId
,AccessLevel
FROM C
Output
UserId AccessLevel
------ -----------
123 admin
NULL moderator
NULL registered
124 moderator
NULL registered
125 admin

SQL - Find more than one occurreance of a record

I have a table of customers:
Firstname Lastname Mobile Email
I would like to know what query in SQL Server I could run to find all the instances of there being a mobile number allocated to more than one email address, for example
Bob Smith 07789665544 bob#test.com
Bill Car 07789665544 bill#hello.com
I want to find all the records where an mobile number has multiple email addresses.
Thanks.
Use EXISTS
SELECT c.*
FROM dbo.Customers c
WHERE EXISTS
(
SELECT 1 FROM dbo.Customers c2
WHERE c.Mobile = c2.Mobile
AND COALESCE(c.Email, '') <> COALESCE(c2.Email, '')
)
I've used COALESCE in case Email can be NULL.
A CTE with a nested query can do this, and rather quickly too:
with DupeNumber as(
select se.Mobile from (select distinct Mobile, Email from Customers) se
group by se.Mobile
having count(*) >1
)
select * from Customers
inner join DupeNumber dn on se.Mobile=dn.Mobile
order by Mobile
This makes a list of the unique fax and email combinations, then finds the Mobile numbers that are in more than one email, then joins back to the original table to get the full rows

Multiple Select against one CTE

I have a CTE query filtering a table Student
Student
(
StudentId PK,
FirstName ,
LastName,
GenderId,
ExperienceId,
NationalityId,
CityId
)
Based on a lot filters (multiple cities, gender, multiple experiences (1, 2, 3), multiple nationalites), I create a CTE by using dynamic sql and joining the student table with a user defined tables (CityTable, NationalityTable,...)
After that I have to retrieve the count of student by each filter like
CityId City Count
NationalityId Nationality Count
Same thing the other filter.
Can I do something like
;With CTE(
Select
FROM Student
Inner JOIN ...
INNER JOIN ....)
SELECT CityId,City,Count(studentId)
FROm CTE
GROUP BY CityId,City
SELECT GenderId,Gender,Count
FROM CTE
GROUP BY GenderId,Gender
I want to something like what LinkedIn is doing with search(people search,job search)
http://www.linkedin.com/search/fpsearch?type=people&keywords=sales+manager&pplSearchOrigin=GLHD&pageKey=member-home
It's so fast and do the same thing.
You can not use multiple select but you can use more than one CTE like this.
WITH CTEA
AS
(
SELECT 'Coulmn1' A,'Coulmn2' B
),
CETB
AS
(
SELECT 'CoulmnX' X,'CoulmnY' Y
)
SELECT * FROM CTEA, CETB
For getting count use RowNumber and CTE some think like this.
ROW_NUMBER() OVER ( ORDER BY COLUMN NAME )AS RowNumber,
Count(1) OVER() AS TotalRecordsFound
Please let me know if you need more information on this.
Sample for your reference.
With CTE AS (
Select StudentId, S.CityId, S.GenderId
FROM Student S
Inner JOIN CITY C
ON S.CityId = C.CityId
INNER JOIN GENDER G
ON S.GenderId = G.GenderId)
,
GENDER
AS
(
SELECT GenderId
FROM CTE
GROUP BY GenderId
)
SELECT * FROM GENDER, CTE
It is not possible to get multiple result sets from a single CTE.
You can however use a table variable to cache some of the information and use it later instead of issuing the same complex query multiple times:
declare #relevantStudent table (StudentID int);
insert into #relevantStudent
select s.StudentID from Students s
join ...
where ...
-- now issue the multiple queries
select s.GenderID, count(*)
from student s
join #relevantStudent r on r.StudentID = s.StudentID
group by s.GenderID
select s.CityID, count(*)
from student s
join #relevantStudent r on r.StudentID = s.StudentID
group by s.CityID
The trick is to store only the minimum required information in the table variable.
As with any query whether this will actually improve performance vs. issuing the queries independently depends on many things (how big the table variable data set is, how complex is the query used to populate it and how complex are the subsequent joins/subselects against the table variable, etc.).
Do a UNION ALL to do multiple SELECT and concatenate the results together into one table.
;WITH CTE AS(
SELECT
FROM Student
INNER JOIN ...
INNER JOIN ....)
SELECT CityId,City,Count(studentId),NULL,NULL
FROM CTE
GROUP BY CityId,City
UNION ALL
SELECT NULL,NULL,NULL,GenderId,Gender,Count
FROM CTE
GROUP BY GenderId,Gender
Note: The NULL values above just allow the two results to have matching columns, so the results can be concatenated.
I know this is a very old question, but here's a solution I just used. I have a stored procedure that returns a PAGE of search results, and I also need it to return the total count matching the query parameters.
WITH results AS (...complicated foo here...)
SELECT results.*,
CASE
WHEN #page=0 THEN (SELECT COUNT(*) FROM results)
ELSE -1
END AS totalCount
FROM results
ORDER BY bar
OFFSET #page * #pageSize ROWS FETCH NEXT #pageSize ROWS ONLY;
With this approach, there's a small "hit" on the first results page to get the count, and for the remaining pages, I pass back "-1" to avoid the hit (I assume the number of results won't change during the user session). Even though totalCount is returned for every row of the first page of results, it's only computed once.
My CTE is doing a bunch of filtering based on stored procedure arguments, so I couldn't just move it to a view and query it twice. This approach allows avoid having to duplicate the CTE's logic just to get a count.

Can I make this T-SQL code more efficient

I am running a SQL query on a table containing 3 million records comparing email addresses.
We have two email address fields, primary and secondary.
I am comparing a subset of primary emails against all other primary and secondary Emails to get a count of both duplicates and unique Emails in the data.
I believe this code works, its still running 10 mins in, and I have to do this for another 9 subsets which are alot larger than this one. Code is as follows:
SELECT COUNT(*) AS UniqueRecords
FROM AllVRContacts
WHERE LEN(EMAIL) > 1 AND ACCOUNTID = '00120000003bNmMAAU'
AND EMAIL NOT IN
(SELECT EMAIL FROM AllVRContacts WHERE ACCOUNTID != '00120000003bNmMAAU')
AND EMAIL NOT IN
(SELECT SECONDARY_EMAIL_ADDRESS__C FROM AllVRContacts WHERE ACCOUNTID != '00120000003bNmMAAU')
I want to learn something from this rather than just have someone scratch my back for me, the more explanation the better!
Thanks guys,
Create the following indexes:
AllVrContacts (AccountID) INCLUDE (Email)
AllVrContacts (Email) INCLUDE (AccountID)
AllVrContacts (SECONDARY_EMAIL_ADDRESS__C) INCLUDE (AccountID)
The index on (AccountID, Email) will be used for the WHERE filter in the main query:
WHERE ACCOUNTID = '00120000003bNmMAAU'
AND LEN(Email) > 1
The other two indexes will be used for antijoins (NOT IN) against this table.
You should also use:
SELECT COUNT(DISTINCT email) AS UniqueRecords
if you want the duplicates across the same account to be counted only once.
SELECT COUNT(*)
FROM (SELECT EMAIL AS UniqueRecords
FROM AllVRContacts a
WHERE ACCOUNTID = '00120000003bNmMAAU'
AND NOT EXISTS (SELECT EMAIL FROM AllVRContacts b
WHERE ACCOUNTID != '00120000003bNmMAAU'
AND (
a.EMAIL = b.EMAIL
OR a.EMAIL = b.SECONDARY_EMAIL_ADDRESS__C
)
)
AND LEN(EMAIL) > 1
GROUP BY EMAIL
) c
So how is this query better?
You typically want to use NOT EXISTS instead of NOT IN
IN returns true if a specified value matches any value in a subquery or a list
EXISTS returns true if a subquery contains any rows
More Info: SQL Server: JOIN vs IN vs EXISTS - the logical difference
= performs much better than !=
Reduce the scans (seeks if you have indexes on AllVRContacts) by not searching through AllVRContacts a second time for the secondary e-mail comparison
GROUP BY resolves potential duplicate e-mails within the ACCOUNTID
To further improve performance, add indexes as Quassnoi suggested and whatever is populating the table should validate e-mails to remove the need for the LEN check.
[EDIT] Added explanation to (3)
Can this be applicable?
SELECT ACCOUNTID, COUNT(*) AS UniqueRecords
FROM (
SELECT ACCOUNTID, EMAIL
FROM AllVRContacts
WHERE ACCOUNTID = '00120000003bNmMAAU' AND LEN(EMAIL) > 1
UNION
SELECT ACCOUNTID, SECONDARY_EMAIL_ADDRESS__C
FROM AllVRContacts
WHERE ACCOUNTID = '00120000003bNmMAAU' AND LEN(SECONDARY_EMAIL_ADDRESS__C) > 1
) s
I understood that basically you wanted to count distinct email addresses for each ACCOUNTID.
UNION in the inner query eliminates duplicates so the output (of the inner query) only has distinct pairs of account ids and emails, whether primary or secondary. Particularly this means that if an email address is stored as both primary and secondary, it will count only once. Same applies to same primary or same secondary address stored in different rows.
Now you only need to count the rows, which is done by the outer query.
If another 9 subsets you've mentioned mean simply other ACCOUNTIDs, then maybe you could try GROUP BY ACCOUNTID applied to the outer query and the ACCOUNTID = '...' part of both WHERE clauses got rid of to count emails for all of them with one query. That is, like this:
SELECT ACCOUNTID, COUNT(*) AS UniqueRecords
FROM (
SELECT ACCOUNTID, EMAIL
FROM AllVRContacts
WHERE LEN(EMAIL) > 1
UNION
SELECT ACCOUNTID, SECONDARY_EMAIL_ADDRESS__C
FROM AllVRContacts
WHERE LEN(SECONDARY_EMAIL_ADDRESS__C) > 1
) s
GROUP BY ACCOUNTID
Try this and let me know
SELECT ACCOUNTID,COUNT(*) AS UniqueRecords
FROM AllVRContacts
WHERE LEN(EMAIL) > 1 AND ACCOUNTID = '00120000003bNmMAAU'
Group by ACCOUNTID
Having COUNT(EMAIL) >1

Resources