Making a query that only shows unique records - database

I have a table where duplicate entries in one of the columns is possible (emailAddress - some couples share them) and I would like to send email newsletters to them. Is there a way to make a select query where it only shows one copy of the email address if there are multiple?

If you need only emailAddress it is quite simple:
select distinct emailAddress from <YourTableNameHere>
Edited according to request in comments.
If you want to choose both distinct emailAddress and ANY customerName related to it then you must somehow tell SQL how to choose the customerName. The easiest way is to select i.e. MIN(customerName), then all other (usually those that are later in alphabet but it actually depends on collation) are discarded. Query would be:
select emailAddress, min(customerName) as pickedCustomerName
from <YourTableNameHere>
group by emailAddress

You can use the DISTINCT keywprd, or you can GROUP BY.
SELECT DISTINCT email
FROM table
Or
SELECT email, Count(ID)
FROM table
GROUP By email

Related

detecting duplicates and removing them

I've been trying to solve a problem in my database which is quite common but I couldn't find a solution so far and I hope you could help me with this.
I have a database with people and their associated addresses. My primary goal is to find out how many unique households are in there. For example, I want to count a family as one. So far a ran a query to display last_names and addresses which are more than one:
select Last_Name ,add_line1, count(*) from ##all_people
group by Last_Name,ADD_LINE1
having count(*) > 1
This shows me people with the same last_name and address but I need their IDs in order to remove them from my temptable.
Furthermore, I'd like to ask how it is possible to display only one record for each household.
This is the structure of my temptable:
ID First_name Last_Name add_line1
Thank you so much for your help!!!
to find duplicates, you can use Count() Over() and partition by the grouping you want.
select * from (
select Id, Last_Name ,add_line1, count(*) over (partition by Last_Name, add_line1) dupe_count from ##all_people
) t
where t.dupe_count > 1
to find the ones you want to delete, you can use Row_Number()
select * from (
select Id, Last_Name ,add_line1, row_number() over (partition by Last_Name, add_line1 order by ID) extras from ##all_people
) t
where t.extras > 1
use t.extras = 1 to see one row per grouping
You seem to have a lot of questions here...
My primary goal is to find out how many unique households are in there.
You can do this with a distinct count:
SELECT COUNT(DISTINCT Last_Name + add_line1)
FROM ##all_people
...but I need their IDs in order to remove them from my temptable
I think this is solved by the new count query.
Furthermore, I'd like to ask how it is possible to display only one record for each household.
Just use distinct last name and address:
SELECT DISTINCT last_name, add_line1
FROM ##all_people

How do i get list of Managers with ID?

I have a Users table with following schema,
UserId, Name, ManagerId
This ManagerId, is nothing but a Userid as a manager is also a user.
I can get list of managers using below query but how do i get managerId along with name ?
Select DISTINCT(ManagerId) from Users
group by ManagerId
I want following output
ManagerId, Name
Try this:
SELECT UserId, Name
FROM Permission
WHERE UserId IN (Select DISTINCT ManagerId FROM Permission)
Also possible with a JOIN:
SELECT UserId, Name
FROM Permission AS p1
JOIN (Select DISTINCT ManagerId FROM Permission) AS p2
ON p1.UserId = p2.MAnagerId
I believe you want to retrieve users who are managers. This could be done using EXISTS clause:
SELECT DISTINCT UserId, Name
FROM Permission p1
WHERE EXISTS (
SELECT 1
FROM Permission p2
WHERE p1.UserId = p2.ManagerId
)
DISTINCT clause is optional and may be extraneous depending on your data. If this is the case, remove it for performance boost (distinct requires the input to be sorted and then duplicates are removed).
I'm not sure if it's a good idea to use IN together with DISTINCT like Giorgos suggested. SQL Server has a good optimization engine, but I've had bad experience with such statements.
Better use EXISTS
SELECT P1.UserId, P1.Name
FROM Permission AS P1
WHERE EXISTS (
SELECT *
FROM Permission AS P2
WHERE P2.ManagerId = P1.UserId
);
Keep in mind that it would be a good idea to have indexes on Permission table where first indexed column will be UserId and in another index ManagerId. This way EXISTS will perform very efficiently.

SQL - Find more than one occurreance of a record

I have a table of customers:
Firstname Lastname Mobile Email
I would like to know what query in SQL Server I could run to find all the instances of there being a mobile number allocated to more than one email address, for example
Bob Smith 07789665544 bob#test.com
Bill Car 07789665544 bill#hello.com
I want to find all the records where an mobile number has multiple email addresses.
Thanks.
Use EXISTS
SELECT c.*
FROM dbo.Customers c
WHERE EXISTS
(
SELECT 1 FROM dbo.Customers c2
WHERE c.Mobile = c2.Mobile
AND COALESCE(c.Email, '') <> COALESCE(c2.Email, '')
)
I've used COALESCE in case Email can be NULL.
A CTE with a nested query can do this, and rather quickly too:
with DupeNumber as(
select se.Mobile from (select distinct Mobile, Email from Customers) se
group by se.Mobile
having count(*) >1
)
select * from Customers
inner join DupeNumber dn on se.Mobile=dn.Mobile
order by Mobile
This makes a list of the unique fax and email combinations, then finds the Mobile numbers that are in more than one email, then joins back to the original table to get the full rows

How would I write this query with DBIX::Class?

I've seen a few other questions on Stackoverflow that discuss sub-selects, but they usually relate to the use of multiple tables. In most cases, a proper join could serve the same purpose.
However my query below refers to a single table. How would I write this using DBIX::Class?
select ID, username, email, role
from Employees
where (ID in
(select max(ID)
from Employees
where username = 'jsmith'
))
order by ID DESC
Thanks!
--
Edit 1: SQL code fix
The Cookbook has almost the exact same query as example.
Your SQL query doesn't make sense to me because the subquery returns a single id, so WHERE id = () would make more sense.
What are you trying to accomplish with it?

Using UNION select inside a view

I have a requirement to check if a specific user is already being referenced to one of our transaction tables (we have around 10 transaction tables). I suggested using a VIEW that will contain all the users that are already referenced, then the DEV team could just SELECT through that table to find out if the data they're looking for is there or not,
so here's my query for the view:
SELECT DISTINCT user_ID
FROM transaction_table_1
UNION
SELECT DISTINCT user_ID
FROM transaction_table_2
UNION
SELECT DISTINCT user_ID
FROM transaction_table_3
UNION
SELECT DISTINCT user_ID
FROM transaction_table_4
[...]
Right now it works, but my question is, is this a good idea? The requirement asks that I only provide a script (or a view) and not a stored procedure, I think this would be better with an SP since I could just do a quick IF EXIST() statement for each of the table and just check if the parameter user exists in any of the table, but they really wanted it to be only a script they could check (and no using of variables).
Can you guys give me advice on a better way of doing this requirement, that would have less impact on performance since this may not be the optimized solution for this requirement.
TIA,
Rommel
Well, you can remove the DISTINCT because UNION already makes it :)
SELECT user_ID
FROM transaction_table_1
UNION
SELECT user_ID
FROM transaction_table_2
UNION
SELECT user_ID
FROM transaction_table_3
UNION
SELECT user_ID
FROM transaction_table_4
But since you have to use a view, I don't see how to make it differently.
From a performance point of view I would structure the query slightly differently:
SELECT DISTINCT user_ID
FROM (
SELECT user_ID
FROM transaction_table_1
UNION ALL
SELECT user_ID
FROM transaction_table_2
UNION ALL
SELECT user_ID
FROM transaction_table_3
...
) x
This will reduce the number of unique index scans that need to be done to 1 - rather than having one each time a UNION is performed

Resources