SQL query like GROUP BY with OR condition - sql-server

I'll try to describe the real situation. In our company we have a reservation system with a table, let's call it Customers, where e-mail and phone contacts are saved with each incoming order - that's the part of a system I can't change. I'm facing the problem how to get count of unique customers. With the unique customer I mean group of people who has either the same e-mail or same phone number.
Example 1: From the real life you can imagine Tom and Sandra who are married. Tom, who ordered 4 products, filled in our reservation system 3 different e-mail addresses and 2 different phone numbers when one of them shares with Sandra (as a homephone) so I can presume they are connected somehow. Sandra except this shared phone number filled also her private one and for both orders she used only one e-mail address. For me this means to count all of the following rows as one unique customer. So in fact this unique customer may grow up into the whole family.
ID E-mail Phone Comment
---- ------------------- -------------- ------------------------------
0 tom#email.com +44 111 111 First row
1 tommy#email.com +44 111 111 Same phone, different e-mail
2 thomas#email.com +44 111 111 Same phone, different e-mail
3 thomas#email.com +44 222 222 Same e-mail, different phone
4 sandra#email.com +44 222 222 Same phone, different e-mail
5 sandra#email.com +44 333 333 Same e-mail, different phone
As ypercube said I will probably need a recursion to count all of these unique customers.
Example 2: Here is the example of what I want to do.Is it possible to get count of unique customers without using recursion for instance by using cursor or something or is the recursion necessary ?
ID E-mail Phone Comment
---- ------------------- -------------- ------------------------------
0 linsey#email.com +44 111 111 ─┐
1 louise#email.com +44 111 111 ├─ 1. unique customer
2 louise#email.com +44 222 222 ─┘
---- ------------------- -------------- ------------------------------
3 steven#email.com +44 333 333 ─┐
4 steven#email.com +44 444 444 ├─ 2. unique customer
5 sandra#email.com +44 444 444 ─┘
---- ------------------- -------------- ------------------------------
6 george#email.com +44 555 555 ─── 3. unique customer
---- ------------------- -------------- ------------------------------
7 xavier#email.com +44 666 666 ─┐
8 xavier#email.com +44 777 777 ├─ 4. unique customer
9 xavier#email.com +44 888 888 ─┘
---- ------------------- -------------- ------------------------------
10 robert#email.com +44 999 999 ─┐
11 miriam#email.com +44 999 999 ├─ 5. unique customer
12 sherry#email.com +44 999 999 ─┘
---- ------------------- -------------- ------------------------------
----------------------------------------------------------------------
Result ∑ = 5 unique customers
----------------------------------------------------------------------
I've tried a query with GROUP BY but I don't know how to group the result by either first or second column. I'm looking for let's say something like
SELECT COUNT(*) FROM Customers
GROUP BY Email OR Phone
Thanks again for any suggestions
P.S.
I really appreciate the answers for this question before the complete rephrase. Now the answers here may not correspond to the update so please don't downvote here if you're going to do it (except the question of course :). I completely rewrote this post.Thanks and sorry for my wrong start.

Here is a full solution using a recursive CTE.
;WITH Nodes AS
(
SELECT DENSE_RANK() OVER (ORDER BY Part, PartRank) SetId
, [ID]
FROM
(
SELECT [ID], 1 Part, DENSE_RANK() OVER (ORDER BY [E-mail]) PartRank
FROM dbo.Customer
UNION ALL
SELECT [ID], 2, DENSE_RANK() OVER (ORDER BY Phone) PartRank
FROM dbo.Customer
) A
),
Links AS
(
SELECT DISTINCT A.Id, B.Id LinkedId
FROM Nodes A
JOIN Nodes B ON B.SetId = A.SetId AND B.Id < A.Id
),
Routes AS
(
SELECT DISTINCT Id, Id LinkedId
FROM dbo.Customer
UNION ALL
SELECT DISTINCT Id, LinkedId
FROM Links
UNION ALL
SELECT A.Id, B.LinkedId
FROM Links A
JOIN Routes B ON B.Id = A.LinkedId AND B.LinkedId < A.Id
),
TransitiveClosure AS
(
SELECT Id, Id LinkedId
FROM Links
UNION
SELECT LinkedId Id, LinkedId
FROM Links
UNION
SELECT Id, LinkedId
FROM Routes
),
UniqueCustomers AS
(
SELECT Id, MIN(LinkedId) UniqueCustomerId
FROM TransitiveClosure
GROUP BY Id
)
SELECT A.Id, A.[E-mail], A.Phone, B.UniqueCustomerId
FROM dbo.Customer A
JOIN UniqueCustomers B ON B.Id = A.Id

Finding groups that have only same Phone:
SELECT
ID
, Name
, Phone
, DENSE_RANK() OVER (ORDER BY Phone) AS GroupPhone
FROM
MyTable
ORDER BY
GroupPhone
, ID
Finding groups that have only same Name:
SELECT
ID
, Name
, Phone
, DENSE_RANK() OVER (ORDER BY Name) AS GroupName
FROM
MyTable
ORDER BY
GroupName
, ID
Now, for the (complex) query you describe, let's say we have a table like this instead:
ID Name Phone
---- ------------- -------------
0 Kate +44 333 333
1 Sandra +44 000 000
2 Thomas +44 222 222
3 Robert +44 000 000
4 Thomas +44 444 444
5 George +44 222 222
6 Kate +44 000 000
7 Robert +44 444 444
--------------------------------
Should all these be in one group? As they all share name or phone with someone else, forming a "chain" of relative persons:
0-6 same name
6-1-3 same phone
3-7 same name
7-4 same-phone
4-2 same name
2-5 bame phone

For the dataset in the example you could write something like this:
;WITH Temp AS (
SELECT Name, Phone,
DENSE_RANK() OVER (ORDER BY Name) AS NameGroup,
DENSE_RANK() OVER (ORDER BY Phone) AS PhoneGroup
FROM MyTable)
SELECT MAX(Phone), MAX(Name), COUNT(*)
FROM Temp
GROUP BY NameGroup, PhoneGroup

I don't know if this is the best solution, but here it is:
SELECT
MyTable.ID, MyTable.Name, MyTable.Phone,
CASE WHEN N.No = 1 AND P.No = 1 THEN 1
WHEN N.No = 1 AND P.No > 1 THEN 2
WHEN N.No > 1 OR P.No > 1 THEN 3
END as GroupRes
FROM
MyTable
JOIN (SELECT Name, count(Name) No FROM MyTable GROUP BY Name) N on MyTable.Name = N.Name
JOIN (SELECT Phone, count(Phone) No FROM MyTable GROUP BY Phone) P on MyTable.Phone = P.Phone
The problem is that here are some joins made on varchars and could end up in increasing execution time.

Here is my solution:
SELECT p.LastName, P.FirstName, P.HomePhone,
CASE
WHEN ph.PhoneCount=1 THEN
CASE
WHEN n.NameCount=1 THEN 'unique name and phone'
ELSE 'common name'
END
ELSE
CASE
WHEN n.NameCount=1 THEN 'common phone'
ELSE 'common phone and name'
END
END
FROM Contacts p
INNER JOIN
(SELECT HomePhone, count(LastName) as PhoneCount
FROM Contacts
GROUP BY HomePhone) ph ON ph.HomePhone = p.HomePhone
INNER JOIN
(SELECT FirstName, count(LastName) as NameCount
FROM Contacts
GROUP BY FirstName) n ON n.FirstName = p.FirstName
LastN FirstN Phone Comment
Hoover Brenda 8138282334 unique name and phone
Washington Brian 9044563211 common name
Roosevelt Brian 7737653279 common name
Reagan Charles 7734567869 unique name and phone

Related

SQL Server select join detect if common column between two tables are different

I am trying to write a function to check between two tables which have a common column with the same name and ID values.
Table 1: CompanyRecords
CompanyRecordsID CompanyId CompanyName CompanyProcessID
-----------------------------------------------------------
1 222 Sears 123
2 333 JCPenny 456
Table 2: JointCompanies
JointCompaniesID CompanyId CompanyName ComanyProcessID
-----------------------------------------------------------
3 222 KMart 123
4 444 Walmart 001
They both use the same foreign key CompanyProcessID with value 123.
How do I write a select statement when it is passed the CompanyProcessID to tell if the CompanyId has changed for the same CompanyProcessId.
I assume it is a join between the two tables with WHERE CompanyProcessID
Thanks for any help.
Is this what you want?
select max(case when cr.name = jc.name then 0 else 1 end) as name_not_same
from CompanyRecords cr join
JointCompanies jc
on cr.ComanyProcessID = jc.ComanyProcessID
where cr.ComanyProcessID = ?

SQL query to get all the data from different tables with same id

Sorry if this is too elemental but I cannot work it out. Don’t know how to search information on it either:
I have three tables:
Provider
id_provider name
---------- -----------
100 John
101 Sam
102 Peter
Contact
id_contact RowNo Email
---------- ----------- ----------------
100 1 john#work.com
100 2 john#gmail.com
101 1 sam#work.com
101 2 sam#yahoo.com
Product
Id_product RowNo Product
---------- ----------- ------------------------
100 1 John’s 1st product
100 2 John’s 2nd product
101 1 Sam’s 1st product
101 2 Sam’s 2nd product
101 3 Sam’s 3rd product
I need a query to show all the data from the three tables like this:
Id name id_contact RowNo Email Id_Product RowNo Product
100 John 100 1 john#work.com 100 1 John’s 1st product
100 John 100 2 john#gmail.com 100 2 John’s 2st product
101 Sam 101 1 sam#work.com 101 1 Sam's 1st product
101 Sam 101 2 sam#yahoo.com 101 2 Sam's 2nd product
101 Sam null null null 101 3 Sam's 3rd product
102 Peter null null null null null null
I am trying all the joins I know but I cannot make it work.
Thanks a lot
You can use the following query:
SELECT t1.id_provider AS Id, t1.name,
t2.id_contact, t2.cRowNo, t2.Email,
t2.Id_product, t2.Product
FROM Provider AS t1
LEFT JOIN (
SELECT COALESCE(id_contact, id_product) AS id,
c.id_contact, c.RowNo AS cRowNo, c.Email,
p.Id_product, p.Product, p.RowNo AS pRowNo
FROM Contact AS c
FULL JOIN Product AS p ON p.id_product = c.id_contact AND p.RowNo = c.RowNo
) AS t2 ON t1.id_provider = t2.id
The query does a FULL JOIN between Contact and Product tables and joins the table derived from the FULL JOIN to Provider table.
A FULL JOIN is required because we cannot know beforehand which of the two tables, Contact or Product, contains the most rows for each id.
select *
from Provider P1
left join Contact C2
on C2.id_contact = P1.id_provider
left join Product P2
on P2.id_product = P1.id_provider
SELECT prov.*,
c.*,
prod.*
FROM PROVIDER prov
LEFT JOIN Product prod ON prod.id_product = prov.id_provider
LEFT JOIN Contact c ON prov.id_provider = c.id_contact
AND prod.RowNo = c.RowNo
use left joins but join provider to product first then to contact
SQL Fiddle Demo

Removing Duplicates of two columns in a query

I have a select * query which gives lots of row and lots of columns of results. I have an issue with duplicates of one column A when given the same value of another column B that I would like to only include one of.
Basically I have a column that tells me the "name" of object and another that tells me the "number". Sometimes I have an object "name" with more than one entry for a given object "number". I only want distinct "numbers" within a "name" but I want the query to give the entire table when this is true and not just these two columns.
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bob 3 443 76
This example above is fine
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bill 1 443 76
Bill 2 54 1856
This example above is fine
Name Number ColumnC ColumnD
Bob 1 93 12
Bob 2 432 546
Bob 2 209 17
This example above is not fine, I only want one of the Bob 2's.
Try it if you are using SQL 2005 or above:
With ranked_records AS
(
select *,
ROW_NUMBER() OVER(Partition By name, number Order By name) [ranked]
from MyTable
)
select * from ranked_records
where ranked = 1
If you just want the Name and number, then
SELECT DISTINCT Name, Number FROM Table1
If you want to know how many of each there are, then
SELECT Name, Number, COUNT(*) FROM Table1 GROUP BY Name, Number
By using a Common Table Expression (CTE) and the ROW_NUMBER OVER PARTION syntax as follows:
WITH
CTE AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Name, Number ORDER BY Name, Number) AS R
FROM
dbo.ATable
)
SELECT
*
FROM
CTE
WHERE
R = 1
WITH
CTE AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Plant, BatchNumber ORDER BY Plant, BatchNumber) AS R
FROM dbo.StatisticalReports WHERE dbo.StatisticalReports. \!"FermBatchStartTime\!" >= DATEADD(d,-90, getdate())
)
SELECT
*
FROM
CTE
WHERE
R = 1
ORDER BY dbo.StatisticalReports.Plant, dbo.StatisticalReports.FermBatchStartTime

MS SQL Query to display id of the maximum repeated columns

I have a table
tblDiseaseTrack
TrackID DiseaseID PostalCode
1 3 111
2 3 111
3 2 111
4 1 222
5 2 222
6 4 111
7 1 222
8 5 333
9 5 333
10 5 333
I want to write a query to display the disease id and the postal code of the maximum repeated DiseaseId for each postalcode as follows,
DiseaseID PostalCode
3 111
1 222
5 333
PLEASE any help would be much appreciated. i tried evrything and couldnt find any help..Thank you again :)
select diseaseid, postalcode
from
(
select
postalcode,
diseaseid,
row_number() over (partition by postalcode order by count desc) as row
from
(
select postalcode, count(postalcode) as count, diseaseid
from tblDiseaseTrack
group by postalcode, diseaseid
) as T1
) as T2
where row = 1
order by postalcode
This returns exactly what you wanted:
DiseaseID PostalCode
3 111
1 222
5 333
EDIT:
Same query like above, but with a JOIN to get the city name from a second table:
select diseaseid, T2.postalcode, city
from
(
select
postalcode,
diseaseid,
row_number() over (partition by postalcode order by count desc) as row
from
(
select postalcode, count(postalcode) as count, diseaseid
from #tblDiseaseTrack
group by postalcode, diseaseid
) as T1
) as T2
inner join tblcity on T2.postalcode = tblcity.postalcode
where row = 1
order by postalcode
Note that I used the same column name PostalCode in the city table as well, so I have to prefix the PostalCode in the outmost query: T2.postalcode instead of postalcode.
Otherwise the query would crash with the message:
Ambiguous column name 'postalcode'
...because SQL Server wouldn't know which of the two PostalCodes I want.

sql select from multiple records only the most recent

i have a table named customer_age that loks like this:
ID 1 2 3 4 5 6 7 8 9
NAME JIM JIM JIM NICK NICK NICK Paul Paul Paul
VALUE 20 13 12 10 20 8 4 24 14
and i want to display only the first record from each name. Something like this
ID 1 4 7
NAME JIM NICK Paul
VALUE 20 10 4
So far i have not been able to work it out.
i use sql server 2005
Any help would be appreciated...
Try using a subselect to find the lowest ID for each name, and use that set of IDs to pull the records from the main table:
SELECT ID, Name, Value
FROM customer_age
WHERE ID IN
(
SELECT MIN(ID) AS ID
FROM customer_age
GROUP BY Name
)
Just select the first record for each name using cross apply:
SELECT
ca.ID, ca.NAME, ca.VALUE
FROM customer_age c
CROSS APPLY (SELECT TOP 1 ID, NAME, VALUE
FROM customer_age ca
WHERE ca.NAME = c.NAME ORDER BY ID) ca
ORDER BY ca.ID
How about using window functions??
SELECT Id, Name, Value
FROM (
SELECT Id, Name, Value, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY Id ASC) AS rowNum
FROM customer_age
) AS sub
WHERE rowNum = 1
Assuming first record means highest ID, you may try your query with descending orderby ID and TOP n.

Resources