SQL server duplicate records issue

SQL server duplicate records issue - sql-server

I have two source tables CustP and CustS (this table is only to store email id, this is how current design is)
CustP:
CustID Fname Lname Phone
------------------------------------------
100 John Doe 1234567890
200 John Doe 1234567890
300 John Doe NULL
CustS:
CustID Fname Lname Email
--------------------------------------------
100 John Doe NULL
200 John Doe a#a.com
300 John Doe a#a.com
I would like to identify duplicate records from above 2 tables based on below criteria :
if (FName, Lname and phone matches) OR (FName, Lname and email matches)
Below are the steps I am using to generate the duplicate result set
drop table #AllCustomer
select
cp.CustID, cp.Fname, cp.Lname, cp.Phone, cs.Email,
ROW_NUMBER() over (order by cp.Fname) RN
into
#AllCustomer
from
CustP cp
inner join
CustS cs on cp.CustID = cs.CustID
--Combining the customer and matched customer into a temp table
Select
A.CustID CustID, B.CustID MatchedCustID,
A.Fname FirstName,
A.Lname SurName,
A.Phone Phone,
A.Email Email,
B.Fname MatchedFirstName,
B.Lname MatchedSurName,
B.Phone MatchedPhone,
B.Email MatchedEmail
into
#AllMatchedCustomers
from
#AllCustomer A
inner join
#AllCustomer B on (A.Fname = B.Fname
and A.Lname = B.Lname
and A.CustID <> B.CustID
and A.RN < B.RN)
where
A.CustID <> B.CustID
and (((1 = case
when isnull(A.Phone, 1) in (isnull(B.Phone, 2))
then 1
else 0
end))
or
(A.Fname = B.Fname and A.Lname = B.Lname and
isnull(A.Email, 'A') = isnull(B.Email, 'B'))
)
And the result is as shown below
CustID MatchedCustID FirstName SurName Phone Email MatchedFirstName MatchedSurName MatchedPhone MatchedEmail
100 200 John Doe 1234567890 NULL John Doe 1234567890 a#a.com
200 300 John Doe 1234567890 a#a.com John Doe NULL a#a.com
I need help in identifying if Cust100 is matched with Cust200 with phone number and cust200 Is related to cust300 via email , then result should include one more additional row showing Cust100 and cust300(because a=b, b=c therefore a=c). (here it would be 100 and 300 in the 3rd row of result set). Or what are the alternative approaches? Appreciate your help, thanks in advance.

Related

SQL Pivot / Case Query based on Row Value

Problem
Using SQL Server, I'm trying to pivot data based on values in a column. I want to move Bob and John's value column over if Salary is in the metric column.
Sample data:
Person table
Person ID
-------------
Bob 1
Bob 1
John 2
John 2
Value table
Metric Value ID
---------------------
Age 52 1
Salary 60000 1
Age 45 2
Salary 55000 2
Expected output
My goal is to pivot the table if salary is present in the Metric column.
Person Metric Value Salary ID
---------------------------------------
Bob Age 52 60000 1
John Age 45 55000 2
Current code:
SELECT *
FROM person_table pt, value_table vb
WHERE pt.id = vb.id
AND vb.metric IN ('Age', 'Salary')

Use the following pivot query:
SELECT
pt.Person,
'Age' AS Metric,
MAX(CASE WHEN vb.Metric = 'Age' THEN vb.Value END) AS Value,
MAX(CASE WHEN vb.Metric = 'Salary' THEN vb.Value END) AS Salary,
pt.ID
FROM person_table pt
INNER JOIN value_table vb
ON pt.id = vb.id
GROUP BY
pt.Person,
pt.ID
ORDER BY
pt.ID;

How to use joins and make Normalised data to denormalised data

I have a database with the following tables :
UserTable :
-----UserID (int) PK,
-----UserName (Varchar) null
PhoneNumber :
------PhoID (int) PK,
------UserID(int) FK to UserTable
------PhoneNumber (Varchar) null
Address
------AddressID (int) PK,
------UserID(int) FK to UserTable
------AddressName (Varchar) null.
The users can have many phone numbers and many addresses. It is a one to many relationship.
I have the following data in these tables.
UserID UserName
1 Bridgerton
2 Merlin
3 Victoria
PhoID UserID PhoneNumber
1 1 phone1
2 1 phone2
3 1 phone3
4 2 Phone21
5 2 9909909900
AddressID UserID AddressName
1 1 Chennai
2 1 Gurgaon
3 2 Hyderabad
4 2 Mumbai
5 2 Gurgaon
Now I need the following result, want to see the userdetails, his phone number and the list of addressess in one go.
I wrote the following query,
select UserName, PhoneNumber, AddressName
from dbo.UserTable a
left outer join dbo.PhoneNumber b
on a.UserID=b.UserID
left outer join dbo.Address c
on a.UserID = c.UserID
Results
UserName PhoneNumber AddressName
Bridgerton phone1 Chennai
Bridgerton phone1 Gurgaon
Bridgerton phone2 Chennai
Bridgerton phone2 Gurgaon
Bridgerton phone3 Chennai
Bridgerton phone3 Gurgaon
Merlin phone21 Hyderabad
Merlin phone21 Mumbai
Merlin phone21 Gurgaon
Merlin 9909909900 Hyderabad
Merlin 9909909900 Mumbai
Merlin 9909909900 Gurgaon
Victoria NULL NULL
I know its messed up but need to see the username, userphone numbers and useraddress for a particular userid like this in just one record.
userid username Phonenumber Address
1 Bridgerton phone1, phone2, phone3 Gurgaon, Hyderabad

One option is to use UNION ALL to normalize your data then then perform a simple conditional aggregation/string_agg()
Example
;with cte as (
Select UserID
,Seq = UserID
,Col = 'UserName'
,Val = UserName
From UserTable
Union All
Select UserID
,Seq = PhoID
,Col = 'Phone'
,Val = PhoneNumber
From PhoneNumber
Union All
Select UserID
,Seq = AddressID
,Col = 'Address'
,Val = [AddressName]
From Address
)
Select UserID
,UserName = max(case when col='UserName' then Val end)
,PhoneNumber = string_agg( case when col='Phone' then Val end,',') within group ( order by seq)
,Address = string_agg( case when col='Address' then Val end,',') within group ( order by seq)
From cte
Group By UserID
Results
UserID UserName PhoneNumber Address
1 Bridgerton phone1,phone2,phone3 Chennai,Gurgaon
2 Merlin Phone21,9909909900 Hyderabad,Mumbai,Gurgaon
3 Victoria NULL NULL

SQL query to get all the data from different tables with same id

Sorry if this is too elemental but I cannot work it out. Don’t know how to search information on it either:
I have three tables:
Provider
id_provider name
---------- -----------
100 John
101 Sam
102 Peter
Contact
id_contact RowNo Email
---------- ----------- ----------------
100 1 john#work.com
100 2 john#gmail.com
101 1 sam#work.com
101 2 sam#yahoo.com
Product
Id_product RowNo Product
---------- ----------- ------------------------
100 1 John’s 1st product
100 2 John’s 2nd product
101 1 Sam’s 1st product
101 2 Sam’s 2nd product
101 3 Sam’s 3rd product
I need a query to show all the data from the three tables like this:
Id name id_contact RowNo Email Id_Product RowNo Product
100 John 100 1 john#work.com 100 1 John’s 1st product
100 John 100 2 john#gmail.com 100 2 John’s 2st product
101 Sam 101 1 sam#work.com 101 1 Sam's 1st product
101 Sam 101 2 sam#yahoo.com 101 2 Sam's 2nd product
101 Sam null null null 101 3 Sam's 3rd product
102 Peter null null null null null null
I am trying all the joins I know but I cannot make it work.
Thanks a lot

You can use the following query:
SELECT t1.id_provider AS Id, t1.name,
t2.id_contact, t2.cRowNo, t2.Email,
t2.Id_product, t2.Product
FROM Provider AS t1
LEFT JOIN (
SELECT COALESCE(id_contact, id_product) AS id,
c.id_contact, c.RowNo AS cRowNo, c.Email,
p.Id_product, p.Product, p.RowNo AS pRowNo
FROM Contact AS c
FULL JOIN Product AS p ON p.id_product = c.id_contact AND p.RowNo = c.RowNo
) AS t2 ON t1.id_provider = t2.id
The query does a FULL JOIN between Contact and Product tables and joins the table derived from the FULL JOIN to Provider table.
A FULL JOIN is required because we cannot know beforehand which of the two tables, Contact or Product, contains the most rows for each id.

select *
from Provider P1
left join Contact C2
on C2.id_contact = P1.id_provider
left join Product P2
on P2.id_product = P1.id_provider

SELECT prov.*,
c.*,
prod.*
FROM PROVIDER prov
LEFT JOIN Product prod ON prod.id_product = prov.id_provider
LEFT JOIN Contact c ON prov.id_provider = c.id_contact
AND prod.RowNo = c.RowNo
use left joins but join provider to product first then to contact
SQL Fiddle Demo

T-SQL Dynamic Query and pivot

I have data where people have changed role mid month and want to count the activity after their new start date. Can I use the results of a table as a dynamic Query, I have a query which returns the following resultset:-
Firstname Surname StartDate
----------------------------------
Jon Smith 2015-01-01
Paul Jones 2014-07-23
...
So the query would look something like:
SELECT Firstname +' '+ surname, month, count(1) FROM dataTable
WHERE (Firstname='John' AND Surname='Smith' AND date >=2015-01-01)
OR (Firstname='Paul' AND Surname='Jones' AND date >=2014-07-23)
OR ...
but the number of 'ORs' would depend on the number of rows in the first table
Name Month Count
----------------------------------
Jon Smith 1 15
Paul Jones 1 16
Jon Smith 2 30
Paul Jones 2 25
Charlie Gu 1 52
Which I can then pivot to get
Name 1 2
--------------------------
Jon Smith 15 30
Paul Jones 16 25
Charlie Gu 52 NULL
Thanks in advance

It seems to me that Ako is right and a simple join should do the trick rather than a dynamic query.
declare #NewStartDates table
(
Firstname nvarchar(100),
Surname nvarchar(100),
StartDate date
);
insert into #NewStartDates
(Firstname, Surname, StartDate)
values (N'Jon', N'Smith', '20150101'),
(N'Paul', N'Jones', '20140723');
select d.Firstname,
d.Surname,
year(d.Date) * 100 + month(d.Date) as Period,
count(*) as ActivityCount
from dataTable as d
inner join #NewStartDates as n
on d.Firstname = n.Firstname
and d.Surname = n.Surname
and d.Date >= n.StartDate
group by d.Firstname,
d.Surname,
year(d.Date) * 100 + month(d.Date);

Please refer this - it will give you complete idea how you can get dynamic column query.Dynamic Column Generation - Pivot [SQL]

SQL query like GROUP BY with OR condition

I'll try to describe the real situation. In our company we have a reservation system with a table, let's call it Customers, where e-mail and phone contacts are saved with each incoming order - that's the part of a system I can't change. I'm facing the problem how to get count of unique customers. With the unique customer I mean group of people who has either the same e-mail or same phone number.
Example 1: From the real life you can imagine Tom and Sandra who are married. Tom, who ordered 4 products, filled in our reservation system 3 different e-mail addresses and 2 different phone numbers when one of them shares with Sandra (as a homephone) so I can presume they are connected somehow. Sandra except this shared phone number filled also her private one and for both orders she used only one e-mail address. For me this means to count all of the following rows as one unique customer. So in fact this unique customer may grow up into the whole family.
ID E-mail Phone Comment
---- ------------------- -------------- ------------------------------
0 tom#email.com +44 111 111 First row
1 tommy#email.com +44 111 111 Same phone, different e-mail
2 thomas#email.com +44 111 111 Same phone, different e-mail
3 thomas#email.com +44 222 222 Same e-mail, different phone
4 sandra#email.com +44 222 222 Same phone, different e-mail
5 sandra#email.com +44 333 333 Same e-mail, different phone
As ypercube said I will probably need a recursion to count all of these unique customers.
Example 2: Here is the example of what I want to do.Is it possible to get count of unique customers without using recursion for instance by using cursor or something or is the recursion necessary ?
ID E-mail Phone Comment
---- ------------------- -------------- ------------------------------
0 linsey#email.com +44 111 111 ─┐
1 louise#email.com +44 111 111 ├─ 1. unique customer
2 louise#email.com +44 222 222 ─┘
---- ------------------- -------------- ------------------------------
3 steven#email.com +44 333 333 ─┐
4 steven#email.com +44 444 444 ├─ 2. unique customer
5 sandra#email.com +44 444 444 ─┘
---- ------------------- -------------- ------------------------------
6 george#email.com +44 555 555 ─── 3. unique customer
---- ------------------- -------------- ------------------------------
7 xavier#email.com +44 666 666 ─┐
8 xavier#email.com +44 777 777 ├─ 4. unique customer
9 xavier#email.com +44 888 888 ─┘
---- ------------------- -------------- ------------------------------
10 robert#email.com +44 999 999 ─┐
11 miriam#email.com +44 999 999 ├─ 5. unique customer
12 sherry#email.com +44 999 999 ─┘
---- ------------------- -------------- ------------------------------
----------------------------------------------------------------------
Result ∑ = 5 unique customers
----------------------------------------------------------------------
I've tried a query with GROUP BY but I don't know how to group the result by either first or second column. I'm looking for let's say something like
SELECT COUNT(*) FROM Customers
GROUP BY Email OR Phone
Thanks again for any suggestions
P.S.
I really appreciate the answers for this question before the complete rephrase. Now the answers here may not correspond to the update so please don't downvote here if you're going to do it (except the question of course :). I completely rewrote this post.Thanks and sorry for my wrong start.

Here is a full solution using a recursive CTE.
;WITH Nodes AS
(
SELECT DENSE_RANK() OVER (ORDER BY Part, PartRank) SetId
, [ID]
FROM
(
SELECT [ID], 1 Part, DENSE_RANK() OVER (ORDER BY [E-mail]) PartRank
FROM dbo.Customer
UNION ALL
SELECT [ID], 2, DENSE_RANK() OVER (ORDER BY Phone) PartRank
FROM dbo.Customer
) A
),
Links AS
(
SELECT DISTINCT A.Id, B.Id LinkedId
FROM Nodes A
JOIN Nodes B ON B.SetId = A.SetId AND B.Id < A.Id
),
Routes AS
(
SELECT DISTINCT Id, Id LinkedId
FROM dbo.Customer
UNION ALL
SELECT DISTINCT Id, LinkedId
FROM Links
UNION ALL
SELECT A.Id, B.LinkedId
FROM Links A
JOIN Routes B ON B.Id = A.LinkedId AND B.LinkedId < A.Id
),
TransitiveClosure AS
(
SELECT Id, Id LinkedId
FROM Links
UNION
SELECT LinkedId Id, LinkedId
FROM Links
UNION
SELECT Id, LinkedId
FROM Routes
),
UniqueCustomers AS
(
SELECT Id, MIN(LinkedId) UniqueCustomerId
FROM TransitiveClosure
GROUP BY Id
)
SELECT A.Id, A.[E-mail], A.Phone, B.UniqueCustomerId
FROM dbo.Customer A
JOIN UniqueCustomers B ON B.Id = A.Id

Finding groups that have only same Phone:
SELECT
ID
, Name
, Phone
, DENSE_RANK() OVER (ORDER BY Phone) AS GroupPhone
FROM
MyTable
ORDER BY
GroupPhone
, ID
Finding groups that have only same Name:
SELECT
ID
, Name
, Phone
, DENSE_RANK() OVER (ORDER BY Name) AS GroupName
FROM
MyTable
ORDER BY
GroupName
, ID
Now, for the (complex) query you describe, let's say we have a table like this instead:
ID Name Phone
---- ------------- -------------
0 Kate +44 333 333
1 Sandra +44 000 000
2 Thomas +44 222 222
3 Robert +44 000 000
4 Thomas +44 444 444
5 George +44 222 222
6 Kate +44 000 000
7 Robert +44 444 444
--------------------------------
Should all these be in one group? As they all share name or phone with someone else, forming a "chain" of relative persons:
0-6 same name
6-1-3 same phone
3-7 same name
7-4 same-phone
4-2 same name
2-5 bame phone

For the dataset in the example you could write something like this:
;WITH Temp AS (
SELECT Name, Phone,
DENSE_RANK() OVER (ORDER BY Name) AS NameGroup,
DENSE_RANK() OVER (ORDER BY Phone) AS PhoneGroup
FROM MyTable)
SELECT MAX(Phone), MAX(Name), COUNT(*)
FROM Temp
GROUP BY NameGroup, PhoneGroup

I don't know if this is the best solution, but here it is:
SELECT
MyTable.ID, MyTable.Name, MyTable.Phone,
CASE WHEN N.No = 1 AND P.No = 1 THEN 1
WHEN N.No = 1 AND P.No > 1 THEN 2
WHEN N.No > 1 OR P.No > 1 THEN 3
END as GroupRes
FROM
MyTable
JOIN (SELECT Name, count(Name) No FROM MyTable GROUP BY Name) N on MyTable.Name = N.Name
JOIN (SELECT Phone, count(Phone) No FROM MyTable GROUP BY Phone) P on MyTable.Phone = P.Phone
The problem is that here are some joins made on varchars and could end up in increasing execution time.

Here is my solution:
SELECT p.LastName, P.FirstName, P.HomePhone,
CASE
WHEN ph.PhoneCount=1 THEN
CASE
WHEN n.NameCount=1 THEN 'unique name and phone'
ELSE 'common name'
END
ELSE
CASE
WHEN n.NameCount=1 THEN 'common phone'
ELSE 'common phone and name'
END
END
FROM Contacts p
INNER JOIN
(SELECT HomePhone, count(LastName) as PhoneCount
FROM Contacts
GROUP BY HomePhone) ph ON ph.HomePhone = p.HomePhone
INNER JOIN
(SELECT FirstName, count(LastName) as NameCount
FROM Contacts
GROUP BY FirstName) n ON n.FirstName = p.FirstName
LastN FirstN Phone Comment
Hoover Brenda 8138282334 unique name and phone
Washington Brian 9044563211 common name
Roosevelt Brian 7737653279 common name
Reagan Charles 7734567869 unique name and phone

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL server duplicate records issue - sql-server

Related

SQL Pivot / Case Query based on Row Value

How to use joins and make Normalised data to denormalised data

SQL query to get all the data from different tables with same id

T-SQL Dynamic Query and pivot

SQL query like GROUP BY with OR condition

Categories

Resources