How to filter groups? - sql-server

I have a table like this:
Address AccountName AccountId
-------------------------------------------------------
10007 Cougar Country Smith 107
90026 Hunters Pond Scott 106
10008 Indigo Run Mary 108
70023 Kopplin Road John 102
70023 Kopplin Road John 103
70023 Kopplin Road Peter 104
70023 Kopplin Road Steve 105
70018 Oaks Drive Joe 100
70018 Oaks Drive Lisa 101
This is a result of joining two tables actually
with OrderBy Address. Table has records where
Address and/or AccountName columns
can have identical values among multiple rows,
while AccountId column would always be different.
How to get record groups where:
a) Address is same and AccountName is different
b) Address is same and AccountName is same
Also number of records in a group > 1.
I need all fields from table.
Here is the output I need to have:
a) Address is same and AccountName is different:
Address AccountName AccountId
-------------------------------------------------------
70023 Kopplin Road Peter 104
70023 Kopplin Road Steve 105
70018 Oaks Drive Joe 100
70018 Oaks Drive Lisa 101
b) Address is same and AccountName is same:
Address AccountName AccountId
-------------------------------------------------------
70023 Kopplin Road John 102
70023 Kopplin Road John 103
Thanks a lot

Perhaps something like this? It's certainly possible this could be improved if we knew more about your base query.
with data as (
<yourQuery>
)
select
case when count(*) over (partition by Address, AccountName) > 1
then 'Same Address and AccountName' else 'Same Address only'
end as Tag,
from data
where Address in (
select Address
from data
group by Address
having count(*) > 1
)
order by Tag, Address, AccountName
According to a comment on another answer it appears you may also want to handle repeated account names with different addresses. The question as posted describes something different. If you need this requirement to be incorporated into an answer you'll need to update the question accordingly.
Edit per your comment
with data as (
<yourQuery>
), dups as (
select *,
case when count(*) over (partition by Address, AccountName) > 1
then 'Same Address and AccountName' else 'Same Address only'
end as Tag,
from data
where Address in (
select Address
from data
group by Address
having count(*) > 1
)
)
select * from dups
where Tag = 'Same Address and AccountName' -- or 'Same Address only'
order by Tag, Address, AccountName

It still isn't totally clear what you want but this should get you close.
with SortedResults as
(
select Address
, AccountName
, AccountId
, ROW_NUMBER() over (partition by Address, AccountName order by AccountId) as RowNum
from SomeTable
)
select *
from SortedResults
where RowNum > 1

Declare #YourTable table (Address varchar(150),AccountName varchar(50),AccountID varchar(50))
Insert Into #YourTable values
('10007 Cougar Country','Smith','107'),
('90026 Hunters Pond' ,'Scott','106'),
('10008 Indigo Run' ,'Mary' ,'108'),
('70023 Kopplin Road' ,'John' ,'102'),
('70023 Kopplin Road' ,'John' ,'103'),
('70023 Kopplin Road' ,'Peter','104'),
('70023 Kopplin Road' ,'Steve','105'),
('70018 Oaks Drive' ,'Joe' ,'100'),
('70018 Oaks Drive' ,'Lisa' ,'101')
;with cteBase as (Select *,RowNr=Row_Number() over (Partition By Address,AccountName Order by AccountID) from #YourTable)
,cteLvl1 as (Select Address,GrpLvl1=IIF(count(*)>1,1,0) From cteBase Group by Address)
,cteLvl2 as (Select Address,AccountName,GrpLvl2=IIF(max(RowNr)>1,1,0) From cteBase Group by Address,AccountName)
Select Class=IIF(GrpLvl1+GrpLvl2=0,'None',IIF(GrpLvl1+GrpLvl2=1,'Group','Sub-Group'))
,A.Address
,A.AccountName
,A.AccountID
From cteBase A
Join cteLvl1 B on (A.Address=B.Address)
Join cteLvl2 C on (A.Address=B.Address and A.AccountName=C.AccountName)
Where GrpLvl1+GrpLvl2>=1
Order By GrpLvl1+GrpLvl2,2,3,4
Returns
Class Address AccountName AccountID
Group 70018 Oaks Drive Joe 100
Group 70018 Oaks Drive Lisa 101
Group 70023 Kopplin Road Peter 104
Group 70023 Kopplin Road Steve 105
Sub-Group 70023 Kopplin Road John 102
Sub-Group 70023 Kopplin Road John 103

Related

Snowflake Unable to get the hierarchy columns values

I have below table from which i am trying to get the correspondent manager name against the employee
TITLE EMPLOYEE_ID MANAGER_ID
President 1
Vice President Engineering 10 1
Programmer 100 10
QA Engineer 101 10
Vice President HR 20 1
Health Insurance Analyst 200 20
i used below hierarchy query to get the result
select employee_id, manager_id, title, prior report_title
from employees
start with title = 'President'
connect by
manager_id = prior employee_id
order by employee_id;
But result not returning as i expected
Expected:
EMPLOYEE_ID MANAGER_ID title report_title
10 1 Vice President Engineering President
Can anyone help on this?
If you use a recursive CTE, you can get the output you want.
with emptree as
(select employee_id, title, manager_id, null as report_title
from employees
where manager_id is null
union all
select employees.employee_id, employees.title, employees.manager_id, emptree.title
from employees
join emptree
on employees.manager_id = emptree.employee_id
)
select employee_id, manager_id, title, report_title
from emptree
order by employee_id;
EMPLOYEE_ID
MANAGER_ID
TITLE
REPORT_TITLE
1
NULL
President
10
1
Vice President Engineering
President
20
1
Vice President HR
President
100
10
Programmer
Vice President Engineering
101
10
QA Engineer
Vice President Engineering
200
20
Health Insurance Analyst
Vice President HR

SQL server duplicate records issue

I have two source tables CustP and CustS (this table is only to store email id, this is how current design is)
CustP:
CustID Fname Lname Phone
------------------------------------------
100 John Doe 1234567890
200 John Doe 1234567890
300 John Doe NULL
CustS:
CustID Fname Lname Email
--------------------------------------------
100 John Doe NULL
200 John Doe a#a.com
300 John Doe a#a.com
I would like to identify duplicate records from above 2 tables based on below criteria :
if (FName, Lname and phone matches) OR (FName, Lname and email matches)
Below are the steps I am using to generate the duplicate result set
drop table #AllCustomer
select
cp.CustID, cp.Fname, cp.Lname, cp.Phone, cs.Email,
ROW_NUMBER() over (order by cp.Fname) RN
into
#AllCustomer
from
CustP cp
inner join
CustS cs on cp.CustID = cs.CustID
--Combining the customer and matched customer into a temp table
Select
A.CustID CustID, B.CustID MatchedCustID,
A.Fname FirstName,
A.Lname SurName,
A.Phone Phone,
A.Email Email,
B.Fname MatchedFirstName,
B.Lname MatchedSurName,
B.Phone MatchedPhone,
B.Email MatchedEmail
into
#AllMatchedCustomers
from
#AllCustomer A
inner join
#AllCustomer B on (A.Fname = B.Fname
and A.Lname = B.Lname
and A.CustID <> B.CustID
and A.RN < B.RN)
where
A.CustID <> B.CustID
and (((1 = case
when isnull(A.Phone, 1) in (isnull(B.Phone, 2))
then 1
else 0
end))
or
(A.Fname = B.Fname and A.Lname = B.Lname and
isnull(A.Email, 'A') = isnull(B.Email, 'B'))
)
And the result is as shown below
CustID MatchedCustID FirstName SurName Phone Email MatchedFirstName MatchedSurName MatchedPhone MatchedEmail
100 200 John Doe 1234567890 NULL John Doe 1234567890 a#a.com
200 300 John Doe 1234567890 a#a.com John Doe NULL a#a.com
I need help in identifying if Cust100 is matched with Cust200 with phone number and cust200 Is related to cust300 via email , then result should include one more additional row showing Cust100 and cust300(because a=b, b=c therefore a=c). (here it would be 100 and 300 in the 3rd row of result set). Or what are the alternative approaches? Appreciate your help, thanks in advance.

SQL query to calculate Throughput based "subtracting" two Select statements using Group By

I'm trying to formulate a SQL query to calculate the difference in the number of people "arriving" and "departing" grouped by City and Date.
TravelerID ArrivalDate DepartureDate City
1 2015-10-01 2015-10-03 New York
2 2015-10-02 2015-10-03 New York
3 2015-10-02 2015-10-04 Chicago
4 2015-10-01 2015-10-02 Chicago
I'm hoping to get a table that looks like
NumOfTravelers Date City
1 2015-10-01 New York
1 2015-10-02 New York
-2 2015-10-03 New York
1 2015-10-01 Chicago
0 2015-10-02 Chicago
-1 2015-10-04 Chicago
A positive number for NumOfTravelers means that more people arrived in that city on that particular date. A negative number for NumOfTravelers means that more people left that city on that particular date.
In trying to break down this SQL query, I've tried
SELECT COUNT(TravelerID) as NumTravelersArrivng, ArrivalDate, City FROM TravelTable GROUP BY ArrivalDate, City;
SELECT COUNT(TravelerID) as NumTravelersDeparting, DepartureDate, City FROM TravelTable GROUP BY DepartureDate, City;
I'm trying to get "NumTravelersArriving" - "NumTravelersDeparting" into a column that represents "traveler throughput" grouped by City and Date.
I've been so stumped on this. I'm using SQL Server, and having a frustrating time using Table aliases and Column aliases.
Try this:
SELECT *
FROM (
SELECT City, ArrivalDate As Date, COUNT(TravelerID) As NumOfTravelers
FROM TravelTable
GROUP BY City, ArrivalDate
) a
FULL JOIN (
SELECT City, DepartureDate As Date, COUNT(TravelerID) * -1 As NumOfTravelers
FROM TravelTable
GROUP BY City, DepartureDate
) b ON b.City = a.City AND b.Date = a.Date

T-SQL Dynamic Query and pivot

I have data where people have changed role mid month and want to count the activity after their new start date. Can I use the results of a table as a dynamic Query, I have a query which returns the following resultset:-
Firstname Surname StartDate
----------------------------------
Jon Smith 2015-01-01
Paul Jones 2014-07-23
...
So the query would look something like:
SELECT Firstname +' '+ surname, month, count(1) FROM dataTable
WHERE (Firstname='John' AND Surname='Smith' AND date >=2015-01-01)
OR (Firstname='Paul' AND Surname='Jones' AND date >=2014-07-23)
OR ...
but the number of 'ORs' would depend on the number of rows in the first table
Name Month Count
----------------------------------
Jon Smith 1 15
Paul Jones 1 16
Jon Smith 2 30
Paul Jones 2 25
Charlie Gu 1 52
Which I can then pivot to get
Name 1 2
--------------------------
Jon Smith 15 30
Paul Jones 16 25
Charlie Gu 52 NULL
Thanks in advance
It seems to me that Ako is right and a simple join should do the trick rather than a dynamic query.
declare #NewStartDates table
(
Firstname nvarchar(100),
Surname nvarchar(100),
StartDate date
);
insert into #NewStartDates
(Firstname, Surname, StartDate)
values (N'Jon', N'Smith', '20150101'),
(N'Paul', N'Jones', '20140723');
select d.Firstname,
d.Surname,
year(d.Date) * 100 + month(d.Date) as Period,
count(*) as ActivityCount
from dataTable as d
inner join #NewStartDates as n
on d.Firstname = n.Firstname
and d.Surname = n.Surname
and d.Date >= n.StartDate
group by d.Firstname,
d.Surname,
year(d.Date) * 100 + month(d.Date);
Please refer this - it will give you complete idea how you can get dynamic column query.Dynamic Column Generation - Pivot [SQL]

SQL query like GROUP BY with OR condition

I'll try to describe the real situation. In our company we have a reservation system with a table, let's call it Customers, where e-mail and phone contacts are saved with each incoming order - that's the part of a system I can't change. I'm facing the problem how to get count of unique customers. With the unique customer I mean group of people who has either the same e-mail or same phone number.
Example 1: From the real life you can imagine Tom and Sandra who are married. Tom, who ordered 4 products, filled in our reservation system 3 different e-mail addresses and 2 different phone numbers when one of them shares with Sandra (as a homephone) so I can presume they are connected somehow. Sandra except this shared phone number filled also her private one and for both orders she used only one e-mail address. For me this means to count all of the following rows as one unique customer. So in fact this unique customer may grow up into the whole family.
ID E-mail Phone Comment
---- ------------------- -------------- ------------------------------
0 tom#email.com +44 111 111 First row
1 tommy#email.com +44 111 111 Same phone, different e-mail
2 thomas#email.com +44 111 111 Same phone, different e-mail
3 thomas#email.com +44 222 222 Same e-mail, different phone
4 sandra#email.com +44 222 222 Same phone, different e-mail
5 sandra#email.com +44 333 333 Same e-mail, different phone
As ypercube said I will probably need a recursion to count all of these unique customers.
Example 2: Here is the example of what I want to do.Is it possible to get count of unique customers without using recursion for instance by using cursor or something or is the recursion necessary ?
ID E-mail Phone Comment
---- ------------------- -------------- ------------------------------
0 linsey#email.com +44 111 111 ─┐
1 louise#email.com +44 111 111 ├─ 1. unique customer
2 louise#email.com +44 222 222 ─┘
---- ------------------- -------------- ------------------------------
3 steven#email.com +44 333 333 ─┐
4 steven#email.com +44 444 444 ├─ 2. unique customer
5 sandra#email.com +44 444 444 ─┘
---- ------------------- -------------- ------------------------------
6 george#email.com +44 555 555 ─── 3. unique customer
---- ------------------- -------------- ------------------------------
7 xavier#email.com +44 666 666 ─┐
8 xavier#email.com +44 777 777 ├─ 4. unique customer
9 xavier#email.com +44 888 888 ─┘
---- ------------------- -------------- ------------------------------
10 robert#email.com +44 999 999 ─┐
11 miriam#email.com +44 999 999 ├─ 5. unique customer
12 sherry#email.com +44 999 999 ─┘
---- ------------------- -------------- ------------------------------
----------------------------------------------------------------------
Result ∑ = 5 unique customers
----------------------------------------------------------------------
I've tried a query with GROUP BY but I don't know how to group the result by either first or second column. I'm looking for let's say something like
SELECT COUNT(*) FROM Customers
GROUP BY Email OR Phone
Thanks again for any suggestions
P.S.
I really appreciate the answers for this question before the complete rephrase. Now the answers here may not correspond to the update so please don't downvote here if you're going to do it (except the question of course :). I completely rewrote this post.Thanks and sorry for my wrong start.
Here is a full solution using a recursive CTE.
;WITH Nodes AS
(
SELECT DENSE_RANK() OVER (ORDER BY Part, PartRank) SetId
, [ID]
FROM
(
SELECT [ID], 1 Part, DENSE_RANK() OVER (ORDER BY [E-mail]) PartRank
FROM dbo.Customer
UNION ALL
SELECT [ID], 2, DENSE_RANK() OVER (ORDER BY Phone) PartRank
FROM dbo.Customer
) A
),
Links AS
(
SELECT DISTINCT A.Id, B.Id LinkedId
FROM Nodes A
JOIN Nodes B ON B.SetId = A.SetId AND B.Id < A.Id
),
Routes AS
(
SELECT DISTINCT Id, Id LinkedId
FROM dbo.Customer
UNION ALL
SELECT DISTINCT Id, LinkedId
FROM Links
UNION ALL
SELECT A.Id, B.LinkedId
FROM Links A
JOIN Routes B ON B.Id = A.LinkedId AND B.LinkedId < A.Id
),
TransitiveClosure AS
(
SELECT Id, Id LinkedId
FROM Links
UNION
SELECT LinkedId Id, LinkedId
FROM Links
UNION
SELECT Id, LinkedId
FROM Routes
),
UniqueCustomers AS
(
SELECT Id, MIN(LinkedId) UniqueCustomerId
FROM TransitiveClosure
GROUP BY Id
)
SELECT A.Id, A.[E-mail], A.Phone, B.UniqueCustomerId
FROM dbo.Customer A
JOIN UniqueCustomers B ON B.Id = A.Id
Finding groups that have only same Phone:
SELECT
ID
, Name
, Phone
, DENSE_RANK() OVER (ORDER BY Phone) AS GroupPhone
FROM
MyTable
ORDER BY
GroupPhone
, ID
Finding groups that have only same Name:
SELECT
ID
, Name
, Phone
, DENSE_RANK() OVER (ORDER BY Name) AS GroupName
FROM
MyTable
ORDER BY
GroupName
, ID
Now, for the (complex) query you describe, let's say we have a table like this instead:
ID Name Phone
---- ------------- -------------
0 Kate +44 333 333
1 Sandra +44 000 000
2 Thomas +44 222 222
3 Robert +44 000 000
4 Thomas +44 444 444
5 George +44 222 222
6 Kate +44 000 000
7 Robert +44 444 444
--------------------------------
Should all these be in one group? As they all share name or phone with someone else, forming a "chain" of relative persons:
0-6 same name
6-1-3 same phone
3-7 same name
7-4 same-phone
4-2 same name
2-5 bame phone
For the dataset in the example you could write something like this:
;WITH Temp AS (
SELECT Name, Phone,
DENSE_RANK() OVER (ORDER BY Name) AS NameGroup,
DENSE_RANK() OVER (ORDER BY Phone) AS PhoneGroup
FROM MyTable)
SELECT MAX(Phone), MAX(Name), COUNT(*)
FROM Temp
GROUP BY NameGroup, PhoneGroup
I don't know if this is the best solution, but here it is:
SELECT
MyTable.ID, MyTable.Name, MyTable.Phone,
CASE WHEN N.No = 1 AND P.No = 1 THEN 1
WHEN N.No = 1 AND P.No > 1 THEN 2
WHEN N.No > 1 OR P.No > 1 THEN 3
END as GroupRes
FROM
MyTable
JOIN (SELECT Name, count(Name) No FROM MyTable GROUP BY Name) N on MyTable.Name = N.Name
JOIN (SELECT Phone, count(Phone) No FROM MyTable GROUP BY Phone) P on MyTable.Phone = P.Phone
The problem is that here are some joins made on varchars and could end up in increasing execution time.
Here is my solution:
SELECT p.LastName, P.FirstName, P.HomePhone,
CASE
WHEN ph.PhoneCount=1 THEN
CASE
WHEN n.NameCount=1 THEN 'unique name and phone'
ELSE 'common name'
END
ELSE
CASE
WHEN n.NameCount=1 THEN 'common phone'
ELSE 'common phone and name'
END
END
FROM Contacts p
INNER JOIN
(SELECT HomePhone, count(LastName) as PhoneCount
FROM Contacts
GROUP BY HomePhone) ph ON ph.HomePhone = p.HomePhone
INNER JOIN
(SELECT FirstName, count(LastName) as NameCount
FROM Contacts
GROUP BY FirstName) n ON n.FirstName = p.FirstName
LastN FirstN Phone Comment
Hoover Brenda 8138282334 unique name and phone
Washington Brian 9044563211 common name
Roosevelt Brian 7737653279 common name
Reagan Charles 7734567869 unique name and phone

Resources