How can I add the outputs of one CTE to another CTE? - sql-server

I have a database that consists of 8 records about the employees, and it saves both the Ssn which is the id of every employee and a Super_ssn that references their manager. Here is the code for creating the table:
Create Table t_employee
(
Ssn int not null,
Super_ssn int,
FirstName varchar(50),
LastName varchar(50),
NationalCode varchar(50),
_role varchar(50),
Primary key(Ssn),
Foreign Key(Super_ssn) references t_employee(Ssn)
);
I have to write a Procedure which takes two employees' id, (for example, #empId1 and #empId2), and then perform the delete operation, with the below condition:
If the first employee did not have any subalterns, it would be deleted. But if it had some subalterns, they must first be added to another employee (#empId2) and then deleted.
Here is my code:
USE CompanyHierarchy
GO
Create Procedure deleteEmployees (#empId1 int, #empId2 int)
as
Begin
Declare #subCount int;
With CTE1 (Ssn, FirstName, LastName, Super_ssn, NationalCode, _role, _level) as
(
Select emp.Ssn,
emp.FirstName,
emp.LastName,
emp.Super_ssn,
emp.NationalCode,
emp._role,
0 as _level
From t_employee AS emp
Where emp.Ssn = #empId1
Union ALL
Select _emp.Ssn,
_emp.FirstName,
_emp.LastName,
_emp.Super_ssn,
_emp.NationalCode,
_emp._role,
_emp._level + 1
From t_employee _emp
Join CTE1 C1
on _emp.Super_ssn = C1.Ssn
)
Select #subCount = COUNT (*) From CTE1 c1 Where c1.Ssn <> c1.Super_ssn;
IF #subCount = 0
Begin
Delete From t_employee where Ssn = #empId1;
End
ELSE
Begin
With _CTE1 (Ssn, FirstName, LastName, Super_ssn, NationalCode, _role, _level) as
(
Select emp.Ssn,
emp.FirstName,
emp.LastName,
emp.Super_ssn,
emp.NationalCode,
emp._role,
0 as _level
From t_employee AS emp
Where emp.Ssn = #empId1
Union ALL
Select _emp.Ssn,
_emp.FirstName,
_emp.LastName,
_emp.Super_ssn,
_emp.NationalCode,
_emp._role,
_emp._level + 1
From t_employee _emp
Join _CTE1 _C1
on _emp.Super_ssn = _C1.Ssn
)
Update _CTE1 Set Super_ssn = #empId2 Where Super_ssn <> Ssn;
End
End
GO
EXEC deleteEmployees #empId1 = 6, #empId2 = 5;
But it shows the error that
Msg 4421, Level 16, State 1, Procedure deleteEmployees, Line 37 [Batch
Start Line 3] Derived table '_CTE1' is not updatable because a column
of the derived table is derived or constant.
I have also tried to implement two CTEs to select the subalterns of both employees, and the use INSERT, but I do not know how to check not to add duplicate items.
Create Procedure deleteEmployees (#empId1 int, #empId2 int)
as
Begin
Declare #subCount int;
With CTE1 (Ssn, FirstName, LastName, Super_ssn, NationalCode, _role, _level) as
(
Select emp.Ssn,
emp.FirstName,
emp.LastName,
emp.Super_ssn,
emp.NationalCode,
emp._role,
0 as _level
From t_employee AS emp
Where emp.Ssn = #empId1
Union ALL
Select _emp.Ssn,
_emp.FirstName,
_emp.LastName,
_emp.Super_ssn,
_emp.NationalCode,
_emp._role,
_emp._level + 1
From t_employee _emp
Join CTE1 C1
on _emp.Super_ssn = C1.Ssn
)
Select #subCount = COUNT (*) From CTE1 c1 Where c1.Ssn <> c1.Super_ssn;
IF #subCount = 0
Begin
Delete From t_employee where Ssn = #empId1;
End
ELSE
Begin
With _CTE1 (Ssn, FirstName, LastName, Super_ssn, NationalCode, _role, _level) as
(
Select emp.Ssn,
emp.FirstName,
emp.LastName,
emp.Super_ssn,
emp.NationalCode,
emp._role,
0 as _level
From t_employee AS emp
Where emp.Ssn = #empId1
Union ALL
Select _emp.Ssn,
_emp.FirstName,
_emp.LastName,
_emp.Super_ssn,
_emp.NationalCode,
_emp._role,
_emp._level + 1
From t_employee _emp
Join _CTE1 _C1
on _emp.Super_ssn = _C1.Ssn
)
, _CTE2 (Ssn, FirstName, LastName, Super_ssn, NationalCode, _role, _level) as
(
Select emp.Ssn,
emp.FirstName,
emp.LastName,
emp.Super_ssn,
emp.NationalCode,
emp._role,
0 as _level
From t_employee AS emp
Where emp.Ssn = #empId1
Union ALL
Select _emp.Ssn,
_emp.FirstName,
_emp.LastName,
_emp.Super_ssn,
_emp.NationalCode,
_emp._role,
_emp._level + 1
From t_employee _emp
Join _CTE2 _C2
on _emp.Super_ssn = _C2.Ssn
)
Insert Into _CTE2
Select * From _CTE1;
End
End
I will be grateful for your help.

Have you overcomplicated your code? Why do you traverse the hierarchy at all? Surely if you intend to delete "a" when "b" reports to "a" and "c" reports to "b", you don't intend to do anything to "c". You only need to change those reporting directly to "a" ("b" in this case). If so, you don't need CTEs to traverse the hierarchy. You have also learned bad habits and have chosen strange naming standards.
Just check for the existence of #empId1 (a generic name that does not provide any clues about how it is used - see what I mean by naming standards?) as a supervisor first and "move" (not "add") those rows to the other user parameter #empId2. In short:
update t_employee set Super_ssn = #empId2
where Super_ssn = #empId1;
delete t_employee where Ssn = #empId1;
That is all the code you need at a very basic level. Add whatever error handling you wish, perhaps some sanity checking of the parameter values, and perhaps check for existence before the update. Use of two part names (schema.table) is a best practice to develop.

Related

SQL LEFT JOIN SUM One To Many

I am trying to get the SUM from two different but related tables that have a one to many relationship, but when I add a where condition to the second table the first does not properly sum up the total. Can this be done in a single query? I should also note that it is critical that both consider the same set of LocationId's as they come from an outside filter. I also need the Activityname condition to happen after the join if at all possible. If that isn't possible then that is fine.
IF OBJECT_ID('tempdb..#tmpVisits') is not null
begin
drop TABLE #tmpVisits
end
IF OBJECT_ID('tempdb..#tmpVisitsByActivity') is not null
begin
drop TABLE #tmpVisitsByActivity
end
CREATE TABLE #tmpVisits
(
AccountId int,
LocationId int,
Dt DATE,
TotalVisits int
)
CREATE TABLE #tmpVisitsByActivity
(
AccountId int,
LocationId int,
EventDate DATE,
TotalCompleted INT,
ActivityName varchar(20)
)
insert INTO #tmpVisits
SELECT 1,10,'2018-09-12',12
union ALL
SELECT 1,11,'2018-09-12',20
union ALL
SELECT 1,22,'2018-09-12',10
insert INTO #tmpVisitsByActivity
SELECT 1,10,'2018-09-12',55,'ActivityA'
union ALL
SELECT 1,10,'2018-09-12',1,'ActivityA'
union ALL
SELECT 1,10,'2018-09-12',2,'ActivityB'
union ALL
SELECT 1,22,'2018-09-12',3,'ActivityC'
SELECT SUM(v.TotalVisits) --expecting 42 actual 10
, SUM(a.TotalCompleted) --expecting 3 actual 3
FROM #tmpVisits v
left JOIN #tmpVisitsByActivity a
ON v.AccountId = a.AccountId
AND v.dt = a.EventDate
AND v.LocationId = a.locationid
WHERE v.dt='2018-09-12' AND v.AccountId=1
AND a.ActivityName='ActivityC'
You can move the where condition clauses in the join condition like below, if you want to use a single query.
SELECT SUM(v.TotalVisits) --expecting 42 actual 10
, SUM(a.TotalCompleted) --expecting 3 actual 3
FROM #tmpVisits v
left JOIN #tmpVisitsByActivity a
ON v.AccountId = a.AccountId
AND v.dt = a.EventDate
AND v.LocationId = a.locationid
AND v.dt='2018-09-12' AND v.AccountId=1
AND a.ActivityName='ActivityC'
The last criteria makes it so that some in #tmpVisits are excluded when there's no match.
But that's easy to get around.
Move the criteria for a.ActivityName to the ON clause, and remove it from the WHERE clause.
...
LEFT JOIN #tmpVisitsByActivity a
ON a.AccountId = v.AccountId AND
a.EventDate = v.dt AND
a.LocationId = v.locationId AND
a.ActivityName = 'ActivityA'
WHERE v.dt = '2018-09-12'
AND v.AccountId = 1
But it would be better to put the second table in a sub-query. Otherwise the first SUM could be wrong.
Example snippet:
DECLARE #Visits TABLE
(
AccountId INT,
LocationId INT,
Dt DATE,
TotalVisits INT
);
DECLARE #VisitsByActivity TABLE
(
AccountId INT,
LocationId INT,
EventDate DATE,
TotalCompleted INT,
ActivityName VARCHAR(20)
);
INSERT INTO #Visits (AccountId, LocationId, Dt, TotalVisits) VALUES
(1,10,'2018-09-12',12),
(1,11,'2018-09-12',20),
(1,22,'2018-09-12',10);
INSERT INTO #VisitsByActivity (AccountId, LocationId, EventDate, TotalCompleted, ActivityName) VALUES
(1,10,'2018-09-12',55,'ActivityA'),
(1,10,'2018-09-12',1,'ActivityA'),
(1,10,'2018-09-12',2,'ActivityB'),
(1,22,'2018-09-12',1,'ActivityC'),
(1,22,'2018-09-12',2,'ActivityC');
SELECT
SUM(v.TotalVisits) AS TotalVisits,
SUM(ac.TotalCompleted) AS TotalCompleted
FROM #Visits v
LEFT JOIN
(
SELECT AccountId, EventDate, locationid,
SUM(TotalCompleted) AS TotalCompleted
FROM #VisitsByActivity
WHERE ActivityName = 'ActivityC'
GROUP BY AccountId, EventDate, locationid
) AS ac
ON (ac.AccountId = v.AccountId AND ac.EventDate = v.dt AND ac.LocationId = v.locationid)
WHERE v.dt = '2018-09-12'
AND v.AccountId=1

check if id exists in multiple tables

I am using SQL Server 2012.
I have 5 tables (lets call them A, B, C, D & E). Each table contains a column called m_id, which contains id's that are nvarchar(10).
I currently run the query below 5 times (changing the table name). To see if the table contains the id.
select m_id from A where m_id = 'some_id'
Basically I want to know if the id is any of the 5 tables, if so return 1 else if does not exist in any of the 5 tables return 0.
I feel the current way I'm doing this is very inefficient. Is there a better way to do this?
You could use UNION(removes duplicates beforehand) or UNION ALL:
SELECT CASE WHEN EXISTS
( SELECT 1 FROM ( SELECT m_id FROM A
UNION
SELECT m_id FROM B
UNION
SELECT m_id FROM C
UNION
SELECT m_id FROM D
UNION
SELECT m_id FROM E ) All
WHERE All.m_id = 'some_id')
THEN 1 ELSE 0 END AS ContainsID
You can use this:
SELECT CASE WHEN COUNT(*) > 0 THEN 1 ELSE 0 END as returnCode
FROM (
SELECT m_id, N'A' as tableName FROM A WHERE m_id = 'some_id'
UNION ALL
SELECT m_id, N'B' as tableName FROM B WHERE m_id = 'some_id'
UNION ALL
SELECT m_id, N'C' as tableName FROM C WHERE m_id = 'some_id'
UNION ALL
SELECT m_id, N'D' as tableName FROM D WHERE m_id = 'some_id'
UNION ALL
SELECT m_id, N'E' as tableName FROM E WHERE m_id = 'some_id'
) data

SQL query for displaying count if same name comes in adjacent row it should show the count else 1

I have a table tb1 with columns id,name,
if same name comes in adjacent row it should display the count count else 1
For eg:
id name
1 sam
2 jose
3 sam
4 sam
5 dev
6 jose
Result want to be
name counts
sam 1
jose 1
sam 2
dev 1
jose 1
please help.
Check out this one :(SELF JOIN)
create table #sampele(id int,name varchar(50))
insert into #sampele values(1,'sam')
insert into #sampele values(2,'jose')
insert into #sampele values(3,'sam')
insert into #sampele values(4,'sam')
insert into #sampele values(5,'dev')
insert into #sampele values(6,'jose')
select a.id,a.name,case when a.name = b.name then 2 else 1 end as cnt from
#sampele a
left outer join
#sampele b
on a.id = b.id+1
Try a combination with a sub query, "COUNT(*) OVER (PARTITION", and row_number():
--DROP TABLE #Test;
SELECT id = IDENTITY(INT,1,1), name INTO #Test FROM
(
SELECT name = 'sam' UNION ALL
SELECT 'jose' UNION ALL
SELECT 'sam ' UNION ALL
SELECT 'sam ' UNION ALL
SELECT 'sam ' UNION ALL
SELECT 'dev ' UNION ALL
SELECT 'dev ' UNION ALL
SELECT 'jose' UNION ALL
SELECT 'sam ' UNION ALL
SELECT 'sam ' UNION ALL
SELECT 'jose'
) a;
GO
WITH GetEndID AS (
SELECT *
, EndID =(SELECT MIN(id) FROM #Test b WHERE b.name != a.name AND b.id > a.id)
FROM #Test a
), GetCount AS
(
SELECT
*
, NameCount = COUNT(*) OVER (PARTITION BY EndID)
, OrderPrio = ROW_NUMBER() OVER (PARTITION BY EndID ORDER BY id)
FROM GetEndID
)
SELECT id, name, NameCount FROM GetCount WHERE OrderPrio = 1 ORDER BY id;
select distinct a.name,case when a.name = b.name then 2 else 1 end as cnt from
tb1 a
left outer join
tb1 b
on a.id = b.id+1
sQlfiddle
Click to see running

Selecting common items for price comparison across 2 or more members

I'm attempting to enable purchase price comparisons across 2 or more members based on the members most recent price paid based on the purchase date.
I have four tables: Member, Items, UOM and Fact
Member (membername varchar(50), memberkey int)
Items (itemname varchar(50), itemkey int)
UOM (uomname varchar(50), uomkey int)
Fact (memberkey int, itemkey int, uomkey int, purchaseamount decimal(18,2), quantity int, purchasedate date)
My UI allows selection of two or more members to allow comparison of prices per uom. My result set has to include items where at least two of the selected members have purchased that item and exclude all others.
I set my member list in a temp table by the following:
declare #MemberKeys as varchar(max)
set #MemberKeys = '702,1382,1389,1390,1391,1392,1393,1394,1395,1396,1397,1401,1402,1404,1405,1406,1516,1844';
create table #mk (memberName varchar(253), memberkey smallint)
insert into #mk (memberName, memberkey)
Select Rownbr + '.) ' + membername, memberkey from (
SELECT
cast(ROW_NUMBER() OVER(ORDER BY [MemberFacilityName] ASC) as varchar (10)) AS RowNbr
,k.value as memberkey
,m.memberName
FROM
Member m
INNER JOIN dbo.String_To_SmallInt_Table(#MemberKeys, ',') AS k
ON m.Memberkey = k.value
) X
Then I use the temp table to filter when querying the fact, uom and item tables.
select m.membername
,i.itemname
,u.uomname
,purchaseamount
,quantity
,purchaseamount/quantity as price
from Fact f
join #mk m
on m.memberkey = f.memberkey
join Item i
on i.itemkey = f.itemkey
join UOM u
on u.uomkey= f.uomkey
Now I need to do the following but need some guidance to accomplish it.
1.) filter out items that are not used by at least two of the select members.
2.) show only the most recent purchase price per member\item\uom based on the purchase date.
3.) order the result set to show member then item for easy comparison (similar to the simplified list below).
Member Item Price
mbr1 A 1.11
mbr2 A 1.12
mbr3 A 1.52
mbr4 A 2.01
mbr1 B 3.01
mbr2 B 3.03
mbr3 B 3.12
mbr4 B 3.41
mbr1 C 6.01
mbr2 C 6.63
mbr3 C 6.92
mbr4 C 6.99
Here's how I implemented this...tell me if my logic is sound:
/****Create Sample Data*****/
-->Member table
IF exists (SELECT 1 from dbo.sysobjects WHERE name = 'Member')
DROP TABLE Member
GO
CREATE TABLE Member (membername VARCHAR(50), memberkey INT)
GO
INSERT INTO Member VALUES
('mbr1',702),
('mbr2',1382),
('mbr3',1389),
('mbr4',1390),
('mbr5',1391),
('mbr6',1392),
('mbr7',1393),
('mbr8',1394),
('mbr9',1395),
('mbr10',1396),
('mbr11',1397),
('mbr12',1401),
('mbr13',1402),
('mbr14',1404),
('mbr15',1405),
('mbr16',1406),
('mbr17',1516),
('mbr18',1111)-->Should NOT show up in query
GO
-->Items table
IF exists (SELECT 1 from dbo.sysobjects WHERE name = 'Items')
DROP TABLE Items
GO
CREATE TABLE Items (itemname VARCHAR(50), itemkey INT)
GO
INSERT INTO Items VALUES
('A',1),
('B',2),
('C',3),
('D',4)
GO
-->UOM table
IF exists (SELECT 1 from dbo.sysobjects WHERE name = 'UOM')
DROP TABLE UOM
GO
CREATE TABLE UOM (uomname VARCHAR(50), uomkey INT)
GO
INSERT INTO UOM VALUES ('QTY', 1)
GO
-->Fact table
IF exists (SELECT 1 from dbo.sysobjects WHERE name = 'Fact')
DROP TABLE Fact
GO
CREATE TABLE Fact (memberkey INT, itemkey INT, uomkey INT, purchaseamount decimal(18,2), quantity INT, purchasedate date)
GO
INSERT INTO Fact VALUES
(702, 1, 1, 1.11, 2, '1/3/2012'),-->Should show up in query
(1382, 1, 1, 1.12, 3, '1/4/2013'),-->Should NOT show up in query
(1382, 1, 1, 1.14, 2, '7/5/2013'),-->Should show up in query
(1404, 1, 1, 1.21, 2, '1/7/2012'),-->Should show up in query
(1401, 2, 1, 3.01, 1, '4/2/2013'),-->Should NOT show up in query
(1111, 3, 1, 6.92, 1, '12/12/2012'),-->Should NOT show up in query
(702, 3, 1, 5.01, 2, '4/1/2011'),-->Should show up in query
(1401, 3, 1, 4.01, 1, '6/5/2012'),-->Should show up in query
(1397, 4, 1, 5.45, 1, '7/4/2013'),-->Should NOT show up in query
(1397, 4, 1, 5.22, 3, '3/15/2011')-->Should NOT show up in query
GO
/*****Code to get results*****/
BEGIN
-->Members to Filter On
DECLARE #MemberKeys AS VARCHAR(max)
SET #MemberKeys = '702,1382,1389,1390,1391,1392,1393,1394,1395,1396,1397,1401,1402,1404,1405,1406,1516,1844';
-->Parse out comma delimited VALUES into a table variable
DECLARE #Member TABLE
(
memberkey INT
)
DECLARE #spot SMALLINT, #str VARCHAR(max), #sql VARCHAR(max)
WHILE #MemberKeys <> ''
BEGIN
SET #spot = CHARINDEX(',', #MemberKeys)
IF #spot>0
BEGIN
SET #str = LEFT(#MemberKeys, #spot-1)
SET #MemberKeys = RIGHT(#MemberKeys, LEN(#MemberKeys)-#spot)
END
ELSE
BEGIN
SET #str = #MemberKeys
SET #MemberKeys = ''
END
INSERT INTO #Member VALUES(CONVERT(VARCHAR(100),#str))
END
END;
-->Display Results
WITH staged(memberkey, membername, itemname ,itemkey, uomname, uomkey, purchaseamount, quantity, price, purchasedate, noitems )
AS
(
SELECT
m.memberkey
,m.membername
,i.itemname
,i.itemkey
,u.uomname
,u.uomkey
,f.purchaseamount
,f.quantity
,f.purchaseamount/f.quantity as price
,f.purchasedate
,COUNT(m.memberkey) OVER(PARTITION BY i.itemkey )-COUNT(m.memberkey) OVER(PARTITION BY convert(VARCHAR,m.memberkey)+convert(VARCHAR,i.itemkey) ) as noitems
FROM
Fact f
JOIN Member m ON m.memberkey = f.memberkey
JOIN Items i ON i.itemkey = f.itemkey
JOIN UOM u ON u.uomkey= f.uomkey
WHERE
EXISTS(SELECT 1 FROM #Member m2 WHERE m.memberkey=m2.memberkey)
)
SELECT
memberkey,
membername,
itemname ,
itemkey,
uomname,
uomkey,
sum(purchaseamount) as purchaseamount ,
sum(quantity) as quantity ,
sum(price) as price,
max(purchasedate) as purchasedate
FROM
staged st
WHERE
noitems>0
and exists(
select memberkey,
itemkey ,
uomkey,
max(purchasedate) as maxdate
from staged st2
where st.memberkey=st2.memberkey
and st.itemkey=st2.itemkey
and st.uomkey=st2.uomkey
group by
memberkey,
itemkey ,
uomkey
having st.purchasedate=max(st2.purchasedate)
)
GROUP BY
memberkey
,membername
,itemname
,uomname
, itemkey
, uomkey
ORDER BY
itemname
,memberkey;
I was able to figure this out on my own but will post my own answer; maybe it could help others with similar tasks.
I was able to complete the three tasks by introducing a second temp table to identify the most recent purchase price per item and member. Then joining the #mostrecentpurchase temp table to the base table enable effective member price comparisons.
To limit the result set to only items where two or more of the selected members had documented prices, I used the OVER clause and partitioned by item and unit of measure to get a count of members per item/uom. I then used this count in the where clause to filter out rows where the count was less than one.
Finally, the sorting was accomplished by simple order by clause. The completed tsql script is below.
declare #MemberKeys as varchar(max)
set #MemberKeys = '702,1382,1389,1390,1391,1392,1393,1394,1395,1396,1397,1401,1402,1404,1405,1406,1516,1844';
create table #mk (memberName varchar(253), memberkey smallint)
insert into #mk (memberName, memberkey)
Select Rownbr + '.) ' + membername, memberkey from (
SELECT
cast(ROW_NUMBER() OVER(ORDER BY [MemberFacilityName] ASC) as varchar (10)) AS RowNbr
,k.value as memberkey
,m.memberName
FROM
Member m
INNER JOIN dbo.String_To_SmallInt_Table(#MemberKeys, ',') AS k
ON m.Memberkey = k.value
) X
create table #mostrecentpurchase(purchasedate date, itemkey int, uomkey int, memberkey int)
Insert into #mostrecentpurchase(purchasedate, itemkey, uomkey, memberkey)
select max(f.PurchaseDate) purchasedate
, f.itemKey
, f.uomkey
, f.memberkey
from Fact f
join #mk m
on m.memberkey = f.memberkey
group by f.itemkey
, f.uomkey
, f.memberkey
select x.* FROM (
select m.memberName
, i.itemname
, i.itemkey
, f.purchasedate
, sum(f.purchaseamount) as purchaseamount
, sum(f.quantity) as quantity
, u.uomname
, sum(f.purchaseamount)/sum(f.quantity) as price
, count(m.memberName) OVER(PARTITION BY i.vendorItem_PK,u.UnitOfmeasure) AS mbrCount
from
fact f
join #mk m
on m.memberkey = f.memberkey
join #mostrecentpurchase mrp
on mrp.purchasedate = f.PurchaseDate
and mrp.memberkey = f.memberkey
and mrp.uomkey = f.uomkey
and mrp.vendoritemkey = f.itemkey
join item i
on i.itemkey = f.itemkey
join uom u
on u.uomkey = f.uomkey
group by m.membername,i.itemname,i.itemkey,f.purchasedate,u.uomname
) X
where mbrCount >= #MemberCompCount
order by X.itemname, X.memberName
drop table #mk;
drop table #mostrecentpurchase;

How to find duplicate values in SQL Server

I'm using SQL Server 2008. I have a table
Customers
customer_number int
field1 varchar
field2 varchar
field3 varchar
field4 varchar
... and a lot more columns, that don't matter for my queries.
Column customer_number is pk. I'm trying to find duplicate values and some differences between them.
Please, help me to find all rows that have same
1) field1, field2, field3, field4
2) only 3 columns are equal and one of them isn't (except rows from list 1)
3) only 2 columns equal and two of them aren't (except rows from list 1 and list 2)
In the end, I'll have 3 tables with this results and additional groupId, which will be same for a group of similar (For example, for 3 column equals, rows that have 3 same columns equal will be a separate group)
Thank you.
Here's a handy query for finding duplicates in a table. Suppose you want to find all email addresses in a table that exist more than once:
SELECT email, COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
You could also use this technique to find rows that occur exactly once:
SELECT email
FROM users
GROUP BY email
HAVING ( COUNT(email) = 1 )
The easiest would probably be to write a stored procedure to iterate over each group of customers with duplicates and insert the matching ones per group number respectively.
However, I've thought about it and you can probably do this with a subquery. Hopefully I haven't made it more complicated than it ought to, but this should get you what you're looking for for the first table of duplicates (all four fields). Note that this is untested, so it might need a little tweaking.
Basically, it gets each group of fields where there are duplicates, a group number for each, then gets all customers with those fields and assigns the same group number.
INSERT INTO FourFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY c.field1) AS group_no,
c.field1, c.field2, c.field3, c.field4
FROM Customers c
GROUP BY c.field1, c.field2, c.field3, c.field4
HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON custs.field1 = Groups.field1
AND custs.field2 = Groups.field2
AND custs.field3 = Groups.field3
AND custs.field4 = Groups.field4
The other ones are a bit more complicated, however as you'll need to expand out the possibilities. The three-field groups would then be:
INSERT INTO ThreeFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY GroupsInner.field1) AS group_no,
GroupsInner.field1, GroupsInner.field2,
GroupsInner.field3, GroupsInner.field4
FROM (SELECT c.field1, c.field2, c.field3, NULL AS field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field1, c.field2, c.field3
UNION ALL
SELECT c.field1, c.field2, NULL AS field3, c.field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field1, c.field2, c.field4
UNION ALL
SELECT c.field1, NULL AS field2, c.field3, c.field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field1, c.field3, c.field4
UNION ALL
SELECT NULL AS field1, c.field2, c.field3, c.field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field2, c.field3, c.field4) GroupsInner
GROUP BY GroupsInner.field1, GroupsInner.field2,
GroupsInner.field3, GroupsInner.field4
HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON (Groups.field1 IS NULL OR custs.field1 = Groups.field1)
AND (Groups.field2 IS NULL OR custs.field2 = Groups.field2)
AND (Groups.field3 IS NULL OR custs.field3 = Groups.field3)
AND (Groups.field4 IS NULL OR custs.field4 = Groups.field4)
Hopefully this produces the right results and I'll leave the last one as an exercise. :-D
I'm not sure if you require an equality check on different fields (like field1=field2).
Otherwise this might be enough.
Edit
Feel free to adjust the testdata to provide us with inputs that give a wrong output according to your specifications.
Test data
DECLARE #Customers TABLE (
customer_number INTEGER IDENTITY(1, 1)
, field1 INTEGER
, field2 INTEGER
, field3 INTEGER
, field4 INTEGER)
INSERT INTO #Customers
SELECT 1, 1, 1, 1
UNION ALL SELECT 1, 1, 1, 1
UNION ALL SELECT 1, 1, 1, NULL
UNION ALL SELECT 1, 1, 1, 2
UNION ALL SELECT 1, 1, 1, 3
UNION ALL SELECT 2, 1, 1, 1
All Equal
SELECT ROW_NUMBER() OVER (ORDER BY c1.customer_number)
, c1.field1
, c1.field2
, c1.field3
, c1.field4
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND ISNULL(c2.field1, 0) = ISNULL(c1.field1, 0)
AND ISNULL(c2.field2, 0) = ISNULL(c1.field2, 0)
AND ISNULL(c2.field3, 0) = ISNULL(c1.field3, 0)
AND ISNULL(c2.field4, 0) = ISNULL(c1.field4, 0)
One field different
SELECT ROW_NUMBER() OVER (ORDER BY field1, field2, field3, field4)
, field1
, field2
, field3
, field4
FROM (
SELECT DISTINCT c1.field1
, c1.field2
, c1.field3
, field4 = NULL
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND c2.field1 = c1.field1
AND c2.field2 = c1.field2
AND c2.field3 = c1.field3
AND ISNULL(c2.field4, 0) <> ISNULL(c1.field4, 0)
UNION ALL
SELECT DISTINCT c1.field1
, c1.field2
, NULL
, c1.field4
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND c2.field1 = c1.field1
AND c2.field2 = c1.field2
AND ISNULL(c2.field3, 0) <> ISNULL(c1.field3, 0)
AND c2.field4 = c1.field4
UNION ALL
SELECT DISTINCT c1.field1
, NULL
, c1.field3
, c1.field4
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND c2.field1 = c1.field1
AND ISNULL(c2.field2, 0) <> ISNULL(c1.field2, 0)
AND c2.field3 = c1.field3
AND c2.field4 = c1.field4
UNION ALL
SELECT DISTINCT NULL
, c1.field2
, c1.field3
, c1.field4
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND ISNULL(c2.field1, 0) <> ISNULL(c1.field1, 0)
AND c2.field2 = c1.field2
AND c2.field3 = c1.field3
AND c2.field4 = c1.field4
) c
You can write simply something like that to count duplicates entries, i think it's working :
use *DATABASE_NAME*
go
SELECT *YOUR_FIELD*, COUNT(*) AS dupes
FROM *YOUR_TABLE_NAME*
GROUP BY *YOUR_FIELD*
HAVING (COUNT(*) > 1)
Enjoy
There is a clean way of doing this with CUBE(), which will aggregate by all the possible combinations of columns
SELECT
field1,field2,field3,field4
,duplicate_row_count = COUNT(*)
,grp_id = GROUPING_ID(field1,field2,field3,field4)
INTO #duplicate_rows
FROM table_name
GROUP BY CUBE(field1,field2,field3,field4)
HAVING COUNT(*) > 1
AND GROUPING_ID(field1,field2,field3,field4) IN (0,1,2,4,8,3,5,6,9,10,12)
The numbers (0,1,2,4,8,3,5,6,9,10,12) are just the bitmasks (0000,0001,0010,0100,...,1010,1100) of the grouping sets that we care about-- those with 4, 3, or 2 matches.
Then join this back to the original table using a technique that treats NULLs in #duplicate_rows as wildcards
SELECT a.*
FROM table_name a
INNER JOIN #duplicate_rows b
ON NULLIF(b.field1,a.field1) IS NULL
AND NULLIF(b.field2,a.field2) IS NULL
AND NULLIF(b.field3,a.field3) IS NULL
AND NULLIF(b.field4,a.field4) IS NULL
--WHERE grp_id IN (0) --Use this for 4 matches
--WHERE grp_id IN (1,2,4,8) --Use this for 3 matches
--WHERE grp_id IN (3,5,6,9,10,12) --Use this for 2 matches

Resources