When joining tables adapt the "on" statement depending on the query results - sql-server

I have 2 tables:
Table_1 with columns col_A, col_B , col_C , col_D , col_E
Table_2 with columns col_A, col_B , col_C , col_D , col_F
I would like to join them on columns col_A, col_B , col_C , col_D.
For the rows in Table_1 that do not get joined this way (as they don't have a match in Table_2), I would like to join them only on columns col_A, col_B , col_C.
If there are still rows in Table_1 that did not get joined, i would like to join them only on columns col_A, col_B.
And once that is done and there are still rows in Table_1 that did not get joined, i would like to join them only on column col_A.
I wrote the following script where i use a new table to get this result.
Is there is a more efficient way to do this? Preferably by creating a view, not a table?
create table new_table (col_A nvarchar(50) , col_B nvarchar(50) , col_C nvarchar(50)
, col_D nvarchar(50) , col_E nvarchar(50) , col_F nvarchar(50) )
go
insert into new_table
select Table_1.* , Table_2.col_F
from Table_1
inner join Table_2
on Table_1.col_A=Table_2.col_A
and Table_1.col_B=Table_2.col_B
and Table_1.col_C=Table_2.col_C
and Table_1.col_D=Table_2.col_D
go
insert into new_table
select Table_1.* , Table_2.col_F
from Table_1
inner join Table_2
on Table_1.col_A=Table_2.col_A
and Table_1.col_B=Table_2.col_B
and Table_1.col_C=Table_2.col_C
where concat (Table_1.col_A, Table_1.col_B , Table_1.col_C , Table_1.col_D , Table_1.col_E
not in (select concat (col_A, col_B , col_C , col_D , col_E) from new_table)
go
insert into new_table
select Table_1.* , Table_2.col_F
from Table_1
inner join Table_2
on Table_1.col_A=Table_2.col_A
and Table_1.col_B=Table_2.col_B
where concat (Table_1.col_A, Table_1.col_B , Table_1.col_C , Table_1.col_D , Table_1.col_E
not in (select concat (col_A, col_B , col_C , col_D , col_E) from new_table)
go
insert into new_table
select Table_1.* , Table_2.col_F
from Table_1
inner join Table_2
on Table_1.col_A=Table_2.col_A
where concat (Table_1.col_A, Table_1.col_B , Table_1.col_C , Table_1.col_D , Table_1.col_E
not in (select concat (col_A, col_B , col_C , col_D , col_E) from new_table)
go

You could join them on just colA and then assign some different numbers:
WITH cte AS(
SELECT
CASE WHEN t1.D = t2.D THEN 100 ELSE 0 END +
CASE WHEN t1.C = t2.C THEN 10 ELSE 0 END +
CASE WHEN t1.B = t2.B THEN 1 ELSE 0 END as whatMatched,
*
FROM
t1 JOIN t2 on t1.A = t2.A
)
Now if a row got 111 we know that all (ABCD) matched, got 0 then only A matched etc..
So we can ask for only some rows:
SELECT * FROM cte WHERE whatmatched IN (111,11,1,0)
And lastly if there were multiples (matching on just A might mean there are duplicates), we can assign a row number to them in descending order and only take the first row:
SELECT x.* FROM
(SELECT *, ROW_NUMBER() OVER(ORDER BY whatmatched DESC) rown FROM cte WHERE whatmatched IN (111,11,1,0)) x
WHERE x.rown = 1
If it suits you better to use letters
we can assess the matches, choose only A, AB, ABC, or ABCD, then pick the most specific one by looking at the LENgth of the match string:
WITH cte AS(
SELECT
'A' +
CASE WHEN t1.B = t2.B THEN 'B' ELSE '' END +
CASE WHEN t1.C = t2.C THEN 'C' ELSE '' END +
CASE WHEN t1.D = t2.D THEN 'D' ELSE '' END as whatMatched,
*
FROM
t1 JOIN t2 on t1.A = t2.A
)
SELECT x.* FROM
(SELECT *, ROW_NUMBER() OVER(ORDER BY LEN(whatmatched) DESC) rown FROM cte WHERE whatmatched IN ('A','AB','ABC','ABCD')) x
WHERE x.rown = 1
If you want ties (i.e. a row from t1 that matches two rows from t2 because their A/B/C is the same and D differs, use DENSE_RANK instead of ROW_NUMBER so they end up tied for first place

Related

Unexpected result using CTE to perform a random join on two tables for all rows one-to-many

I am attempting to randomly join the rows of two tables (TableA and TableB) such that each row in TableA is joined to only one row in TableB and every row in TableB is joined to at least one row in TableA.
For example, a random join of TableA with 5 distinct rows and TableB with 3 distinct rows should result in something like this:
TableA TableB
1 3
2 1
3 1
4 2
5 1
However, sometimes not all the rows from TableB are included in the final result; so in the example above might have row 2 from TableB missing because in its place is either row 1 or 3 joined to row 4 on TableA. You can see this occur by executing the script a number of times and checking the result. It seems that it is necessary for some reason to use an interim table (#Q) to be able to ensure that a correct result is returned which has all rows from both TableA and TableB.
Can someone please explain why this is happening?
Also, can someone please advise on what would be a better way to get the desired result?
I understand that sometimes no result is returned due to a failure of some kind in the cross apply and ordering which i have yet to identify and goes to the point that I am sure there is a better way to perform this operation. I hope that makes sense. Thanks in advance!
declare #TableA table (
ID int
);
declare #TableB table (
ID int
);
declare #Q table (
RN int,
TableAID int,
TableBID int
);
with cte as (
select
1 as ID
union all
select
ID + 1
from cte
where ID < 5
)
insert #TableA (ID)
select ID from cte;
with cte as (
select
1 as ID
union all
select
ID + 1
from cte
where ID < 3
)
insert #TableB (ID)
select ID from cte;
select * from #TableA;
select * from #TableB;
with cte as (
select
row_number() over (partition by TableAID order by newid()) as RN,
TableAID,
TableBID
from (
select
a.ID as TableAID,
b.ID as TableBID
from #TableA as a
cross apply #TableB as b
) as M
)
select --All rows from TableB not always included
TableAID,
TableBID
from cte
where RN in (
select
top 1
iCTE.RN
from cte as iCTE
group by iCTE.RN
having count(distinct iCTE.TableBID) = (
select count(1) from #TableB
)
)
order by TableAID;
with cte as (
select
row_number() over (partition by TableAID order by newid()) as RN,
TableAID,
TableBID
from (
select
a.ID as TableAID,
b.ID as TableBID
from #TableA as a
cross apply #TableB as b
) as M
)
insert #Q
select
RN,
TableAID,
TableBID
from cte;
select * from #Q;
select --All rows from both TableA and TableB included
TableAID,
TableBID
from #Q
where RN in (
select
top 1
iQ.RN
from #Q as iQ
group by iQ.RN
having count(distinct iQ.TableBID) = (
select count(1) from #TableB
)
)
order by TableAID;
See if this gives you what you're looking for...
DECLARE
#CountA INT = (SELECT COUNT(*) FROM #TableA ta),
#CountB INT = (SELECT COUNT(*) FROM #TableB tb),
#MinCount INT;
SELECT #MinCount = CASE WHEN #CountA < #CountB THEN #CountA ELSE #CountB END;
WITH
cte_A1 AS (
SELECT
*,
rn = ROW_NUMBER() OVER (ORDER BY NEWID())
FROM
#TableA ta
),
cte_B1 AS (
SELECT
*,
rn = ROW_NUMBER() OVER (ORDER BY NEWID())
FROM
#TableB tb
),
cte_A2 AS (
SELECT
a1.ID,
rn = CASE WHEN a1.rn > #MinCount THEN a1.rn - #MinCount ELSE a1.rn end
FROM
cte_A1 a1
),
cte_B2 AS (
SELECT
b1.ID,
rn = CASE WHEN b1.rn > #MinCount THEN b1.rn - #MinCount ELSE b1.rn end
FROM
cte_B1 b1
)
SELECT
A = a.ID,
B = b.ID
FROM
cte_A2 a
JOIN cte_B2 b
ON a.rn = b.rn;

How to use CTE to get a query repeated for multiple inputs?

I have the following query:
SELECT **top 1** account, date, result
FROM table_1 as t1
JOIN table_2 at t2 ON t1.accountId = t2.frn_accountId
WHERE accountID = 1
ORDER BY date
This query returns the result that I want however I want that result for multiple accountID. They query should return the top 1 value for each accountID.
The query that produce the list of the accountID-s is:
SELECT accountID from lskin WHERE refname LIKE '%BHA%' and isactive = 1
How can I write this query so it can produce the desired result? I have been playing around with CTE but haven't been able to make it correct. It doesn't have to be with CTE, I just thought it can be easier using CTE...
Here is CTE solution.
SELECT *
FROM (SELECT account
, date
, result
, ROW_NUMBER() OVER (PARTITION BY t1.accountId ORDER BY date DESC) AS Rownum
FROM table_1 AS t1
INNER JOIN table_2 AS t2
ON t1.accountId = t2.frn_accountId
INNER JOIN lskin AS l
ON l.accountID = t1.accountID
WHERE l.refname LIKE '%BHA%'
) a
WHERE a.Rownum = 1;
Use max on your date and group by the account, or what ever columns are appropriate.
SELECT
account,
DT = max(date),
result
FROM table_1 as t1
JOIN table_2 as t2 ON t1.accountId = t2.frn_accountId
JOIN lskin as l on l.accountID = t1.accountID
WHERE l.refname like '%BHA%'
GROUP BY
account
,result
If the grouping isn't correct, just join to a sub-query to limit it with max date. Just change the table names as necessary.
SELECT
account,
date,
result
FROM table_1 as t1
JOIN table_2 as t2 ON t1.accountId = t2.frn_accountId
JOIN lskin as l on l.accountID = t1.accountID
INNER JOIN (select max(date) dt, accountID from table_1 group by accountID) tt on tt.dt = t1.accountId and tt.accountId = t1.accountId
WHERE l.refname like '%BHA%'
Ignore the CTE at the top. That's just test data.
/* CTE Test Data */
; WITH table_1 AS (
SELECT 1 AS accountID, 'acc1' AS account UNION ALL
SELECT 2 AS accountID, 'acc2' AS account UNION ALL
SELECT 3 AS accountID, 'acc3' AS account
)
, table_2 AS (
SELECT 1 AS frn_accountID, 'new1' AS result, GETDATE() AS [date] UNION ALL
SELECT 1 AS frn_accountID, 'mid1' AS result, GETDATE()-1 AS [date] UNION ALL
SELECT 1 AS frn_accountID, 'old1' AS result, GETDATE()-2 AS [date] UNION ALL
SELECT 2 AS frn_accountID, 'new2' AS result, GETDATE() AS [date] UNION ALL
SELECT 2 AS frn_accountID, 'mid2' AS result, GETDATE()-1 AS [date] UNION ALL
SELECT 2 AS frn_accountID, 'old2' AS result, GETDATE()-2 AS [date] UNION ALL
SELECT 3 AS frn_accountID, 'new3' AS result, GETDATE() AS [date] UNION ALL
SELECT 3 AS frn_accountID, 'mid3' AS result, GETDATE()-1 AS [date] UNION ALL
SELECT 3 AS frn_accountID, 'old3' AS result, GETDATE()-2 AS [date]
)
, lskin AS (
SELECT 1 AS accountID, 'purple' AS refName, 1 AS isActive UNION ALL
SELECT 2 AS accountID, 'blue' AS refName, 1 AS isActive UNION ALL
SELECT 3 AS accountID, 'orange' AS refName, 0 AS isActive UNION ALL
SELECT 4 AS accountID, 'blue' AS refName, 1 AS isActive
)
,
/* Just use the below and remove comment markers around WITH to build Orders CTE. */
/* ; WITH */
theCTE AS (
SELECT s1.accountID, s1.account, s1.result, s1.[date]
FROM (
SELECT t1.accountid, t1.account, t2.result, t2.[date], ROW_NUMBER() OVER (PARTITION BY t1.account ORDER BY t2.[date]) AS rn
FROM table_1 t1
INNER JOIN table_2 t2 ON t1.accountID = t2.frn_accountID
) s1
WHERE s1.rn = 1
)
SELECT lskin.accountID
FROM lskin
INNER JOIN theCTE ON theCTE.accountid = lskin.accountID
WHERE lskin.refName LIKE '%blue%'
AND lskin.isActive = 1
;
EDITED:
I'm still making a lot of assumptions about your data structure. And again, make sure you're querying what you need. CTEs are awesome, but you don't want to accidentally filter out expected results.

How to join first two child records to parent?

I am trying to select the top 25 parent records and join to it the first two child records ordered by date. The parent record can have 0 to n children.
The end result would be something like:
P1, C1, C2
P2, C1, C2
...
P25, C1, C2
I have found an example using max date, but I am having trouble getting a specific row number
select top 25 *
from parentTable p
left join childTable c
on p.Key = c.Key
and c.dateColumn = (
select Max(c.dateColumn)
from c
where p.Key = c.Key
)
You should be able to get what you are looking for using ROW_NUMBER():
WITH c AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY [key] ORDER BY [date] asc) as rn
,[key]
,[date]
from child
)
SELECT top 25 p.[key], c1.[key], c1.[date], c2.[key], c2.[date]
FROM parent p
LEFT JOIN c c1
ON p.[key] = c1.[key]
AND c1.rn = 1
LEFT JOIN c c2
ON p.[key] = c2.[key]
AND c2.rn = 2
Check SQLFiddle for test data/results
You can use CTE.
;with cte as (
select top (25) *
from parentTable
)
select *
from cte p
left join childTable c
on p.[Key] = c.[Key];
First solution (because data set seems to be small) one solution is to use OUTER APPLY:
select top (25) ... columns ...
from parentTable p
outer apply (
select top (2) ... columns ...
from childTable c
where p.[Key] = c.[Key]
order by c.dateColumn desc -- asc ?
) a
-- Most of the times, when top filter is used order by clause should be also used
order by p.dateColumn desc -- asc ?
-- order by p.idColumn desc -- asc ?
Second solution (could be less eficient):
select top (25) ... columns ...
from parentTable p
left join (
select top (2) ... columns ..., ROW_NUMBER() over(partition by c.[Key] order by c.dateColumn desc) as rn -- asc ?
from childTable c
) a on p.[Key] = a.[Key] and a.rn < 3
-- Most of the times, when top filter is used order by clause should be also used
order by p.dateColumn desc -- asc ?
-- order by p.idColumn desc -- asc ?
Note: at least for the first solution one of following indices could help from the point of view of performance:
create index ix_name on dbo.childTable ([Key], [dateColumn])
--or
create index ix_name on dbo.childTable ([Key], [dateColumn])
include (... columns from select top(2) clause ...)

Combine two tables in SQL Server

I have tow tables with the same number of rows
Example:
table a:
1,A
2,B
3,C
table b:
AA,BB
AAA,BBB,
AAAA,BBBB
I want a new table made like that in SQL SErver:
1,A,AA,BB
2,B,AAA,BBB
3,C,AAAA,BBBB
How do I do that?
In SQL Server 2005 (or newer), you can use something like this:
-- test data setup
DECLARE #tablea TABLE (ID INT, Val CHAR(1))
INSERT INTO #tablea VALUES(1, 'A'), (2, 'B'), (3, 'C')
DECLARE #tableb TABLE (Val1 VARCHAR(10), Val2 VARCHAR(10))
INSERT INTO #tableb VALUES('AA', 'BB'),('AAA', 'BBB'), ('AAAA', 'BBBB')
-- define CTE for table A - sort by "ID" (I just assumed this - adapt if needed)
;WITH DataFromTableA AS
(
SELECT ID, Val, ROW_NUMBER() OVER(ORDER BY ID) AS RN
FROM #tablea
),
-- define CTE for table B - sort by "Val1" (I just assumed this - adapt if needed)
DataFromTableB AS
(
SELECT Val1, Val2, ROW_NUMBER() OVER(ORDER BY Val1) AS RN
FROM #tableb
)
-- create an INNER JOIN between the two CTE which just basically selected the data
-- from both tables and added a new column "RN" which gets a consecutive number for each row
SELECT
a.ID, a.Val, b.Val1, b.Val2
FROM
DataFromTableA a
INNER JOIN
DataFromTableB b ON a.RN = b.RN
This gives you the requested output:
You could do a rank over the primary keys, then join on that rank:
SELECT RANK() OVER (table1.primaryKey),
T1.*,
T2.*
FROM
SELECT T1.*, T2.*
FROM
(
SELECT RANK() OVER (table1.primaryKey) [rank], table1.* FROM table1
) AS T1
JOIN
(
SELECT RANK() OVER (table2.primaryKey) [rank], table2.* FROM table2
) AS T2 ON T1.[rank] = T2.[rank]
Your query is strange, but in Oracle you can do this:
select a.*, tb.*
from a
, ( select rownum rn, b.* from b ) tb -- temporary b - added rn column
where a.c1 = tb.rn -- assuming first column in a is called c1
if there is not column with numbers in a you can do same trick twice
select ta.*, tb.*
from ( select rownum rn, a.* from a ) ta
, ( select rownum rn, b.* from b ) tb
where ta.rn = tb.rn
Note: be aware that this can generate random combination, for example
1 A AA BB
2 C A B
3 B AAA BBB
because there is no order by in ta and tb

How to find duplicate values in SQL Server

I'm using SQL Server 2008. I have a table
Customers
customer_number int
field1 varchar
field2 varchar
field3 varchar
field4 varchar
... and a lot more columns, that don't matter for my queries.
Column customer_number is pk. I'm trying to find duplicate values and some differences between them.
Please, help me to find all rows that have same
1) field1, field2, field3, field4
2) only 3 columns are equal and one of them isn't (except rows from list 1)
3) only 2 columns equal and two of them aren't (except rows from list 1 and list 2)
In the end, I'll have 3 tables with this results and additional groupId, which will be same for a group of similar (For example, for 3 column equals, rows that have 3 same columns equal will be a separate group)
Thank you.
Here's a handy query for finding duplicates in a table. Suppose you want to find all email addresses in a table that exist more than once:
SELECT email, COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
You could also use this technique to find rows that occur exactly once:
SELECT email
FROM users
GROUP BY email
HAVING ( COUNT(email) = 1 )
The easiest would probably be to write a stored procedure to iterate over each group of customers with duplicates and insert the matching ones per group number respectively.
However, I've thought about it and you can probably do this with a subquery. Hopefully I haven't made it more complicated than it ought to, but this should get you what you're looking for for the first table of duplicates (all four fields). Note that this is untested, so it might need a little tweaking.
Basically, it gets each group of fields where there are duplicates, a group number for each, then gets all customers with those fields and assigns the same group number.
INSERT INTO FourFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY c.field1) AS group_no,
c.field1, c.field2, c.field3, c.field4
FROM Customers c
GROUP BY c.field1, c.field2, c.field3, c.field4
HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON custs.field1 = Groups.field1
AND custs.field2 = Groups.field2
AND custs.field3 = Groups.field3
AND custs.field4 = Groups.field4
The other ones are a bit more complicated, however as you'll need to expand out the possibilities. The three-field groups would then be:
INSERT INTO ThreeFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY GroupsInner.field1) AS group_no,
GroupsInner.field1, GroupsInner.field2,
GroupsInner.field3, GroupsInner.field4
FROM (SELECT c.field1, c.field2, c.field3, NULL AS field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field1, c.field2, c.field3
UNION ALL
SELECT c.field1, c.field2, NULL AS field3, c.field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field1, c.field2, c.field4
UNION ALL
SELECT c.field1, NULL AS field2, c.field3, c.field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field1, c.field3, c.field4
UNION ALL
SELECT NULL AS field1, c.field2, c.field3, c.field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field2, c.field3, c.field4) GroupsInner
GROUP BY GroupsInner.field1, GroupsInner.field2,
GroupsInner.field3, GroupsInner.field4
HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON (Groups.field1 IS NULL OR custs.field1 = Groups.field1)
AND (Groups.field2 IS NULL OR custs.field2 = Groups.field2)
AND (Groups.field3 IS NULL OR custs.field3 = Groups.field3)
AND (Groups.field4 IS NULL OR custs.field4 = Groups.field4)
Hopefully this produces the right results and I'll leave the last one as an exercise. :-D
I'm not sure if you require an equality check on different fields (like field1=field2).
Otherwise this might be enough.
Edit
Feel free to adjust the testdata to provide us with inputs that give a wrong output according to your specifications.
Test data
DECLARE #Customers TABLE (
customer_number INTEGER IDENTITY(1, 1)
, field1 INTEGER
, field2 INTEGER
, field3 INTEGER
, field4 INTEGER)
INSERT INTO #Customers
SELECT 1, 1, 1, 1
UNION ALL SELECT 1, 1, 1, 1
UNION ALL SELECT 1, 1, 1, NULL
UNION ALL SELECT 1, 1, 1, 2
UNION ALL SELECT 1, 1, 1, 3
UNION ALL SELECT 2, 1, 1, 1
All Equal
SELECT ROW_NUMBER() OVER (ORDER BY c1.customer_number)
, c1.field1
, c1.field2
, c1.field3
, c1.field4
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND ISNULL(c2.field1, 0) = ISNULL(c1.field1, 0)
AND ISNULL(c2.field2, 0) = ISNULL(c1.field2, 0)
AND ISNULL(c2.field3, 0) = ISNULL(c1.field3, 0)
AND ISNULL(c2.field4, 0) = ISNULL(c1.field4, 0)
One field different
SELECT ROW_NUMBER() OVER (ORDER BY field1, field2, field3, field4)
, field1
, field2
, field3
, field4
FROM (
SELECT DISTINCT c1.field1
, c1.field2
, c1.field3
, field4 = NULL
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND c2.field1 = c1.field1
AND c2.field2 = c1.field2
AND c2.field3 = c1.field3
AND ISNULL(c2.field4, 0) <> ISNULL(c1.field4, 0)
UNION ALL
SELECT DISTINCT c1.field1
, c1.field2
, NULL
, c1.field4
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND c2.field1 = c1.field1
AND c2.field2 = c1.field2
AND ISNULL(c2.field3, 0) <> ISNULL(c1.field3, 0)
AND c2.field4 = c1.field4
UNION ALL
SELECT DISTINCT c1.field1
, NULL
, c1.field3
, c1.field4
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND c2.field1 = c1.field1
AND ISNULL(c2.field2, 0) <> ISNULL(c1.field2, 0)
AND c2.field3 = c1.field3
AND c2.field4 = c1.field4
UNION ALL
SELECT DISTINCT NULL
, c1.field2
, c1.field3
, c1.field4
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND ISNULL(c2.field1, 0) <> ISNULL(c1.field1, 0)
AND c2.field2 = c1.field2
AND c2.field3 = c1.field3
AND c2.field4 = c1.field4
) c
You can write simply something like that to count duplicates entries, i think it's working :
use *DATABASE_NAME*
go
SELECT *YOUR_FIELD*, COUNT(*) AS dupes
FROM *YOUR_TABLE_NAME*
GROUP BY *YOUR_FIELD*
HAVING (COUNT(*) > 1)
Enjoy
There is a clean way of doing this with CUBE(), which will aggregate by all the possible combinations of columns
SELECT
field1,field2,field3,field4
,duplicate_row_count = COUNT(*)
,grp_id = GROUPING_ID(field1,field2,field3,field4)
INTO #duplicate_rows
FROM table_name
GROUP BY CUBE(field1,field2,field3,field4)
HAVING COUNT(*) > 1
AND GROUPING_ID(field1,field2,field3,field4) IN (0,1,2,4,8,3,5,6,9,10,12)
The numbers (0,1,2,4,8,3,5,6,9,10,12) are just the bitmasks (0000,0001,0010,0100,...,1010,1100) of the grouping sets that we care about-- those with 4, 3, or 2 matches.
Then join this back to the original table using a technique that treats NULLs in #duplicate_rows as wildcards
SELECT a.*
FROM table_name a
INNER JOIN #duplicate_rows b
ON NULLIF(b.field1,a.field1) IS NULL
AND NULLIF(b.field2,a.field2) IS NULL
AND NULLIF(b.field3,a.field3) IS NULL
AND NULLIF(b.field4,a.field4) IS NULL
--WHERE grp_id IN (0) --Use this for 4 matches
--WHERE grp_id IN (1,2,4,8) --Use this for 3 matches
--WHERE grp_id IN (3,5,6,9,10,12) --Use this for 2 matches

Resources