How to find duplicate values in SQL Server - sql-server

I'm using SQL Server 2008. I have a table
Customers
customer_number int
field1 varchar
field2 varchar
field3 varchar
field4 varchar
... and a lot more columns, that don't matter for my queries.
Column customer_number is pk. I'm trying to find duplicate values and some differences between them.
Please, help me to find all rows that have same
1) field1, field2, field3, field4
2) only 3 columns are equal and one of them isn't (except rows from list 1)
3) only 2 columns equal and two of them aren't (except rows from list 1 and list 2)
In the end, I'll have 3 tables with this results and additional groupId, which will be same for a group of similar (For example, for 3 column equals, rows that have 3 same columns equal will be a separate group)
Thank you.

Here's a handy query for finding duplicates in a table. Suppose you want to find all email addresses in a table that exist more than once:
SELECT email, COUNT(email) AS NumOccurrences
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )
You could also use this technique to find rows that occur exactly once:
SELECT email
FROM users
GROUP BY email
HAVING ( COUNT(email) = 1 )

The easiest would probably be to write a stored procedure to iterate over each group of customers with duplicates and insert the matching ones per group number respectively.
However, I've thought about it and you can probably do this with a subquery. Hopefully I haven't made it more complicated than it ought to, but this should get you what you're looking for for the first table of duplicates (all four fields). Note that this is untested, so it might need a little tweaking.
Basically, it gets each group of fields where there are duplicates, a group number for each, then gets all customers with those fields and assigns the same group number.
INSERT INTO FourFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY c.field1) AS group_no,
c.field1, c.field2, c.field3, c.field4
FROM Customers c
GROUP BY c.field1, c.field2, c.field3, c.field4
HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON custs.field1 = Groups.field1
AND custs.field2 = Groups.field2
AND custs.field3 = Groups.field3
AND custs.field4 = Groups.field4
The other ones are a bit more complicated, however as you'll need to expand out the possibilities. The three-field groups would then be:
INSERT INTO ThreeFieldsDuplicates(group_no, customer_no)
SELECT Groups.group_no, custs.customer_no
FROM (SELECT ROW_NUMBER() OVER(ORDER BY GroupsInner.field1) AS group_no,
GroupsInner.field1, GroupsInner.field2,
GroupsInner.field3, GroupsInner.field4
FROM (SELECT c.field1, c.field2, c.field3, NULL AS field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field1, c.field2, c.field3
UNION ALL
SELECT c.field1, c.field2, NULL AS field3, c.field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field1, c.field2, c.field4
UNION ALL
SELECT c.field1, NULL AS field2, c.field3, c.field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field1, c.field3, c.field4
UNION ALL
SELECT NULL AS field1, c.field2, c.field3, c.field4
FROM Customers c
WHERE NOT EXISTS(SELECT d.customer_no
FROM FourFieldsDuplicates d
WHERE d.customer_no = c.customer_no)
GROUP BY c.field2, c.field3, c.field4) GroupsInner
GROUP BY GroupsInner.field1, GroupsInner.field2,
GroupsInner.field3, GroupsInner.field4
HAVING COUNT(*) > 1) Groups
INNER JOIN Customers custs ON (Groups.field1 IS NULL OR custs.field1 = Groups.field1)
AND (Groups.field2 IS NULL OR custs.field2 = Groups.field2)
AND (Groups.field3 IS NULL OR custs.field3 = Groups.field3)
AND (Groups.field4 IS NULL OR custs.field4 = Groups.field4)
Hopefully this produces the right results and I'll leave the last one as an exercise. :-D

I'm not sure if you require an equality check on different fields (like field1=field2).
Otherwise this might be enough.
Edit
Feel free to adjust the testdata to provide us with inputs that give a wrong output according to your specifications.
Test data
DECLARE #Customers TABLE (
customer_number INTEGER IDENTITY(1, 1)
, field1 INTEGER
, field2 INTEGER
, field3 INTEGER
, field4 INTEGER)
INSERT INTO #Customers
SELECT 1, 1, 1, 1
UNION ALL SELECT 1, 1, 1, 1
UNION ALL SELECT 1, 1, 1, NULL
UNION ALL SELECT 1, 1, 1, 2
UNION ALL SELECT 1, 1, 1, 3
UNION ALL SELECT 2, 1, 1, 1
All Equal
SELECT ROW_NUMBER() OVER (ORDER BY c1.customer_number)
, c1.field1
, c1.field2
, c1.field3
, c1.field4
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND ISNULL(c2.field1, 0) = ISNULL(c1.field1, 0)
AND ISNULL(c2.field2, 0) = ISNULL(c1.field2, 0)
AND ISNULL(c2.field3, 0) = ISNULL(c1.field3, 0)
AND ISNULL(c2.field4, 0) = ISNULL(c1.field4, 0)
One field different
SELECT ROW_NUMBER() OVER (ORDER BY field1, field2, field3, field4)
, field1
, field2
, field3
, field4
FROM (
SELECT DISTINCT c1.field1
, c1.field2
, c1.field3
, field4 = NULL
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND c2.field1 = c1.field1
AND c2.field2 = c1.field2
AND c2.field3 = c1.field3
AND ISNULL(c2.field4, 0) <> ISNULL(c1.field4, 0)
UNION ALL
SELECT DISTINCT c1.field1
, c1.field2
, NULL
, c1.field4
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND c2.field1 = c1.field1
AND c2.field2 = c1.field2
AND ISNULL(c2.field3, 0) <> ISNULL(c1.field3, 0)
AND c2.field4 = c1.field4
UNION ALL
SELECT DISTINCT c1.field1
, NULL
, c1.field3
, c1.field4
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND c2.field1 = c1.field1
AND ISNULL(c2.field2, 0) <> ISNULL(c1.field2, 0)
AND c2.field3 = c1.field3
AND c2.field4 = c1.field4
UNION ALL
SELECT DISTINCT NULL
, c1.field2
, c1.field3
, c1.field4
FROM #Customers c1
INNER JOIN #Customers c2 ON c2.customer_number > c1.customer_number
AND ISNULL(c2.field1, 0) <> ISNULL(c1.field1, 0)
AND c2.field2 = c1.field2
AND c2.field3 = c1.field3
AND c2.field4 = c1.field4
) c

You can write simply something like that to count duplicates entries, i think it's working :
use *DATABASE_NAME*
go
SELECT *YOUR_FIELD*, COUNT(*) AS dupes
FROM *YOUR_TABLE_NAME*
GROUP BY *YOUR_FIELD*
HAVING (COUNT(*) > 1)
Enjoy

There is a clean way of doing this with CUBE(), which will aggregate by all the possible combinations of columns
SELECT
field1,field2,field3,field4
,duplicate_row_count = COUNT(*)
,grp_id = GROUPING_ID(field1,field2,field3,field4)
INTO #duplicate_rows
FROM table_name
GROUP BY CUBE(field1,field2,field3,field4)
HAVING COUNT(*) > 1
AND GROUPING_ID(field1,field2,field3,field4) IN (0,1,2,4,8,3,5,6,9,10,12)
The numbers (0,1,2,4,8,3,5,6,9,10,12) are just the bitmasks (0000,0001,0010,0100,...,1010,1100) of the grouping sets that we care about-- those with 4, 3, or 2 matches.
Then join this back to the original table using a technique that treats NULLs in #duplicate_rows as wildcards
SELECT a.*
FROM table_name a
INNER JOIN #duplicate_rows b
ON NULLIF(b.field1,a.field1) IS NULL
AND NULLIF(b.field2,a.field2) IS NULL
AND NULLIF(b.field3,a.field3) IS NULL
AND NULLIF(b.field4,a.field4) IS NULL
--WHERE grp_id IN (0) --Use this for 4 matches
--WHERE grp_id IN (1,2,4,8) --Use this for 3 matches
--WHERE grp_id IN (3,5,6,9,10,12) --Use this for 2 matches

Related

When joining tables adapt the "on" statement depending on the query results

I have 2 tables:
Table_1 with columns col_A, col_B , col_C , col_D , col_E
Table_2 with columns col_A, col_B , col_C , col_D , col_F
I would like to join them on columns col_A, col_B , col_C , col_D.
For the rows in Table_1 that do not get joined this way (as they don't have a match in Table_2), I would like to join them only on columns col_A, col_B , col_C.
If there are still rows in Table_1 that did not get joined, i would like to join them only on columns col_A, col_B.
And once that is done and there are still rows in Table_1 that did not get joined, i would like to join them only on column col_A.
I wrote the following script where i use a new table to get this result.
Is there is a more efficient way to do this? Preferably by creating a view, not a table?
create table new_table (col_A nvarchar(50) , col_B nvarchar(50) , col_C nvarchar(50)
, col_D nvarchar(50) , col_E nvarchar(50) , col_F nvarchar(50) )
go
insert into new_table
select Table_1.* , Table_2.col_F
from Table_1
inner join Table_2
on Table_1.col_A=Table_2.col_A
and Table_1.col_B=Table_2.col_B
and Table_1.col_C=Table_2.col_C
and Table_1.col_D=Table_2.col_D
go
insert into new_table
select Table_1.* , Table_2.col_F
from Table_1
inner join Table_2
on Table_1.col_A=Table_2.col_A
and Table_1.col_B=Table_2.col_B
and Table_1.col_C=Table_2.col_C
where concat (Table_1.col_A, Table_1.col_B , Table_1.col_C , Table_1.col_D , Table_1.col_E
not in (select concat (col_A, col_B , col_C , col_D , col_E) from new_table)
go
insert into new_table
select Table_1.* , Table_2.col_F
from Table_1
inner join Table_2
on Table_1.col_A=Table_2.col_A
and Table_1.col_B=Table_2.col_B
where concat (Table_1.col_A, Table_1.col_B , Table_1.col_C , Table_1.col_D , Table_1.col_E
not in (select concat (col_A, col_B , col_C , col_D , col_E) from new_table)
go
insert into new_table
select Table_1.* , Table_2.col_F
from Table_1
inner join Table_2
on Table_1.col_A=Table_2.col_A
where concat (Table_1.col_A, Table_1.col_B , Table_1.col_C , Table_1.col_D , Table_1.col_E
not in (select concat (col_A, col_B , col_C , col_D , col_E) from new_table)
go
You could join them on just colA and then assign some different numbers:
WITH cte AS(
SELECT
CASE WHEN t1.D = t2.D THEN 100 ELSE 0 END +
CASE WHEN t1.C = t2.C THEN 10 ELSE 0 END +
CASE WHEN t1.B = t2.B THEN 1 ELSE 0 END as whatMatched,
*
FROM
t1 JOIN t2 on t1.A = t2.A
)
Now if a row got 111 we know that all (ABCD) matched, got 0 then only A matched etc..
So we can ask for only some rows:
SELECT * FROM cte WHERE whatmatched IN (111,11,1,0)
And lastly if there were multiples (matching on just A might mean there are duplicates), we can assign a row number to them in descending order and only take the first row:
SELECT x.* FROM
(SELECT *, ROW_NUMBER() OVER(ORDER BY whatmatched DESC) rown FROM cte WHERE whatmatched IN (111,11,1,0)) x
WHERE x.rown = 1
If it suits you better to use letters
we can assess the matches, choose only A, AB, ABC, or ABCD, then pick the most specific one by looking at the LENgth of the match string:
WITH cte AS(
SELECT
'A' +
CASE WHEN t1.B = t2.B THEN 'B' ELSE '' END +
CASE WHEN t1.C = t2.C THEN 'C' ELSE '' END +
CASE WHEN t1.D = t2.D THEN 'D' ELSE '' END as whatMatched,
*
FROM
t1 JOIN t2 on t1.A = t2.A
)
SELECT x.* FROM
(SELECT *, ROW_NUMBER() OVER(ORDER BY LEN(whatmatched) DESC) rown FROM cte WHERE whatmatched IN ('A','AB','ABC','ABCD')) x
WHERE x.rown = 1
If you want ties (i.e. a row from t1 that matches two rows from t2 because their A/B/C is the same and D differs, use DENSE_RANK instead of ROW_NUMBER so they end up tied for first place

BigQuery LEFT JOIN a table and filter its array elements based on conditions

I want to join a table to another table containing arrays and in the joined result I want to have only the array elements which pass a condition. In this case a date condition.
The code snippet below illustrates my problem. I want the output to contain only ids with record_dates less than '2019-10-15'
WITH platform AS (
SELECT 'u1' AS id, 'm1' AS platform_id, '2019-10-12' as record_date
UNION ALL
SELECT 'u2' AS id, 'm1' AS platform_id, '2019-10-13' as record_date
UNION ALL
SELECT 'u21' AS id, 'm1' AS platform_id, '2019-10-16' as record_date
),
platform_agg AS (
SELECT platform_id
, ARRAY_AGG(id) as ids
, ARRAY_AGG(record_date) as record_dates
FROM platform
GROUP BY platform_id
),
orders AS(
SELECT 'u2' AS id, 'c1' AS order_id, '2019-10-15' as order_date
),
orders_plus_platform AS (
SELECT order_id
, orders.id
, orders.order_date
, platform.platform_id
, CASE WHEN platform.platform_id IS NOT NULL THEN platform_agg.ids ELSE [orders.id] END AS ids
, CASE WHEN platform.platform_id IS NOT NULL THEN platform_agg.record_dates ELSE NULL END AS record_dates
FROM orders
LEFT JOIN platform
ON orders.id = platform.id and platform.record_date <= orders.order_date
LEFT JOIN platform_agg
ON platform.platform_id = platform_agg.platform_id
)
SELECT * FROM orders_plus_platform
Below is the current query output, however, in the desired output the u21 element should be filtered out as the record_date is after '2019-10-15'.
Thank you,
The below solution worked for me. Basically you join twice to the platform table to get all the ids associated with a platform, instead of joining to a pre-aggregated versions of it. This way you can more easily apply filters.
orders_plus_platform AS (
SELECT order_id
, orders.id
, orders.order_date
, platform.platform_id
, ARRAY_AGG(CASE WHEN platform.platform_id IS NOT NULL THEN platform2.id ELSE orders.id END) AS ids
, ARRAY_AGG(CASE WHEN platform.platform_id IS NOT NULL THEN platform2.record_date ELSE NULL END) AS record_dates
FROM orders
LEFT JOIN platform
ON orders.id = platform.id and platform.record_date <= orders.order_date
LEFT JOIN platform platform2
ON platform.platform_id = platform2.platform_id AND platform2.record_date <= orders.order_date
GROUP BY
order_id
, orders.id
, orders.order_date
, platform.platform_id
)
You can use subqueries in your WHERE clause. Subqueries can run on unnested arrays and return a boolean value - e.g. count of dates < something should be more than zero:
SELECT c_id
, c.id
, c.c_date
, cxd.record_id
, CASE WHEN cxd.record_id IS NOT NULL THEN rd_agg.ids ELSE [c.id] END AS ids
, CASE WHEN cxd.record_id IS NOT NULL THEN rd_agg.record_dates ELSE NULL END AS record_dates
FROM c
LEFT JOIN record_ids cxd
ON c.id = cxd.id and cxd.record_date <= c.c_date
LEFT JOIN record_ids_agg rd_agg
ON cxd.record_id = rd_agg.record_id
WHERE (SELECT COUNT(1)>0 FROM UNNEST(record_dates) AS r WHERE r < '2019-10-15')

Sql find records in join table which match filter exactly

Best to explain with an example :)
Let's say I have a the tables
CREATE TABLE dbo.Customer (
CustomerId INT PRIMARY KEY,
Name NVARCHAR(50)
)
CREATE TABLE dbo.ShoppingBasket (
ShoppingBasketId INT PRIMARY KEY,
CustomerId INT NOT NULL FOREIGN KEY dbo.Customer(CustomerId),
ItemName NVARCHAR(50)
)
Example data
INSERT INTO dbo.Customer
VALUES (1, 'Steve'), (2, 'Bucky')
INSERT INTO dbo.ShoppingBasket
VALUES (1, 1, 'Banana'), (2, 1, 'Orange'), (3, 2, 'Orange')
Now, I want to find all customers, that have a Banana and an Orange in their shopping basket exactly. So in the case above, it should return Steve only. Since Bucky has only a Banana.
The following query works for this
SELECT *
FROM dbo.Customer AS c
WHERE EXISTS (
SELECT 1
FROM dbo.ShoppingBasket AS b
WHERE b.CustomerId = c.CustomerId
AND b.ItemName IN ('Banana', 'Orange')
GROUP BY CustomerId
HAVING COUNT(CustomerId) = 2
)
That's fine. Now, if I want all customers that only have an Orange, the above query fails since
SELECT *
FROM dbo.Customer AS c
WHERE EXISTS (
SELECT 1
FROM dbo.ShoppingBasket AS b
WHERE b.CustomerId = c.CustomerId
AND b.ItemName = 'Orange'
GROUP BY CustomerId
HAVING COUNT(CustomerId) = 1
)
is filtering out the shopping basket and then applying the group and having clause. Thus both Steve and Bucky are return whereas only Bucky should be returned.
Could someone point me in the right direction to find such a query, I suppose I can always do another NOT EXIST inside the exist subquery, to make sure no other items are found. E.g.
SELECT *
FROM dbo.Customer AS c
WHERE EXISTS (
SELECT 1
FROM dbo.ShoppingBasket AS b
WHERE b.CustomerId = c.CustomerId
AND b.ItemName = 'Orange'
AND NOT EXISTS (
SELECT 1
FROM dbo.ShoppingBasket AS b2
WHERE b2.CustomerId = b.CustomerId
AND b.ItemName <> 'Orange'
)
)
But was wandering if there's a more elegant way to handle it. One that preferably doesn't do an extra, negated join on the same table.
you should check the distinct ItemName instead of the customerId eg:
select c.*
from dbo.Customer
inner join(
select CustomerId, count(distinct ItemName) count_name
from ShoppingBasket
where ItemName IN ('Banana', 'Orange')
group by CustomerId
having count_name = 2
) t on t.CustomerId = c.CustomerId
If you need a twice check on number of item name an type you could compose the inner join in two part
select c.*
from dbo.Customer
inner join(
select CustomerId
from ShoppingBasket b
where ItemName IN ('Banana', 'Orange')
INNER JOIN (
Select CustomerId, count(distinct ItemName) count_name
from ShoppingBasket
group by CustomerId
having count_name = 2
) t2 ON t2.CustomerId = b.CustomerId
) t on t.CustomerId = c.CustomerId
and for Orange ..
select c.*
from dbo.Customer
inner join(
select CustomerId
from ShoppingBasket b
where ItemName IN ('Orange')
INNER JOIN (
Select CustomerId, count(distinct ItemName) count_name
from ShoppingBasket
group by CustomerId
having count_name = 1
) t2 ON t2.CustomerId = b.CustomerId
) t on t.CustomerId = c.CustomerId
The problem is that the in clause in ambiguos because return true also for ShoppingBasket CustomerId with One positive check
then instead of an in clause ( equivalent to OR ) you should work on and clause for all the customer that have a number of distinct name equivalent at then number you are looking for
Select CustomerId
from ShoppingBasket a
inner join ShoppingBasket b a.ItemName = 'Orange' and b.ItemName = 'Banana'
and customerId IN (
Select CustomerId
from ShoppingBasket
group by CustomerId
having count(distinct ItemName) = 2
)
I like to do this with group by and having. If you want both "banana" and "orange":
select sb.customerId
from dbo.ShoppingBasket sb
where sb.itemName in ('banana', 'orange')
group by sb.customerId
having count(distinct itemName) = 2; -- has both
If you want the two items and nothing else, then use this more general form:
select sb.customerId
from dbo.ShoppingBasket sb
where sb.itemName in ('banana', 'orange')
group by sb.customerId
having sum(case when sb.itemName = 'banana' then 1 else 0 end) > 0 and
sum(case when sb.itemName = 'orange' then 1 else 0 end) > 0 and
sum(case when sb.itemName not in ('orange, 'banana') then 1 else 0 end) = 0 ;
You can extend this version easily. Each item gets its own sum(). You can also include multiple items to support, say, "banana and (orange or clementine)".

Unexpected result using CTE to perform a random join on two tables for all rows one-to-many

I am attempting to randomly join the rows of two tables (TableA and TableB) such that each row in TableA is joined to only one row in TableB and every row in TableB is joined to at least one row in TableA.
For example, a random join of TableA with 5 distinct rows and TableB with 3 distinct rows should result in something like this:
TableA TableB
1 3
2 1
3 1
4 2
5 1
However, sometimes not all the rows from TableB are included in the final result; so in the example above might have row 2 from TableB missing because in its place is either row 1 or 3 joined to row 4 on TableA. You can see this occur by executing the script a number of times and checking the result. It seems that it is necessary for some reason to use an interim table (#Q) to be able to ensure that a correct result is returned which has all rows from both TableA and TableB.
Can someone please explain why this is happening?
Also, can someone please advise on what would be a better way to get the desired result?
I understand that sometimes no result is returned due to a failure of some kind in the cross apply and ordering which i have yet to identify and goes to the point that I am sure there is a better way to perform this operation. I hope that makes sense. Thanks in advance!
declare #TableA table (
ID int
);
declare #TableB table (
ID int
);
declare #Q table (
RN int,
TableAID int,
TableBID int
);
with cte as (
select
1 as ID
union all
select
ID + 1
from cte
where ID < 5
)
insert #TableA (ID)
select ID from cte;
with cte as (
select
1 as ID
union all
select
ID + 1
from cte
where ID < 3
)
insert #TableB (ID)
select ID from cte;
select * from #TableA;
select * from #TableB;
with cte as (
select
row_number() over (partition by TableAID order by newid()) as RN,
TableAID,
TableBID
from (
select
a.ID as TableAID,
b.ID as TableBID
from #TableA as a
cross apply #TableB as b
) as M
)
select --All rows from TableB not always included
TableAID,
TableBID
from cte
where RN in (
select
top 1
iCTE.RN
from cte as iCTE
group by iCTE.RN
having count(distinct iCTE.TableBID) = (
select count(1) from #TableB
)
)
order by TableAID;
with cte as (
select
row_number() over (partition by TableAID order by newid()) as RN,
TableAID,
TableBID
from (
select
a.ID as TableAID,
b.ID as TableBID
from #TableA as a
cross apply #TableB as b
) as M
)
insert #Q
select
RN,
TableAID,
TableBID
from cte;
select * from #Q;
select --All rows from both TableA and TableB included
TableAID,
TableBID
from #Q
where RN in (
select
top 1
iQ.RN
from #Q as iQ
group by iQ.RN
having count(distinct iQ.TableBID) = (
select count(1) from #TableB
)
)
order by TableAID;
See if this gives you what you're looking for...
DECLARE
#CountA INT = (SELECT COUNT(*) FROM #TableA ta),
#CountB INT = (SELECT COUNT(*) FROM #TableB tb),
#MinCount INT;
SELECT #MinCount = CASE WHEN #CountA < #CountB THEN #CountA ELSE #CountB END;
WITH
cte_A1 AS (
SELECT
*,
rn = ROW_NUMBER() OVER (ORDER BY NEWID())
FROM
#TableA ta
),
cte_B1 AS (
SELECT
*,
rn = ROW_NUMBER() OVER (ORDER BY NEWID())
FROM
#TableB tb
),
cte_A2 AS (
SELECT
a1.ID,
rn = CASE WHEN a1.rn > #MinCount THEN a1.rn - #MinCount ELSE a1.rn end
FROM
cte_A1 a1
),
cte_B2 AS (
SELECT
b1.ID,
rn = CASE WHEN b1.rn > #MinCount THEN b1.rn - #MinCount ELSE b1.rn end
FROM
cte_B1 b1
)
SELECT
A = a.ID,
B = b.ID
FROM
cte_A2 a
JOIN cte_B2 b
ON a.rn = b.rn;

How to use CTE to get a query repeated for multiple inputs?

I have the following query:
SELECT **top 1** account, date, result
FROM table_1 as t1
JOIN table_2 at t2 ON t1.accountId = t2.frn_accountId
WHERE accountID = 1
ORDER BY date
This query returns the result that I want however I want that result for multiple accountID. They query should return the top 1 value for each accountID.
The query that produce the list of the accountID-s is:
SELECT accountID from lskin WHERE refname LIKE '%BHA%' and isactive = 1
How can I write this query so it can produce the desired result? I have been playing around with CTE but haven't been able to make it correct. It doesn't have to be with CTE, I just thought it can be easier using CTE...
Here is CTE solution.
SELECT *
FROM (SELECT account
, date
, result
, ROW_NUMBER() OVER (PARTITION BY t1.accountId ORDER BY date DESC) AS Rownum
FROM table_1 AS t1
INNER JOIN table_2 AS t2
ON t1.accountId = t2.frn_accountId
INNER JOIN lskin AS l
ON l.accountID = t1.accountID
WHERE l.refname LIKE '%BHA%'
) a
WHERE a.Rownum = 1;
Use max on your date and group by the account, or what ever columns are appropriate.
SELECT
account,
DT = max(date),
result
FROM table_1 as t1
JOIN table_2 as t2 ON t1.accountId = t2.frn_accountId
JOIN lskin as l on l.accountID = t1.accountID
WHERE l.refname like '%BHA%'
GROUP BY
account
,result
If the grouping isn't correct, just join to a sub-query to limit it with max date. Just change the table names as necessary.
SELECT
account,
date,
result
FROM table_1 as t1
JOIN table_2 as t2 ON t1.accountId = t2.frn_accountId
JOIN lskin as l on l.accountID = t1.accountID
INNER JOIN (select max(date) dt, accountID from table_1 group by accountID) tt on tt.dt = t1.accountId and tt.accountId = t1.accountId
WHERE l.refname like '%BHA%'
Ignore the CTE at the top. That's just test data.
/* CTE Test Data */
; WITH table_1 AS (
SELECT 1 AS accountID, 'acc1' AS account UNION ALL
SELECT 2 AS accountID, 'acc2' AS account UNION ALL
SELECT 3 AS accountID, 'acc3' AS account
)
, table_2 AS (
SELECT 1 AS frn_accountID, 'new1' AS result, GETDATE() AS [date] UNION ALL
SELECT 1 AS frn_accountID, 'mid1' AS result, GETDATE()-1 AS [date] UNION ALL
SELECT 1 AS frn_accountID, 'old1' AS result, GETDATE()-2 AS [date] UNION ALL
SELECT 2 AS frn_accountID, 'new2' AS result, GETDATE() AS [date] UNION ALL
SELECT 2 AS frn_accountID, 'mid2' AS result, GETDATE()-1 AS [date] UNION ALL
SELECT 2 AS frn_accountID, 'old2' AS result, GETDATE()-2 AS [date] UNION ALL
SELECT 3 AS frn_accountID, 'new3' AS result, GETDATE() AS [date] UNION ALL
SELECT 3 AS frn_accountID, 'mid3' AS result, GETDATE()-1 AS [date] UNION ALL
SELECT 3 AS frn_accountID, 'old3' AS result, GETDATE()-2 AS [date]
)
, lskin AS (
SELECT 1 AS accountID, 'purple' AS refName, 1 AS isActive UNION ALL
SELECT 2 AS accountID, 'blue' AS refName, 1 AS isActive UNION ALL
SELECT 3 AS accountID, 'orange' AS refName, 0 AS isActive UNION ALL
SELECT 4 AS accountID, 'blue' AS refName, 1 AS isActive
)
,
/* Just use the below and remove comment markers around WITH to build Orders CTE. */
/* ; WITH */
theCTE AS (
SELECT s1.accountID, s1.account, s1.result, s1.[date]
FROM (
SELECT t1.accountid, t1.account, t2.result, t2.[date], ROW_NUMBER() OVER (PARTITION BY t1.account ORDER BY t2.[date]) AS rn
FROM table_1 t1
INNER JOIN table_2 t2 ON t1.accountID = t2.frn_accountID
) s1
WHERE s1.rn = 1
)
SELECT lskin.accountID
FROM lskin
INNER JOIN theCTE ON theCTE.accountid = lskin.accountID
WHERE lskin.refName LIKE '%blue%'
AND lskin.isActive = 1
;
EDITED:
I'm still making a lot of assumptions about your data structure. And again, make sure you're querying what you need. CTEs are awesome, but you don't want to accidentally filter out expected results.

Resources