T-SQL to select row count from two tables, excluding duplicates - sql-server

So, I have 2 tables…. I need to get the combined row count from both, excluding any duplicates…
For example….
Table 1 has 20000 rows and Table 2 has 500 rows
There is 1 duplicate id that is in both table 1 and table 2, so the total row count should be 20,499….
This is what I have tried so far….
with cterc as
(SELECT COUNT(*) as rn
FROM Table_1 as t1
join Table_2 as t2 on t1.id <> t2.id)
SELECT SUM(rn) as totalrowNo
from cterc

Does the following provide your expected count?
select count(*)
from (
Select Id from Table_1
union /* distinct values, union all doesn't distinct values */
Select Id from Table_2
)t;

Related

Find matched column records in one table that may be in multiple columns in a second table

I have two tables, Table 1 with multiple columns, name, ID number, address, etc. And Table 2 with columns, ID number 1 and ID number 2 and a few other columns.
I am trying to get a T-SQL query returning all rows in Table 1 with an indicator showing whether the ID number in Table 1 matches either ID_1 or ID_2 in Table 2. The result set would be all columns from Table 1 , plus the indicator “Matched” if the ID number in Table 1 matches either ID_1 or ID_2 in Table 2.
Table 1: ID | Name | Address |
Table 2: ID_1 | ID_2
Result
T1.ID, Name, Address, ("Matched"/"Unmatched") ...
Also, would it be the same to do the opposite, meaning instead of the result including all rows from Table 1 that have a matching ID in ID_1 or ID_2 in Table 2, the result set would include only records from Table 1 where t1.ID = (T2.ID_1 or T2.ID_2)?
SELECT DISTINCT
CASE
WHEN (table1.ID = table2.ID_1 )
THEN 'Matched'
ELSE 'Unmatched'
END AS Status ,
table1.*
FROM
table1
LEFT JOIN
table2 ON table1.ID = table2.ID_1
UNION
SELECT DISTINCT
CASE
WHEN (table1.ID = table2.ID_2)
THEN 'Matched'
ELSE 'Unmatched'
END AS Status,
table1.*
FROM
table1
LEFT JOIN
table2 ON table1.ID = table2.ID_2
I think that a correlated subquery with an exists condition would be a reasonable solution:
select
t1.*,
case when exists (select 1 from table2 t2 where t1.id in (t2.id_1, t2.id_2))
then 'Matched'
else 'Unmatched'
end matched
from table1 t1
And the other way around:
select
t2.*,
case when exists (select 1 from table1 t1 where t1.id in (t2.id_1, t2.id_2))
then 'Matched'
else 'Unmatched'
end matched
from table2 t2
If you want to "align" the rows based on the match for the whole dataset at once, then you might want to try a full join:
select t1.*, t2.*
from table1 t1
full join table2 t2 on t1.id in (t2.id_1, t2.id_2)

MSSQL Union All two queries with if statement

I have a query the following works as expected
If((Select count(*) from table1 where product = 'carrot')< 5)
Begin
Select Top (5 - (Select count(*) from table1 where product = 'carrot'))
id, product From table2
WHere id NOT IN
(Select id from table1) AND product = 'carrot'
Order by newid()
END
What i want to do is Union or Union all say another product potatoes
If((Select count(*) from table1 where product = 'potato')< 5)
Begin
Select Top (5 - (Select count(*) from table1 where product = 'potato'))
id, product From table2
WHere id NOT IN
(Select id from table1) AND product = 'potato'
Order by newid()
END
I keep getting a syntax error, when i add UNION between IF or after END. Is this possible or another way is better....
What i am doing is trying to select a random sample of carrots, first i want to check if i have the 5 carrots in table1. if i do don't run sample.
If i do not have 5 total carrots run the sampler and return 5 carrots. I then filter out if they already exist in table 1 by the id. Then it subtracts the count from the new sample for a total of five.
It works well, now i want to run for other products eg lettuce, potatoes etc...
But i want an UNION or UNION All. hope makes sense.
I'd be interested to see whether this way works-
Select Top (5 - (Select count(*) from table1 where product = 'carrots')< 5)
id
, product
From table2
WHere id NOT IN (Select id from table2)
AND (Select count(*) from table1 where product = 'carrots')< 5)
UNION ALL
Select Top (5 - (Select count(*) from table1 where product = 'potatoes')< 5)
id
, product
From table2
WHere id NOT IN (Select id from table2)
AND (Select count(*) from table1 where product = 'potatoes')< 5)
Your style is interesting, feels procedural rather than set-based.
You can try it this way
If(((Select count(*) from table1 where product = 'carrot'< 5) and (Select count(*) from table1 where product ='potato' <5))
)
Begin
Select Top (5 - (Select count(*) from table1 where product = 'carrot')) id, product
From table2
WHere id NOT IN (Select id from table1) AND product = 'carrot' Order by newid()
Union all
Select Top (5 - (Select count(*) from table1 where product = 'potato')) id, product From table2
WHere id NOT IN (Select id from table1) AND product = 'potato' Order by newid()
END
IF statements in SQL do not behave as sub-queries or row-sets in SQL, as you've found out. They are for branching the flow of control only.
Here is a more set based approach you could take:
SELECT ProdSamples.*
FROM
(
SELECT Table2.*, ROW_NUMBER() OVER (PARTITION BY table2.Product ORDER BY NEWID()) RowNum
FROM Table2
LEFT JOIN Table1
ON Table1.id = Table2.id
WHERE Table1.id IS NULL
) ProdSamples
JOIN
(
SELECT Product, COUNT(*) ProdCount
FROM Table1
GROUP BY Product
) ProdCounts
ON ProdSamples.Product = ProdCounts.Product
AND ProdSamples.RowNum <= (5 - ProdCounts.ProdCount)
The first sub-query ProdSamples returns all the products from Table2 that do not have an id in Table1. The RowNum field ranks them in random order partitioned by Product.
The second sub-query ProdCounts is the count of records for each product in Table1. Then it joins these sub-queries together and only returns the records from ProdSamples where the RowNum is lower or equal to the number of samples you want to return.

How to test against a list of items in an if statement

I have a large table (130 columns). It is a monthly dataset that is separated by month (jan,feb,mar,...). every month I get a small set of duplicate rows. I would like to remove one of the rows, it does not matter which row to be deleted.
This query seems to work ok when I only select the ID that I want to filter the dups on, but when I select everything "*" from the table I end up with all of the rows, dups included. My goal is to filter out the dups and insert the result set into a new table.
SELECT DISTINCT a.[ID]
FROM MonthlyLoan a
JOIN (SELECT COUNT(*) as Count, b.[ID]
FROM MonthlyLoan b
GROUP BY b.[ID])
AS b ON a.[ID] = b.[ID]
WHERE b.Count > 1
and effectiveDate = '01/31/2017'
Any help will be appreciated.
This will show you all duplicates per ID:
;WITH Duplicates AS
(
SELECT ID
rn = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID)
FROM MonthlyLoan
)
SELECT ID,
rn
FROM Duplicates
WHERE rn > 1
Alternatively, you can set rn = 2 to find the immediate duplicate per ID.
Since your ID is dupped (A DUPPED ID!!!!)
all you need it to use the HAVING clause in your aggregate.
See the below example.
declare #tableA as table
(
ID int not null
)
insert into #tableA
values
(1),(2),(2),(3),(3),(3),(4),(5)
select ID, COUNT(*) as [Count]
from #tableA
group by ID
having COUNT(*) > 1
Result:
ID Count
----------- -----------
2 2
3 3
To insert the result into a #Temporary Table:
select ID, COUNT(*) as [Count]
into #temp
from #tableA
group by ID
having COUNT(*) > 1
select * from #temp

Get rows on first table not on left join's result set

I have two tables.
TableA
Id
Column 1
Column 2
TableB (n-1 mapping with TableA)
Column 1
Column 2
fkToTableAonIdentity
and my query is
DECLARE #Offset = 0,
DECLARE #pageSize = 10
SELECT
A.column1, B.Column1
FROM
TableA AS A
LEFT JOIN
TABLEB AS TABLE B
ORDER BY
B.Column2 DESC
OFFSET #Offset ROWS
FETCH NEXT #PageSize ROWS ONLY
I was trying to fetch 10 rows from tableA joining data for tableB
but the query will only return exact 10 rows from the set created by left join, but I needed 10 rows of data from table A, so in set of left join number of rows may vary for each record in TableA.
How can I get the desired result?
Update:
I am using the above query in my stored procedure where #pageSize will be a parameter to the stored procedure.
Use following syntax:
SELECT * FROM
(SELECT TOP 10 * FROM Table1) ST1
JOIN Table2 ON ST1.Id=Table2.FkToT1
I expect your query will look as following one:
SELECT ST1.Col1, T2.Col1 FROM
(
SELECT * FROM Table1
ORDER BY Col1
OFFSET #offset ROWS
FETCH NEXT #page ROWS ONLY
) ST1
JOIN Table2 T2 ON ST1.Id=T2.FkToT1

sql select inside count - without join

There is a table called: IDs and another table called Entries.
Not all ids from Ids have entries. I do want to count how many entries have ALL the ids. if an Id has no entry I want to print 0.
Ids have PK: ID and Entries have a column ID.
If I joined them I get only the IDS having entries, but I want to get all of the IDS.
You are using INNER JOIN you can achieve this by using LEFT JOIN instead
EXAMPLE
/* Declare Temperory table for data storage */
DECLARE #MasterTable AS TABLE
(
ID INT
)
DECLARE #EntryTable AS TABLE
(
EntryID INT IDENTITY(1,1)
,MasterId INT
)
--Insert entries to Master Table
INSERT INTO #MasterTable
SELECT 1
UNION
SELECT 2
UNION
SELECT 3
UNION
SELECT 4
--Insert details into details table for only 1 and 2
INSERT INTO #EntryTable
(
MasterId
)
SELECT 1
UNION ALL
SELECT 2
UNION ALL
SELECT 3
UNION ALL
SELECT 3
SELECT
ID
,COUNT(EntryTable.MasterId) AS EntryCount
FROM
#MasterTable MainTable
LEFT JOIN
#EntryTable EntryTable
ON
MainTable.ID = EntryTable.MasterId
GROUP BY
ID
Use a left join
select ids.id, count(entries.id)
from ids
left join entries on entries.id = ids.id
group by ids.id
Also see this great explanation of joins
SELECT DISTINCT id, EntriesCount.entriesCount
FROM IDs
OUTER APPLY (
SELECT COUNT(id) entriesCount
FROM Entries
WHERE Entries.ids = IDs.id
) AS EntriesCount
outer apply let's you use the id from IDs in the 'where' condition from Entries.

Resources