Compare 2 result sets

Compare 2 result sets - sql-server

I am working in SQL Server 2008. I have 2 queries. The first one is:
SELECT
col1,
col2
FROM tableA
A typical result set of this query is:
col1 col2
facilityA 10
facilityB 20
The second one is:
SELECT
colx,
COUNT(*) AS 'Totals'
FROM tableB
GROUP BY colx
A typical result set of this query is:
colx Totals
facilityA 10
facilityB 50
I want to return all records in the first result set where the values are different between col2 and Totals from the second result set, for corresponding values between col1 and colx. For instance, the given example should return:
col1 col2
facilityB 20
How do I achieve this?

Sounds like you want to use the EXCEPT clause. This will compare each row column by column and if an identical row exists between the two datasets it will be excluded:
SELECT col1, col2
FROM tableA
EXCEPT
SELECT colx, COUNT(*) AS 'Totals'
FROM tableB
GROUP BY colx

Related

How to derive column value based on occurrence of a phrase in Snowflake

I have input table as below
I want to have a derived column with the logic like
If for single value of COL1, if field COL2 has 'ABC' then DERIVED_COL will be filled with 'ABC_FIXED', if for a single value of COL2, if field COL# does not have 'ABC', then DERIVED_COL will be filled with 'ABC_NONFIXED'.
Is this possible in Snowflake?

Just do a self-join with a subset of same table with col2='ABC'. If join produces result means ABC fixed else ABC not fixed.
select orig.col1,orig.col2,
case when abc.col2 is not null then 'ABC_FIXED' else 'ABC_NOTFIXED' end derived_col
from mytable orig
left join (select distinct col1, col2 from mytable where col2='ABC') abc
on abc.col1=orig.col1

Using windowed function:
SELECT *, CASE WHEN COUNT_IF(COL2 = 'ABC') OVER(PARTITION BY COL1) > 0
THEN 'ABC_FIXED' ELSE 'ABC_NOTFIXED'
END AS DERIVED_COL
FROM tab;

How to get data from other table within a group by query?

I have tried to group records from one table which have similar SerialNo. And I also want to show a column records from other table that has relation ship with table one using SerialNo.
I have a table 1:
And table 2:
My Query is:
select CIT_SERIALNUMBER, COUNT(CIT_ID)
as Cases from Table_2 where CIT_SOURCEID like '%E_One%'
and (CIT_CREATED BETWEEN '2018-01-15'AND '2019-06-15') and CIT_SERIALNUMBER is not null
group by CIT_SERIALNUMBER
having COUNT(CIT_ID)>1 order by min(CIT_CREATED) desc
Here is the result table:
In the query above I’ve got only CIT_SERIALNUMBER records from Table_2. But I also want to get the data from Table_1 column ComputerName. So, the expected result is:
Note: The two table 1 and 2 can be join by Column T1_Serial and CIT_SERIALNUMBER.
Please help me to re-write the sql query to achieve the expected result above.

If I understood your column names correctly, try this:
select CIT_SERIALNUMBER, ComputerName, COUNT(CIT_ID)
as Cases from Table_2 join Table_1 on Table_2.CIT_SERIALNUMBER=Table_1.Serial where CIT_SOURCEID like '%E_One%'
and (CIT_CREATED BETWEEN '2018-01-15'AND '2019-06-15') and CIT_SERIALNUMBER is not null
group by CIT_SERIALNUMBER
having COUNT(CIT_ID)>1 order by min(CIT_CREATED) desc

Try this-
SELECT A.ComputerName,
CIT_SERIALNUMBER,
COUNT(CIT_ID) AS Cases
FROM table_1 A
INNER JOIN Table_2 B ON A.Column T1_Serial = CIT_SERIALNUMBER.
WHERE B.CIT_SOURCEID LIKE '%E_One%'
AND (B.CIT_CREATED BETWEEN '2018-01-15' AND '2019-06-15')
AND B.CIT_SERIALNUMBER IS NOT NULL
GROUP BY A.ComputerName,B.CIT_SERIALNUMBER
HAVING COUNT(B.CIT_ID) > 1
ORDER BY MIN(B.CIT_CREATED) DESC;

It look odd, but I've got a solution for this:
I select all duplicated records from Table_2 first
Then I join Table_1 with result set of Table_2 to view column from both tables
Then I use another select to select data from result set above and group by all records.
Here is the query:
select z.CIT_SERIALNUMBER, z.ComputerName, z.Cases from (
SELECT y.CIT_SERIALNUMBER, x.ComputerName, y.Cases
FROM Table_1 x
right JOIN (
select CIT_SERIALNUMBER, COUNT(CIT_ID)
as Cases from Table_2 where CIT_SOURCEID like '%E_One%'
and (CIT_CREATED BETWEEN '2018-01-15'AND '2019-06-15') and CIT_SERIALNUMBER is not null
group by CIT_SERIALNUMBER
having COUNT(CIT_ID)>1
) y ON y.CIT_SERIALNUMBER = x.SerialNo) z group by CIT_SERIALNUMBER, z.ComputerName, z.Cases
Result set:

Performance tuning on join two tables columns with patindex

Sample data:
Note:
The table tbl_test1 is filtered table, may have less records based on filtered earlier.
The following is just the data sample for understanding purpose. The actual table tbl_test2 is having 70 columns and 100 millions of records.
The WHERE condition is dynamic comes with any combination.
The display columns are also dynamic, i mean one or more columns.
create table tbl_test1
(
col1 varchar(100)
);
insert into tbl_test1 values('John Mak'),('Omont Boy'),('Will Smith'),('Mak John');
create table tbl_test2
(
col1 varchar(100)
);
insert into tbl_test2 values('John Mak'),('Smith Will'),('Jack Don');
query 1: The following query is take more than 10 min and still running for 100 millions records.
select t2.col1
from tbl_test2 t2
inner join tbl_test1 t2 on patindex('%'+t1.col1+'%',t2.col1) > 0
query 2: This also keeps running unable to get the result after 10 min of wait.
select t2.col1
from tbl_test2 t2
where exists
(
select * from tbl_test1 t1 where charindex(t1.col1,t2.col1) > 0
)
expected result:
col1
----------
John Mak
Smith Will

Query to find the record with most matching columns, where the number of columns and names of columns is unknown?

I have two tables, X and Y, with identical schema but different records. Given a record from X, I need a query to find the closest matching record in Y that contains NULL values for non-matching columns. Identity columns should be excluded from the comparison. For example, if my record looked like this:
------------------------
id | col1 | col2 | col3
------------------------
0 |'abc' |'def' | 'ghi'
And table Y looked like this:
------------------------
id | col1 | col2 | col3
------------------------
6 |'abc' |'def' | 'zzz'
8 | NULL |'def' | NULL
Then the closest match would be record 8, since where the columns don't match, there are NULL values. 6 WOULD have been the closest match, but the 'zzz' disqualified it.
What's unique about this problem is that the schema of the tables is unknown besides the id column and the data types. There could be 4 columns, or there could be 7 columns. We just don't know - it's dynamic. All we know is that there is going to be an 'id' column and that the columns will be strings, either varchar or nvarchar.
What is the best query in this case to pick the closest matching record out of Y, given a record from X? I'm actually writing a function. The input is an integer (the id of a record in X) and the output is an integer (the id of a record in Y, or NULL). I'm an SQL novice, so a brief explanation of what's happening in your solution would help me greatly.

There could be 4 columns, or there could be 7 columns.... I'm actually writing a function.
This is an impossible task. Because functions are deterministic, so you cannot have a function that will work on an arbitrary table structure, using dynamic SQL. A stored procedure, sure, but not a function.
However, the below shows you a way using FOR XML and some decomposing of the XML to unpivot rows into column names and values which can then be compared. The technique used here and the queries can be incorporated into a stored procedure.
MS SQL Server 2008 Schema Setup:
-- this is the data table to match against
create table t1 (
id int,
col1 varchar(10),
col2 varchar(20),
col3 nvarchar(40));
insert t1
select 6, 'abc', 'def', 'zzz' union all
select 8, null , 'def', null;
-- this is the data with the row you want to match
create table t2 (
id int,
col1 varchar(10),
col2 varchar(20),
col3 nvarchar(40));
insert t2
select 0, 'abc', 'def', 'ghi';
GO
Query 1:
;with unpivoted1 as (
select n.n.value('local-name(.)','nvarchar(max)') colname,
n.n.value('.','nvarchar(max)') value
from (select (select * from t2 where id=0 for xml path(''), type)) x(xml)
cross apply x.xml.nodes('//*[local-name()!="id"]') n(n)
), unpivoted2 as (
select x.id,
n.n.value('local-name(.)','nvarchar(max)') colname,
n.n.value('.','nvarchar(max)') value
from (select id,(select * from t1 where id=outr.id for xml path(''), type) from t1 outr) x(id,xml)
cross apply x.xml.nodes('//*[local-name()!="id"]') n(n)
)
select TOP(1) WITH TIES
B.id,
sum(case when A.value=B.value then 1 else 0 end) matches
from unpivoted1 A
join unpivoted2 B on A.colname = B.colname
group by B.id
having max(case when A.value <> B.value then 1 end) is null
ORDER BY matches;
Results:
| ID | MATCHES |
----------------
| 8 | 1 |

SQL set operation with different number of columns in each set

Let say I have set 1:
1 30 60
2 45 90
3 120 240
4 30 60
5 20 40
and set 2
30 60
20 40
I would like to do some sort of union where I only keep rows 1,4,5 from set 1 because the latter 2 columns of set 1 can be found in set 2.
My problem is that set based operations insist on the same numnber of columns.
I've thought of concatenating the columns contents, but it feels dirty to me.
Is there a 'proper' way to accomplish this?
I'm on SQL Server 2008 R2
In the end, I would like to end up with
1 30 60
4 30 60
5 20 40
CLEARLY I need to go sleep as a simple join on 2 columns worked.... Thanks!

You are literally asking for
give me the rows in t1 where the 2 columns match in T2
So if the output is only rows 1, 4 and 5 from table 1 then it is a set based operation and can be done with EXISTS or INTERSECT or JOIN. For the "same number of column", then you simply set 2 conditions with an AND. This is evaluated per row
EXISTS is the most portable and compatible way and allows any column from table1
select id, val1, val2
from table1 t1
WHERE EXISTS (SELECT * FROM table2 t2
WHERE t1.val1 = t2.val1 AND t1.val2 = t2.val2)
INTERSECT requires the same columns in each clause and not all engines support this (SQL Server does since 2005+)
select val1, val2
from table1
INTERSECT
select val1, val2
from table2
With an INNER JOIN, if you have duplicate values for val1, val2 in table2 then you'll get more rows than expected. The internals of this usually makes it slower then EXISTS
select t1.id, t1.val1, t1.val2
from table1 t1
JOIN
table2 t2 ON t1.val1 = t2.val1 AND t1.val2 = t2.val2
Some RBDMS support IN on multiple columns: this isn't portable and SQL Server doesn't support it
Edit: some background
Relationally, it's a semi-join (One, Two).
SQL Server does it as a "left semi join"
INTERSECT and EXISTS in SQL Server usually give the same execution plan. The join type is a "left semi join" whereas INNER JOIN is a full "equi-join".

You could use union which, as opposed to union all, eliminates duplicates:
select val1, val2
from table1
union
select val1, val2
from table1
EDIT: Based on your edited question, you can exclude rows that match the second table using a not exists subquery:
select id, col1, col2
from table1 t1
where not exists
(
select *
from table2 t2
where t1.col1 = t2.col1
and t1.col2 = t2.col2
)
union all
select null, col1, col2
from table2
If you'd like to exclude rows from table2, omit union all and everything below it.