Why does UNION returns only one null? - sql-server

I understand null represents missing/unknown value, so a null is not equal to another null because two unknown things cannot be compared. For example
if null = null
select 'nulls are equal'
else
select 'nulls are not equal'
results in 'nulls are not equal' I used an = instead of is null or is not null here to emphasize the fact that two nulls cannot be compared.
Coming to UNION, UNION is supposed to eliminate duplicate values. I was expecting the below code to return two rows each with null since two null values are not equal, but I get only one null in the result set.
(select null as Col1)
union
(select null as Col1)
Why does SQL's interpretation of 'null as an unknown value' change in above two statements?

NULL is not comparable, but SQL generally does have the concept of "IS DISTINCT FROM"
SQL Server has a Connect item for it
1 IS DISTINCT FROM NULL = true
1 = null is false
For completeness, NULL IS DISTINCT FROM NULL = false
I would guess that DISTINCT and UNION use IS DISTINCT FROM (as Pரதீப் mentioned above)
Now, SQL Server does have IS DISTINCT FROM in INTERSECT and EXCEPT
DECLARE #t1 TABLE (t1col INT);
INSERT #t1 VALUES (1), (NULL), (2), (3), (3), (5), (5);
DECLARE #t2 TABLE (t2col INT);
INSERT #t2 VALUES (1), (NULL), (3), (4);
SELECT DISTINCT 't1 EXISTS t2', *
FROM #t1 t1 WHERE EXISTS (SELECT * FROM #t2 t2 WHERE t1.t1col = t2.t2col);
t1 EXISTS t2 1
t1 EXISTS t2 3
t1 EXISTS t2 3
SELECT DISTINCT 't1 INTERSECT t2', *
FROM #t1 INTERSECT SELECT 't1 INTERSECT t2', * FROM #t2;
t1 INTERSECT t2 NULL
t1 INTERSECT t2 1
t1 INTERSECT t2 3
INTERSECT and EXCEPT also remove duplicates because they do a semi-join
EXISTS is an anti-join BTW
For completeness
SELECT 't1 EXISTS t2', *
FROM #t1 t1 WHERE NOT EXISTS (SELECT * FROM #t2 t2 WHERE t1.t1col = t2.t2col);
t1 EXISTS t2 NULL
t1 EXISTS t2 2
t1 EXISTS t2 5
t1 EXISTS t2 5
SELECT 't1 EXCEPT t2', *
FROM #t1 EXCEPT SELECT 't1 EXCEPT t2', * FROM #t2;
t1 EXCEPT t2 2
t1 EXCEPT t2 5
Example taken from my answer Why does EXCEPT exist in T-SQL? with added NULLs

UNION is basically SELECT DISTINCT, so it would be eliminating duplicate NULL values, but it's not the same as Equal operation.
Using UNION ALL would give you all records including duplicating NULLs.
As for the first part of you question. NULL really equals NULL, but not with "=". This would give you result you expect:
if null IS null
select 'nulls are equal'
else
select 'nulls are not equal'
This is also helpful when dealing with nulls.

Try UNION ALL to retain everything in both sets without removing duplicates.

Related

Multiple OR operator in Snowflake in WHERE Clause is not working

I am trying to do below
Table 1
Table 2
I am writing a query like below to ensure that if any of the NOT IN satisfies, those records should be filtered out.
SELECT * FROM TABLE1
WHERE TABLE1."DEPTID" NOT IN (SELECT TABLE2."DEPTID" FROM TABLE2)
OR
TABLE1."EMPCOUNTRY" NOT IN (SELECT TABLE2."EMPCOUNTRY" FROM TABLE2)
OR
TABLE1."EMPZONE" NOT IN (SELECT TABLE2."EMPZONE" FROM TABLE2)
But it errors out
What am I doing wrong?
Edited : Exact Query is working with nvl , but result set is not as per requirement.
SELECT * FROM TABLE1 AS T1
WHERE
(
UPPER (T1."DEPTID") NOT IN
(SELECT UPPER (nvl (T2."DEPTID", '')) FROM TABLE2 AS T2)
OR
UPPER (T1."EMPZONE") NOT IN
(SELECT UPPER (nvl (T2."EMPZONE",'')) FROM TABLE2 AS T2)
)
Result - set is not as per requirement, it should filter out if there any DEPTID in the Table2 or if there is any EMPZONE in table 2 or both etc.
What should be the best way to achieve this?
I think the issue is about NULL values coming from the subqueries. Could you try to use something like this?
SELECT * FROM TABLE1 AS T1
WHERE
(
NOT EXISTS (SELECT 1 FROM TABLE2 T2 WHERE equal_null( UPPER(T1."DEPTID"), UPPER(T2."DEPTID")) )
OR
NOT EXISTS (SELECT 1 FROM TABLE2 T2 WHERE equal_null( UPPER(T1."EMPZONE"), UPPER(T2."EMPZONE")) )
);
Also here is a sample to demonstrate why you can't use NOT IN with a subquery returning NULL values:
create table NULL_TABLE ( v varchar);
insert into NULL_TABLE values (NULL),('ABC');
create or replace table MAIN_TABLE ( v varchar);
INSERT INTO MAIN_TABLE values
('Jack'),('Joe'),('ABC');
select * from MAIN_TABLE
where v NOT IN (select v FROM NULL_TABLE);
The last query returns NULL, because we can't determine if a value does not exist in series of numbers where some of them are not known - the last WHERE clause.
When using NOT IN the subquery should not allow null values as reselut, second conditions to happen all at once should be joined with AND instead of OR.
NOT (cond1 OR cond2 OR cond3)
<=>
(NOT cond1) AND (NOT cond2) AND (NOT cond3)
De Morgan's law: "The negation of a disjunction is the conjunction of the negations"
The final query should rather be:
SELECT *
FROM TABLE1 AS T1
WHERE T1."DEPTID" NOT IN (SELECT T2."DEPTID" FROM TABLE2 T2
WHERE T2."DEPTID" IS NOT NULL)
AND T1."EMPCOUNTRY" NOT IN (SELECT T2."EMPCOUNTRY" FROM TABLE2 T2
WHERE T2."EMPCOUNTRY" IS NOT NULL)
AND T1."EMPZONE" NOT IN (SELECT T2."EMPZONE" FROM TABLE2 T2
WHERE T2."EMPZONE" IS NOT NULL);

Select query to get counts

I have 2 tables that I want to query against. If a record exists in both tables then count record and assign to Both, if record exist in Table 1 (T1 Only), then count it and assign to T1 Only, if record exists in Table 2 (T2 Only), then count it and assign to T2 Only column. Have the result set come out as follows.
Both T1 Only T2 Only
2000 3000 4000
Use a FULL JOIN. It will return all records from both tables in the query. See example below for your desired result.
CREATE TABLE #Table1 (Id INT);
CREATE TABLE #Table2 (Id INT);
INSERT INTO #Table1 VALUES (1), (2), (3), (4), (5), (6);
INSERT INTO #Table2 VALUES (3), (4), (5), (6), (7);
SELECT
SUM(CASE WHEN T1.Id IS NOT NULL AND T2.Id IS NOT NULL THEN 1 ELSE 0 END) AS Both
, SUM(CASE WHEN T1.Id IS NOT NULL AND T2.Id IS NULL THEN 1 ELSE 0 END) AS T1Only
, SUM(CASE WHEN T1.Id IS NULL AND T2.Id IS NOT NULL THEN 1 ELSE 0 END) AS T2Only
FROM #Table1 AS T1 FULL JOIN #Table2 AS T2 ON T2.Id = T1.Id;

Joining tables without a common column in sql server

TABLE1
ID
----
1
2
3
4
5
TABLE2
Name
----
Z
Y
X
W
V
Expected Output:
ID Name
-------------------------
1 NULL
2 NULL
3 NULL
4 NULL
5 NULL
NULL Z
NULL Y
NULL X
NULL W
NULL V
I need a solution for the above scenario by using JOINS in SQL Server.
Using FULL OUTER JOIN, you can get the expected result.
Since there are no common fields, no records from Table1 should match with Table2 and vice versa. So perhaps ON 0 = 1 as the join condition also will work as expected. Thanks Bart Hofland
So the query below also will work:
SELECT T1.Id, T2.[Name]
FROM Table1 T1
FULL OUTER JOIN Table2 T2 ON 0 = 1;
or
SELECT T1.Id, T2.[Name]
FROM Table1 T1
FULL OUTER JOIN Table2 T2 ON T2.[Name] = CAST(T1.Id AS VARCHAR(2));
Demo with the sample data:
DECLARE #Table1 TABLE (Id INT);
INSERT INTO #Table1 (Id) VALUES
(1),
(2),
(3),
(4),
(5);
DECLARE #Table2 TABLE ([Name] VARCHAR(1));
INSERT INTO #Table2 ([Name]) VALUES
('Z'),
('Y'),
('X'),
('W'),
('V');
SELECT T1.Id, T2.[Name]
FROM #Table1 T1
FULL OUTER JOIN #Table2 T2 ON 0 = 1;
Output:
Id Name
-----------------
1 NULL
2 NULL
3 NULL
4 NULL
5 NULL
NULL Z
NULL Y
NULL X
NULL W
NULL V
I don't understand why you'd want this, but to get your expected results you could do this. This is not a join, though.
SELECT ID, NULL as NAME from Table1
UNION ALL
SELECT NULL, NAME from Table2
Edited to add
Since the question specifically requests a solution with a join, Arulkumar's answer of FULL OUTER JOIN is a better fit, and you don't have to worry about what the column data types are.

Combine two tables into one table with separated values

This is a simple rather simple question, but for whatever reason I just can't get to a solution.
How do I join the two tables like such that I have get NULL values like this?
Table #T1
A
--
1
2
Table #T2
B
--
3
Desired result:
A B
----
1 NULL
2 NULL
NULL 3
EDIT:
My solution was this
SELECT #T1.A, #T2.B
FROM #t2
RIGHT JOIN #T1 ON 1 = 0
UNION
SELECT #T1.A, #T2.B
FROM #t2
LEFT JOIN #t1 ON 1 = 0
But it seems overly complicated. Anything better?
Use FULL JOIN
select *
from #t1 t1
full outer join #t2 t2 on t1.a = t2.b
or use UNION ALL
select a,Null as b
from #t1
union all
select NULL, b
from #t2
since there is no common records in both the tables, both the query results will look same. When there is a common record, the result will differ. Use the one that suits your requirement
This is the better/simple one
SELECT #T1.A, #T2.B
FROM #t2
FULL OUTER JOIN #T1 ON 1 = 0

Set field to random value from another table

update table1
set firstname = (select top 1 firstname from table2 order by NEWID())
This just sets table1.firstname to the same value for all records. I know it's possible to do this, but everything I've seen online expects the same row count in both tables (or at least a greater amount in table1). I have 200,000 records in table1, I have 200 in table2. How can I set table1.firstname to a random value from table2.firstname when the row counts are off?
DECLARE #t1 TABLE (a INT)
DECLARE #t2 TABLE (b INT, c INT)
INSERT INTO #t1(a)
VALUES (0), (1), (2), (3), (4), (5)
INSERT INTO #t2(b)
VALUES (0), (1), (2)
UPDATE t2
SET c = t1.a
FROM #t2 t2
CROSS APPLY (
SELECT TOP(1) t1.a
FROM #t1 t1
WHERE t2.b IS NOT NULL -- any calculations for t2 columns
ORDER BY NEWID()
) t1
SELECT * FROM #t2
Output -
b c
----------- -----------
0 5
1 1
2 0

Resources