How does not equal in snowflake exactly work? - snowflake-cloud-data-platform

I have a snowflake query that has a field called status. The field either contains null or 'deleted'
when I do the following to get only deleted it works:
select * from tbl_1 where status = 'deleted'
when I try excluding all deleted it excludes everything, no records are returned. Here's what I've tried
select * from tbl_1 where status != 'deleted'
select * from tbl_1 where status <> 'deleted'
Neither one works. Can someone tell me why or what's the proper way of doing this in snowflake?
Thanks in advance!

I think you should use EQUAL_NULL here. EQUAL_NULL Compares whether two expressions are equal. The function is NULL-safe, meaning it treats NULLs as known values for comparing equality. Note that this is different from the EQUAL comparison operator (=), which treats NULLs as unknown values.
https://docs.snowflake.com/en/sql-reference/functions/equal_null.html#equal-null
CREATE TABLE NULL_TEST(PRODUCT VARCHAR, STATUS VARCHAR);
INSERT INTO NULL_TEST VALUES('1111','DELETED');
INSERT INTO NULL_TEST VALUES('1112',NULL);
select * from NULL_TEST where NOT EQUAL_NULL(status,'DELETED');

You can try the following which factors for the status being null and unsearchable for you.
select * from tbl_1 where IS_NULL_VALUE(status:no_value) != 'deleted'
https://docs.snowflake.com/en/sql-reference/functions/is_null_value.html

Use NVL as it simply converts the NULL to another value that you can compare against
NVL doc here
I've tested using the below query to return values 2, 3 (that are not 'deleted')
Select *
from
values (1,'deleted'),(2,NULL),(3,null),(4,'deleted')
WHERE NVL($2,'N/A')!='deleted';
So for your case use SQL
select * from tbl_1 where NVL(status,'N/A') != 'deleted';

Related

SSIS SQL Task Unable to assign Parameter Value when using IF EXISTS

I'm using VS 2012 and SQL Server / SSIS.
I originally had a SQL task to check for duplicate values in a table:
SELECT COUNT(*) AS DupNI
FROM dbo.mytable
WHERE XMLFileID = ?
GROUP BY XMLFileID, NINumber
HAVING (COUNT(*) > 1);
The ? is because I am inserting a parameter value, and the result of the query is being assigned to a variable. It works fine if there is a duplicate.
When there are no duplicates, I get this message:
Single Row result set is specified, but no rows were returned
So, to get round this I now use an IF EXISTS, like the below:
IF EXISTS (SELECT COUNT(*) AS DupNI
FROM dbo.mytable
WHERE XMLFileID = ?
GROUP BY XMLFileID, NINumber
HAVING (COUNT(*) > 1))
SELECT COUNT(*) AS DupNI
FROM dbo.mytable
WHERE XMLFileID = ?
GROUP BY XMLFileID, NINumber
HAVING (COUNT(*) > 1)
ELSE
SELECT 0 AS DupNI;
However, now I get the error:
No value given for one or more required parameters.
It appears because I am wrapping the statement in the IF EXISTS, I can no longer inject the parameter values via the ?
Why is this? How do I get around this issue?
Your current query will return multiple rows if there are duplicates, one for each duplicate (XMLFileID, NINumber) pair. If you only want to return a value which indicates whether there are any duplicates in the table, you could use your EXISTS clause as an expression:
SELECT CASE WHEN EXISTS(
SELECT COUNT(*) AS DupNI
FROM dbo.mytable
WHERE XMLFileID = ?
GROUP BY XMLFileID, NINumber
HAVING (COUNT(*) > 1)
) THEN 1 ELSE 0 END AS [Duplicates Exist]
Demo on dbfiddle

Why does NOT IN condition on SQL Server "break" when list includes a NULL value? [duplicate]

This issue came up when I got different records counts for what I thought were identical queries one using a not in where constraint and the other a left join. The table in the not in constraint had one null value (bad data) which caused that query to return a count of 0 records. I sort of understand why but I could use some help fully grasping the concept.
To state it simply, why does query A return a result but B doesn't?
A: select 'true' where 3 in (1, 2, 3, null)
B: select 'true' where 3 not in (1, 2, null)
This was on SQL Server 2005. I also found that calling set ansi_nulls off causes B to return a result.
Query A is the same as:
select 'true' where 3 = 1 or 3 = 2 or 3 = 3 or 3 = null
Since 3 = 3 is true, you get a result.
Query B is the same as:
select 'true' where 3 <> 1 and 3 <> 2 and 3 <> null
When ansi_nulls is on, 3 <> null is UNKNOWN, so the predicate evaluates to UNKNOWN, and you don't get any rows.
When ansi_nulls is off, 3 <> null is true, so the predicate evaluates to true, and you get a row.
NOT IN returns 0 records when compared against an unknown value
Since NULL is an unknown, a NOT IN query containing a NULL or NULLs in the list of possible values will always return 0 records since there is no way to be sure that the NULL value is not the value being tested.
Whenever you use NULL you are really dealing with a Three-Valued logic.
Your first query returns results as the WHERE clause evaluates to:
3 = 1 or 3 = 2 or 3 = 3 or 3 = null
which is:
FALSE or FALSE or TRUE or UNKNOWN
which evaluates to
TRUE
The second one:
3 <> 1 and 3 <> 2 and 3 <> null
which evaluates to:
TRUE and TRUE and UNKNOWN
which evaluates to:
UNKNOWN
The UNKNOWN is not the same as FALSE
you can easily test it by calling:
select 'true' where 3 <> null
select 'true' where not (3 <> null)
Both queries will give you no results
If the UNKNOWN was the same as FALSE then assuming that the first query would give you FALSE the second would have to evaluate to TRUE as it would have been the same as NOT(FALSE).
That is not the case.
There is a very good article on this subject on SqlServerCentral.
The whole issue of NULLs and Three-Valued Logic can be a bit confusing at first but it is essential to understand in order to write correct queries in TSQL
Another article I would recommend is SQL Aggregate Functions and NULL.
Compare to null is undefined, unless you use IS NULL.
So, when comparing 3 to NULL (query A), it returns undefined.
I.e. SELECT 'true' where 3 in (1,2,null)
and
SELECT 'true' where 3 not in (1,2,null)
will produce the same result, as NOT (UNDEFINED) is still undefined, but not TRUE
IF you want to filter with NOT IN for a subquery containg NULLs justcheck for not null
SELECT blah FROM t WHERE blah NOT IN
(SELECT someotherBlah FROM t2 WHERE someotherBlah IS NOT NULL )
The title of this question at the time of writing is
SQL NOT IN constraint and NULL values
From the text of the question it appears that the problem was occurring in a SQL DML SELECT query, rather than a SQL DDL CONSTRAINT.
However, especially given the wording of the title, I want to point out that some statements made here are potentially misleading statements, those along the lines of (paraphrasing)
When the predicate evaluates to UNKNOWN you don't get any rows.
Although this is the case for SQL DML, when considering constraints the effect is different.
Consider this very simple table with two constraints taken directly from the predicates in the question (and addressed in an excellent answer by #Brannon):
DECLARE #T TABLE
(
true CHAR(4) DEFAULT 'true' NOT NULL,
CHECK ( 3 IN (1, 2, 3, NULL )),
CHECK ( 3 NOT IN (1, 2, NULL ))
);
INSERT INTO #T VALUES ('true');
SELECT COUNT(*) AS tally FROM #T;
As per #Brannon's answer, the first constraint (using IN) evaluates to TRUE and the second constraint (using NOT IN) evaluates to UNKNOWN. However, the insert succeeds! Therefore, in this case it is not strictly correct to say, "you don't get any rows" because we have indeed got a row inserted as a result.
The above effect is indeed the correct one as regards the SQL-92 Standard. Compare and contrast the following section from the SQL-92 spec
7.6 where clause
The result of the is a table of those rows of T for
which the result of the search condition is true.
4.10 Integrity constraints
A table check constraint is satisfied if and only if the specified
search condition is not false for any row of a table.
In other words:
In SQL DML, rows are removed from the result when the WHERE evaluates to UNKNOWN because it does not satisfy the condition "is true".
In SQL DDL (i.e. constraints), rows are not removed from the result when they evaluate to UNKNOWN because it does satisfy the condition "is not false".
Although the effects in SQL DML and SQL DDL respectively may seem contradictory, there is practical reason for giving UNKNOWN results the 'benefit of the doubt' by allowing them to satisfy a constraint (more correctly, allowing them to not fail to satisfy a constraint): without this behaviour, every constraints would have to explicitly handle nulls and that would be very unsatisfactory from a language design perspective (not to mention, a right pain for coders!)
p.s. if you are finding it as challenging to follow such logic as "unknown does not fail to satisfy a constraint" as I am to write it, then consider you can dispense with all this simply by avoiding nullable columns in SQL DDL and anything in SQL DML that produces nulls (e.g. outer joins)!
In A, 3 is tested for equality against each member of the set, yielding (FALSE, FALSE, TRUE, UNKNOWN). Since one of the elements is TRUE, the condition is TRUE. (It's also possible that some short-circuiting takes place here, so it actually stops as soon as it hits the first TRUE and never evaluates 3=NULL.)
In B, I think it is evaluating the condition as NOT (3 in (1,2,null)). Testing 3 for equality against the set yields (FALSE, FALSE, UNKNOWN), which is aggregated to UNKNOWN. NOT ( UNKNOWN ) yields UNKNOWN. So overall the truth of the condition is unknown, which at the end is essentially treated as FALSE.
SQL uses three-valued logic for truth values. The IN query produces the expected result:
SELECT * FROM (VALUES (1), (2)) AS tbl(col) WHERE col IN (NULL, 1)
-- returns first row
But adding a NOT does not invert the results:
SELECT * FROM (VALUES (1), (2)) AS tbl(col) WHERE NOT col IN (NULL, 1)
-- returns zero rows
This is because the above query is equivalent of the following:
SELECT * FROM (VALUES (1), (2)) AS tbl(col) WHERE NOT (col = NULL OR col = 1)
Here is how the where clause is evaluated:
| col | col = NULL⁽¹⁾ | col = 1 | col = NULL OR col = 1 | NOT (col = NULL OR col = 1) |
|-----|----------------|---------|-----------------------|-----------------------------|
| 1 | UNKNOWN | TRUE | TRUE | FALSE |
| 2 | UNKNOWN | FALSE | UNKNOWN⁽²⁾ | UNKNOWN⁽³⁾ |
Notice that:
The comparison involving NULL yields UNKNOWN
The OR expression where none of the operands are TRUE and at least one operand is UNKNOWN yields UNKNOWN (ref)
The NOT of UNKNOWN yields UNKNOWN (ref)
You can extend the above example to more than two values (e.g. NULL, 1 and 2) but the result will be same: if one of the values is NULL then no row will match.
Null signifies and absence of data, that is it is unknown, not a data value of nothing. It's very easy for people from a programming background to confuse this because in C type languages when using pointers null is indeed nothing.
Hence in the first case 3 is indeed in the set of (1,2,3,null) so true is returned
In the second however you can reduce it to
select 'true' where 3 not in (null)
So nothing is returned because the parser knows nothing about the set to which you are comparing it - it's not an empty set but an unknown set. Using (1, 2, null) doesn't help because the (1,2) set is obviously false, but then you're and'ing that against unknown, which is unknown.
It may be concluded from answers here that NOT IN (subquery) doesn't handle nulls correctly and should be avoided in favour of NOT EXISTS. However, such a conclusion may be premature. In the following scenario, credited to Chris Date (Database Programming and Design, Vol 2 No 9, September 1989), it is NOT IN that handles nulls correctly and returns the correct result, rather than NOT EXISTS.
Consider a table sp to represent suppliers (sno) who are known to supply parts (pno) in quantity (qty). The table currently holds the following values:
VALUES ('S1', 'P1', NULL),
('S2', 'P1', 200),
('S3', 'P1', 1000)
Note that quantity is nullable i.e. to be able to record the fact a supplier is known to supply parts even if it is not known in what quantity.
The task is to find the suppliers who are known supply part number 'P1' but not in quantities of 1000.
The following uses NOT IN to correctly identify supplier 'S2' only:
WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1', NULL ),
( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT DISTINCT spx.sno
FROM sp spx
WHERE spx.pno = 'P1'
AND 1000 NOT IN (
SELECT spy.qty
FROM sp spy
WHERE spy.sno = spx.sno
AND spy.pno = 'P1'
);
However, the below query uses the same general structure but with NOT EXISTS but incorrectly includes supplier 'S1' in the result (i.e. for which the quantity is null):
WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1', NULL ),
( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT DISTINCT spx.sno
FROM sp spx
WHERE spx.pno = 'P1'
AND NOT EXISTS (
SELECT *
FROM sp spy
WHERE spy.sno = spx.sno
AND spy.pno = 'P1'
AND spy.qty = 1000
);
So NOT EXISTS is not the silver bullet it may have appeared!
Of course, source of the problem is the presence of nulls, therefore the 'real' solution is to eliminate those nulls.
This can be achieved (among other possible designs) using two tables:
sp suppliers known to supply parts
spq suppliers known to supply parts in known quantities
noting there should probably be a foreign key constraint where spq references sp.
The result can then be obtained using the 'minus' relational operator (being the EXCEPT keyword in Standard SQL) e.g.
WITH sp AS
( SELECT *
FROM ( VALUES ( 'S1', 'P1' ),
( 'S2', 'P1' ),
( 'S3', 'P1' ) )
AS T ( sno, pno )
),
spq AS
( SELECT *
FROM ( VALUES ( 'S2', 'P1', 200 ),
( 'S3', 'P1', 1000 ) )
AS T ( sno, pno, qty )
)
SELECT sno
FROM spq
WHERE pno = 'P1'
EXCEPT
SELECT sno
FROM spq
WHERE pno = 'P1'
AND qty = 1000;
this is for Boy:
select party_code
from abc as a
where party_code not in (select party_code
from xyz
where party_code = a.party_code);
this works regardless of ansi settings
also this might be of use to know the logical difference between join, exists and in
http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

Execution of ISNULL in SQL Server

Here is how i am using ISNULL condition to check for student address.
It works fine but how ISNULL function treat the null codition i.e the second parameter which is display if first condition is null.
Will it calculate Value for second parameter when first condition is not null?
select
...
...
(CASE
WHEN st.ADDRESS='Y' THEN st.LOCATION
ELSE
ISNULL(
(SELECT TOP 1 STDLOC.LOCATION FROM STDLOC
INNER JOIN COMLOC ON STKLOC.LOCATION=COMLOC.CODE AND COMLOC.ADDRESS='Y'
WHERE STDLOC.ZIBCODE=st.ZIBCODE)
,(SELECT TOP 1 COMLOC.LOCATION FROM COMLOC COMLOC.ZIBCODE=st.ZIBCODE))
END
) AS STDUDENTLOCATION
FROM STUDENT st
Both queries inside the ISNULL will be executed, even if the first query will return a value.
Here is a simple test I've made:
Create and populate sample table:
DECLARE #T AS TABLE
(
Col int
)
INSERT INTO #T Values(1),(2)
SELECT ISNULL(
(SELECT TOP 1 Col FROM #T ORDER BY Col DESC),
(SELECT TOP 1 Col FROM #T ORDER BY Col )
)
Execution plan image:
As you can clearly see, the execution plan includes both queries.
I also was looking for an answer. After some reading I came out with my own way to check it.
Deviding by zero will give an error, so we can try:
SELECT ISNULL( (SELECT TOP 1 object_id FROM sys.columns), 5 / 0)
This will give correct result. BUT
SELECT ISNULL( (SELECT TOP 0 object_id FROM sys.columns), 5 / 0)
It will throw an error, because result of first query gives NULL so it tries the second query which fails
ISNULL is a T-SQL specific function that will use the specified second parameter as the return value if the first parameter is NULL(https://msdn.microsoft.com/en-us/library/ms184325.aspx).
Use COALESCE function if you want to return the first non-null value from multiple arguments, and this is a standard function that is supported by all types of relational databases.
This POST provide a good answer for the question:
Is Sql Server's ISNULL() function lazy/short-circuited?

ISNULL() returning NULL

Why does the following simple query return null when there are no matching rows (<Condition> is not met by any row)?
SELECT ISNULL(MyField, 0) FROM [MyTable] WHERE <Condition>
I have tried COALESCE() as well, with similar results. How can I return zero when there are no matching rows?
This will work, provided you expect condition to reduce the result set to either 0 or 1 row:
SELECT ISNULL((SELECT MyField FROM [MyTable] WHERE <Condition>),0)
That is, create an outer query with no FROM clause (which will therefore always generate exactly one row) and then use a subquery to obtain your 0 or 1 row of actual data.
Use this:
SELECT ISNULL(COUNT(MyField), 0) FROM [MyTable] WHERE <Condition>
It'll return 0 if row is missing.
You cannot convert 0 rows to 1 row with a null value with any sql built-in fuction because that may cause mis-interpretation of data
But you can customize your result using the below logic (same as you do in .net).
If (select COUNT(MyField) FROM [MyTable] WHERE <Condition>)=0
select 0 as [MyField]
else
SELECT ISNULL(MyField, 0) as [MyField] FROM [MyTable] WHERE <Condition>

JOIN ON subselect returns what I want, but surrounding select is missing records when subselect returns NULL

I have a table where I am storing records with a Created_On date and a Last_Updated_On date. Each new record will be written with a Created_On, and each subsequent update writes a new row with the same Created_On, but an updated Last_Updated_On.
I am trying to design a query to return the newest row of each. What I have looks something like this:
SELECT
t1.[id] as id,
t1.[Store_Number] as storeNumber,
t1.[Date_Of_Inventory] as dateOfInventory,
t1.[Created_On] as createdOn,
t1.[Last_Updated_On] as lastUpdatedOn
FROM [UserData].[dbo].[StoreResponses] t1
JOIN (
SELECT
[Store_Number],
[Date_Of_Inventory],
MAX([Created_On]) co,
MAX([Last_Updated_On]) luo
FROM [UserData].[dbo].[StoreResponses]
GROUP BY [Store_Number],[Date_Of_Inventory]) t2
ON
t1.[Store_Number] = t2.[Store_Number]
AND t1.[Created_On] = t2.co
AND t1.[Last_Updated_On] = t2.luo
AND t1.[Date_Of_Inventory] = t2.[Date_Of_Inventory]
WHERE t1.[Store_Number] = 123
ORDER BY t1.[Created_On] ASC
The subselect works fine...I see X number of rows, grouped by Store_Number and Date_Of_Inventory, some of which have luo (Last_Updated_On) values of NULL. However, those rows in the sub-select where luo is null do not appear in the overall results. In other words, where I get 6 results in the sub-select, I only get 2 in the overall results, and its only those rows where the Last_Updated_On is not NULL.
So, as a test, I wrote the following:
SELECT 1 WHERE NULL = NULL
And got no results, but, when I run:
SELECT 1 WHERE 1 = 1
I get back a result of 1. Its as if SQL Server is not relating NULL to NULL.
How can I fix this? Why wouldn't two fields compare when both values are NULL?
You could use Coalesce (example assuming Store_Number is an integer)
ON
Coalesce(t1.[Store_Number],0) = Coalesce(t2.[Store_Number],0)
The ANSI Null comparison is not enabled by default; NULL doesn't equal NULL.
You can enable this (if your business case and your Database design usage of NULL requires this) by the Hint:
SET ansi_nulls off
Another alternative basic turn around using:
ON ((t1.[Store_Number] = t2.[Store_Number]) OR
(t1.[Store_Number] IS NULL AND t2.[Store_Number] IS NULL))
Executing your POC:
SET ansi_nulls off
SELECT 1 WHERE NULL = NULL
Returns:
1
This also works:
AND EXISTS (SELECT t1.Store_Number INTERSECT SELECT t2.Store_Number)

Resources