Truly null-safe comparison - sql-server

Is there a truly safe way of checking two nullable values are not equal in T-SQL that is shorter than this?
where
A.MyField != B.MyField
or (
A.MyField is null
and B.MyField is not null
)
or (
A.MyField is not null
and B.MyField is null
)
Using isnull() isn't truly safe as it collapses null values into a 'real' value that could potentially exist in the data set, for eg:
where
isnull(A.MyField, '') != isnull(B.MyField, '')
would incorrectly think that an empty string '' and null are equal, which is not the desired result. You could come up with a "known" value that never occurs or is exceedingly unlikely to occur, but this seems like a band-aid fix.
Setting ANSI_NULLS off is also undesirable for several reasons (most particularly that the feature is being depreciated).
Is there functionality that will do a "true" null-safe check, or is the code above the best way?

For versions from 2005+ You can use
WHERE EXISTS (SELECT A.MyField
EXCEPT
SELECT B.MyField)
From SQL Server 2022 you can use
WHERE A.MyField IS DISTINCT FROM B.MyField

Related

Why does the SQL language use IS NULL or IS NOT NULL instead of = NULL or <> NULL?

I mostly have application development background. In programming languages variable == null or variable != null works.
When it comes to SQL, below queries don't give any syntax errors but don't return correct results either.
select SomeColumn from SomeTable where SomeNullableColumn=null
select SomeColumn from SomeTable where SomeNullableColumn<>null
Why do we need to write the queries with is null or is not null to get correct results?
select SomeColumn from SomeTable where SomeNullableColumn is null
select SomeColumn from SomeTable where SomeNullableColumn is not null
What's the reasons/requirements behind is null or is not null instead of =null or <>null?
Is this ANSI SQL or Microsoft's TSQL standard?
Because null is not equal to null, like NaN ("not a number" produced when, for example, 0 is divided by 0 or raised to the power of 0) is not equal to NaN. Consider null as an unknown value (an unknown value is not necessarily equal to some unknown value).
Because sql is based on set theory and predicate logic. This two strong mathematical foundations contains mathematical logic - Boolean algebra. In this logic exist three values - TRUE, FALSE and UNKNOWN (=NULL).
Is null is used instead of = null to separate logic. And why it is in ANSI standard? It is to make it more readable and separate three-valued logic.
In common datawarehouses is null replaced by '' to make qry more simple.
Or try SET ANSI_NULLS OFF option.

Is Sql Server's ISNULL() function lazy/short-circuited?

TIs ISNULL() a lazy function?
That is, if i code something like the following:
SELECT ISNULL(MYFIELD, getMyFunction()) FROM MYTABLE
will it always evaluate getMyFunction() or will it only evaluate it in the case where MYFIELD is actually null?
This works fine
declare #X int
set #X = 1
select isnull(#X, 1/0)
But introducing an aggregate will make it fail and proving that the second argument could be evaluated before the first, sometimes.
declare #X int
set #X = 1
select isnull(#X, min(1/0))
It's whichever it thinks will work best.
Now it's functionally lazy, which is the important thing. E.g. if col1 is a varchar which will always contain a number when col2 is null, then
isnull(col2, cast(col1 as int))
Will work.
However, it's not specified whether it will try the cast before or simultaneously with the null-check and eat the error if col2 isn't null, or if it will only try the cast at all if col2 is null.
At the very least, we would expect it to obtain col1 in any case because a single scan of a table obtaining 2 values is going to be faster than two scans obtaining one each.
The same SQL commands can be executed in very different ways, because the instructions we give are turned into lower-level operations based on knowledge of the indices and statistics about the tables.
For that reason, in terms of performance, the answer is "when it seems like it would be a good idea it is, otherwise it isn't".
In terms of observed behaviour, it is lazy.
Edit: Mikael Eriksson's answer shows that there are cases that may indeed error due to not being lazy. I'll stick by my answer here in terms of the performance impact, but his is vital in terms of correctness impact across at least some cases.
Judging from the different behavior of
SELECT ISNULL(1, 1/0)
SELECT ISNULL(NULL, 1/0)
the first SELECT returns 1, the second raises a Msg 8134, Level 16, State 1, Line 4
Divide by zero error encountered. error.
This "lazy" feature you are referring to is in fact called "short-circuiting"
And it does NOT always work especially if you have a udf in the ISNULL expression.
Check this article where tests were run to prove it:
Short-circuiting (mainly in VB.Net and SQL Server)
T-SQL is a declarative language hence it cannot control the algorithm used to get the results.. it just declares what results it needs. It is upto the query engine/optimizer to figure out the cost-effective plan. And in SQL Server, the optimizer uses "contradiction detection" which will never guarantee a left-to-right evaluation as you would assume in procedural languages.
For your example, did a quick test:
Created the scalar-valued UDF to invoke the Divide by zero error:
CREATE FUNCTION getMyFunction
( #MyValue INT )
RETURNS INT
AS
BEGIN
RETURN (1/0)
END
GO
Running the below query did not give me a Divide by zero error encountered error.
DECLARE #test INT
SET #test = 1
SET #test = ISNULL(#test, (dbo.getMyFunction(1)))
SELECT #test
Changing the SET to the below statement did give me the Divide by zero error encountered. error. (introduced a SELECT in ISNULL)
SET #test = ISNULL(#test, (SELECT dbo.getMyFunction(1)))
But with values instead of variables, it never gave me the error.
SELECT ISNULL(1, (dbo.getMyFunction(1)))
SELECT ISNULL(1, (SELECT dbo.getMyFunction(1)))
So unless you really figure out how the optimizer is evaluating these expressions for all permutations, it would be safe to not rely on the short-circuit capabilities of T-SQL.

NULL vs empty string

What is the difference between the below queries & how it works?
SELECT * FROM some_table WHERE col IS NOT NULL
&
SELECT * FROM some_table WHERE col <> ''
Regards,
Mubarak
The NULL is special data type, it means absence of value.
An empty string on the other hand means a string or value which is empty.
Both are different.
For example, if you have name field in table and by default you have set it to NULL. When no value is specified for it, it will be NULL but if you specify a real name or an empty string, it won't be NULL then, it will contain an empty string instead.
NULL is the absence of value, and usually indicates something meaningful, such as unknown or not (yet) determined. For example, if I start a project today, the StartDate is 2012-02-25. If I don't know how long the project is going to take, what should the EndDate be? I might have some idea what the ProjectedEndDate may be, but I would set the EndDate to NULL, and update it when the project is complete.
'' is a zero-length (or "empty") string. It is not technically the absence of data, since it might actually be meaningful. For example, if I don't have a middle name, depending on your data model '' might make more sense than NULL since the latter implies unknown but '' can imply that it is known that I don't have one. NULL can be used the same way of course, but then it is difficult to decipher whether it is not known whether it exists, or known that it does not exist. A lot of standards have dedicated values for things where it might not be known - for example Gender has I believe 9 different character codes so that if M or F are not specified, we always know exactly why (unknown, unspecified, transgender, etc). Also think of the case where HeartRate is NULL - is it because there was no pulse, or because we haven't taken it yet?
They are not the same, though unfortunately many people treat them the same. If your column allows NULL it means that you know in advance that sometimes you may not know this information. If you are not treating them as the same thing, then your queries would differ. For example if col does not allow NULL, your first query will always return all results in the table, since none of them can be NULL. However NOT NULL still allows an empty string to be entered unless you have also set up a check constraint to prevent zero-length strings also.
Allowing both for the same column is usually a bit confusing for someone trying to understand the data model, though I believe in most cases a NOT NULL constraint is not combined with a LEN(col)>0 check constraint. The problem if both are allowed is that it is difficult to know what it means if the column is NULL or the column is "empty" - they could mean the same thing, but they may not - and this will vary from shop to shop.
Another key point is that NULL compared to anything (at least by default in SQL Server*) evaluates to unknown, which in turn evaluates to false. As an example, these queries all return 0:
DECLARE #x TABLE(i INT);
INSERT #x VALUES(NULL);
SELECT COUNT(*) FROM #x WHERE i = 1;
SELECT COUNT(*) FROM #x WHERE i <> 1;
SELECT COUNT(*) FROM #x WHERE i <= 3;
SELECT COUNT(*) FROM #x WHERE i > 3;
SELECT COUNT(*) FROM #x WHERE i IN (1,2,3);
SELECT COUNT(*) FROM #x WHERE i NOT IN (1,2,3);
Since the comparisons in the where clause always evaluate to unknown, they always come back false, so no rows ever meet the criteria and all counts come back as 0.
In addition, the answers to this question on dba.stackexchange might be useful:
https://dba.stackexchange.com/questions/5222/why-shouldnt-we-allow-nulls
* You can change this by using SET ANSI_NULLS OFF - however this is not advised both because it provides non-standard behavior and because this "feature" has been deprecated since SQL Server 2005 and will become a no-op in a future version of SQL Server. But you can play with the query above and see that the NOT IN behaves differently with SET ANSI_NULLS OFF.
NULL means the value is missing but '' means the value is there but just empty string
so first query means query all rows that col value is not missing, second one means select those rows that col not equals empty string
Update
For further information, I suggest you read this article:
https://sqlserverfast.com/blog/hugo/2007/07/null-the-databases-black-hole/
Select * from table where col IS NOT NULLwould return results excluded from Select * from table where col <> ‘’ because an empty string is also NOT NULL.
https://data.stackexchange.com/stackoverflow/query/62491/http-stackoverflow-com-questions-9444638-null-vs-empty-in-sql-server
SET NOCOUNT ON;
DECLARE #tbl AS TABLE (value varchar(50) NULL, description varchar(50) NOT NULL);
INSERT INTO #tbl VALUES (NULL, 'A Null'), ('', 'Empty String'), ('Some Text', 'A non-empty string');
SELECT * FROM #tbl;
SELECT * FROM #tbl WHERE value IS NOT NULL;
SELECT * FROM #tbl WHERE value <> '';
Note that in the display you cannot distinguish between NULL and '' - this is only an artifact of how the grid and text client display the data, but the data in the set is stored differently for NULL and ''.
As stated in other answers, NULL means 'no value' while empty string '' means just that - empty string. You can think of fields that allow NULLs as optional fields - they can be ignored and value for them may just not be provided.
Imagine an application where respondent selects their title (Mr, Mrs, Miss, Dr) but you do not require him/her to select any of those and leave it blank. In this case you would put NULL into relevant database field.
Distinction between NULL and empty string may not be obvious because they both can mean 'no value' if you decide to. It depends entirely up to you but using NULL would be better mainly because it is a special case for databases which are designed to handle NULLs quickly and efficiently (much faster than strings). If you use it instead of an empty string your queries will be faster and more reliable.

MyNullableCol <> 'MyValue' Doesn't Includes Rows where MyNullableCol IS NULL

Today I found a strange problem that is I have a Table With a Nullable Column and I tried to use following Query
SELECT *
Id,
MyNCol,
FROM dbo.[MyTable]
WHERE MyNCol <> 'MyValue'
And Expecting it to return all rows not having 'MyValue' value in Field MyNCol. But its not returning all those Rows Containing NULL in above specified Column(MyNCol).
So I have to Rewrite my query to
SELECT *
Id,
MyNCol,
FROM dbo.[MyTable]
WHERE MyNCol <> 'MyValue' OR MyNCol IS NULL
Now My question is why is it why the first query is not sufficient to perform what i desire. :(
Regards
Mubashar
Look into the behaviour of "ANSI NULLs".
Standardly, NULL != NULL. You can change this behaviour by altering the ANSI NULLS enabled property on your database, but this has far ranging effects on column storage and query execution. Alternatively, you can use the SET ANSI_NULLS statement in your SQL script.
MSDN Documentation for SQL Server 2008 is here:
SET ANSI_NULLS (Transact-SQL)
Pay particular attention to the details of how comparisons are performed - even when setting ANSI_NULLS OFF, column to column comparisons may not behave as you intended. It is best practice to always use IS NULL and IS NOT NULL.
Equality operators cannot be performed on null values. In SQL 2000, you could check null = null, however in 2005 this changed.
NULL = UNKNOWN. So you can neither say that MyNCol = 'My Value' nor that MyNCol <> 'My Value' - both evaluate to UNKNOWN (just like 'foo' + MyNCol will still become NULL). An alternative to the workaround you posted is:
WHERE COALESCE(MyNCol, 'Not my Value') <> 'My Value';
This will include the NULL columns because the comparison becomes false rather than unknown. I don't like this option any better because it relies on some magic value in the comparison, even though that magic value will only be supplied when the column is NULL.

Is there a setting in SQL Server to have null = null evaluate to true?

Is there a setting in SQL Server to have null = null evaluate to true?
It is not SQL Server's fault, it is due to the ternary logic introduced by the NULL value. null=null will never be true, and null <> null is not true either.
you could use ANSI_NULL OFF but:
"In a future version of SQL Server, ANSI_NULLS will always be ON and any applications that explicitly set the option to OFF will generate an error. Avoid using this feature in new development work, and plan to modify applications that currently use this feature."
Wouldn't COALESCE do what you need?
From the MSDN Documentation:
SET ANSI_NULLS OFF
Will create the following comparison results:
10 = NULL False
NULL = NULL True
10 <> NULL True
NULL <> NULL False
With the following setting, which is the default:
SET ANSI_NULLS ON
The same comparisons will give these results:
10 = NULL NULL (Unknown)
NULL = NULL NULL (Unknown)
10 <> NULL NULL (Unknown)
NULL <> NULL NULL (Unknown)
Edit: Ok, so per comments, I tried some specific SQL to verify the claims that this did not work, and here's what I found:
SET ANSI_NULLS OFF
CREATE TABLE TestTable (USERNAME VARCHAR(20))
INSERT INTO TestTable VALUES (NULL)
SELECT * FROM TestTable WHERE USERNAME = USERNAME
SELECT * FROM TestTable WHERE USERNAME = NULL
Produces this output:
[USERNAME]
(0 row(s) affected)
[USERNAME]
NULL
(1 row(s) affected)
So I guess this setting is flawed. I've only seen and used this setting in one particular reporting query so I wasn't aware of the difference in query plans that makes it work in one instance and not in another.
Then there is no setting that works.
Even if it did work, as per other answers here, relying on this setting is a bad idea since it will be yanked out from SQL Server in a future version.
Oh well.
Yeah turning that off doesn't seem good in general.
I am just mentioned this in case you are doing a comparison where you first object might not be a null in the comparison:
val IS NULL can be used to test if something is null or not.
NULL = NULL should be False because Unknown = Unknown is Unknown or False.
I will avoid the ansi_nulls setting and use isnull function although it could complicate certain queries.
We're possibly dealing with this at the wrong level of abstraction. It appears that you have a query where you've stated it in such a fashion that you aren't getting the result you expect. So you've asked a question about how to change the database so it will give you what you expect, rather than what the database understood.
Wouldn't it be better to ask about how to restate your query so the database understands it as you intended?
Then you finish off with the assertion that it will complicate other queries. I think there was a general reacion of "ouch" from a lot of us who read that. because, given your apparent understanding of NULLs, that's probably true.
We might be able to help us discuss the your understanding of NULLs, if you would tell us what it is that you think will cause these problems.

Resources