What is the difference between != and is not? And Nulls in general - sql-server

I am having a bit of difficulty understanding how t-sql treats null values.
As a C# guy, I tend to want to do
IF(#myVar != null)...
But, this doesn't ever appear to run my code. So I do
IF(#myVar is not null)
Whats the difference?
Second, the way addition works is unclear. Let's say I have
declare #someCount int, #someFinalResult int
--Select that returns null
SELECT #someCount = columnName from tableName where someColumn = someValue
Then if I do
SET #someFinalResult = #someCount + 1--I seem to get NULL if I had null + something
But, if I first
declare #someCount int, #someFinalResult int
--FIRST SET DEFAULT TO 0
SET #someCount = 0
--Select that returns null
SELECT #someCount = columnName from tableName where someColumn = someValue
Now, #someCount defaults to 0, it's not actually set equal to NULL even if the result is null. Why?

When you deal with NULL in SQL Server you basically work with 3-value logic with all the implications.
So in your example
IF(#myVar != null) vs IF(#myVar is not null)
It basically boils down to the question what is the difference between: #myVar = null vs #myVar is null
#myVar = null will always evaluate to null as what you are asking is:
is the value in #myVar equal to UNKNOWN
As you do not know what the UNKNOWN is this question cannot by answered yes, or no so it evaluates to UNKNOWN
e.g.
"is 1 = UNKNOWN" - I do not know
"is 'a' = UNKNOWN" - I do not know
"is UNKNOWN = UNKNOWN" - I do not know
The last one may be a bit tricky but just imagine that you have 2 boxes with apples and you do not know neither how many apples are in box1 one nor in box2 so asking:
is count(box1) = count(box2)
is the same as
is UNKNOWN = UNKNOWN"
so the answer is I do not know
the second one #myVar is null is different as it is like asking
is the value in #myVar UNKNOWN
so the difference is that you specifically ask "is it true that the value stored in the variable is UNKNOWN?", so
"is 1 UNKNOWN" - NO
"is 'a' UNKNOWN" - NO
"is UNKNOWN UNKNOWN" - YES

Generally, it's like this: NULL is unknown, so !=NULL is also unknown, because you don't know if it's equal or not. And you know even less whether two unknowns are equal. The same goes for more or less any operation with unknowns, when you add something to unknown the result is hardly any more known to you.

Related

SQL Server CHOOSE() function behaving unexpectedly with RAND() function

I've encountered an interesting SQL server behaviour while trying to generate random values in T-sql using RAND and CHOOSE functions.
My goal was to try to return one of two given values using RAND() as rng. Pretty easy right?
For those of you who don't know it, CHOOSE function accepts in an index number(int) along with a collection of values and returns a value at specified index. Pretty straightforward.
At first attempt my SQL looked like this:
select choose(ceiling((rand()*2)) ,'a','b')
To my surprise, this expression returned one of three values: null, 'a' or 'b'. Since I didn't expect the null value i started digging. RAND() function returns a float in range from 0(included) to 1 (excluded). Since I'm multiplying it by 2, it should return values anywhere in range from 0(included) to 2 (excluded). Therefore after use of CEILING function final value should be one of: 0,1,2. After realising that i extended the value list by 'c' to check whether that'd be perhaps returned. I also checked the docs page of CEILING and learnt that:
Return values have the same type as numeric_expression.
I assumed the CEILINGfunction returned int, but in this case would mean that the value is implicitly cast to int before being used in CHOOSE, which sure enough is stated on the docs page:
If the provided index value has a numeric data type other than int,
then the value is implicitly converted to an integer.
Just in case I added an explicit cast. My SQL query looks like this now:
select choose(cast(ceiling((rand()*2)) as int) ,'a','b','c')
However, the result set didn't change. To check which values cause the problem I tried generating the value beforehand and selecting it alongside the CHOOSE result. It looked like this:
declare #int int = cast(ceiling((rand()*2)) as int)
select #int,choose( #int,'a','b','c')
Interestingly enough, now the result set changed to (1,a), (2,b) which was my original goal. After delving deeper in the CHOOSE docs page and some testing i learned that 'null' is returned in one of two cases:
Given index is a null
Given index is out of range
In this case that would mean that index value when generated inside the SELECT statement is either 0 or above 2/3 (I'm assuming that negative numbers are not possible here and CHOOSE function indexes from 1). As I've stated before 0 should be one of possibilities of:
ceiling((rand()*2))
,but for some reason it's never 0 (at least when i tried it 1 million+ times like this)
set nocount on
declare #test table(ceiling_rand int)
declare #counter int = 0
while #counter<1000000
begin
insert into #test
select ceiling((rand()*2))
set #counter=#counter+1
end
select distinct ceiling_rand from #test
Therefore I assume that the value generated in SELECT is greater than 2/3 or NULL. Why would it be like this only when generated in SELECT statement? Perhaps order of resolving CAST, CELING or RAND inside SELECT is different than it would seem? It's true I've only tried it a limited number of times, but at this point the chances of it being a statistical fluctuation are extremely small. Is it somehow a floating-point error? I truly am stumbled and looking forward to any explanation.
TL;DR: When generating a random number inside a SELECT statement result set of possible values is different then when it's generated before the SELECT statement.
Cheers,
NFSU
EDIT: Formatting
You can see what's going on if you look at the execution plan.
SET SHOWPLAN_TEXT ON
GO
SELECT (select choose(ceiling((rand()*2)) ,'a','b'))
Returns
|--Constant Scan(VALUES:((CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(2.0000000000000000e+000)),0)=(1) THEN 'a' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(2.0000000000000000e+000)),0)=(2) THEN 'b' ELSE NULL END END)))
The CHOOSE is expanded out to
SELECT CASE
WHEN ceiling(( rand() * 2 )) = 1 THEN 'a'
ELSE
CASE
WHEN ceiling(( rand() * 2 )) = 2 THEN 'b'
ELSE NULL
END
END
and rand() is referenced twice. Each evaluation can return a different result.
You will get the same problem with the below rewrite being expanded out too
SELECT CASE ceiling(( rand() * 2 ))
WHEN 1 THEN 'a'
WHEN 2 THEN 'b'
END
Avoid CASE for this and any of its variants.
One method would be
SELECT JSON_VALUE ( '["a", "b"]' , CONCAT('$[', FLOOR(rand()*2) ,']') )

Is <> 0 condition will Consider Null & 0 as same in SQL

I need to compare variable in case condition in stored procedure.
case when #column <> 0 then ...
ELSE 0 END
but whenever am getting #column as NULL, then the above script returning as 0.
will sql consider both 0 & NULL as same.
Am using sql server 2014.
Thanks All
No. SQL considers NULL as "I have no idea". Comparing anything with "I have no idea" results in an answer of "I totally have no idea". Look:
- How high is John?
- I have no idea.
- What is two centimeters higher than John?
- I have no idea.
Even comparison between two NULL values is not true: if I have no idea how tall John is and if I also have no idea how tall Jack is, I can't conclude that John is equally tall as Jack (and I can't conclude that John is not equally tall as Jack). The only sensible answer is... "I have no idea".
The way to test for NULL is with IS operator, which specifically exists for this scenario (e.g. #column IS NULL, or #column IS NOT NULL).
So NULL is not equal to 0, nor is it NOT equal to 0. The result of NULL <> 0 is NULL. However, NULL is falsy where conditionals are concerned, so CASE thinks you should get the ELSE branch any time #column is NULL.
In case if you want to execute the then part of case if the column value is null, then modify your condition to check for nulls also
CASE WHEN (#column <> 0 OR #column IS NULL) then ...
ELSE 0 END

Why does SUM(...) on an empty recordset return NULL instead of 0?

I understand why null + 1 or (1 + null) returns null: null means "unknown value", and if a value is unknown, its successor is unknown as well. The same is true for most other operations involving null.[*]
However, I don't understand why the following happens:
SELECT SUM(someNotNullableIntegerField) FROM someTable WHERE 1=0
This query returns null. Why? There are no unknown values involved here! The WHERE clause returns zero records, and the sum of an empty set of values is 0.[**] Note that the set is not unknown, it is known to be empty.
I know that I can work around this behaviour by using ISNULL or COALESCE, but I'm trying to understand why this behaviour, which appears counter-intuitive to me, was chosen.
Any insights as to why this makes sense?
[*] with some notable exceptions such as null OR true, where obviously true is the right result since the unknown value simply does not matter.
[**] just like the product of an empty set of values is 1. Mathematically speaking, if I were to extend $(Z, +)$ to $(Z union {null}, +)$, the obvious choice for the identity element would still be 0, not null, since x + 0 = x but x + null = null.
The ANSI-SQL-Standard defines the result of the SUM of an empty set as NULL. Why they did this, I cannot tell, but at least the behavior should be consistent across all database engines.
Reference: http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt on page 126:
b) If AVG, MAX, MIN, or SUM is specified, then
Case:
i) If TXA is empty, then the result is the null value.
TXA is the operative resultset from the selected column.
When you mean empty table you mean a table with only NULL values, That's why we will get NULL as output for aggregate functions. You can consider this as by design for SQL Server.
Example 1
CREATE TABLE testSUMNulls
(
ID TINYINT
)
GO
INSERT INTO testSUMNulls (ID) VALUES (NULL),(NULL),(NULL),(NULL)
SELECT SUM(ID) FROM testSUMNulls
Example 2
CREATE TABLE testSumEmptyTable
(
ID TINYINT
)
GO
SELECT SUM(ID) Sums FROM testSumEmptyTable
In both the examples you will NULL as output..

Is SQL Server's double checking needed here?

IF #insertedValue IS NOT NULL AND #insertedValue > 0
This logic is in a trigger.
The value comes from a deleted or inserted row (doesn't matter).
2 questions :
Do I need to check both conditions? (I want all value > 0, value in db can be nullable)
Does SQL Server check the expression in the order I wrote it ?
1) Actually, no, since if the #insertedValue is NULL, the expression #insertedValue > 0 will evaulate to false. (Actually, as Martin Smith points out in his comment, it will evaluate to a special value "unknown", which when forced to a Boolean result on its own collapses to false - examples: unknown AND true = unknown which is forced to false, unknown OR true = true.) But you're relying on comparison behaviour with NULL values. A single step equivalent method, BTW, would be:
IF ISNULL(#insertedValue, 0) > 0
IMHO, you're better sticking with the explicit NULL check for clarity if nothing else.
2) Since the query will be optimised before execution, there is absolutely no guarantee of order of execution or short circuiting of the AND operator.
Combining the two - if the double check is truly unnecessary, then it will probably be optimised out before execution anyway, but your SQL code will be more maintainable in my view if you make this explicit.
You can use COALESCE => Returns the first nonnull expression among its arguments.
Now you can make the query more flexible, by increasing the column limits and again you need to check the Greater Then Zero condition. Important point to note down here is you have the option to check values in multiple columns.
declare #val int
set #val = COALESCE( null, 1, 10 )
if(#val>0)
select 'fine'
else
select 'not fine'

Is there an Alternate for Where is Null using Where = Null?

If not, is there an alternate way to switch through SELECT statements using a CASE or IF/THEN identifier WITHOUT putting the statement in a scalar variable first?
Is there a way to format this without using IS and using an = sign for it to work?
SELECT ID FROM TABLE WHERE ID = Null
No. NULL isn't a value. Think of NULL as a condition, with IS NULL or IS NOT NULL is testing for this condition.
In this example you can test for the actual value, or lack of value represented by a conditon
WHERE
(X IS NULL OR X = #X)
OR
WHERE
(#X IS NULL OR X = #X)
Or test for your definite conditions first:
WHERE
CASE X
WHEN 1 THEN
WHEN 2 THEN
ELSE -- includes NULL
END = ...
Your question is abstract so hard to give a more precise answer.
For example, are you having problems with NOT IN and NULL? If so, use NOT EXISTS.

Resources