Is there an Alternate for Where is Null using Where = Null? - sql-server

If not, is there an alternate way to switch through SELECT statements using a CASE or IF/THEN identifier WITHOUT putting the statement in a scalar variable first?
Is there a way to format this without using IS and using an = sign for it to work?
SELECT ID FROM TABLE WHERE ID = Null

No. NULL isn't a value. Think of NULL as a condition, with IS NULL or IS NOT NULL is testing for this condition.
In this example you can test for the actual value, or lack of value represented by a conditon
WHERE
(X IS NULL OR X = #X)
OR
WHERE
(#X IS NULL OR X = #X)
Or test for your definite conditions first:
WHERE
CASE X
WHEN 1 THEN
WHEN 2 THEN
ELSE -- includes NULL
END = ...
Your question is abstract so hard to give a more precise answer.
For example, are you having problems with NOT IN and NULL? If so, use NOT EXISTS.

Related

SQL Server CHOOSE() function behaving unexpectedly with RAND() function

I've encountered an interesting SQL server behaviour while trying to generate random values in T-sql using RAND and CHOOSE functions.
My goal was to try to return one of two given values using RAND() as rng. Pretty easy right?
For those of you who don't know it, CHOOSE function accepts in an index number(int) along with a collection of values and returns a value at specified index. Pretty straightforward.
At first attempt my SQL looked like this:
select choose(ceiling((rand()*2)) ,'a','b')
To my surprise, this expression returned one of three values: null, 'a' or 'b'. Since I didn't expect the null value i started digging. RAND() function returns a float in range from 0(included) to 1 (excluded). Since I'm multiplying it by 2, it should return values anywhere in range from 0(included) to 2 (excluded). Therefore after use of CEILING function final value should be one of: 0,1,2. After realising that i extended the value list by 'c' to check whether that'd be perhaps returned. I also checked the docs page of CEILING and learnt that:
Return values have the same type as numeric_expression.
I assumed the CEILINGfunction returned int, but in this case would mean that the value is implicitly cast to int before being used in CHOOSE, which sure enough is stated on the docs page:
If the provided index value has a numeric data type other than int,
then the value is implicitly converted to an integer.
Just in case I added an explicit cast. My SQL query looks like this now:
select choose(cast(ceiling((rand()*2)) as int) ,'a','b','c')
However, the result set didn't change. To check which values cause the problem I tried generating the value beforehand and selecting it alongside the CHOOSE result. It looked like this:
declare #int int = cast(ceiling((rand()*2)) as int)
select #int,choose( #int,'a','b','c')
Interestingly enough, now the result set changed to (1,a), (2,b) which was my original goal. After delving deeper in the CHOOSE docs page and some testing i learned that 'null' is returned in one of two cases:
Given index is a null
Given index is out of range
In this case that would mean that index value when generated inside the SELECT statement is either 0 or above 2/3 (I'm assuming that negative numbers are not possible here and CHOOSE function indexes from 1). As I've stated before 0 should be one of possibilities of:
ceiling((rand()*2))
,but for some reason it's never 0 (at least when i tried it 1 million+ times like this)
set nocount on
declare #test table(ceiling_rand int)
declare #counter int = 0
while #counter<1000000
begin
insert into #test
select ceiling((rand()*2))
set #counter=#counter+1
end
select distinct ceiling_rand from #test
Therefore I assume that the value generated in SELECT is greater than 2/3 or NULL. Why would it be like this only when generated in SELECT statement? Perhaps order of resolving CAST, CELING or RAND inside SELECT is different than it would seem? It's true I've only tried it a limited number of times, but at this point the chances of it being a statistical fluctuation are extremely small. Is it somehow a floating-point error? I truly am stumbled and looking forward to any explanation.
TL;DR: When generating a random number inside a SELECT statement result set of possible values is different then when it's generated before the SELECT statement.
Cheers,
NFSU
EDIT: Formatting
You can see what's going on if you look at the execution plan.
SET SHOWPLAN_TEXT ON
GO
SELECT (select choose(ceiling((rand()*2)) ,'a','b'))
Returns
|--Constant Scan(VALUES:((CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(2.0000000000000000e+000)),0)=(1) THEN 'a' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(2.0000000000000000e+000)),0)=(2) THEN 'b' ELSE NULL END END)))
The CHOOSE is expanded out to
SELECT CASE
WHEN ceiling(( rand() * 2 )) = 1 THEN 'a'
ELSE
CASE
WHEN ceiling(( rand() * 2 )) = 2 THEN 'b'
ELSE NULL
END
END
and rand() is referenced twice. Each evaluation can return a different result.
You will get the same problem with the below rewrite being expanded out too
SELECT CASE ceiling(( rand() * 2 ))
WHEN 1 THEN 'a'
WHEN 2 THEN 'b'
END
Avoid CASE for this and any of its variants.
One method would be
SELECT JSON_VALUE ( '["a", "b"]' , CONCAT('$[', FLOOR(rand()*2) ,']') )

SQL Server Case Statement when IS NULL - Other SO questions do not solve this issue

There are 2 questions on SO, but they are different and do not solve this problem. Feel free to test it before marking this as duplicate.
There is a SQLFiddle for this question.
In this example, the cell phone number may be NULL
ID
Name
Cell
1
John
123
2
Sally
NULL
The query works when the cell number is not null:
DECLARE #Cell NVARCHAR(100) = '123'
SELECT *
FROM Temp
WHERE Cell = CASE WHEN #Cell IS NULL THEN NULL ELSE #Cell END
The same query fails when the cell number is null.
DECLARE #Cell NVARCHAR(100) = NULL
SELECT *
FROM Temp
WHERE Cell = CASE WHEN #Cell IS NULL THEN NULL ELSE #Cell END
The question is how to get the CASE WHEN working for both NULL and when it is comparing an actual value. Note that this is a simplified example of the real problem (which has a lot more conditions and additional complexity) and the focus is to get the example working by modifying the CASE WHEN in order to solve the real problem.
NULL isn’t equal to anything, including NULL but you can just check if something is NULL
WHERE (#Cell IS NULL AND Cell IS NULL) OR Cell = #Cell
Probably could also move the comparison inside CASE but this is clear in meaning at least.
You can do it without the CASE expression, with COALESCE():
DECLARE #Cell NVARCHAR(100) = ?
Select * from Temp
WHERE Cell = #Cell OR COALESCE(Cell, #Cell) IS NULL;
Replace ? with the value that you search for or NULL.
See the demo.

Can I write a conditional AND to test for "ISNUMERIC" and then CAST a string to an int in T-SQL?

I thought (hoped) this would work but it doesn't, it evaluates both sided of the AND statement even if the left side is going to be false:
SELECT PropertyNumber
FROM Properties
WHERE PropertyNumber = '203a'
AND
(
NOT PropertyNumber LIKE '[^0-9]' AND CONVERT(INT,PropertyNumber) > 0
)
So I get:
Conversion failed when converting the nvarchar value '203a' to data type int.
Is there anyway to do conditional AND or any other way to solve this problem?
You can do this in two steps:
SELECT
OtherField, CAST(PropertyNumber AS INT) AS PropertyNumber
FROM
(SELECT OtherField, PropertyNumber FROM Properties WHERE ISNUMERIC(PropertyNumber) = 1) AS X
This will discard records with non-numeric property numbers. If you want to keep them...
SELECT
P.OtherField, P.PropertyNumber AS PropertyNumberAsText, CAST(X.PropertyNumber AS INT) AS PropertyNumberAsInt
FROM
Properties AS P
LEFT JOIN (SELECT DISTINCT PropertyNumber FROM Properties WHERE ISNUMERIC(PropertyNumber) = 1) AS X ON P.PropertyNumber = X.PropertyNumber
Because ISNUMERIC() approves of some values which cannot be cast to INT, I tend to use something like this:
CASE WHEN Field IN ('-', '.') THEN NULL
WHEN Field LIKE '%,%' THEN NULL
WHEN ISNUMERIC(Field) = 1 THEN CAST(Field AS INT) END
Note that this will reject values like "1,234", which is a valid integer in some regions. Code to identify and handle all possible formats for numeric values is a separate question, and has probably been asked on SO before.
try this:
SELECT PropertyNumber
FROM #Temp
WHERE PropertyNumber = '203a'
AND
(
case When IsNumeric('-' + PropertyNumber + 'e0') = 1
Then Convert(Int, PropertyNumber)
Else NULL
End > 0
)
The trick here is to use the knowledge that IsNumeric will return 1 only under certain conditions. By putting a '-' sign in front of your data, you are effectively making sure that property numbers are positive. By putting 'e0' after the column name, you are preventing floating point values.

Why does SUM(...) on an empty recordset return NULL instead of 0?

I understand why null + 1 or (1 + null) returns null: null means "unknown value", and if a value is unknown, its successor is unknown as well. The same is true for most other operations involving null.[*]
However, I don't understand why the following happens:
SELECT SUM(someNotNullableIntegerField) FROM someTable WHERE 1=0
This query returns null. Why? There are no unknown values involved here! The WHERE clause returns zero records, and the sum of an empty set of values is 0.[**] Note that the set is not unknown, it is known to be empty.
I know that I can work around this behaviour by using ISNULL or COALESCE, but I'm trying to understand why this behaviour, which appears counter-intuitive to me, was chosen.
Any insights as to why this makes sense?
[*] with some notable exceptions such as null OR true, where obviously true is the right result since the unknown value simply does not matter.
[**] just like the product of an empty set of values is 1. Mathematically speaking, if I were to extend $(Z, +)$ to $(Z union {null}, +)$, the obvious choice for the identity element would still be 0, not null, since x + 0 = x but x + null = null.
The ANSI-SQL-Standard defines the result of the SUM of an empty set as NULL. Why they did this, I cannot tell, but at least the behavior should be consistent across all database engines.
Reference: http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt on page 126:
b) If AVG, MAX, MIN, or SUM is specified, then
Case:
i) If TXA is empty, then the result is the null value.
TXA is the operative resultset from the selected column.
When you mean empty table you mean a table with only NULL values, That's why we will get NULL as output for aggregate functions. You can consider this as by design for SQL Server.
Example 1
CREATE TABLE testSUMNulls
(
ID TINYINT
)
GO
INSERT INTO testSUMNulls (ID) VALUES (NULL),(NULL),(NULL),(NULL)
SELECT SUM(ID) FROM testSUMNulls
Example 2
CREATE TABLE testSumEmptyTable
(
ID TINYINT
)
GO
SELECT SUM(ID) Sums FROM testSumEmptyTable
In both the examples you will NULL as output..

What is the difference between != and is not? And Nulls in general

I am having a bit of difficulty understanding how t-sql treats null values.
As a C# guy, I tend to want to do
IF(#myVar != null)...
But, this doesn't ever appear to run my code. So I do
IF(#myVar is not null)
Whats the difference?
Second, the way addition works is unclear. Let's say I have
declare #someCount int, #someFinalResult int
--Select that returns null
SELECT #someCount = columnName from tableName where someColumn = someValue
Then if I do
SET #someFinalResult = #someCount + 1--I seem to get NULL if I had null + something
But, if I first
declare #someCount int, #someFinalResult int
--FIRST SET DEFAULT TO 0
SET #someCount = 0
--Select that returns null
SELECT #someCount = columnName from tableName where someColumn = someValue
Now, #someCount defaults to 0, it's not actually set equal to NULL even if the result is null. Why?
When you deal with NULL in SQL Server you basically work with 3-value logic with all the implications.
So in your example
IF(#myVar != null) vs IF(#myVar is not null)
It basically boils down to the question what is the difference between: #myVar = null vs #myVar is null
#myVar = null will always evaluate to null as what you are asking is:
is the value in #myVar equal to UNKNOWN
As you do not know what the UNKNOWN is this question cannot by answered yes, or no so it evaluates to UNKNOWN
e.g.
"is 1 = UNKNOWN" - I do not know
"is 'a' = UNKNOWN" - I do not know
"is UNKNOWN = UNKNOWN" - I do not know
The last one may be a bit tricky but just imagine that you have 2 boxes with apples and you do not know neither how many apples are in box1 one nor in box2 so asking:
is count(box1) = count(box2)
is the same as
is UNKNOWN = UNKNOWN"
so the answer is I do not know
the second one #myVar is null is different as it is like asking
is the value in #myVar UNKNOWN
so the difference is that you specifically ask "is it true that the value stored in the variable is UNKNOWN?", so
"is 1 UNKNOWN" - NO
"is 'a' UNKNOWN" - NO
"is UNKNOWN UNKNOWN" - YES
Generally, it's like this: NULL is unknown, so !=NULL is also unknown, because you don't know if it's equal or not. And you know even less whether two unknowns are equal. The same goes for more or less any operation with unknowns, when you add something to unknown the result is hardly any more known to you.

Resources