t-sql: incompatibility between built-in boolean functions - sql-server

Why built-in boolean functions behaves differently on NULL input ?
For example - this query:
select 'ISDATE(null)' as function_call,
ISDATE(null) as result union all
select 'ISNUMERIC(null)' as function_call,
ISNUMERIC(null) as result union all
select 'IS_MEMBER(null)' as function_call,
IS_MEMBER(null) as result union all
select 'IS_SRVROLEMEMBER(null, null)' as function_call,
IS_SRVROLEMEMBER(null, null) as result
gives us:
function_call result
---------------------------- -----------
ISDATE(null) 0
ISNUMERIC(null) 0
IS_MEMBER(null) NULL
IS_SRVROLEMEMBER(null, null) NULL
So seems that ISDATE, ISNUMERIC behaves according to boolean logic, but IS_MEMBER,IS_SRVROLEMEMBER - behaves according to Three valued logic.
Shouldn't all boolean functions behave the same on NULL input ? What ANSI SQL standard says about that ?
Thanks

Re ANSI standards, the latter two functions have nothing to do with ANSI SQL; They're MSSQL specific security functions. Which is not to suggest they don't have analogues in other DBMSs, just that they're not "typical" UML style functions or part of the standard.
I'm fact, a search on this reasonably authoritative O'Reilly page about ANSI standard functions for the term "Boolean" returns no results. One may infer from this that there is no ANSI approach to such scalar functions' handling of NULLs.
The three valued logic is required in those functions to allow NULL to signify that an input is not valid. eg Refer to Remarks section of MSDN IS_MEMBER().
(This form of NULL return is not to be confused with eg aggregate functions that may return a value, or NULL if one of its inputs was NULL.)
There's nothing stopping you "wrapping" those functions to behave like the others, if that's what you really need. Eg ISNULL(IS_MEMBER(someValueFromATable),0).
The former two functions return a meaningful Boolean value, as you've found.
ISDATE(null), for example, returns false because null is not a "valid date, time, or datetime value" (MSDN, my emphasis on value).
In the case where NULL is interpreted to mean "unknown", it would be semantically meaningful for ISDATE() etc to return "unknown" when the input is unknown, but not programatically practical; The need to "convert" the result (from all these boolean functions) from three-state to boolean logic is completely redundant when we already have a separate type-non-specific test for ISNULL().
In comparing the two types of functions you've identified, the return for the latter ones shouldn't be NULL in this case, because while NULL isn't a date, it is still certainly a valid piece of data that can be properly examined by this function.

I don't find it odd that these security functions (IS_SRVROLEMEMBER) would behave differently from system/datatype functions (ISNUMERIC) since these security functions are essentially queries and the results can change depending upon who is querying. The meanings of the return values, including nulls, are spelled out very well for all of these in the MSDN Documentation.
More concretely, for the arguments to ISNUMERIC and ISDATE, you can test very well ahead of time if the argument is null or not, so I'm not sure that returning a null is necessary, or has much practicality.
For the arguments to the security functions, you may have non-null arguments, but the functions have been built in a helpful manner to return null in cases where the arguments aren't valid, not found, or you don't have the permissions to know the answer.
Much of this can certainly be seen as subjective, thus I have voted to close this question, however interesting it may be.

Related

Why doesn't the equivalence check #x = null work in SQL Server? [duplicate]

I'm not asking if it does. I know that it doesn't.
I'm curious as to the reason. I've read support docs such as as this one on Working With Nulls in MySQL but they don't really give any reason. They only repeat the mantra that you have to use "is null" instead.
This has always bothered me. When doing dynamic SQL (those rare times when it has to be done) it would be so much easier to pass "null" into where clause like this:
#where = "where GroupId = null"
Which would be a simple replacement for a regular variable. Instead we have to use if/else blocks to do stuff like:
if #groupId is null then
#where = "where GroupId is null"
else
#where = "where GroupId = #groupId"
end
In larger more-complicated queries, this is a huge pain in the neck. Is there a specific reason that SQL and all the major RDBMS vendors don't allow this? Some kind of keyword conflict or value conflict that it would create?
Edit:
The problem with a lot of the answers (in my opinion) is that everyone is setting up an equivalency between null and "I don't know what the value is". There's a huge difference between those two things. If null meant "there's a value but it's unknown" I would 100% agree that nulls couldn't be equal. But SQL null doesn't mean that. It means that there is no value. Any two SQL results that are null both have no value. No value does not equal unknown value. Two different things. That's an important distinction.
Edit 2:
The other problem I have is that other HLLs allow null=null perfectly fine and resolve it appropriately. In C# for instance, null=null returns true.
The reason why it's off by default is that null is really not equal to null in a business sense. For example, if you were joining orders and customers:
select * from orders o join customers c on c.name = o.customer_name
It wouldn't make a lot of sense to match orders with an unknown customer with customers with an unknown name.
Most databases allow you to customize this behaviour. For example, in SQL Server:
set ansi_nulls on
if null = null
print 'this will not print'
set ansi_nulls off
if null = null
print 'this should print'
Equality is something that can be absolutely determined. The trouble with null is that it's inherently unknown. If you follow the truth table for three-value logic, null combined with any other value is null - unknown. Asking SQL "Is my value equal to null?" would be unknown every single time, even if the input is null. I think the implementation of IS NULL makes it clear.
It's a language semantic.
Null is the lack of a value.
is null makes sense to me. It says, "is lacking a value" or "is unknown". Personally I've never asked somebody if something is, "equal to lacking a value".
I can't help but feel that you're still not satisfied with the answers that have been given so far, so I thought I'd try another tack. Let's have an example (no, I've no idea why this specific example has come into my head).
We have a table for employees, EMP:
EMP
---
EMPNO GIVENNAME
E0001 Boris
E0002 Chris
E0003 Dave
E0004 Steve
E0005 Tony
And, for whatever bizarre reason, we're tracking what colour trousers each employee chooses to wear on a particular day (TROUS):
TROUS
-----
EMPNO DATE COLOUR
E0001 20110806 Brown
E0002 20110806 Blue
E0003 20110806 Black
E0004 20110806 Brown
E0005 20110806 Black
E0001 20110807 Black
E0003 20110807 Black
E0004 20110807 Grey
I could go on. We write a query, where we want to know the name of every employee, and what colour trousers they had on on the 7th August:
SELECT e.GIVENNAME,t.COLOUR
FROM
EMP e
LEFT JOIN
TROUS t
ON
e.EMPNO = t.EMPNO and
t.DATE = '20110807'
And we get the result set:
GIVENNAME COLOUR
Chris NULL
Steve Grey
Dave Black
Boris Black
Tony NULL
Now, this result set could be in a view, or CTE, or whatever, and we might want to continue asking questions about these results, using SQL. What might some of these questions be?
Were Dave and Boris wearing the same colour trousers on that day? (Yes, Black==Black)
Were Dave and Steve wearing the same colour trousers on that day? (No, Black!=Grey)
Were Boris and Tony wearing the same colour trousers on that day? (Unknown - we're trying to compare with NULL, and we're following the SQL rules)
Were Boris and Tony not wearing the same colour trousers on that day? (Unknown - we're again comparing to NULL, and we're following SQL rules)
Were Chris and Tony wearing the same colour trousers on that day? (Unknown)
Note, that you're already aware of specific mechanisms (e.g. IS NULL) to force the outcomes you want, if you've designed your database to never use NULL as a marker for missing information.
But in SQL, NULL has been given two roles (at least) - to mark inapplicable information (maybe we have complete information in the database, and Chris and Tony didn't turn up for work that day, or did but weren't wearing trousers), and to mark missing information (Chris did turn up that day, we just don't have the information recorded in the database at this time)
If you're using NULL purely as a marker of inapplicable information, I assume you're avoiding such constructs as outer joins.
I find it interesting that you've brought up NaN in comments to other answers, without seeing that NaN and (SQL) NULL have a lot in common. The biggest difference between them is that NULL is intended for use across the system, no matter what data type is involved.
You're biggest issue seems to be that you've decided that NULL has a single meaning across all programming languages, and you seem to feel that SQL has broken that meaning. In fact, null in different languages frequently has subtly different meanings. In some languages, it's a synonym for 0. In others, not, so the comparison 0==null will succeed in some, and fail in others. You mentioned VB, but VB (assuming you're talking .NET versions) does not have null. It has Nothing, which again is subtly different (it's the equivalent in most respects of the C# construct default(T)).
The concept is that NULL is not an equitable value. It denotes the absence of a value.
Therefore, a variable or a column can only be checked if it IS NULL, but not if it IS EQUAL TO NULL.
Once you open up arithmetic comparisions, you may have to contend with IS GREATER THAN NULL, or IS LESS THAN OR EQUAL TO NULL
NULL is unknown. It is neither true nor false so when you are comparing anything to unknown, the only answer is "unknown" Much better article on wikipedia http://en.wikipedia.org/wiki/Null_(SQL)
Because in ANSI SQL, null means "unknown", which is not a value. As such, it doesn't equal anything; you can just evaluate the value's state (known or unknown).
a. Null is not the "lack of a value"
b. Null is not "empty"
c. Null is not an "unset value"
It's all of the above and none of the above.
By technical rights, NULL is an "unknown value". However, like uninitialized pointers in C/C++, you don't really know what your pointing at. With databases, they allocate the space but do not initialize the value in that space.
So, it is an "empty" space in the sense that it's not initialized. If you set a value to NULL, the original value stays in that storage location. If it was originally an empty string (for example), it will remain that.
It's a "lack of a value" in the fact that it hasn't been set to what the database deems a valid value.
It's an "unset value" in that if the space was just allocated, the value that is there has never been set.
"Unknown" is the closest that we can truly come to knowing what to expect when we examine a NULL.
Because of that, if we try to compare this "unknown" value, we will get a comparison that
a) may or may not be valid
b) may or may not have the result we expect
c) may or may not crash the database.
So, the DBMS systems (long ago) decided that it doesn't even make sense to use equality when it comes to NULL.
Therefore, "= null" makes no sense.
In addition to all that has already been said, I wish to stress that what you write in your first line is wrong. SQL does support the “= NULL” syntax, but it has a different semantic than “IS NULL” – as can be seen in the very piece of documentation you linked to.
I agree with the OP that
where column_name = null
should be syntactic sugar for
where column_name is null
However, I do understand why the creators of SQL wanted to make the distinction. In three-valued logic (IMO this is a misnomer), a predicate can return two values (true or false) OR unknown which is technically not a value but just a way to say "we don't know which of the two values this is". Think about the following predicate in terms of three-valued logic:
A == B
This predicate tests whether A is equal to B. Here's what the truth table looks like:
T U F
-----
T | T U F
U | U U U
F | F U T
If either A or B is unknown, the predicate itself always returns unknown, regardless of whether the other one is true or false or unknown.
In SQL, null is a synonym for unknown. So, the SQL predicate
column_name = null
tests whether the value of column_name is equal to something whose value is unknown, and returns unknown regardless of whether column_name is true or false or unknown or anything else, just like in three-valued logic above. SQL DML operations are restricted to operating on rows for which the predicate in the where clause returns true, ignoring rows for which the predicate returns false or unknown. That's why "where column_name = null" doesn't operate on any rows.
NULL doesn't equal NULL. It can't equal NULL. It doesn't make sense for them to be equal.
A few ways to think about it:
Imagine a contacts database, containing fields like FirstName, LastName, DateOfBirth and HairColor. If I looked for records WHERE DateOfBirth = HairColor, should it ever match anything? What if someone's DateOfBirth was NULL, and their HairColor was too? An unknown hair color isn't equal to an unknown anything else.
Let's join the contacts table with purchases and product tables. Let's say I want to find all the instances where a customer bought a wig that was the same color as their own hair. So I query WHERE contacts.HairColor = product.WigColor. Should I get matches between every customer I don't know the hair color of and products that don't have a WigColor? No, they're a different thing.
Let's consider that NULL is another word for unknown. What's the result of ('Smith' = NULL)? The answer is not false, it's unknown. Unknown is not true, therefore it behaves like false. What's the result of (NULL = NULL)? The answer is also unknown, therefore also effectively false. (This is also why concatenating a string with a NULL value makes the whole string become NULL -- the result really is unknown.)
Why don't you use the isnull function?
#where = "where GroupId = "+ isnull(#groupId,"null")

Convert IS NULL to BIT

This might be a very basic question but I just came over it while writing a query.
Why can't SQL Server convert a check for NULL to BIT? I was thinking about something like this:
DECLARE #someVariable INT = NULL;
-- Do something
SELECT CONVERT(BIT, (#someVariable IS NULL))
The expected outcome would then be either 1 or 0.
Use case:
SELECT CONVERT(BIT, (CASE WHEN #someVariable IS NULL THEN 1 ELSE 0 END))
Or use IIF (a little more readable than CASE):
CONVERT(BIT, IIF(#x IS NULL, 0, 1))
not a direct cast
select cast(isnull(#null,1) as bit)
In SQL the language, NULLs are not considered data values. They represent a missing/unknown state. Quoting from Wikipedia's article on SQL NULL:
SQL null is a state (unknown) and not a value. This usage is quite different from most programming languages, where null means not assigned to a particular instance.
This means that any comparison against that UNKNOWN value can only be UNKNOWN itself. Even comparing two NULLs can't return true: if both values are unknown, how can we say that they are equal or not?
IS NULL and IS NOT NULL are predicates that can be used in conditional expressions. That means that they don't return a value themselves. Therefore, they can't be "cast" to a bit , or treated as a boolean.
Basic SQL comparison operators always return Unknown when comparing anything with Null, so the SQL standard provides for two special Null-specific comparison predicates. The IS NULL and IS NOT NULL predicates (which use a postfix syntax) test whether data is, or is not, Null.
Any other way of treating nulls is a vendor-specific extension.
Finally, BIT is not a boolean type, it's just a single-bit number. An optional BOOLEAN type was introduced in SQL 1999 but only PostgreSQL implements it correctly, ie having TRUE, FALSE or UNKNOWN values.
Without a BOOLEAN type you can't really calculate the result of a conditional expression like A AND B or x IS NULL. You can only use functions like NULLIF or COALESCE to replace the NULL value with something else.

Why do I get a SQL syntax error with this?

Trying to run this query in LINQPad 4:
SELECT item_group_id as AccountID, IIF(ISNULL(t_item_group.description),'[blank]',t_item_group.description) AS Name
FROM t_item_group
WHERE active = TRUE
I get, "the isnull function requires 2 argument(s)."
I've tried moving the parens around, changing the "[blank]" to "[blank]" and "[blank]" , but none of it helps...
The queries (I have two similar ones (with IIF(ISNULL)) that LINQPad won't run for this reason, yet they run in actuality (in my Web API app) fine; so, LINQPad is more "picky" than it needs to be, perhaps, but what is it expecting, SQL syntax-wise?
ISNULL is already like a 'if' type statement.
You can just replace
IIF(ISNULL(t_item_group.description),'[blank]',t_item_group.description)
with
ISNULL(t_item_group.description, '[blank]')
The ISNULL uses the first parameter (the 'description'), unless that value is null in which case it will use the second parameter.
As an aside, one of the reasons I don't care for ISNULL is that it is poorly named. You'd assume that given its name it will return a bit - true if the parameter is null, false if not null - which you could use in an 'if' statement like you attempted. But that's not how it works.
The alternative is to use COALESCE. It provides much the same functionality, but the naming makes sense.
co·a·lesce ˌkōəˈles verb
1. come together and form one mass or whole.
To COALESCE two parameters is to force them into one non-nullable result. And the function is actually more powerful, as you can provide multiple parameters - COALESCE(i.description, i.name, '[blank]') is perfectly valid.

Difference between NULL in SQL and null in programming languages

I've just come across an interesting scenario on how NULL is handled in T-SQL (and possibly other forms of SQL). The issue is pretty well described and answered by this question and I've illustrated the issue below;
-- SET ANSI_NULLS ON -- Toggle this between ON/OFF to see how it changes behaviour
DECLARE #VAR1 DATETIME
DECLARE #VAR2 DATETIME
SET #VAR1 = (SELECT CURRENT_TIMESTAMP)
SET #VAR2 = (SELECT NULL)
-- This will return 1 when ansi_nulls is off and nothing when ansi_nulls is on
SELECT 1 WHERE #VAR1 != #VAR2
DECLARE #TstTable TABLE (
COL1 DATETIME,
COL2 DATETIME)
INSERT INTO #TstTable
SELECT #VAR1, #VAR1
UNION
SELECT #VAR1, NULL
-- This won't ever return a value irrespective of the ansi_nulls setting
SELECT * FROM #TstTable WHERE COL1 != COL2
This situation led me to question my understanding of null representations specifically within SQL. I've always understood null to mean that it has no value. This seems to be an incorrect assumption given the first paragraph of this page. It states (my emphasis...I could quite easily just highlight the whole paragraph though);
A value of NULL indicates the value is unknown. A value of NULL is
different from an empty or zero value. No two null values are equal.
Comparisons between two null values, or between a NULL and any other
value, return unknown because the value of each NULL is unknown.
Does this hold true for T-SQL variable conditions also? It certainly does for my SELECT 1 WHERE #VAR1 != #VAR2 example above, but I don't understand why NULL in this instance is considered "UNKNOWN" and not empty/uninitialised/nothing etc. I know ANSI_NULLS changes how this works, but it is deprecated and will be removed from some future version.
Can someone offer a good explanation as to why NULL in T-SQL refers to an unknown value rather than an uninitialised value? If so, can you extend your answer to show why T-SQL variables with a NULL value are also considered to be unknown?
In SQL, we're interested in storing facts in tables (a.k.a relations).
What Codd asked for was:
Rule 3: Systematic treatment of null values:
The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.
What we've ended up with is three-valued logic (as #zmbq stated). Why is it this way?
We have two items that we're trying to compare for equality. Are they equal? Well, it turns out that we don't (yet) know what item 1 is, and we don't (yet) know what item 2 is (both are NULL). They might be equal. They might be unequal. It would be equally wrong to answer the equality comparison with either TRUE or FALSE. So we answer UNKNOWN.
In other languages, null is usually used with pointers (or references in languages without pointers, but notably not C++), to indicate that the pointer does not, at this time, point to anything.
Welcome to Three Valued Logic, where everything can be true, false or unknown.
The value of the null==null is not true, and it's not false, it's unknown...
but I don't understand why NULL in this instance is considered "UNKNOWN" and not
empty/uninitialised/nothing
?? What is there not to understand. It is like that BECAUSE IT WAS DEFINED LIKE THAT. Someone had the idea it is like that. It was put into the standard.
Yes, this is a little recursive, but quite often design decisions run like that.
This has more to do with arithmetics. Sum of 20 rows with one Null is Null - how would you treat it as unknown? C# etc. react with an exception, but that gets in your way when doing statistical analysis. Uknonwn values have tto move all they come in contact with into unknown, and no unknown is ever the same.

"Catching" Errors from within a user defined function in SQL Server 2005

I have a function that takes a number as an input and converts it to a date. This number isn't any standard form of date number, so I have to manually subdivide portions of the number to various date parts, cast the date parts to varchar strings and then, concatenate and cast the strings to a new datetime object.
My question is how can I catch a casting failure and return a null or low-range value from my function? I would prefer for my function to "passively" fail, returning a default value, instead of returning a fail code to my stored procedure. TRY/CATCH statements apparently don't work form within functions (unless there is some type of definition flag that I am unaware of) and trying the standard '##Error <> 0' method doesn't work either.
Incidentally this sounds like it could be a scalar UDF. This is a performance disaster, as Alex's blog points out. http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/05/23/reuse-your-code-with-cross-apply.aspx
SELECT CASE WHEN ISDATE(#yourParameter) = 1
THEN CAST(#yourParameter AS DATETIME)
ELSE YourDefaultValue
END
Since the format is nonstandard it sounds to me like you are stuck with doing all the validation yourself, prior to casting. Making sure that the individual pieces are numeric, checking that the month is between 1 and 12, making sure it's not Feb 30, etc. If anything fails you return nothing.

Resources