Difference between NULL in SQL and null in programming languages - sql-server

I've just come across an interesting scenario on how NULL is handled in T-SQL (and possibly other forms of SQL). The issue is pretty well described and answered by this question and I've illustrated the issue below;
-- SET ANSI_NULLS ON -- Toggle this between ON/OFF to see how it changes behaviour
DECLARE #VAR1 DATETIME
DECLARE #VAR2 DATETIME
SET #VAR1 = (SELECT CURRENT_TIMESTAMP)
SET #VAR2 = (SELECT NULL)
-- This will return 1 when ansi_nulls is off and nothing when ansi_nulls is on
SELECT 1 WHERE #VAR1 != #VAR2
DECLARE #TstTable TABLE (
COL1 DATETIME,
COL2 DATETIME)
INSERT INTO #TstTable
SELECT #VAR1, #VAR1
UNION
SELECT #VAR1, NULL
-- This won't ever return a value irrespective of the ansi_nulls setting
SELECT * FROM #TstTable WHERE COL1 != COL2
This situation led me to question my understanding of null representations specifically within SQL. I've always understood null to mean that it has no value. This seems to be an incorrect assumption given the first paragraph of this page. It states (my emphasis...I could quite easily just highlight the whole paragraph though);
A value of NULL indicates the value is unknown. A value of NULL is
different from an empty or zero value. No two null values are equal.
Comparisons between two null values, or between a NULL and any other
value, return unknown because the value of each NULL is unknown.
Does this hold true for T-SQL variable conditions also? It certainly does for my SELECT 1 WHERE #VAR1 != #VAR2 example above, but I don't understand why NULL in this instance is considered "UNKNOWN" and not empty/uninitialised/nothing etc. I know ANSI_NULLS changes how this works, but it is deprecated and will be removed from some future version.
Can someone offer a good explanation as to why NULL in T-SQL refers to an unknown value rather than an uninitialised value? If so, can you extend your answer to show why T-SQL variables with a NULL value are also considered to be unknown?

In SQL, we're interested in storing facts in tables (a.k.a relations).
What Codd asked for was:
Rule 3: Systematic treatment of null values:
The DBMS must allow each field to remain null (or empty). Specifically, it must support a representation of "missing information and inapplicable information" that is systematic, distinct from all regular values (for example, "distinct from zero or any other number", in the case of numeric values), and independent of data type. It is also implied that such representations must be manipulated by the DBMS in a systematic way.
What we've ended up with is three-valued logic (as #zmbq stated). Why is it this way?
We have two items that we're trying to compare for equality. Are they equal? Well, it turns out that we don't (yet) know what item 1 is, and we don't (yet) know what item 2 is (both are NULL). They might be equal. They might be unequal. It would be equally wrong to answer the equality comparison with either TRUE or FALSE. So we answer UNKNOWN.
In other languages, null is usually used with pointers (or references in languages without pointers, but notably not C++), to indicate that the pointer does not, at this time, point to anything.

Welcome to Three Valued Logic, where everything can be true, false or unknown.
The value of the null==null is not true, and it's not false, it's unknown...

but I don't understand why NULL in this instance is considered "UNKNOWN" and not
empty/uninitialised/nothing
?? What is there not to understand. It is like that BECAUSE IT WAS DEFINED LIKE THAT. Someone had the idea it is like that. It was put into the standard.
Yes, this is a little recursive, but quite often design decisions run like that.
This has more to do with arithmetics. Sum of 20 rows with one Null is Null - how would you treat it as unknown? C# etc. react with an exception, but that gets in your way when doing statistical analysis. Uknonwn values have tto move all they come in contact with into unknown, and no unknown is ever the same.

Related

Why doesn't the equivalence check #x = null work in SQL Server? [duplicate]

I'm not asking if it does. I know that it doesn't.
I'm curious as to the reason. I've read support docs such as as this one on Working With Nulls in MySQL but they don't really give any reason. They only repeat the mantra that you have to use "is null" instead.
This has always bothered me. When doing dynamic SQL (those rare times when it has to be done) it would be so much easier to pass "null" into where clause like this:
#where = "where GroupId = null"
Which would be a simple replacement for a regular variable. Instead we have to use if/else blocks to do stuff like:
if #groupId is null then
#where = "where GroupId is null"
else
#where = "where GroupId = #groupId"
end
In larger more-complicated queries, this is a huge pain in the neck. Is there a specific reason that SQL and all the major RDBMS vendors don't allow this? Some kind of keyword conflict or value conflict that it would create?
Edit:
The problem with a lot of the answers (in my opinion) is that everyone is setting up an equivalency between null and "I don't know what the value is". There's a huge difference between those two things. If null meant "there's a value but it's unknown" I would 100% agree that nulls couldn't be equal. But SQL null doesn't mean that. It means that there is no value. Any two SQL results that are null both have no value. No value does not equal unknown value. Two different things. That's an important distinction.
Edit 2:
The other problem I have is that other HLLs allow null=null perfectly fine and resolve it appropriately. In C# for instance, null=null returns true.
The reason why it's off by default is that null is really not equal to null in a business sense. For example, if you were joining orders and customers:
select * from orders o join customers c on c.name = o.customer_name
It wouldn't make a lot of sense to match orders with an unknown customer with customers with an unknown name.
Most databases allow you to customize this behaviour. For example, in SQL Server:
set ansi_nulls on
if null = null
print 'this will not print'
set ansi_nulls off
if null = null
print 'this should print'
Equality is something that can be absolutely determined. The trouble with null is that it's inherently unknown. If you follow the truth table for three-value logic, null combined with any other value is null - unknown. Asking SQL "Is my value equal to null?" would be unknown every single time, even if the input is null. I think the implementation of IS NULL makes it clear.
It's a language semantic.
Null is the lack of a value.
is null makes sense to me. It says, "is lacking a value" or "is unknown". Personally I've never asked somebody if something is, "equal to lacking a value".
I can't help but feel that you're still not satisfied with the answers that have been given so far, so I thought I'd try another tack. Let's have an example (no, I've no idea why this specific example has come into my head).
We have a table for employees, EMP:
EMP
---
EMPNO GIVENNAME
E0001 Boris
E0002 Chris
E0003 Dave
E0004 Steve
E0005 Tony
And, for whatever bizarre reason, we're tracking what colour trousers each employee chooses to wear on a particular day (TROUS):
TROUS
-----
EMPNO DATE COLOUR
E0001 20110806 Brown
E0002 20110806 Blue
E0003 20110806 Black
E0004 20110806 Brown
E0005 20110806 Black
E0001 20110807 Black
E0003 20110807 Black
E0004 20110807 Grey
I could go on. We write a query, where we want to know the name of every employee, and what colour trousers they had on on the 7th August:
SELECT e.GIVENNAME,t.COLOUR
FROM
EMP e
LEFT JOIN
TROUS t
ON
e.EMPNO = t.EMPNO and
t.DATE = '20110807'
And we get the result set:
GIVENNAME COLOUR
Chris NULL
Steve Grey
Dave Black
Boris Black
Tony NULL
Now, this result set could be in a view, or CTE, or whatever, and we might want to continue asking questions about these results, using SQL. What might some of these questions be?
Were Dave and Boris wearing the same colour trousers on that day? (Yes, Black==Black)
Were Dave and Steve wearing the same colour trousers on that day? (No, Black!=Grey)
Were Boris and Tony wearing the same colour trousers on that day? (Unknown - we're trying to compare with NULL, and we're following the SQL rules)
Were Boris and Tony not wearing the same colour trousers on that day? (Unknown - we're again comparing to NULL, and we're following SQL rules)
Were Chris and Tony wearing the same colour trousers on that day? (Unknown)
Note, that you're already aware of specific mechanisms (e.g. IS NULL) to force the outcomes you want, if you've designed your database to never use NULL as a marker for missing information.
But in SQL, NULL has been given two roles (at least) - to mark inapplicable information (maybe we have complete information in the database, and Chris and Tony didn't turn up for work that day, or did but weren't wearing trousers), and to mark missing information (Chris did turn up that day, we just don't have the information recorded in the database at this time)
If you're using NULL purely as a marker of inapplicable information, I assume you're avoiding such constructs as outer joins.
I find it interesting that you've brought up NaN in comments to other answers, without seeing that NaN and (SQL) NULL have a lot in common. The biggest difference between them is that NULL is intended for use across the system, no matter what data type is involved.
You're biggest issue seems to be that you've decided that NULL has a single meaning across all programming languages, and you seem to feel that SQL has broken that meaning. In fact, null in different languages frequently has subtly different meanings. In some languages, it's a synonym for 0. In others, not, so the comparison 0==null will succeed in some, and fail in others. You mentioned VB, but VB (assuming you're talking .NET versions) does not have null. It has Nothing, which again is subtly different (it's the equivalent in most respects of the C# construct default(T)).
The concept is that NULL is not an equitable value. It denotes the absence of a value.
Therefore, a variable or a column can only be checked if it IS NULL, but not if it IS EQUAL TO NULL.
Once you open up arithmetic comparisions, you may have to contend with IS GREATER THAN NULL, or IS LESS THAN OR EQUAL TO NULL
NULL is unknown. It is neither true nor false so when you are comparing anything to unknown, the only answer is "unknown" Much better article on wikipedia http://en.wikipedia.org/wiki/Null_(SQL)
Because in ANSI SQL, null means "unknown", which is not a value. As such, it doesn't equal anything; you can just evaluate the value's state (known or unknown).
a. Null is not the "lack of a value"
b. Null is not "empty"
c. Null is not an "unset value"
It's all of the above and none of the above.
By technical rights, NULL is an "unknown value". However, like uninitialized pointers in C/C++, you don't really know what your pointing at. With databases, they allocate the space but do not initialize the value in that space.
So, it is an "empty" space in the sense that it's not initialized. If you set a value to NULL, the original value stays in that storage location. If it was originally an empty string (for example), it will remain that.
It's a "lack of a value" in the fact that it hasn't been set to what the database deems a valid value.
It's an "unset value" in that if the space was just allocated, the value that is there has never been set.
"Unknown" is the closest that we can truly come to knowing what to expect when we examine a NULL.
Because of that, if we try to compare this "unknown" value, we will get a comparison that
a) may or may not be valid
b) may or may not have the result we expect
c) may or may not crash the database.
So, the DBMS systems (long ago) decided that it doesn't even make sense to use equality when it comes to NULL.
Therefore, "= null" makes no sense.
In addition to all that has already been said, I wish to stress that what you write in your first line is wrong. SQL does support the “= NULL” syntax, but it has a different semantic than “IS NULL” – as can be seen in the very piece of documentation you linked to.
I agree with the OP that
where column_name = null
should be syntactic sugar for
where column_name is null
However, I do understand why the creators of SQL wanted to make the distinction. In three-valued logic (IMO this is a misnomer), a predicate can return two values (true or false) OR unknown which is technically not a value but just a way to say "we don't know which of the two values this is". Think about the following predicate in terms of three-valued logic:
A == B
This predicate tests whether A is equal to B. Here's what the truth table looks like:
T U F
-----
T | T U F
U | U U U
F | F U T
If either A or B is unknown, the predicate itself always returns unknown, regardless of whether the other one is true or false or unknown.
In SQL, null is a synonym for unknown. So, the SQL predicate
column_name = null
tests whether the value of column_name is equal to something whose value is unknown, and returns unknown regardless of whether column_name is true or false or unknown or anything else, just like in three-valued logic above. SQL DML operations are restricted to operating on rows for which the predicate in the where clause returns true, ignoring rows for which the predicate returns false or unknown. That's why "where column_name = null" doesn't operate on any rows.
NULL doesn't equal NULL. It can't equal NULL. It doesn't make sense for them to be equal.
A few ways to think about it:
Imagine a contacts database, containing fields like FirstName, LastName, DateOfBirth and HairColor. If I looked for records WHERE DateOfBirth = HairColor, should it ever match anything? What if someone's DateOfBirth was NULL, and their HairColor was too? An unknown hair color isn't equal to an unknown anything else.
Let's join the contacts table with purchases and product tables. Let's say I want to find all the instances where a customer bought a wig that was the same color as their own hair. So I query WHERE contacts.HairColor = product.WigColor. Should I get matches between every customer I don't know the hair color of and products that don't have a WigColor? No, they're a different thing.
Let's consider that NULL is another word for unknown. What's the result of ('Smith' = NULL)? The answer is not false, it's unknown. Unknown is not true, therefore it behaves like false. What's the result of (NULL = NULL)? The answer is also unknown, therefore also effectively false. (This is also why concatenating a string with a NULL value makes the whole string become NULL -- the result really is unknown.)
Why don't you use the isnull function?
#where = "where GroupId = "+ isnull(#groupId,"null")

Convert IS NULL to BIT

This might be a very basic question but I just came over it while writing a query.
Why can't SQL Server convert a check for NULL to BIT? I was thinking about something like this:
DECLARE #someVariable INT = NULL;
-- Do something
SELECT CONVERT(BIT, (#someVariable IS NULL))
The expected outcome would then be either 1 or 0.
Use case:
SELECT CONVERT(BIT, (CASE WHEN #someVariable IS NULL THEN 1 ELSE 0 END))
Or use IIF (a little more readable than CASE):
CONVERT(BIT, IIF(#x IS NULL, 0, 1))
not a direct cast
select cast(isnull(#null,1) as bit)
In SQL the language, NULLs are not considered data values. They represent a missing/unknown state. Quoting from Wikipedia's article on SQL NULL:
SQL null is a state (unknown) and not a value. This usage is quite different from most programming languages, where null means not assigned to a particular instance.
This means that any comparison against that UNKNOWN value can only be UNKNOWN itself. Even comparing two NULLs can't return true: if both values are unknown, how can we say that they are equal or not?
IS NULL and IS NOT NULL are predicates that can be used in conditional expressions. That means that they don't return a value themselves. Therefore, they can't be "cast" to a bit , or treated as a boolean.
Basic SQL comparison operators always return Unknown when comparing anything with Null, so the SQL standard provides for two special Null-specific comparison predicates. The IS NULL and IS NOT NULL predicates (which use a postfix syntax) test whether data is, or is not, Null.
Any other way of treating nulls is a vendor-specific extension.
Finally, BIT is not a boolean type, it's just a single-bit number. An optional BOOLEAN type was introduced in SQL 1999 but only PostgreSQL implements it correctly, ie having TRUE, FALSE or UNKNOWN values.
Without a BOOLEAN type you can't really calculate the result of a conditional expression like A AND B or x IS NULL. You can only use functions like NULLIF or COALESCE to replace the NULL value with something else.

Issue with join: Error converting data type nvarchar to float?

I've been trying to run the program below and I keep on getting the error
Error converting data type nvarchar to float
SQL:
SELECT
distinct
coalesce(a.File_NBR,b.File_NBR) as ID,
b.Division,
b.Program,
a.Full_Name,
a.SBC_RESULT
FROM
New_EEs.dbo.vw_SBC_RESULTS a
full join
New_EEs.dbo.vw_SBC_Employee_Info b on a.File_NBR = b.File_NBR
where
(a.File_NBR is not null OR b.File_NBR is not null)
and A.Full_Name is not null
order by
a.Full_Name, b.Division, b.Program
When I comment out /*and A.Full_Name is not null */ the program works.
I can't figure out what the error means and why the join works when I comment out /*and A.Full_Name is not null */
Any feedback is appreciated.
Thanks!
The error message clearly says that the issue has to do with conversion of an nvarchar to a float. There's no explicit conversion in your query, therefore it's about implicit one. If the issue indeed stems from this particular query and not from somewhere else, only two places can be responsible for that:
1) the join predicate;
2) the COALESCE call.
Both places involve one and the same pair of columns, a.File_NBR and b.File_NBR. So, one of them must be an nvarchar column and the other a float one. Since the float type has higher precedence than nvarchar, the latter would implicitly be converted to the former, not the other way around. And apparently one of the string values failed to convert. That's the explanation of the immediate issue (the conversion).
I've seen your comment where you are saying that one of the columns is an int and the other float. I have no problem with that as I believe you are talking about columns in physical tables whereas both sources in this query appear to be views, judging by their names. I believe one of the columns enjoys a transformation to nvarchar in a view, and this query ends up seeing it as such. So, that should account for you where the nvarchar can come from.
As for an explanation to why commenting a seemingly irrelevant condition out appears to make such a big difference, the answer must lie in the internals of the query planner's workings. While there's a documented order of logical evaluation of clauses in a Transact-SQL SELECT query, the real, physical order may differ from that. The actual plan chosen for the query determines that physical order. And the choice of a plan can be affected, in particular, by such a trivial thing as incorporation or elimination of a simple condition.
To apply that to your situation, when the offending condition is commented out, the planner chooses such a plan for the query that both the join predicate and the COALESCE expression evaluate only when all the rows capable of causing the issue in question have been filtered out by predicates in the underlying views. When the condition is put back, however, the query is assigned a different execution plan, and either COALESCE or (more likely) the join predicate ends up being applied to a row containing a string that cannot be converted to a float, which results in an exception raised.
Conversion of both a.File_NBR and b.File_NBR to char, as you did, is one way of solving the issue. You could in fact pick any of these four string types:
char
varchar
nchar
nvarchar
And since one of the columns is already a string (possibly the a.File_NBR one, but you are at a better position to find that out exactly), the conversion could be applied to the other one only.
Alternatively, you could look into the view producing the nvarchar column to try and see if the int to nvarchar conversion could be eliminated in the first place.
Please see this example, maybe will be useful.
CREATE TABLE TEST(
ID FLOAT)
INSERT INTO TEST(ID) VALUES(NULL)
INSERT INTO TEST(ID) VALUES(12.3)
--SELECT COALESCE(ID,'TEST') FROM TEST; NOT WORKING ERROR:Error converting data type nvarchar to float
SELECT COALESCE(CAST(ID AS VARCHAR),'TEST') FROM TEST; --WORKS

SQL Server: Null VS Empty String

How are the NULL and Empty Varchar values stored in SQL Server. And in case I have no user entry for a string field on my UI, should I store a NULL or a '' ?
There's a nice article here which discusses this point. Key things to take away are that there is no difference in table size, however some users prefer to use an empty string as it can make queries easier as there is not a NULL check to do. You just check if the string is empty. Another thing to note is what NULL means in the context of a relational database. It means that the pointer to the character field is set to 0x00 in the row's header, therefore no data to access.
Update
There's a detailed article here which talks about what is actually happening on a row basis
Each row has a null bitmap for columns that allow nulls. If the row in
that column is null then a bit in the bitmap is 1 else it's 0.
For variable size datatypes the acctual size is 0 bytes.
For fixed size datatype the acctual size is the default datatype size
in bytes set to default value (0 for numbers, '' for chars).
the result of DBCC PAGE shows that both NULL and empty strings both take up zero bytes.
Be careful with nulls and checking for inequality in sql server.
For example
select * from foo where bla <> 'something'
will NOT return records where bla is null. Even though logically it should.
So the right way to check would be
select * from foo where isnull(bla,'') <> 'something'
Which of course people often forget and then get weird bugs.
The conceptual differences between NULL and "empty-string" are real and very important in database design, but often misunderstood and improperly applied - here's a short description of the two:
NULL - means that we do NOT know what the value is, it may exist, but it may not exist, we just don't know.
Empty-String - means we know what the value is and that it is nothing.
Here's a simple example:
Suppose you have a table with people's names including separate columns for first_name, middle_name, and last_name. In the scenario where first_name = 'John', last_name = 'Doe', and middle_name IS NULL, it means that we do not know what the middle name is, or if it even exists. Change that scenario such that middle_name = '' (i.e. empty-string), and it now means that we know that there is no middle name.
I once heard a SQL Server instructor promote making every character type column in a database required, and then assigning a DEFAULT VALUE to each of either '' (empty-string), or 'unknown'. In stating this, the instructor demonstrated he did not have a clear understanding of the difference between NULLs and empty-strings. Admittedly, the differences can seem confusing, but for me the above example helps to clarify the difference. Also, it is important to understand the difference when writing SQL code, and properly handle for NULLs as well as empty-strings.
An empty string is a string with zero length or no character.
Null is absence of data.
NULL values are stored separately in a special bitmap space for all the columns.
If you do not distinguish between NULL and '' in your application, then I would recommend you to store '' in your tables (unless the string column is a foreign key, in which case it would probably be better to prohibit the column from storing empty strings and allow the NULLs, if that is compatible with the logic of your application).
NULL is a non value, like undefined. '' is a empty string with 0 characters.
The value of a string in database depends of your value in your UI, but generally, it's an empty string '' if you specify the parameter in your query or stored procedure.
if it's not a foreign key field, not using empty strings could save you some trouble. only allow nulls if you'll take null to mean something different than an empty string. for example if you have a password field, a null value could indicate that a new user has not created his password yet while an empty varchar could indicate a blank password. for a field like "address2" allowing nulls can only make life difficult. things to watch out for include null references and unexpected results of = and <> operators mentioned by Vagif Verdi, and watching out for these things is often unnecessary programmer overhead.
edit: if performance is an issue see this related question: Nullable vs. non-null varchar data types - which is faster for queries?
In terms of having something tell you, whether a value in a VARCHAR column has something or nothing, I've written a function which I use to decide for me.
CREATE FUNCTION [dbo].[ISNULLEMPTY](#X VARCHAR(MAX))
RETURNS BIT AS
BEGIN
DECLARE #result AS BIT
IF #X IS NOT NULL AND LEN(#X) > 0
SET #result = 0
ELSE
SET #result = 1
RETURN #result
END
Now there is no doubt.
How are the "NULL" and "empty varchar" values stored in SQL Server.
Why would you want to know that? Or in other words, if you knew the answer, how would you use that information?
And in case I have no user entry for a string field on my UI, should I store a NULL or a ''?
It depends on the nature of your field. Ask yourself whether the empty string is a valid value for your field.
If it is (for example, house name in an address) then that might be what you want to store (depending on whether or not you know that the address has no house name).
If it's not (for example, a person's name), then you should store a null, because people don't have blank names (in any culture, so far as I know).

Passing default parameter value vs not passing parameter at all?

Here's what I want to do:
Given a table
PeopleOutfit (id int primary key, boots int, hat int)
And a stored procedure
UpdateOutfit #id int, #newBoots int = null, #newHat = null
Is there a way to tell whether I called this procedure as
exec UpdateOutfit #id=1, #newBoots=null, #newHat=5
effectively telling that the person with id of 1 must now be barefoot and wear fifth hat from
exec UpdateOutfit #id=1, #newHat=5
that instructs this person to wear fifth hat keeping his current boots?
In other words, I want to tell (within the stored procedure) if "the default value was used because it was not specified" from "I explicitly called this procedure passing the value that happens to be the same as default one".
I know there are several ways of accomplishing what I want to do such as passing XML or bitmask of fields being updated, but for the moment I just want to make sure whether this exact technique is possible or not.
Edit: Passing reserved values does not work for fields with small range types such as bit. Overloading procedures is also an option that's not acceptable. Creating user-defined type that extends NULL paradigm with additional "NotAValue" value might be an answer, but I need some more guidance on how to implement it.
My guess is no, you can't tell those two things apart.
My suggestion is to use a default value that you would never pass in as an argument. i.e. if the default is null, then maybe you could pass in 0 as the value for #newBoots
no, the default null "looks" the same as a passed in null
possibly make your default -1 and use logic to do something different.
Strictly, no, there's no real facility to accomplish this. You could, however, try using some sort of reserved value for the parameter (a very small negative number, for example) to indicate this.
Never done this myself; introduced the 3 state bit (using an integer) into some code to handle the bit situation. I don't have access to a sql server but i do like latteral thinking sometimes; but i think you might be able to figure it out ysubg a string manipulation over some management views / functions. You would need to run with a heck of a lot of privilege but if its absolutely neccessary i don;t see why you cant work it out from st.text using something like this
SELECT
st.text
FROM
sys.dm_exec_requests r
CROSS APPLY
sys.dm_exec_sql_text(sql_handle) AS st
WHERE
r.session_id = ##SPID
As stated, TSQL doesn't distinguish between supplying the default value and not supplying a value. I think the engine basically substitutes the default values for any missing parameters (or params called with the DEFAULT keyword.)
Instead, use 0 as "No Hat", and NULL as no parameter specified. This is the prefered use of NULL, where it means value unknown or not specified. By using NULL as "No Hat", you've co-opted it into adding an extra value to the range of your data type.
Think of it in terms of the BIT datatype. The datatype is defined to represent a binary value (1 or 0, or T/F if you prefer to think of it as a boolean.) By treating NULL as a valid value, you have extended the datatype beyond the binary options (now have three options, 1/0/NULL.) My recommendation is always that if you find you've run out of values in the current datatype, you're using too small a type.
Back to the stored procedure calling; if you set your default values to NULL, and treat NULL as unset or not specified, then callers should always specify a non-null value when calling the proc. If you get a NULL, assume they didn't supply a value, supplied a NULL, or used the DEFAULT keyword.

Resources