NULL vs Empty when dealing with user input - database

Yes, another NULL vs empty string question.
I agree with the idea that NULL means not set, while empty string means "a value that is empty". Here's my problem: If the default value for a column is NULL, how do I allow the user to enter that NULL.
Let's say a new user is created on a system. There is a first and last name field; last name is required while first name is not. When creating the user, the person will see 2 text inputs, one for first and one for last. The person chooses to only enter the last name. The first name is technically not set. During the insert I check the length of each field, setting all fields that are empty to NULL.
When looking at the database, I see that the first name is not set. The question that immediately comes to mind is that maybe they never saw the first name field (ie, because of an error). But this is not the case; they left if blank.
So, my question is, how do you decide when a field should be set to NULL or an empty string when receiving user input? How do you know that the user wants the field to be not set without detecting focus or if they deleted a value...or...or...?
Related Question: Should I use NULL or an empty string to represent no data in table column?

I'll break the pattern, and say that I would always use NULL for zero-length strings, for the following reasons.
If you start fine-slicing the implications of blanks, then you must ensure somehow that every other developer reads and writes it the same way.
How do you alphabetize it?
Can you unambiguously determine when a user omitted entering a value, compared with intentionally leaving it blank?
How would you unambiguously query for the difference? Can a query screen user indicate NULL vs. blank using standard input form syntax?
In practice, I've never been prohibited from reading and writing data using default, unsurprising behavior using this rule. If I've needed to know the difference, I've used a boolean field (which is easier to map to unambiguous UI devices). In one case, I used a trigger to enforce True => null value, but never saw it invoked because the BR layer filtered out the condition effectively.

If the user provides an empty string, I always treat it as null from the database perspective. In addition, I will typically trim my string inputs to remove leading/trailing spaces and then check for empty. It's a small win in the database with varchar() types and it also reduces the cases for search since I only need to check for name is null instead of name is null or name = '' You could also go the other way, converting null to ''. Either way, choose a way and be consistent.

What you need to do is figure out what behavior you want. There is no one fixed algebra of how name strings are interpreted.
Think about the state machine here: you have fields which have several states: it slounds like you're thinking about a state of "unitialized", another of "purposefully empty" and a third with some set value. ANYTHING you do that makes that assignment and is consistent with the rest of your program will be find; it sounds like the easy mapping is
NULL → uninitialized
"" → purposefully unset
a name → initialized.

I almost never use NULL when referring to actual data. When used for foreign keys, I would say that NULL is valid, but it is almost never valid for user entered data. The one exception that would probably come up quite regularly is for dates that don't exist, such as an employee database with a "termination_date" field. In that case, all current employees should have a value of NULL in that field. As for getting them to actually enter a null value, for the values that truly require a null value, I would put a checkbox next to the input field so that the user can check it on and off to see the corresponding value to null (or in a more user friendly manner, none). When enabling the checkbox to set the field to null, the corresponding text box should be disabled, and if a null value is already associated, it should start out as disabled, and only become enabled once the user unchecks the null checkbox.

I try to keep things simple. In this case, I'd make the first-name column not-nullable and allow blanks. Otherwise, you'll have three cases to deal with anywhere you refer to this field:
Blank first name
Null first name
Non-blank first name
If you go with 'blank is null' or 'null is blank' then you're down to two cases. Two cases are better than three.
To further answer your question: the user entering data probably doesn't (and shouldn't) know anything about what a "null" is and how it compares to an "empty". This issue should be resolved cleanly and consistently in the system--not the UI.

While your example is mostly for strings, I like to say that I use null for numerical and boolean fields.
An accountbalance of 0 is very different to me as one that is null. Same for booleans, if people take a multiple choice test with true and false answers, it is very important to know whether someone answered true or false or didn't answer at all.
Not using null in these cases would require me to have an extra table or a different setup to see whether someone answered a question. You could use for instance -1 for not filled in 0 for false and 1 for true, but then you're using a numerical field for something that's essentially a boolean.

I've never, ever had a use for a NULL value in production code. An empty string is a fine sentinel value for a blank name field, phone number, or annual income for any application. That said, I'm sure you could find some use for it, but I just think it's overused. If I were to use a NULL value, however, I imagine I'd use it anywhere I want to represent an empty value.

I have always used NULL for uninitialized values, empty for purposely empty values and 0 for off indicators.
By doing this all the time, it is there even if I am not using it, but I don't have to do anything different if I need that distinction.
I am usually testing for empty(), but sometimes I check for isset() which evaluates false on NULL. This is useful for reminders to answer certain questions. If it is empty, false or 0 then the question is answered.

Related

scala readCsvFile behaviour

I´m using flink to load a csv file to a dataset of pojos, defined through a scala case class, using readCsvFile method, and I have a problem that i cannot solve.
When in the csv there is a record with some format error in any of its fields it is discarded, and I assume that the only way to keep those records is to type them all as String and do the validations myself.
The problem is that if the last field after the delimiter is empty, the record is discarded by default, I think because it is considered as not having the expected number of fields it should, and it is not possible to handle this record error, while if the empty value if in any of the previous fields there is no problem.
Example
field1|field2|field3
a||c
a|b|
In this example, first record is returned by readCsvFile method but not the second.
Is this behaviour right? and there is any walk around to get the record?
Thanks
Case classes and tuples in Flink do not support null values. Therefore, a||c is invalid if the empty field is not a String. I recommend to use the RowCsvInputFormat in this case. It supports nulls and generic rows can be converted to any other class in a following map operator.
The problem is that, as you say, if the field is a String, the record should be valid even if it is null, and this doesn't happen when the null value is in the last field. The behaviour is different depending on the position.
I will try also with RowCsvInputFormat as you recommend.
Thanks

SQL would using between statement improve this?

I want to find out using a select statement what columns in a table share similar information.
Example: Classes table with ClassID, ClassName, ClassCode, ClassDescription columns.
This was part of my SQL class that I already turned in. The question asked "What classes are part of the English department?"
I used this Select statement:
SELECT *
FROM Classes
WHERE ClassName LIKE "English%" OR ClassCode LIKE "ENG%"
Granted we have only input one actual English course in this database, the end result was it executed fine and displayed everything for just the English class. Which I thought was a success since we did populate other non English courses in the database.
Anyways, I was told I should have used a BETWEEN statement.
I am just sitting here thinking they would both do what I needed them to do right?
I'm using SQL Server 2014
No, BETWEEN would probably be a bad idea here. BETWEEN doesn't allow wildcards and doesn't do any pattern matching in any RDBMS I've used. So you'd have to say BETWEEN 'ENG' AND 'English'. Except that doesn't return things like 'English I' (which would be after 'English' in a sorted list).
It would also potentially include something like 'Engineering' or 'Engaging Artistry', but that's a weakness of your existing query, too, since LIKE 'ENG%' matches those.
If you happen to be using a case-sensitive collation you add a whole new dimension of complexity. Your BETWEEN statement gets even more confusing. Just know that capital letters generally come before lower case letters, so 'ENGRAVING I' would be included but 'Engraving I' would not. Additionally, 'eng' would not be included. Note that case-insensitive collation is the default.
Also whats the difference when searching for null values in one table
and one column
column_name =''
or
column_name IS NULL
You're not understanding the difference between an empty string and null.
An empty string is explicit. It says "This field has a known value and it is a string of zero length."
A null string is imprecise. It means "unknown". It could mean "This value wasn't asked for," or "This value was not available," or "This value has not yet been determined," or "This values does not make sense for this record."
"What is this person's middle name?"
"He doesn't have one. See, his birth certificate has no middle name listed." --> Empty string
"I don't know. He never told me and I don't have any birth or identity record." --> NULL
Note that Oracle, due to backwards compatibility, treats empty strings as NULLs. This is explicitly against ANSI SQL, but since Oracle is that old and that's how it's always worked that's how it will continue to work.
Another way to look at it is the example I tend to use with numbers. The difference between 0 and NULL is the difference between having a bank account with $0 balance and not having a bank account at all.
Nothing can be said unless we see table and its data.Though don't use between.
Secondly first find which of the column is not null by design.Say for example ClassName cannot be null then there is no use using ClassCode LIKE "ENG%",just ClassName LIKE "English%" is enough,similarly vice versa is also true.
Thirdly you should use same parameter in both column.for example
ClassName LIKE "English%" OR ClassCode LIKE "English%"
see the difference.
Select * FROM Classes
Where ClassName LIKE "%English%"

SSRS 2005 parameters not returning any results when null checked

newbie to SSRS, running 2005. I hope someone can help!
I have a report with a parameter on it, ItemNum. I have it as a string, to allow NULL value and allow blank value, nothing in Available values, and null selected for Default Values. I then have in there WHERE part of my SELECT statement where item_num IN (#ItemNum).
when I preview the report, I get NO results if I:
leave the NULL checkbox checked (and obviously the Item # parameter empty, it's greyed out) or
uncheck the NULL checkbox but don't put a value in the parameter.
the ONLY way I can get ANY results back is to enter a valid item number.
I would think that if you Allow Null values and/or Allow Blank values, it should just disregard that parameter. Perhaps it's because of it being in the where clause?
how can I make this function the way I need it to, i.e. if the NULL checkbox is checked (or unchecked and NO value entered) disregard that parameter (or in other words, return all records)?
also -- as a 2nd question -- is it possible to use the LIKE comparison with a parameter or does it always have to be IN?
Try changing your WHERE clause to:
WHERE #ItemNum is null OR item_num IN (#ItemNum)
You need to explicitly check for NULL in this case.
In fact, why are you using IN for a single value? It's usually used to check for a number of values, e.g. WHERE item_num IN (1, 2, 3, 5, 10). A simple = would suffice.

What to use when not using Boolean type?

Ok this might be a brain dead question but I am curious.
I often find I have a situation where I need to store 1 of 3 values in a database. For example if my application asks "Is this house for sale?", the user needs to be able to say "yes", "no" but sometimes the user doesn't know. So there is also "I don't know".
I am always tempted to set my data type as Boolean, but of course that only gives me yes or no and usually this is set to no for default.
I am curious as to what data type is set as common practice in this scenario?
I could use an integer and store "1", "2" or "3". Or I could store a string value, or , or ,or.
This is probably a silly question and there maybe a million ways that this is done. but if anyone knows a best practices method and why that would be helpful. Thanks.
In a database a boolean value can actually expresses three states - true, false, and NULL (provided you allow NULL in the column).
Use NULL for "I don't know".
(And probably set the default for the column to NULL as well)
EDIT: Thinking about this for a moment though, this can be problematic depending on your use case. Some high level languages (Java, for one) would end up converting the NULL to false in the query result set.
::shrug:: Use a varchar(1) ('t','f','u') or the smallest integer value available (for space considerations) ... either is fine.
Using an enum is another option, but be aware that it isn't portable (Oracle being the notable problem child).
You could use NULL for "I don't know". You should still be able to have TRUE and FALSE for your boolean type column.
Also you can use
ENUM('first','second','third')
it's can be helpful not just with integer numbers, and with string values too
I'd advise against using NULL for "I don't know" for the following reasons:
In code, you typically check the value of a bool by asking something like this:
IF myBool THEN...
Or
IF NOT myBool THEN...
...as such, you can't tell the difference between false and 'i don't know'
Additionally, I've found it to be a best-practice for my own situation to always have a default value (false) for bools in my database. One direct effect of nulls in a bool field can be errors in 3rd-party tools (such as MS Access making an ODBC connection to a SQL-Server backend) that can't handle nulls in bool fields gracefully / accurately.
So as for what I do recommend -- I'd say find your smallest number data type (like a tinyint in TSQL) and go with that. Then you can have 0 = false, 1 = true, 2 = IDK.
This depends on what data types the server supports. Every data type that is 1 byte can hold up to 255 different values. In MySQL for example, you can use TINYINT. Define numeric constants in your code and use the smalled integer representation possible.

SQL Server: Null VS Empty String

How are the NULL and Empty Varchar values stored in SQL Server. And in case I have no user entry for a string field on my UI, should I store a NULL or a '' ?
There's a nice article here which discusses this point. Key things to take away are that there is no difference in table size, however some users prefer to use an empty string as it can make queries easier as there is not a NULL check to do. You just check if the string is empty. Another thing to note is what NULL means in the context of a relational database. It means that the pointer to the character field is set to 0x00 in the row's header, therefore no data to access.
Update
There's a detailed article here which talks about what is actually happening on a row basis
Each row has a null bitmap for columns that allow nulls. If the row in
that column is null then a bit in the bitmap is 1 else it's 0.
For variable size datatypes the acctual size is 0 bytes.
For fixed size datatype the acctual size is the default datatype size
in bytes set to default value (0 for numbers, '' for chars).
the result of DBCC PAGE shows that both NULL and empty strings both take up zero bytes.
Be careful with nulls and checking for inequality in sql server.
For example
select * from foo where bla <> 'something'
will NOT return records where bla is null. Even though logically it should.
So the right way to check would be
select * from foo where isnull(bla,'') <> 'something'
Which of course people often forget and then get weird bugs.
The conceptual differences between NULL and "empty-string" are real and very important in database design, but often misunderstood and improperly applied - here's a short description of the two:
NULL - means that we do NOT know what the value is, it may exist, but it may not exist, we just don't know.
Empty-String - means we know what the value is and that it is nothing.
Here's a simple example:
Suppose you have a table with people's names including separate columns for first_name, middle_name, and last_name. In the scenario where first_name = 'John', last_name = 'Doe', and middle_name IS NULL, it means that we do not know what the middle name is, or if it even exists. Change that scenario such that middle_name = '' (i.e. empty-string), and it now means that we know that there is no middle name.
I once heard a SQL Server instructor promote making every character type column in a database required, and then assigning a DEFAULT VALUE to each of either '' (empty-string), or 'unknown'. In stating this, the instructor demonstrated he did not have a clear understanding of the difference between NULLs and empty-strings. Admittedly, the differences can seem confusing, but for me the above example helps to clarify the difference. Also, it is important to understand the difference when writing SQL code, and properly handle for NULLs as well as empty-strings.
An empty string is a string with zero length or no character.
Null is absence of data.
NULL values are stored separately in a special bitmap space for all the columns.
If you do not distinguish between NULL and '' in your application, then I would recommend you to store '' in your tables (unless the string column is a foreign key, in which case it would probably be better to prohibit the column from storing empty strings and allow the NULLs, if that is compatible with the logic of your application).
NULL is a non value, like undefined. '' is a empty string with 0 characters.
The value of a string in database depends of your value in your UI, but generally, it's an empty string '' if you specify the parameter in your query or stored procedure.
if it's not a foreign key field, not using empty strings could save you some trouble. only allow nulls if you'll take null to mean something different than an empty string. for example if you have a password field, a null value could indicate that a new user has not created his password yet while an empty varchar could indicate a blank password. for a field like "address2" allowing nulls can only make life difficult. things to watch out for include null references and unexpected results of = and <> operators mentioned by Vagif Verdi, and watching out for these things is often unnecessary programmer overhead.
edit: if performance is an issue see this related question: Nullable vs. non-null varchar data types - which is faster for queries?
In terms of having something tell you, whether a value in a VARCHAR column has something or nothing, I've written a function which I use to decide for me.
CREATE FUNCTION [dbo].[ISNULLEMPTY](#X VARCHAR(MAX))
RETURNS BIT AS
BEGIN
DECLARE #result AS BIT
IF #X IS NOT NULL AND LEN(#X) > 0
SET #result = 0
ELSE
SET #result = 1
RETURN #result
END
Now there is no doubt.
How are the "NULL" and "empty varchar" values stored in SQL Server.
Why would you want to know that? Or in other words, if you knew the answer, how would you use that information?
And in case I have no user entry for a string field on my UI, should I store a NULL or a ''?
It depends on the nature of your field. Ask yourself whether the empty string is a valid value for your field.
If it is (for example, house name in an address) then that might be what you want to store (depending on whether or not you know that the address has no house name).
If it's not (for example, a person's name), then you should store a null, because people don't have blank names (in any culture, so far as I know).

Resources