NULL vs empty string - sql-server

What is the difference between the below queries & how it works?
SELECT * FROM some_table WHERE col IS NOT NULL
&
SELECT * FROM some_table WHERE col <> ''
Regards,
Mubarak

The NULL is special data type, it means absence of value.
An empty string on the other hand means a string or value which is empty.
Both are different.
For example, if you have name field in table and by default you have set it to NULL. When no value is specified for it, it will be NULL but if you specify a real name or an empty string, it won't be NULL then, it will contain an empty string instead.

NULL is the absence of value, and usually indicates something meaningful, such as unknown or not (yet) determined. For example, if I start a project today, the StartDate is 2012-02-25. If I don't know how long the project is going to take, what should the EndDate be? I might have some idea what the ProjectedEndDate may be, but I would set the EndDate to NULL, and update it when the project is complete.
'' is a zero-length (or "empty") string. It is not technically the absence of data, since it might actually be meaningful. For example, if I don't have a middle name, depending on your data model '' might make more sense than NULL since the latter implies unknown but '' can imply that it is known that I don't have one. NULL can be used the same way of course, but then it is difficult to decipher whether it is not known whether it exists, or known that it does not exist. A lot of standards have dedicated values for things where it might not be known - for example Gender has I believe 9 different character codes so that if M or F are not specified, we always know exactly why (unknown, unspecified, transgender, etc). Also think of the case where HeartRate is NULL - is it because there was no pulse, or because we haven't taken it yet?
They are not the same, though unfortunately many people treat them the same. If your column allows NULL it means that you know in advance that sometimes you may not know this information. If you are not treating them as the same thing, then your queries would differ. For example if col does not allow NULL, your first query will always return all results in the table, since none of them can be NULL. However NOT NULL still allows an empty string to be entered unless you have also set up a check constraint to prevent zero-length strings also.
Allowing both for the same column is usually a bit confusing for someone trying to understand the data model, though I believe in most cases a NOT NULL constraint is not combined with a LEN(col)>0 check constraint. The problem if both are allowed is that it is difficult to know what it means if the column is NULL or the column is "empty" - they could mean the same thing, but they may not - and this will vary from shop to shop.
Another key point is that NULL compared to anything (at least by default in SQL Server*) evaluates to unknown, which in turn evaluates to false. As an example, these queries all return 0:
DECLARE #x TABLE(i INT);
INSERT #x VALUES(NULL);
SELECT COUNT(*) FROM #x WHERE i = 1;
SELECT COUNT(*) FROM #x WHERE i <> 1;
SELECT COUNT(*) FROM #x WHERE i <= 3;
SELECT COUNT(*) FROM #x WHERE i > 3;
SELECT COUNT(*) FROM #x WHERE i IN (1,2,3);
SELECT COUNT(*) FROM #x WHERE i NOT IN (1,2,3);
Since the comparisons in the where clause always evaluate to unknown, they always come back false, so no rows ever meet the criteria and all counts come back as 0.
In addition, the answers to this question on dba.stackexchange might be useful:
https://dba.stackexchange.com/questions/5222/why-shouldnt-we-allow-nulls
* You can change this by using SET ANSI_NULLS OFF - however this is not advised both because it provides non-standard behavior and because this "feature" has been deprecated since SQL Server 2005 and will become a no-op in a future version of SQL Server. But you can play with the query above and see that the NOT IN behaves differently with SET ANSI_NULLS OFF.

NULL means the value is missing but '' means the value is there but just empty string
so first query means query all rows that col value is not missing, second one means select those rows that col not equals empty string
Update
For further information, I suggest you read this article:
https://sqlserverfast.com/blog/hugo/2007/07/null-the-databases-black-hole/

Select * from table where col IS NOT NULLwould return results excluded from Select * from table where col <> ‘’ because an empty string is also NOT NULL.

https://data.stackexchange.com/stackoverflow/query/62491/http-stackoverflow-com-questions-9444638-null-vs-empty-in-sql-server
SET NOCOUNT ON;
DECLARE #tbl AS TABLE (value varchar(50) NULL, description varchar(50) NOT NULL);
INSERT INTO #tbl VALUES (NULL, 'A Null'), ('', 'Empty String'), ('Some Text', 'A non-empty string');
SELECT * FROM #tbl;
SELECT * FROM #tbl WHERE value IS NOT NULL;
SELECT * FROM #tbl WHERE value <> '';
Note that in the display you cannot distinguish between NULL and '' - this is only an artifact of how the grid and text client display the data, but the data in the set is stored differently for NULL and ''.

As stated in other answers, NULL means 'no value' while empty string '' means just that - empty string. You can think of fields that allow NULLs as optional fields - they can be ignored and value for them may just not be provided.
Imagine an application where respondent selects their title (Mr, Mrs, Miss, Dr) but you do not require him/her to select any of those and leave it blank. In this case you would put NULL into relevant database field.
Distinction between NULL and empty string may not be obvious because they both can mean 'no value' if you decide to. It depends entirely up to you but using NULL would be better mainly because it is a special case for databases which are designed to handle NULLs quickly and efficiently (much faster than strings). If you use it instead of an empty string your queries will be faster and more reliable.

Related

SQL SUBSTRING & PATINDEX of varying lengths

SQL Server 2017.
Given the following 3 records with field of type nvarchar(250) called fileString:
_318_CA_DCA_2020_12_11-01_00_01_VM6.log
_319_CA_DCA_2020_12_12-01_VM17.log
_333_KF_DCA01_00_01_VM232.log
I would want to return:
VM6
VM17
VM232
Attempted thus far with:
SELECT
SUBSTRING(fileString, PATINDEX('%VM[0-9]%', fileString), 3)
FROM dbo.Table
But of course that only returns VM and 1 number.
How would I define the parameter for number of characters when it varies?
EDIT: to pre-emptively answer a question that may come up, yes, the VM pattern will always be proceeded immediately by .log and nothing else. But even if I took that approach and worked backwards, I still don't understand how to define the number of characters to take when the number varies.
here is one way :
DECLARE #test TABLE( fileString varchar(500))
INSERT INTO #test VALUES
('_318_CA_DCA_2020_12_11-01_00_01_VM6.log')
,('_319_CA_DCA_2020_12_12-01_00_01_VM17.log')
,('_333_KF_DCA_2020_12_15-01_00_01_VM232.log')
-- 5 is the length of file extension + 1 which is always the same size '.log'
SELECT
REVERSE(SUBSTRING(REVERSE(fileString),5,CHARINDEX('_',REVERSE(fileString))-5))
FROM #test AS t
This will dynamically grab the length and location of the last _ and remove the .log.
It is not the most efficient, if you are able to write a CLR function usnig C# and import it into SQL, that will be much more efficient. Or you can use this as starting point and tweak it as needed.
You can remove the variable and replace it with your table like below
DECLARE #TESTVariable as varchar(500)
Set #TESTVariable = '_318_CA_DCA_2020_12_11-01_00_01_VM6adf.log'
SELECT REPLACE(SUBSTRING(#TESTVariable, PATINDEX('%VM[0-9]%', #TESTVariable), PATINDEX('%[_]%', REVERSE(#TESTVariable))), '.log', '')
select *,
part = REPLACE(SUBSTRING(filestring, PATINDEX('%VM[0-9]%', filestring), PATINDEX('%[_]%', REVERSE(filestring))), '.log', '')
from table
Your lengths are consistent at the beginning. So get away from patindex and use substring to crop out the beginning. Then just replace the '.log' with an empty string at the end.
select *,
part = replace(substring(filestring,33,255),'.log','')
from table;
Edit:
Okay, from your edit you show differing prefix portions. Then patindex is in fact correct. Here's my solution, which is not better or worse than the other answers but differs with respect to the fact that it avoids reverse and delegates the patindex computation to a cross apply section. You may find it a bit more readable.
select filestring,
part = replace(substring(filestring, ap.vmIx, 255),'.log','')
from table
cross apply (select
vmIx = patindex('%_vm%', filestring) + 1
) ap

TSQL Variable With List of Values for IN Clause

I want to use a clause along the lines of "CASE WHEN ... THEN 1 ELSE 0 END" in a select statement. The tricky part is that I need it to work with "value IN #List".
If I hard code the list it works fine - and it performs well:
SELECT
CASE WHEN t.column_a IN ( 'value a', 'value b' ) THEN 1 ELSE 0 END AS priority
, t.column_b
, t.column_c
FROM
table AS t
ORDER BY
priority DESC
What I would like to do is:
-- #AvailableValues would be a list (array) of strings.
DECLARE
#AvailableValues ???
SELECT
#AvailableValues = ???
FROM
lookup_table
SELECT
CASE WHEN t.column_a IN #AvailableValues THEN 1 ELSE 0 END AS priority
, t.column_b
, t.column_c
FROM
table AS t
ORDER BY
priority DESC
Unfortunately, it seems that SQL Server doesn't do this - you can't use a variable with an IN clause. So this leaves me with some other options:
Make '#AvailableValues' a comma-delimited string and use a LIKE statement. This does not perform well.
Use an inline SELECT statement against 'lookup_table' in place of the variable. Again, doesn't perform well (I think) because it has to lookup the table on each row.
Write a function wrapping around the SELECT statement in place of the variable. I haven't tried this yet (will try it now) but it seems that it will have the same problem as a direct SELECT statement.
???
Are there any other options? Performance is very important for the query - it has to be really fast as it feeds a real-time search result page (i.e. no caching) for a web site.
Are there any other options here? Is there a way to improve the performance of one of the above options to get good performance?
Thanks in advance for any help given!
UPDATE: I should have mentioned that the 'lookup_table' in the example above is already a table variable. I've also updated the sample queries to better demonstrate how I'm using the clause.
UPDATE II: It occurred to me that the IN clause is operating off an NVARCHAR/NCHAR field (due to historical table design reasons). If I was to make changes that dealt with integer fields (i.e through PK/FK relationship constraints) could this have much impact on performance?
You can use a variable in an IN clause, but not in the way you're trying to do. For instance, you could do this:
declare #i int
declare #j int
select #i = 10, #j = 20
select * from YourTable where SomeColumn IN (#i, #j)
The key is that the variables cannot represent more than one value.
To answer your question, use the inline select. As long as you don't reference an outer value in the query (which could change the results on a per-row basis), the engine will not repeatedly select the same data from the table.
Based on your update and assuming the lookup table is small, I suggest trying something like the following:
DECLARE #MyLookup table
(SomeValue nvarchar(100) not null)
SELECT
case when ml.SomeValue is not null then 1 else 0 end AS Priority
,t.column_b
,t.column_c
from MyTable t
left outer join #MyLookup ml
on ml.SomeValue = t.column_a
order by case when ml.SomeValue is not null then 1 else 0 end desc
(You can't reference the column alias "Priority" in the ORDER BY clause. Alternatively, you could use the ordinal position like so:
order by 1 desc
but that's generally not recommended.)
As long as the lookup table is small , this really should run fairly quickly -- but your comment implies that it's a pretty big table, and that could slow down performance.
As for n[Var]char vs. int, yes, integers would be faster, if only because the CPU has fewer bytes to juggle around... which shoud only be a problem when processing a lot of rows, so it might be worth trying.
I solved this problem by using a CHARINDEX function. I wanted to pass the string in as a single parameter. I created a string with leading and trailing commas for each value I wanted to test for. Then I concatenated a leading and trailing commas to the string I wanted to see if was "in" the parameter. At the end I checked for CHARINDEX > 0
DECLARE #CTSPST_Profit_Centers VARCHAR (256)
SELECT #CTSPST_Profit_Centers = ',CS5000U37Y,CS5000U48B,CS5000V68A,CS5000V69A,CS500IV69A,CS5000V70S,CS5000V79B,CS500IV79B,'
SELECT
CASE
WHEN CHARINDEX(','+ISMAT.PROFIT_CENTER+',' ,#CTSPST_Profit_Centers) > 0 THEN 'CTSPST'
ELSE ISMAT.DESIGN_ID + ' 1 CPG'
END AS DESIGN_ID
You can also do it in the where clause
WHERE CHARINDEX(','+ISMAT.PROFIT_CENTER+',',#CTSPST_Profit_Centers) > 0
If you were trying to compare numbers you'd need to convert the number to a text string for the CHARINDEX function to work.
This might be along the lines of what you need.
Note that this assumes that you have permissions and the input data has been sanitized.
From Running Dynamic Stored Procedures
CREATE PROCEDURE MyProc (#WHEREClause varchar(255))
AS
-- Create a variable #SQLStatement
DECLARE #SQLStatement varchar(255)
-- Enter the dynamic SQL statement into the
-- variable #SQLStatement
SELECT #SQLStatement = "SELECT * FROM TableName WHERE " + #WHEREClause
-- Execute the SQL statement
EXEC(#SQLStatement)

MyNullableCol <> 'MyValue' Doesn't Includes Rows where MyNullableCol IS NULL

Today I found a strange problem that is I have a Table With a Nullable Column and I tried to use following Query
SELECT *
Id,
MyNCol,
FROM dbo.[MyTable]
WHERE MyNCol <> 'MyValue'
And Expecting it to return all rows not having 'MyValue' value in Field MyNCol. But its not returning all those Rows Containing NULL in above specified Column(MyNCol).
So I have to Rewrite my query to
SELECT *
Id,
MyNCol,
FROM dbo.[MyTable]
WHERE MyNCol <> 'MyValue' OR MyNCol IS NULL
Now My question is why is it why the first query is not sufficient to perform what i desire. :(
Regards
Mubashar
Look into the behaviour of "ANSI NULLs".
Standardly, NULL != NULL. You can change this behaviour by altering the ANSI NULLS enabled property on your database, but this has far ranging effects on column storage and query execution. Alternatively, you can use the SET ANSI_NULLS statement in your SQL script.
MSDN Documentation for SQL Server 2008 is here:
SET ANSI_NULLS (Transact-SQL)
Pay particular attention to the details of how comparisons are performed - even when setting ANSI_NULLS OFF, column to column comparisons may not behave as you intended. It is best practice to always use IS NULL and IS NOT NULL.
Equality operators cannot be performed on null values. In SQL 2000, you could check null = null, however in 2005 this changed.
NULL = UNKNOWN. So you can neither say that MyNCol = 'My Value' nor that MyNCol <> 'My Value' - both evaluate to UNKNOWN (just like 'foo' + MyNCol will still become NULL). An alternative to the workaround you posted is:
WHERE COALESCE(MyNCol, 'Not my Value') <> 'My Value';
This will include the NULL columns because the comparison becomes false rather than unknown. I don't like this option any better because it relies on some magic value in the comparison, even though that magic value will only be supplied when the column is NULL.

SQL Cast Mystery

I have a real mystery with the T-SQL below. As it is, it works with either the DATAP.Private=1, or the cast as int on Right(CRS,1). That is, if I uncomment the DATAP.Private=1, I get the error Conversion failed when converting the varchar value 'M' to data type int, and if I then remove that cast, the query works again. With the cast in place, the query only works without the Private=1.
I cannot for the life of me see how the Private=1 can add anything to the result set that will cause the error, unless Private is ever 'M', but Private is a bit field!
SELECT
cast(Right(CRS,1) as int) AS Company
, cast(PerNr as int) AS PN
, Round(Sum(Cost),2) AS Total_Cost
FROM
DATAP
LEFT JOIN BU_Summary ON DATAP.BU=BU_Summary.BU
WHERE
DATAP.Extension Is Not Null
--And DATAP.Private=1
And Left(CRS,2)='SB'
And DATAP.PerNr Between '1' And '9A'
and Right(CRS,1) <> 'm'
GROUP BY
cast(Right(CRS,1) as int)
, cast(PerNr as int)
ORDER BY
cast(PerNr as int)
I've seen something like this in the past. It's possible the DATAP.Private = 1 clause is generating a query plan that is performing the CRS cast before the Right(CRS,1) <> 'm' filter is applied.
It sure shouldn't do that, but I've had similar problems in T-SQL I've written, particularly when views are involved.
You might be able to reorder the elements to get the query to work, or select uncast data values into a temporary table or table variable and then select and cast from there in a separate statement as a stopgap.
If you check the execution plan it might shed some light about what is being calculated where. This might give you more ideas as to what you might change.
Just a guess, but it may be that when Private = 1, PerNr cannot be anything but a castable number in your data (as it is in the PerNr can equal '9A [or whatever else]', breaking the cast in the group by and order by clauses).
CAST('9A' AS int) fails when I tested it. It looks like you're doing unnecessary CASTS for grouping and sorting. Particularly in the GROUP BY, that will at least kill any chance for optimizing.

Oracle considers empty strings to be NULL while SQL Server does not - how is this best handled?

I have to write a component that re-creates SQL Server tables (structure and data) in an Oracle database. This component also has to take new data entered into the Oracle database and copy it back into SQL Server.
Translating the data types from SQL Server to Oracle is not a problem. However, a critical difference between Oracle and SQL Server is causing a major headache. SQL Server considers a blank string ("") to be different from a NULL value, so a char column can be defined as NOT NULL and yet still include blank strings in the data.
Oracle considers a blank string to be the same as a NULL value, so if a char column is defined as NOT NULL, you cannot insert a blank string. This is causing my component to break whenever a NOT NULL char column contains a blank string in the original SQL Server data.
So far my solution has been to not use NOT NULL in any of my mirror Oracle table definitions, but I need a more robust solution. This has to be a code solution, so the answer can't be "use so-and-so's SQL2Oracle product".
How would you solve this problem?
Edit: here is the only solution I've come up with so far, and it may help to illustrate the problem. Because Oracle doesn't allow "" in a NOT NULL column, my component could intercept any such value coming from SQL Server and replace it with "#" (just for example).
When I add a new record to my Oracle table, my code has to write "#" if I really want to insert a "", and when my code copies the new row back to SQL Server, it has to intercept the "#" and instead write "".
I'm hoping there's a more elegant way.
Edit 2: Is it possible that there's a simpler solution, like some setting in Oracle that gets it to treat blank strings the same as all the other major database? And would this setting also be available in Oracle Lite?
I don't see an easy solution for this.
Maybe you can store your values as one or more blanks -> ' ', which aren't NULLS in Oracle, or keep track of this special case through extra fields/tables, and an adapter layer.
My typical solution would be to add a constraint in SQL Server forcing all string values in the affected columns to have a length greater than 0:
CREATE TABLE Example (StringColumn VARCHAR(10) NOT NULL)
ALTER TABLE Example
ADD CONSTRAINT CK_Example_StringColumn CHECK (LEN(StringColumn) > 0)
However, as you have stated, you have no control over the SQL Database. As such you really have four choices (as I see it):
Treat empty string values as invalid, skip those records, alert an operator and log the records in some manner that makes it easy to manually correct / re-enter.
Convert empty string values to spaces.
Convert empty string values to a code (i.e. "LEGACY" or "EMPTY").
Rollback transfers that encounter empty string values in these columns, then put pressure on the SQL Server database owner to correct their data.
Number four would be my preference, but isn't always possible. The action you take will really depend on what the oracle users need. Ultimately, if nothing can be done about the SQL database, I would explain the issue to the oracle business system owners, explain the options and consequences and make them make the decision :)
NOTE: I believe in this case SQL Server actually exhibits the "correct" behaviour.
Do you have to permit empty strings in the SQL Server system? If you can add a constraint to the SQL Server system that disallows empty strings, that is probably the easiest solution.
Its nasty and could have unexpected side effects.. but you could just insert "chr(0)" rather than ''.
drop table x
drop table x succeeded.
create table x ( id number, my_varchar varchar2(10))
create table succeeded.
insert into x values (1, chr(0))
1 rows inserted
insert into x values (2, null)
1 rows inserted
select id,length(my_varchar) from x
ID LENGTH(MY_VARCHAR)
---------------------- ----------------------
1 1
2
2 rows selected
select * from x where my_varchar is not null
ID MY_VARCHAR
---------------------- ----------
1
NOT NULL is a database constraint used to stop putting invalid data into your database. This is not serving any purpose in your Oracle database and so I would not have it.
I think you should just continue to allow NULLS in any Oracle column that mirrors a SqlServer column that is known to contain empty strings.
If there is a logical difference in the SqlServer database between NULL and empty string, then you would need something extra to model this difference in Oracle.
I'd go with an additional column on the oracle side. Have your column allow nulls and have a second column that identifies whether the SQL-Server side should get a null-value or empty-string for the row.
For those that think a Null and an empty string should be considered the same. A null has a different meaning from an empty string. It captures the difference between 'undefined' and 'known to be blank'. As an example a record may have been automatically created, but never validated by user input, and thus receive a 'null' in the expectation that when a user validates it, it will be set to be empty. Practically we may not want to trigger logic on a null but may want to on an empty string. This is analogous to the case for a 3 state checkbox of Yes/No/Undefined.
Both SQL and Oracle have not got it entirely correct. A blank should not satisfy a 'not null' constraint, and there is a need for an empty string to be treated differently than a null is treated.
If you are migrating data you might have to substitute a space for an empty string. Not very elegant, but workable. This is a nasty "feature" of Oracle.
I've written an explanation on how Oracle handles null values on my blog a while ago. Check it here: http://www.psinke.nl/blog/hello-world/ and let me know if you have any more questions.
If you have data from a source with empty values and you must convert to an Oracle database where columns are NOT NULL, there are 2 things you can do:
remove the not null constraint from the Oracle column
Check for each individual column if it's acceptable to place a ' ' or 0 or dummy date in the column in order to be able to save your data.
Well, main point I'd consider is absence of tasks when some field can be null, the same field can be empty string and business logic requires to distinguish these values. So I'd make this logic:
check MSSQL if column has NOT NULL constraint
check MSSQL if column has CHECK(column <> '') or similar constraint
If both are true, make Oracle column NOT NULL. If any one is true, make Oracle column NULL. If none is true, raise INVALID DESIGN exception (or maybe ignore it, if it's acceptable by this application).
When sending data from MSSQL to Oracle, just do nothing special, all data would be transferred right. When retrieving data to MSSQL, any not-null data should be sent as is. For null strings you should decide whether it should be inserted as null or as empty string. To do this you should check table design again (or remember previous result) and see if it has NOT NULL constraint. If has - use empty string, if has not - use NULL. Simple and clever.
Sometimes, if you work with unknown and unpredictable application, you cannot check for existence of {not empty string} constraint because of various forms of it. If so, you can either use simplified logic (make Oracle columns always nullable) or check whether you can insert empty string into MSSQL table without error.
Although, for the most part, I agree with most of the other responses (not going to get into an argument about any I disagree with - not the place for that :) )
I do notice that OP mentioned the following:
"Oracle considers a blank string to be the same as a NULL value, so if a char column is defined as NOT NULL, you cannot insert a blank string."
Specifically calling out CHAR, and not VARCHAR2.
Hence, talking about an "empty string" of length 0 (ie '' ) is moot.
If he's declared the CHAR as, for example, CHAR(5), then just add a space to the empty string coming in, Oracle's going to pad it anyway. You'll end up with a 5 space string.
Now, if OP meant VARCHAR2, well yeah, that's a whole other beast, and yeah, the difference between empty string and NULL becomes relevant.
SQL> drop table junk;
Table dropped.
SQL>
SQL> create table junk ( c1 char(5) not null );
Table created.
SQL>
SQL> insert into junk values ( 'hi' );
1 row created.
SQL>
SQL> insert into junk values ( ' ' );
1 row created.
SQL>
SQL> insert into junk values ( '' );
insert into junk values ( '' )
*
ERROR at line 1:
ORA-01400: cannot insert NULL into ("GREGS"."JUNK"."C1")
SQL>
SQL> insert into junk values ( rpad('', 5, ' ') );
insert into junk values ( rpad('', 5, ' ') )
*
ERROR at line 1:
ORA-01400: cannot insert NULL into ("GREGS"."JUNK"."C1")
SQL>
SQL> declare
2 lv_in varchar2(5) := '';
3 begin
4 insert into junk values ( rpad(lv_in||' ', 5) );
5 end;
6 /
PL/SQL procedure successfully completed.
SQL>

Resources