Comparing with null by converting value to a constant with isnull

Comparing with null by converting value to a constant with isnull - sql-server

One of our programmers tends to use isnull in MS SQL to compare with NULLs.
That is instead of writing Where ColumnName Is Null he writes Where IsNull(ColumnName,0)=0
I think that the optimizer will convert the latter into the former anyway, but if it does not - is there a way to prove that the latter is less effective, since it
1.Compares with null,
2.Converts to integer,
3.Compares 2 integers
instead of just comparing with null.
Both ways are really fast for me to be able to use the execution plans (and also I think, that the optimizer plays its part). Is there a way to prove him that just comparing with Null without IsNull is more effective (unless it's not).

Another obvious issue is the ISNULL precludes the use of indexes.
Run this setup:
create table T1 (
ID int not null primary key,
Column1 int null
)
go
create index IX_T1 on T1(Column1)
go
declare #i int
set #i = 10000
while #i > 0
begin
insert into T1 (ID,Column1) values (#i,CASE WHEN #i%1000=0 THEN NULL ELSE #i%1000 END)
set #i = #i - 1
end
go
Then turn on execution plans and run the following:
select * from T1 where Column1 is null
select * from T1 where ISNULL(Column1,0)=0
The first uses an index seek (using IX_T1) and is quite efficient. The second uses an index scan on the clustered index - it has to look at every row in the table.
On my machine, the second query took 90% of the time, the first 10%.

IsNull is not well used if you are using it in the where clause and comparing it to 0, the use of isnull is to replace the value null
http://msdn.microsoft.com/en-us/library/ms184325.aspx
For example:
SELECT Description, DiscountPct, MinQty, ISNULL(MaxQty, 0.00) AS 'Max Quantity'
FROM Sales.SpecialOffer;

Related

What's faster (Field is not null or Field is not null) or (isnull(field,0) + isnull(field,0) != 0)?

I want to only process rows that have at least one field with a value.
with a table like
create table #Temp
(
Id int,
cnt1 int,
cnt2 int,
cnt3 int,
cnt4 int,
cnt5 int
)
which one of the following two queries is faster?
select
*
from
#Temp
where
(cnt1 is not null or cnt2 is not null or cnt3 is not null or cnt4 is not null or cnt5 is not null)
or
select
*
from
#Temp
where
isnull(cnt1,0) + isnull(cnt2,0) + isnull(cnt3,0) + isnull(cnt4,0) + isnull(cnt5,0) != 0
I don't want results where all fields are equal to zero so the second query isn't a problem for me.
Also, is there a better (easier to read) way to do either query?

I don't think there'll be a truly "good" query for this. No matter what you do, it still amounts to a series of OR conditions, and that rarely runs well. I suggest this:
select
*
from
#Temp
where coalesce(cnt1, cnt2, cnt3, cnt4, cnt5) is not null
But you should try it on your database, with your actual data, resources, indexes, and load, and see what works best.

The first query is seargable, the second one is not.
This means that if there are any indexes that SQL Server might be able to use to optimize the query, it can use them with the first question but not with the second one, and in that case, the first question will be faster.
With no relevant indexes, You should race your horses (as Tim wrote in his comment).

Is it necessary to test for NULL if also testing for greater than?

I inherited some old stored procedures today, and came across several examples that followed this general pattern, where #Test is some INT value:
IF #Test IS NOT NULL AND #Test > 0
-- do something...
My understanding is that if #Test is NULL, then it has no value, and is not greater than, less than or even equal to zero. Therefore, testing for NULL is redundant in the above code:
IF #Test > 0
-- do something...
This second version seems to work just fine, and is far more readable IHMO.
So, my question: Is my understanding of NULL being unnecessary in this instance correct, or is there some obvious use-case I'm overlooking here where it could all go horribly wrong?
Note: In some cases, it was obvious that the intent was checking for the existence of a value, and I've changed those to IF EXISTS... my question is more concerned with the general case outlined above.

In SQL all comparisons to a NULL value evaluate to false.
So you always have to check explicitly for NULL, if you wish to act on it.
So, in this case, the additional test is not necessary.

#FlorianHeer is right on. NULL > 0 will eventually evaluate to false but as #Pred points out that is because Null > 0 actually evaluates to null and null cast to a bit is false....
A null is an unknown and therefore any comparison with it is also unknown. Think of arithmetic functions such as addition 1 + NULL = NULL, or concatenation 'A' + NULLL = NULL. NULL means the SQL database engine cannot interpret what its value is so any function or comparison on it is also unknown.
#MikkaRin pointed out that it is the assumption in the ELSE portion of a case statement or IF statement where that can become problematic but lets also think about this in the context of a join and how you may or may not want to see the results.
DECLARE #Table1 AS TABLE (Col INT)
DECLARE #Table2 AS TABLE (Col INT)
INSERT INTO #Table1 VALUES (1),(2),(3)
INSERT INTO #Table2 VALUES (1),(NULL),(3),(4)
SELECT *
FROM
#Table1 t1
INNER JOIN #Table2 t2
ON t1.Col <> t2.Col
Naturally you might think because NULL would be not equal to 1,2,3 that it should be included in the result set. But null is unknown so SQL is saying well I don't know if NULL could be 1,2,3 so I cannot return that as a result.
Now lets do the same thing but add a NULL in the first table:
DECLARE #Table1 AS TABLE (Col INT)
DECLARE #Table2 AS TABLE (Col INT)
INSERT INTO #Table1 VALUES (1),(2),(3),(NULL)
INSERT INTO #Table2 VALUES (1),(NULL),(3),(4)
SELECT *
FROM
#Table1 t1
INNER JOIN #Table2 t2
ON t1.Col = t2.Col
Again you might think that NULL is = to NULL but any comparison of NULL is considered unknown so even though both tables have NULL in it it will not be returned in the dataset.
Now consider:
DECLARE #Table1 AS TABLE (Col INT)
INSERT INTO #Table1 VALUES (1),(2),(3),(NULL)
SELECT *, CASE WHEN Col < 2 THEN Col ELSE 1000 END as ColCase
FROM
#Table1 t1
Which will make even the NULL 1000 the question is should NULL an unknown be 1000? if NULL is unknown how do we know that it isn't less than 2?
For a lot of your operations it may simply be enough to compare #Value > 1 but especially when you start dealing with ELSE in case of IF statements or joining on the antithesis you should consider dealing with the NULLs. Such as using ISNULL() or COALESCE() as #GuidoG points out.
IMHO being explicit about your intentions during operations to appropriately account for null values out weighs the minimal savings of typing.

Compare with NULL is necessary if you use ELSE statements:
for example:
declare #t int
set #t=null
if (#t>0) print '1' -- works fine
if (#t<0) print '2' --works fine
if (#t>0)
print '3' --works fine
else print '4' --here we start getting problems, because we are sure that #t<=0 that is obviously not true

you could replace it with
if isnull(#test, 0) > 0
This way it will be shorter and you still have checked everything

another interesting example:
SELECT (null > 0) AS a, !(null > 0) AS b
value of both a and b will be NULL

From my understanding, in some cases null checks are added sometimes to short circuit OR logic. For example, consider the following:
select * from tbl where (#id is null or #id > id)
If you pass in a value for #id, it tests the first condition (#id is null) and sees that it's false, but since it's part of an OR statement, it then goes ahead and then runs the #id > id comparison to see what that one returns as well. OR statements only need one true returned for the whole thing to resolve to true, and must keep testing until it comes across an OR condition that does.
Whereas if you pass in null for the #id parameter, as soon as it gets to the first condition and it returns true. Seeing that the next it's part of an OR statement, SQL knows it doesn't even have to do any of the following comparison, because the entire OR statement has already resolved to true. The #id > id comparison and will not even run it. This can save a ton of processing if it's a huge table or complex join, etc.

Why is SQL Server using index scan instead of index seek when WHERE clause contains parameterized values

We have found that SQL Server is using an index scan instead of an index seek if the where clause contains parametrized values instead of string literal.
Following is an example:
SQL Server performs index scan in following case (parameters in where clause)
declare #val1 nvarchar(40), #val2 nvarchar(40);
set #val1 = 'val1';
set #val2 = 'val2';
select
min(id)
from
scor_inv_binaries
where
col1 in (#val1, #val2)
group by
col1
On the other hand, the following query performs an index seek:
select
min(id)
from
scor_inv_binaries
where
col1 in ('val1', 'val2')
group by
col1
Has any one observed similar behavior, and how they have fixed this to ensure that query performs index seek instead of index scan?
We are not able to use forceseek table hint, because forceseek is supported on SQL Sserver 2005.
I have updated the statistics as well.
Thank you very much for help.

Well to answer your question why SQL Server is doing this, the answer is that the query is not compiled in a logical order, each statement is compiled on it's own merit,
so when the query plan for your select statement is being generated, the optimiser does not know that #val1 and #Val2 will become 'val1' and 'val2' respectively.
When SQL Server does not know the value, it has to make a best guess about how many times that variable will appear in the table, which can sometimes lead to sub-optimal plans. My main point is that the same query with different values can generate different plans. Imagine this simple example:
IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL
DROP TABLE #T;
CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL);
INSERT #T (Val)
SELECT TOP 991 1
FROM sys.all_objects a
UNION ALL
SELECT TOP 9 ROW_NUMBER() OVER(ORDER BY a.object_id) + 1
FROM sys.all_objects a;
CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val);
All I have done here is create a simple table, and add 1000 rows with values 1-10 for the column val, however 1 appears 991 times, and the other 9 only appear once. The premise being this query:
SELECT COUNT(Filler)
FROM #T
WHERE Val = 1;
Would be more efficient to just scan the entire table, than use the index for a seek, then do 991 bookmark lookups to get the value for Filler, however with only 1 row the following query:
SELECT COUNT(Filler)
FROM #T
WHERE Val = 2;
will be more efficient to do an index seek, and a single bookmark lookup to get the value for Filler (and running these two queries will ratify this)
I am pretty certain the cut off for a seek and bookmark lookup actually varies depending on the situation, but it is fairly low. Using the example table, with a bit of trial and error, I found that I needed the Val column to have 38 rows with the value 2 before the optimiser went for a full table scan over an index seek and bookmark lookup:
IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL
DROP TABLE #T;
DECLARE #I INT = 38;
CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL);
INSERT #T (Val)
SELECT TOP (991 - #i) 1
FROM sys.all_objects a
UNION ALL
SELECT TOP (#i) 2
FROM sys.all_objects a
UNION ALL
SELECT TOP 8 ROW_NUMBER() OVER(ORDER BY a.object_id) + 2
FROM sys.all_objects a;
CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val);
SELECT COUNT(Filler), COUNT(*)
FROM #T
WHERE Val = 2;
So for this example the limit is 3.7% of matching rows.
Since the query does not know the how many rows will match when you are using a variable it has to guess, and the simplest way is by finding out the total number rows, and dividing this by the total number of distinct values in the column, so in this example the estimated number of rows for WHERE val = #Val is 1000 / 10 = 100, The actual algorithm is more complex than this, but for example's sake this will do. So when we look at the execution plan for:
DECLARE #i INT = 2;
SELECT COUNT(Filler)
FROM #T
WHERE Val = #i;
We can see here (with the original data) that the estimated number of rows is 100, but the actual rows is 1. From the previous steps we know that with more than 38 rows the optimiser will opt for a clustered index scan over an index seek, so since the best guess for the number of rows is higher than this, the plan for an unknown variable is a clustered index scan.
Just to further prove the theory, if we create the table with 1000 rows of numbers 1-27 evenly distributed (so the estimated row count will be approximately 1000 / 27 = 37.037)
IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL
DROP TABLE #T;
CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL);
INSERT #T (Val)
SELECT TOP 27 ROW_NUMBER() OVER(ORDER BY a.object_id)
FROM sys.all_objects a;
INSERT #T (val)
SELECT TOP 973 t1.Val
FROM #T AS t1
CROSS JOIN #T AS t2
CROSS JOIN #T AS t3
ORDER BY t2.Val, t3.Val;
CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val);
Then run the query again, we get a plan with an index seek:
DECLARE #i INT = 2;
SELECT COUNT(Filler)
FROM #T
WHERE Val = #i;
So hopefully that pretty comprehensively covers why you get that plan. Now I suppose the next question is how do you force a different plan, and the answer is, to use the query hint OPTION (RECOMPILE), to force the query to compile at execution time when the value of the parameter is known. Reverting to the original data, where the best plan for Val = 2 is a lookup, but using a variable yields a plan with an index scan, we can run:
DECLARE #i INT = 2;
SELECT COUNT(Filler)
FROM #T
WHERE Val = #i;
GO
DECLARE #i INT = 2;
SELECT COUNT(Filler)
FROM #T
WHERE Val = #i
OPTION (RECOMPILE);
We can see that the latter uses the index seek and key lookup because it has checked the value of variable at execution time, and the most appropriate plan for that specific value is chosen. The trouble with OPTION (RECOMPILE) is that means you can't take advantage of cached query plans, so there is an additional cost of compiling the query each time.

I had this exact problem and none of query option solutions seemed to have any effect.
Turned out I was declaring an nvarchar(8) as the parameter and the table had a column of varchar(8).
Upon changing the parameter type, the query did an index seek and ran instantaneously. Must be the optimizer was getting messed up by the conversion.
This may not be the answer in this case, but something that's worth checking.

Try
declare #val1 nvarchar(40), #val2 nvarchar(40);
set #val1 = 'val1';
set #val2 = 'val2';
select
min(id)
from
scor_inv_binaries
where
col1 in (#val1, #val2)
group by
col1
OPTION (RECOMPILE)

What datatype is col1?
Your variables are nvarchar whereas your literals are varchar/char; if col1 is varchar/char it may be doing the index scan to implicitly cast each value in col1 to nvarchar for the comparison.

I guess first query is using predicate and second query is using seek predicate.
Seek Predicate is the operation that describes the b-tree portion of the Seek. Predicate is the operation that describes the additional filter using non-key columns. Based on the description, it is very clear that Seek Predicate is better than Predicate as it searches indexes whereas in Predicate, the search is on non-key columns – which implies that the search is on the data in page files itself.
For more details please visit:-
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/36a176c8-005e-4a7d-afc2-68071f33987a/predicate-and-seek-predicate

Index seek with coalesce

I have a table [MyTable] with a column [MyColumn] NVarchar(50). I have a nonclustered index on this column, now while running the below two queries:
SELECT 1
FROM [MyTable] M
WHERE M.[MyColumn] = #MyColumn
SELECT 1
FROM [MyTable] M
WHERE M.[MyColumn] = COALESCE(#MyColumn, M.[MyColumn] )
I noticed the first query is using Index Seek (NonClustered) and the second one is using Index Scan (Non Clustered). May I know how will I make use of index seek with coalesce or isnull ?

May I know how will I make use of
index seek with coalesce or isnull ?
Perhaps not an answer to your question but you can have two different queries. One for the case where #MyColumn is null and one for the case where you want to use #MyColumn in the where clause.
IF #MyColumn IS NULL
BEGIN
SELECT 1
FROM [MyTable] M
END
ELSE
BEGIN
SELECT 1
FROM [MyTable] M
WHERE M.[MyColumn] = #MyColumn
END

This isn't easy, since as Alex pointed out using the functions forces a scan, since the optimizer knows it needs to check every row.
What you CAN do is created a Computed Column for the result of your function, and index that column.
There's not really a prettier way to get a seek.
EDIT:
In rereading your question, this may not be an option for you unless you rethink your logic. You are integrating a variable into the function, and there is absolutely no way to index that.
EDIT 2:
Instead of your current logic, try something like:
...
WHERE (M.[MyColumn] = #MyColumn
OR #MyColumn IS NULL)

Using functions like COALESCE or ISNULL in the where clause is asking the server to search for the results of those functions - which are unknown until they are executed for every row in the resulting set, so there is no way for it to make use of an index.
To take full advantage of the index don't use functions in the WHERE clause, modify it with standard conditions e.g. WHERE MyColumn = #MyColumn OR #MyColumn IS NULL

I guess you will use this query in a more complex one, possibly withEXISTS:
EXISTS
( SELECT 1
FROM [MyTable] M
WHERE M.[MyColumn] = COALESCE(#MyColumn, M.[MyColumn] )
)
Try this instead:
EXISTS
( SELECT 1
FROM [MyTable] M
WHERE M.[MyColumn] = #MyColumn
)
OR EXISTS
( SELECT 1
FROM [MyTable] M
WHERE #MyColumn IS NULL
)
Or this one:
CASE WHEN #MyColumn IS NULL
THEN 1
ELSE
( SELECT 1
FROM [MyTable] M
WHERE M.[MyColumn] = #MyColumn
)
END

In the query with the coalesce clause the optimizer knows "MyColumn" is a range of values so it will decide to use a scan on the index. The only method to use a seek when there is a non null variable passed in is to code two stored procs and call the appropiate one via logic testing of the variable.
If you have a situation as simple as your example and you wish to use an Index Seek when the variable is NOT NULL then you should code the query as :
If #MyColumn is NULL
Begin
EXEC MyStoredProcWithMyColumn=Mycolumn
END
ELSE
Begin
EXEC MyStoredProcWithMyColumn=Variable #MyColumn
END
after creating two stored procedures one which returns the data using the where clause with the variable and one with the where cluase for the column equal to itself.

SQL Server return Rows that are not equal <> to a value and NULL

I have a table that has a column of values that can be rowTypeID = (1,2,3, or null) . I would like to write a query that returns any row that doesn't have a value of 3 in it. In this example I want all NULL rows along with all 1,2 rows I just don't want rows with the value of 3
Set ANSI null ON is currently set for the database.
I'm curious as to why I can't write
select * from myTable where myCol <> 3
This query will not return any rows that have NULL in the myCol column
I have to write
select * from my Table where myCol <> 3 or myCol Is NULL
Do I always have to include the IS NULL or can I set it up so a where clause myCol <>3 will return rows that have Null as value for my Col

I think your approach is fine:
SELECT *
FROM MyTable
WHERE myCol <> 3 OR myCol IS NULL
Since you are asking for alternatives, another way to do it is to make your column NOT NULL and store another (otherwised unused) value in the database instead of NULL - for example -1. Then the expression myCol <> 3 will match your fake-NULL just as it would with any other value.
SELECT *
FROM MyTable
WHERE myCol <> 3
However in general I would recommend not to use this approach. The way you are already doing it is the right way.
Also it might be worth mentioning that several other databases support IS DISTINCT FROM which does exactly what you want:
SELECT *
FROM MyTable
WHERE myCol IS DISTINCT FROM 3
MySQL has the NULL-safe equal which can also be used for this purpose:
SELECT *
FROM MyTable
WHERE NOT myCol <=> 3
Unfortunately SQL Server doesn't yet support either of these syntaxes.

You must handle the NULLs one way or another, since expressions involving NULL evaluate to Unknown. If you want, you could instead do:
select *
from MyTable
where isnull(MyColumn, -1) <> 3
But this involves a magic number (-1), and is arguably less readable than the original test for IS NULL.
Edit: and, as SQLMenace points out, is not SARGable.

Whenever you test for a value all NULLs are omitted – after all, you are testing whether the value in some column passes certain criteria and NULL is not a value.

Do I always have to include the IS NULL or can I set it up so a where clause myCol <>3 will return rows that have Null as value for my Col?
You always, always, always have to include is null.
Because 3 does not equal Not/Applicable and it does not equal Unkown.

because you can't compare NULL to anything else, NULL is not even equal to NULL
DECLARE #i INT
DECLARE #i2 INT
SELECT #i = NULL, #i2 = NULL
IF #i = #i2
PRINT 'equal'
ELSE
PRINT 'not equal'

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Comparing with null by converting value to a constant with isnull - sql-server

Related

What's faster (Field is not null or Field is not null) or (isnull(field,0) + isnull(field,0) != 0)?

Is it necessary to test for NULL if also testing for greater than?

Why is SQL Server using index scan instead of index seek when WHERE clause contains parameterized values

Index seek with coalesce

SQL Server return Rows that are not equal <> to a value and NULL

Categories

Resources