Unexpected output on SELECT data retrieval [duplicate] - sql-server

I'm confused. How could you explain this diffenece in variable concatenation with ORDER BY?
declare #tbl table (id int);
insert into #tbl values (1), (2), (3);
declare #msg1 varchar(100) = '', #msg2 varchar(100) = '',
#msg3 varchar(100) = '', #msg4 varchar(100) = '';
select #msg1 = #msg1 + cast(id as varchar) from #tbl
order by id;
select #msg2 = #msg2 + cast(id as varchar) from #tbl
order by id+id;
select #msg3 = #msg3 + cast(id as varchar) from #tbl
order by id+id desc;
select TOP(100) #msg4 = #msg4 + cast(id as varchar) from #tbl
order by id+id;
select
#msg1 as msg1,
#msg2 as msg2,
#msg3 as msg3,
#msg4 as msg4;
Results
msg1 msg2 msg3 msg4
---- ---- ---- ----
123 3 1 123

As many have confirmed, this is not the right way to concatenate all the rows in a column into a variable - even though in some cases it does "work". If you want to see some alternatives, please check out this blog.
According to MSDN (applies to SQL Server 2008 through 2014 and Azure SQL Database) , the SELECT should not be used to assign local variables. In the remarks, it describes how, when you do use the SELECT, it attempts to behave. The interesting points to note:
While typically it should only be used to return a single value to a variable, when the expression is the name of the column, it can return multiple values.
When the expression does return multiple values, the variable is assigned the last value that is returned.
If no value is returned, the variable retains its original value (not directly relevant here, but worth noting).
The first two points here are key - concatenation happens to work because SELECT #msg1 = #msg1 + cast(id as varchar) is essentially SELECT #msg1 += cast(id as varchar), and as the syntax notes, += is an accepted compound assignment operator on this expression. Please note here that it should not be expected this operation to continue to be supported on VARCHAR and to do string concatenation - just because it happens to work in some situations doesn't mean it is ok for production code.
The bottom line as to the underlying reason is whether the Compute Scalar that runs on the select expression uses the original id column or an expression of the id column. You probably can't find any docs on why the optimizer might choose the specific plans for each query, but each example highlights different use cases that allow the msg value to be evaluated from the column (and therefore multiple rows being returned and concatenated) or expression (and therefore only the last column).
#msg1 is '123' because the Compute Scalar (the row-by-row evaluation of the variable assignment) occurs after the Sort. This allows the scalar computation to return multiple values on the id column concatenating them through the += compound operator. I doubt there is specific documentation why, but it appears the optimizer chose to do the sort before the scalar computation because the order by was a column and not an expression.
#msg2 is '3' because the Compute Scalar is done before the sort, which leaves the #msg2 in each row just being the ('' + id) - so never concatenated, just the value of the id. Again, probably not any documentation why the optimizer chose this, but it appears that since the order by was an expression, perhaps it needed to do the (id+id) in the order by as part of the scalar computation before it could sort. At this point, your original column is no longer referencing the source column, but it has been replaced by an expression. Therefore, as MSDN stated, your first column points to an expression, not a column, so the behavior assigns the last value of the result set to the variable in the SELECT. Since you sorted ASC, you get '3' here.
#msg3 is '1' for the same reason as example 2, except you ordered DESC. Again, this becomes an expression in the evaluation - not the original column, so therefore the assignment gets the last value of the DESC order, so you get '1'.
#msg4 is '123' again because the TOP operation forces an initial scalar evaluation of the ORDER BY so that it can determine your top 100 records. This is different than examples 2 and 3 in which the scalar computation contained both the order by and select computations which caused each example to be an expression and not refer back to the original column. Example 4 has the TOP separating the ORDER BY and SELECT computations, so after the SORT (TOP N SORT) is applied, it then does the scalar computation for the SELECT columns in which at this point you are still referencing the original column (not an expression of the column), and therefore it returns multiple rows allowing the concatenation to occur.
Sources:
MSDN: https://msdn.microsoft.com/en-us/library/ms187330.aspx

SQL Server will calculate the results, then sort them, then return them. In the case of assigning a variable, only the first result will be used to populate your variable. You are receiving the first value from the sorted result sets, which can move around the order SQL Server will scan the records as well as the position within the results.
TOP will always produce special query plans as it immediately forces SQL Server to stick to the natural ordering of the results instead of producing query plans that would statistically reduce the number of records it must read.
To explain the differences, you'll have to refer to how SQL Server decided to implicitly sort the values to optimize the query.
Query 1
Insert -> Table Insert -> Constant Scan
Query 2
SELECT -> Compute Scalar -> Sort -> Table Scan
Query 3, and 4
SELECT -> Sort -> Compute Scalar -> Table Scan
Query 5 and 6 (using TOP)
SELECT -> Compute Scalar -> Sort (Top N) -> Compute Scalar -> Table
Scan
I added Query 6:
select top (100)
#msg5 = #msg5 + cast(id as varchar)
from #tbl
order by id+id desc

All I can see is there is a difference in the execution plans. They all start with SELECT and end with Table Scan. The difference is in between, the Compute Scalar and the Sort.
#Msg1 has Compute Scalar then Sort. Results: 123
#Msg2 has Sort then Compute Scalar. Results: 3
#Msg3 has Sort then Compute Scalar. Results: 1
The fourth one is different because of the top. It still starts with select and ends with table scan, but it's different in the middle. It uses a different sort.
#Msg4 has Compute Scalar then Sort(Top N Sort) then Compute Scalar

You're not supposed to set variables in a select that returns more than a single row. Consider this code:
select top 1 #msg1 = #msg1 + cast(id as varchar) from #tbl
order by id;
select top 1 #msg2 = #msg2 + cast(id as varchar) from #tbl
order by id+id;
select top 1 #msg3 = #msg3 + cast(id as varchar) from #tbl
order by id+id desc;
select top 1 #msg4 = #msg4 + cast(id as varchar) from #tbl
order by id+id;
Producing 1, 1, 3 and 1, respectively.
I'm pretty surprised it doesn't cause an exception, I was quite sure it used to forbid this outright.
The underlying point is still the same: the SQL engine isn't just executing some commands procedurally, one by one, as you might expect. It will build an execution plan that is tailored to be as efficient as possible (given many constraints).
On the other hand, assigning a variable is inherently procedural, and requires an explicit execution / evaluation order to work correctly.
You're combining the two approches - select id from #tbl order by id is a non-procedural query, but select #id = id from #tbl order by id is a mix of the procedural #id = id, and the very much non-procedural select.

Related

Scalar function with WHILE loop to inline function

Hi I have a view which is used in lots of search queries in my application.
The issue is application queries which use this view is is running very slow.I am investigating this and i found out a particular portion on the view definition which is making it slow.
create view Demoview AS
Select
p.Id as Id,
----------,
STUFF((SELECT ',' + [dbo].[OnlyAlphaNum](colDesc)
FROM dbo.ContactInfoDetails cd
WHERE pp.FormId = f.Id AND ppc.PageId = pp.Id
FOR XML PATH('')), 1, 1, '') AS PhoneNumber,
p.FirstName as Fname,
From
---
This is one of the column in the view.
The scalar function [OnlyAlphaNum] is making it slow,as it stops parallel execution of the query.
The function is as below;
CREATE FUNCTION [dbo].[OnlyAlphaNum]
(
#String VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
WITH SCHEMABINDING
AS
BEGIN
WHILE PATINDEX('%[^A-Z0-9]%', #String) > 0
SET #String = STUFF(#String, PATINDEX('%[^A-Z0-9]%', #String), 1, '')
RETURN #String
END
How can i convert it into an inline function.?
I tried with CASE ,but not successful.I have read that CTE is a good option.
Any idea how to tackle this problem.?
I already did this; you can read more about it here.
The function:
CREATE FUNCTION dbo.alphaNumericOnly8K(#pString varchar(8000))
RETURNS TABLE WITH SCHEMABINDING AS RETURN
/****************************************************************************************
Purpose:
Given a varchar(8000) string or smaller, this function strips all but the alphanumeric
characters that exist in #pString.
Compatibility:
SQL Server 2008+, Azure SQL Database, Azure SQL Data Warehouse & Parallel Data Warehouse
Parameters:
#pString = varchar(8000); Input string to be cleaned
Returns:
AlphaNumericOnly - varchar(8000)
Syntax:
--===== Autonomous
SELECT ca.AlphaNumericOnly
FROM dbo.AlphaNumericOnly(#pString) ca;
--===== CROSS APPLY example
SELECT ca.AlphaNumericOnly
FROM dbo.SomeTable st
CROSS APPLY dbo.AlphaNumericOnly(st.SomeVarcharCol) ca;
Programmer's Notes:
1. Based on Jeff Moden/Eirikur Eiriksson's DigitsOnlyEE function. For more details see:
http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx#bm1629360
2. This is an iTVF (Inline Table Valued Function) that performs the same task as a
scalar user defined function (UDF) accept that it requires the APPLY table operator.
Note the usage examples below and see this article for more details:
http://www.sqlservercentral.com/articles/T-SQL/91724/
The function will be slightly more complicated to use than a scalar UDF but will yeild
much better performance. For example - unlike a scalar UDF, this function does not
restrict the query optimizer's ability generate a parallel query plan. Initial testing
showed that the function generally gets a
3. AlphaNumericOnly runs 2-4 times faster when using make_parallel() (provided that you
have two or more logical CPU's and MAXDOP is not set to 1 on your SQL Instance).
4. This is an iTVF (Inline Table Valued Function) that will be used as an iSF (Inline
Scalar Function) in that it returns a single value in the returned table and should
normally be used in the FROM clause as with any other iTVF.
5. CHECKSUM returns an INT and will return the exact number given if given an INT to
begin with. It's also faster than a CAST or CONVERT and is used as a performance
enhancer by changing the bigint of ROW_NUMBER() to a more appropriately sized INT.
6. Another performance enhancement is using a WHERE clause calculation to prevent
the relatively expensive XML PATH concatentation of empty strings normally
determined by a CASE statement in the XML "loop".
7. Note that AlphaNumericOnly returns an nvarchar(max) value. If you are returning small
numbers consider casting or converting yout values to a numeric data type if you are
inserting the return value into a new table or using it for joins or comparison
purposes.
8. AlphaNumericOnly is deterministic; for more about deterministic and nondeterministic
functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx
Usage Examples:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM samd.alphaNumericOnly8K('xxx123abc999!!!') ao;
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY samd.alphaNumericOnly8K(st.txt);
---------------------------------------------------------------------------------------
Revision History:
Rev 00 - 20150526 - Inital Creation - Alan Burstein
Rev 00 - 20150526 - 3rd line in WHERE clause to correct something that was missed
- Eirikur Eiriksson
Rev 01 - 20180624 - ADDED ORDER BY N; now performing CHECKSUM conversion to INT inside
the final cte (digitsonly) so that ORDER BY N does not get sorted.
****************************************************************************************/
WITH
E1(N) AS
(
SELECT N
FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))x(N)
),
iTally(N) AS
(
SELECT TOP (LEN(ISNULL(#pString,CHAR(32)))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E1 a CROSS JOIN E1 b CROSS JOIN E1 c CROSS JOIN E1 d
)
SELECT AlphaNumericOnly =
(
SELECT SUBSTRING(#pString,CHECKSUM(N),1)
FROM iTally
WHERE
((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 48) & 0x7FFF) < 10
OR ((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 65) & 0x7FFF) < 26
OR ((ASCII(SUBSTRING(#pString,CHECKSUM(N),1)) - 97) & 0x7FFF) < 26
ORDER BY N
FOR XML PATH('')
);
Note the examples in the code comments:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM samd.alphaNumericOnly8K('xxx123abc999!!!') ao;
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY samd.alphaNumericOnly8K(st.txt);
Returns:
AlphaNumericOnly
-------------------
xxx123abc999
txtID OldTxt AlphaNumericOnly
----------- ------------- -----------------
1 !!!A555A!!! A555A
2 NULL NULL
3 AAA.999 AAA999
It's the fastest of it's kind. It runs extra fast with a parallel execution plan. To force a parallel execution plan, grab a copy of make_parallel by Adam Machanic. Then you would run it like this:
--===== 1. Basic use against a literal
SELECT ao.AlphaNumericOnly
FROM dbo.alphaNumericOnly8K('xxx123abc999!!!') ao
CROSS APPLY dbo.make_parallel();
--===== 2. Against a table
DECLARE #sampleTxt TABLE (txtID int identity, txt varchar(100));
INSERT #sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
SELECT txtID, OldTxt = txt, AlphaNumericOnly
FROM #sampleTxt st
CROSS APPLY dbo.alphaNumericOnly8K(st.txt)
CROSS APPLY dbo.make_parallel();
Surely there is scope to improve this. test it out.
;WITH CTE AS (
SELECT (CASE WHEN PATINDEX('%[^A-Z0-9]%', D.Name) > 0
THEN STUFF(D.Name, PATINDEX('%[^A-Z0-9]%', D.Name), 1, '')
ELSE D.NAME
END ) NameString
FROM #dept D
UNION ALL
SELECT STUFF(C.NameString, PATINDEX('%[^A-Z0-9]%', C.NameString), 1, '')
FROM CTE C
WHERE PATINDEX('%[^A-Z0-9]%', C.NameString) > 0
)
Select STUFF((SELECT ',' + E.NameString from CTE E
WHERE PATINDEX('%[^A-Z0-9]%', E.NameString) = 0
FOR XML PATH('')), 1, 1, '') AS NAME

MS SQL Server - Use Calculated Field of SELECT statement

i would like to ask you if there is a statement to use calculated fields of the same SELECT-statement:
For example:
Table Test:
Machine Amount Value
500 20 20
SELECT Machine,
Amount*Value AS TestFormula
TestFormula*12 AS TestFormulaYear
FROM Test
What is the correct statement to reuse this calculated field?
Thanks in advance,
Kevin
In sql server at least, you can do it with a subquery:
SELECT Machine
, TestFormula
, TestFormula*12 AS TestFormulaYear
FROM (
SELECT Machine
, Amount*Value AS TestFormula
FROM Test
) T
For the simple example you showed us, I would just recommend repeating the expression
SELECT
Machine,
Amount*Value AS TestFormula,
Amount*Value*12 AS TestFormulaYear
FROM Test;
Other answers have already shown how you can use a subquery to truly reuse the column, but that is not very performant compared to what I wrote above.
You can use a common-table expression (CTE) to reuse the value:
WITH formula AS (
SELECT Machine,
Amount*Value AS TestFormula
FROM Test
)
SELECT Machine,
TestFormula
TestFormula*12 AS TestFormulaYear
FROM formula;
If the batch with the CTE contains multiple statements, the preceding statement must be terminated with a semicolon.
Assuming this is T-SQL:
You can't reference the alias of a column in the SELECT statement, no. If you look at SELECT (Transact-SQL) you'll note that the SELECT is the 8th part of the query to be processed. This means only ORDER BY is going to be able to reference a column's alias.
If you need to do further calculations on a calculated value you need to use a CTE, subquery, or redeclare the calculation. For example:
Repeated calculation:
SELECT [Column] * 10 As Expression,
[Column] * 10 * 5 AS Expression2
FROM [Table];
CTE:
WITH Formula AS(
SELECT [Column] * 10 As Expression
FROM [Table])
SELECT Expression,
Expression * 5 AS Expression2
FROM Formula;
Sub Query:
SELECT Expression,
Expression * 5 AS Expression2
FROM (SELECT [Column] * 10 As Expression
FROM [Table]) Formula;
If you are looking to set up a Statement so that when formulas are changed many columns will be updated, I suppose you could declare the formulas and use Dynamic SQL. There can be an advantage to this if you want to be sure that lots of columns are updated correctly:
Declare #TestFormula as nvarchar(100) = '([Amount]*[Value])'
Declare #TestFormulaYear as nvarchar(100) = '(12*' + #TestFormula + ')'
declare #sql as nvarchar(max)
set #sql = 'SELECT [Machine], ' + #TestFormula + ' AS TestFormula, ' + #TestFormulaYear + ' AS TestFormulaYear
FROM (values(500, 20, 20)) a([Machine], [Amount], [Value])'
exec(#sql)

Why is SQL Server using index scan instead of index seek when WHERE clause contains parameterized values

We have found that SQL Server is using an index scan instead of an index seek if the where clause contains parametrized values instead of string literal.
Following is an example:
SQL Server performs index scan in following case (parameters in where clause)
declare #val1 nvarchar(40), #val2 nvarchar(40);
set #val1 = 'val1';
set #val2 = 'val2';
select
min(id)
from
scor_inv_binaries
where
col1 in (#val1, #val2)
group by
col1
On the other hand, the following query performs an index seek:
select
min(id)
from
scor_inv_binaries
where
col1 in ('val1', 'val2')
group by
col1
Has any one observed similar behavior, and how they have fixed this to ensure that query performs index seek instead of index scan?
We are not able to use forceseek table hint, because forceseek is supported on SQL Sserver 2005.
I have updated the statistics as well.
Thank you very much for help.
Well to answer your question why SQL Server is doing this, the answer is that the query is not compiled in a logical order, each statement is compiled on it's own merit,
so when the query plan for your select statement is being generated, the optimiser does not know that #val1 and #Val2 will become 'val1' and 'val2' respectively.
When SQL Server does not know the value, it has to make a best guess about how many times that variable will appear in the table, which can sometimes lead to sub-optimal plans. My main point is that the same query with different values can generate different plans. Imagine this simple example:
IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL
DROP TABLE #T;
CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL);
INSERT #T (Val)
SELECT TOP 991 1
FROM sys.all_objects a
UNION ALL
SELECT TOP 9 ROW_NUMBER() OVER(ORDER BY a.object_id) + 1
FROM sys.all_objects a;
CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val);
All I have done here is create a simple table, and add 1000 rows with values 1-10 for the column val, however 1 appears 991 times, and the other 9 only appear once. The premise being this query:
SELECT COUNT(Filler)
FROM #T
WHERE Val = 1;
Would be more efficient to just scan the entire table, than use the index for a seek, then do 991 bookmark lookups to get the value for Filler, however with only 1 row the following query:
SELECT COUNT(Filler)
FROM #T
WHERE Val = 2;
will be more efficient to do an index seek, and a single bookmark lookup to get the value for Filler (and running these two queries will ratify this)
I am pretty certain the cut off for a seek and bookmark lookup actually varies depending on the situation, but it is fairly low. Using the example table, with a bit of trial and error, I found that I needed the Val column to have 38 rows with the value 2 before the optimiser went for a full table scan over an index seek and bookmark lookup:
IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL
DROP TABLE #T;
DECLARE #I INT = 38;
CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL);
INSERT #T (Val)
SELECT TOP (991 - #i) 1
FROM sys.all_objects a
UNION ALL
SELECT TOP (#i) 2
FROM sys.all_objects a
UNION ALL
SELECT TOP 8 ROW_NUMBER() OVER(ORDER BY a.object_id) + 2
FROM sys.all_objects a;
CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val);
SELECT COUNT(Filler), COUNT(*)
FROM #T
WHERE Val = 2;
So for this example the limit is 3.7% of matching rows.
Since the query does not know the how many rows will match when you are using a variable it has to guess, and the simplest way is by finding out the total number rows, and dividing this by the total number of distinct values in the column, so in this example the estimated number of rows for WHERE val = #Val is 1000 / 10 = 100, The actual algorithm is more complex than this, but for example's sake this will do. So when we look at the execution plan for:
DECLARE #i INT = 2;
SELECT COUNT(Filler)
FROM #T
WHERE Val = #i;
We can see here (with the original data) that the estimated number of rows is 100, but the actual rows is 1. From the previous steps we know that with more than 38 rows the optimiser will opt for a clustered index scan over an index seek, so since the best guess for the number of rows is higher than this, the plan for an unknown variable is a clustered index scan.
Just to further prove the theory, if we create the table with 1000 rows of numbers 1-27 evenly distributed (so the estimated row count will be approximately 1000 / 27 = 37.037)
IF OBJECT_ID(N'tempdb..#T', 'U') IS NOT NULL
DROP TABLE #T;
CREATE TABLE #T (ID INT IDENTITY PRIMARY KEY, Val INT NOT NULL, Filler CHAR(1000) NULL);
INSERT #T (Val)
SELECT TOP 27 ROW_NUMBER() OVER(ORDER BY a.object_id)
FROM sys.all_objects a;
INSERT #T (val)
SELECT TOP 973 t1.Val
FROM #T AS t1
CROSS JOIN #T AS t2
CROSS JOIN #T AS t3
ORDER BY t2.Val, t3.Val;
CREATE NONCLUSTERED INDEX IX_T__Val ON #T (Val);
Then run the query again, we get a plan with an index seek:
DECLARE #i INT = 2;
SELECT COUNT(Filler)
FROM #T
WHERE Val = #i;
So hopefully that pretty comprehensively covers why you get that plan. Now I suppose the next question is how do you force a different plan, and the answer is, to use the query hint OPTION (RECOMPILE), to force the query to compile at execution time when the value of the parameter is known. Reverting to the original data, where the best plan for Val = 2 is a lookup, but using a variable yields a plan with an index scan, we can run:
DECLARE #i INT = 2;
SELECT COUNT(Filler)
FROM #T
WHERE Val = #i;
GO
DECLARE #i INT = 2;
SELECT COUNT(Filler)
FROM #T
WHERE Val = #i
OPTION (RECOMPILE);
We can see that the latter uses the index seek and key lookup because it has checked the value of variable at execution time, and the most appropriate plan for that specific value is chosen. The trouble with OPTION (RECOMPILE) is that means you can't take advantage of cached query plans, so there is an additional cost of compiling the query each time.
I had this exact problem and none of query option solutions seemed to have any effect.
Turned out I was declaring an nvarchar(8) as the parameter and the table had a column of varchar(8).
Upon changing the parameter type, the query did an index seek and ran instantaneously. Must be the optimizer was getting messed up by the conversion.
This may not be the answer in this case, but something that's worth checking.
Try
declare #val1 nvarchar(40), #val2 nvarchar(40);
set #val1 = 'val1';
set #val2 = 'val2';
select
min(id)
from
scor_inv_binaries
where
col1 in (#val1, #val2)
group by
col1
OPTION (RECOMPILE)
What datatype is col1?
Your variables are nvarchar whereas your literals are varchar/char; if col1 is varchar/char it may be doing the index scan to implicitly cast each value in col1 to nvarchar for the comparison.
I guess first query is using predicate and second query is using seek predicate.
Seek Predicate is the operation that describes the b-tree portion of the Seek. Predicate is the operation that describes the additional filter using non-key columns. Based on the description, it is very clear that Seek Predicate is better than Predicate as it searches indexes whereas in Predicate, the search is on non-key columns – which implies that the search is on the data in page files itself.
For more details please visit:-
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/36a176c8-005e-4a7d-afc2-68071f33987a/predicate-and-seek-predicate

CASE in sql server -implementation Clarification?

I have this simple table in sql server :
DECLARE #tbl table( a int , b NVARCHAR(100), isCalcByA bit)
INSERT INTO #tbl
SELECT 1,'c',1
UNION ALL
SELECT 2,'d',0
Ok.
If I run this :
SELECT CASE
WHEN isCalcByA = 1 THEN a
ELSE b
END FROM #tbl
It yields an error :
Msg 245, Level 16, State 1, Line 9
Conversion failed when converting the nvarchar value 'd' to data type int.
I can understand why it is happening :
Because the data which is being accumulated (to be displayed) can't attach both int and string at the same column .
Ok
But what about this :
SELECT 'dummy'
FROM #tbl
WHERE CASE
WHEN isCalcByA = 1 THEN a
ELSE b
END IS NOT NULL
Here -
I always display string
I don't accumulate different displaying results of different types.
I'm checking them against not null rather than a string or int value.
But still I get the same error .
What am I missing ?
NB
I know I can/should do this :
SELECT 'dummy'
FROM #tbl
WHERE
(isCalcByA = 1 AND a IS NOT NULL)
OR
(isCalcByA <> 1 AND b IS NOT NULL)
(which works fine)
But I'm asking why it is not working in the first CASE situation
CASE is an expression - it returns a value of a specific type. All possible values it might return must all be convertible to some common type. The system uses the type precedences rules to consider the types of all possible return values and decide what that common type is. int has higher precedence and wins.
CASE
WHEN isCalcByA = 1 THEN CONVERT(nvarchar(100),a)
ELSE b
END
would work because now the common type selected is unambiguously nvarchar(100).
No matter if you use CASE in the SELECT or the WHERE clause. CASE expressions should return the same datatype always. So, convert both columns to a datatype that can hold both:
CASE
WHEN isCalcByA = 1 THEN CAST(a AS NVARCHAR(100))
ELSE b
END
From the CASE expression documentation:
Returns the highest precedence type from the set of types in result_expressions and the optional else_result_expression.
When the various WHEN and the ELSE part have different datatypes as results, the highest precedence is chosen from this list: Data Type Precedence and all results are converted to that datatype.
Your queries fail because int has higher precedence than nvarchar.
DECLARE #tbl table( a int , b NVARCHAR(100), isCalcByA bit)
INSERT INTO #tbl
SELECT 1,'c',1
UNION ALL
SELECT 2,'d',0
UNION ALL
SELECT null,'d',1
SELECT CASE
WHEN isCalcByA = 1 THEN CAST(a AS VARCHAR(30))
ELSE b
END FROM #tbl
Above, you are selecting two different data types based on the select.
SELECT 'dummy'
FROM #tbl
WHERE CASE
WHEN isCalcByA = 1 THEN CAST(a AS VARCHAR(30))
ELSE b
END IS NOT NULL
Above, you are selecting 'dummy' every time, regardless of condition.
So, in the first statement, you are setting the return type based on the case and the case can return two different types. In the second query, the return type is always the same type.
Don't think about CASE like it is built-in IF from regular language. It's more like ... ? ... : ... operator with strong typing - it has to result in a specific singular type. If you want to mix columns you need to cast it to for example nvarchar.
You can also think about it like the result of SELECT should be possible to be defined by CREATE TABLE.

Comparing with null by converting value to a constant with isnull

One of our programmers tends to use isnull in MS SQL to compare with NULLs.
That is instead of writing Where ColumnName Is Null he writes Where IsNull(ColumnName,0)=0
I think that the optimizer will convert the latter into the former anyway, but if it does not - is there a way to prove that the latter is less effective, since it
1.Compares with null,
2.Converts to integer,
3.Compares 2 integers
instead of just comparing with null.
Both ways are really fast for me to be able to use the execution plans (and also I think, that the optimizer plays its part). Is there a way to prove him that just comparing with Null without IsNull is more effective (unless it's not).
Another obvious issue is the ISNULL precludes the use of indexes.
Run this setup:
create table T1 (
ID int not null primary key,
Column1 int null
)
go
create index IX_T1 on T1(Column1)
go
declare #i int
set #i = 10000
while #i > 0
begin
insert into T1 (ID,Column1) values (#i,CASE WHEN #i%1000=0 THEN NULL ELSE #i%1000 END)
set #i = #i - 1
end
go
Then turn on execution plans and run the following:
select * from T1 where Column1 is null
select * from T1 where ISNULL(Column1,0)=0
The first uses an index seek (using IX_T1) and is quite efficient. The second uses an index scan on the clustered index - it has to look at every row in the table.
On my machine, the second query took 90% of the time, the first 10%.
IsNull is not well used if you are using it in the where clause and comparing it to 0, the use of isnull is to replace the value null
http://msdn.microsoft.com/en-us/library/ms184325.aspx
For example:
SELECT Description, DiscountPct, MinQty, ISNULL(MaxQty, 0.00) AS 'Max Quantity'
FROM Sales.SpecialOffer;

Resources