T-SQL Case statement strange behavior with newid() as randomness source - sql-server

I'm using SQL Server 2012.
If I do the following to get a list of random-ish numbers in the range [1,3], it works just fine:
SELECT TOP 100
ABS(CHECKSUM(NEWID()))%3 + 1 [value_of_rand]
FROM sys.objects
and I get nice things like this (all between 1 and 3).
3
2
2
2
1
....etc.
But if I then put the output of that same chained-random-value function into a CASE statement, it apparently does not produce only the values 1,2,3.
SELECT TOP 100
CASE (ABS(CHECKSUM(NEWID()))%3 + 1)
WHEN 1
THEN 'one'
WHEN 2
THEN 'two'
WHEN 3
THEN 'three'
ELSE
'that is strange'
END [value_of_case]
FROM sys.objects
It outputs:
three
that is strange
that is strange
one
two
...etc
What am I doing wrong here?

Your:
SELECT TOP 100
CASE (ABS(CHECKSUM(NEWID()))%3 + 1)
WHEN 1
THEN 'one'
WHEN 2
THEN 'two'
WHEN 3
THEN 'three'
ELSE
'that is strange'
END [value_of_case]
FROM sys.objects
Actually executed:
SELECT TOP 100
CASE
WHEN (ABS(CHECKSUM(NEWID()))%3 + 1) = 1 THEN 'one'
WHEN (ABS(CHECKSUM(NEWID()))%3 + 1) = 2 THEN 'two'
WHEN (ABS(CHECKSUM(NEWID()))%3 + 1) = 3 THEN 'three'
ELSE 'that is strange'
END [value_of_case]
FROM sys.objects;
Basically your expression is non-deterministic and each time is evaluated so you can end up with ELSE clause. So there is no bug or catch, just you use it with variable expression and it is perfectly normal behavior.
This is the same class like COALESCE syntactic-sugar
The COALESCE expression is a syntactic shortcut for the CASE
expression. That is, the code COALESCE(expression1,...n) is rewritten
by the query optimizer as the following CASE expression:
CASE
WHEN (expression1 IS NOT NULL) THEN expression1
WHEN (expression2 IS NOT NULL) THEN expression2
...
ELSE expressionN
END
This means that the input values (expression1, expression2,
expressionN, etc.) will be evaluated multiple times. Also, in
compliance with the SQL standard, a value expression that contains a
subquery is considered non-deterministic and the subquery is evaluated
twice. In either case, different results can be returned between the
first evaluation and subsequent evaluations.
EDIT:
Solution:
SqlFiddle
SELECT TOP 100
CASE t.col
WHEN 1 THEN 'one'
WHEN 2 THEN 'two'
WHEN 3 THEN 'three'
ELSE 'that is strange'
END [value_of_case]
FROM sys.objects
CROSS APPLY ( SELECT ABS(CHECKSUM(NEWID()))%3 + 1 ) AS t(col)

I think that the problem you are experiencing here is that (ABS(CHECKSUM(NEWID()))%3 + 1) isn't a value, it's an expression and SQL has the option of re-evaluating it whenever it wants to. You can try various syntactical things like removing the extra parenthesis or a CTE. That might make it go away (for now), but it might not, since logically it looks like the same request to the optimizer.
I think that the only sure-fire, supported way to stop this would be to save it first (to a temp table or a real one) and then use a second query to reference the saved values.

I cant tell you why, it is indeed strange, but I can give you a workaround. Select the random values into a cte before trying to use them
;with rndsrc(value_of_rand) as
(
SELECT TOP 100
ABS(CHECKSUM(NEWID()))%3 + 1
FROM sys.objects
)
SELECT TOP 100
CASE value_of_rand
WHEN 1
THEN 'one'
WHEN 2
THEN 'two'
WHEN 3
THEN 'three'
ELSE
'that is strange'
END [value_of_case]
from rndsrc
No more "that is strange"

Related

Using IN within a CASE statement without repeating input expression?

When using a CASE statement, I realise that as per the documentation I can either use a simple CASE expression, with an input_expression and pairs of when_expression and result_expression values (assume in these examples that values in myField are always positive integers):
CASE [input_expression]
WHEN [when_expression] THEN [result_expression]
...
ELSE [result_expression]
END
CASE myField
WHEN 1 THEN 'One'
WHEN 2 THEN 'Two'
WHEN 3 THEN 'Three'
ELSE 'More than three'
END
or I can use a searched CASE expression, with pairs of Boolean_expression and result_expression values:
CASE
WHEN [Boolean_expression] THEN [result_expression]
...
ELSE [result_expression]
END
CASE
WHEN myField = 1 THEN 'One'
WHEN myField = 2 THEN 'Two'
WHEN myField = 3 THEN 'Three'
ELSE 'More than three'
END
I also know that I can use the IN operator as part of any Boolean_expression when writing a searched CASE expression:
CASE
WHEN myField IN (1,2) THEN 'One or two'
WHEN myField = 3 THEN 'Three'
ELSE 'More than three'
END
What would be really useful would be to be able to use the IN operator when writing a simple CASE expression, like:
CASE myField
WHEN IN (1,2,3) THEN 'Between one and three'
WHEN IN (4,5,6) THEN 'Between four and six'
WHEN IN (7,8,9) THEN 'Between seven and nine'
ELSE 'More than nine'
END
But this will not run and gives the error:
Incorrect syntax near the keyword 'IN'.
Obviously my example above is contrived, but is there any way to write a simple CASE expression and also use the IN operator?
I understand from reading the syntax on MSDN that it appears this is not possible, because of how the simple CASE expression is formed - comparing the input_expression to each when_expression for equality - but I just wondered if there's any kind of way to achieve what I'm asking as it would be a handy to save writing the same field name over and over again within each WHEN clause of a searched CASE expression.
No, this is not possible. The documentation differentiates between the when_expression and the Boolean_expression. The when_expression always uses the equality operator.
Aaron Bertrand described this in his Dirty Secrets of the CASE expression that an expression like this
SELECT CASE #variable
WHEN 1 THEN 'foo'
WHEN 2 THEN 'bar'
END
evaluates to
SELECT
WHEN #variable = 1 THEN 'foo'
WHEN #variable = 2 THEN 'bar'
END
So there's no way to replace the comparison operator using the when_expression. This is only possible with the Boolean_expression like you already mentioned:
SELECT
WHEN #variable IN (1,2) THEN 'foo'
WHEN #variable in (3,4) THEN 'bar'
END

How to replace a value in array

I have a data like below.
id col1[]
--- ------
1 {1,2,3}
2 {3,4,5}
My question is how to use replace function in arrays.
select array_replace(col1, 1, 100) where id = 1;
but it gives an error like:
function array_replace(integer[], integer, integer) does not exist
can anyone suggest how to use it?
Your statement (augmented with the missing FROM clause):
SELECT array_replace(col1, 1, 100) FROM tbl WHERE id = 1;
As commented by #mu, array_replace() was introduced with pg 9.3. I see 3 options for older versions:
1. intarray
As long as ...
we are dealing with integer arrays.
elements are unique.
and the order of elements is irrelevant.
A simple and fast option would be to install the additional module intarray, which (among other things) provides operators to subtract and add elements (or whole arrays) from/to integer arrays:
SELECT CASE col1 && '{1}'::int[] THEN (col1 - 1) + 100 ELSE col1 END AS col1
FROM tbl WHERE id = 1;
2. Emulate with SQL functions
A (slower) drop-in replacement for array_replace() using polymorphic types, so it works for any base type:
CREATE OR REPLACE FUNCTION f_array_replace(anyarray, anyelement, anyelement)
RETURNS anyarray LANGUAGE SQL IMMUTABLE AS
'SELECT ARRAY (SELECT CASE WHEN x = $2 THEN $3 ELSE x END FROM unnest($1) x)';
Does not replace NULL values. Related:
Replace NULL values in an array in PostgreSQL
If you need to guarantee order of elements:
PostgreSQL unnest() with element number
3. Apply patch to source and recompile
Get the patch "Add array_remove() and array_replace() functions" from the git repo, apply it to the source of your version and recompile. May or may not apply cleanly. The older your version the worse are your chances. I have not tried that, I would rather upgrade to the current version.
You can create your own
based on this source :
CREATE TABLE arr(id int, col1 int[]);
INSERT INTO arr VALUES (1, '{1,2,3}');
INSERT INTO arr VALUES (2, '{3,4,5}');
SELECT array(
SELECT CASE WHEN q = 1 THEN 100 ELSE q END
FROM UNNEST(col1::int[]) q)
FROM arr;
array
-----------
{100,2,3}
{3,4,5}
You can create your own function and put it in your public schema if you still want to call by function though it will be slightly different than the original.
UPDATE tbl SET col1 = array_replace(col1, 1, 100) WHERE id = 1;
Here is the sample query for a test-array:
SELECT test_id,
test_array,
(array (
-- Replace existing value 'int' of an array with given value 'Text'
SELECT CASE WHEN a = '0' THEN 'MyEntry'
WHEN a = '1' THEN 'Apple'
WHEN a = '2' THEN 'Banana'
WHEN a = '3' THEN 'ChErRiEs'
WHEN a = '4' THEN 'Dragon Fruit'
WHEN a = '5' THEN 'Eat a Fruit in a Day'
ELSE 'NONE' END
FROM UNNEST(test_array::TEXT[]) a) ::TEXT
-- UNNEST : Lists out values of my_test_array
) test_result
FROM (
--my_test_array
SELECT 1 test_id, '{0,1,2,3,4,5,6,7,8,9}'::TEXT[][] test_array
) test;

CASE Statement SQL: Priority in cases?

I have a general question for when you are using a CASE statement in SQL (Server 2008), and more than one of your WHEN conditions are true but the resulting flag is to be different.
This is hypothetical example but may be transferable when applying checks across multiple columns to classify data in rows. The output of the code below is dependant on how the cases are ordered, as both are true.
DECLARE #TESTSTRING varchar(5)
SET #TESTSTRING = 'hello'
SELECT CASE
WHEN #TESTSTRING = 'hello' THEN '0'
WHEN #TESTSTRING <> 'hi' THEN '1'
ELSE 'N/A'
END AS [Output]
In general, would it be considered bad practice to create flags in this way? Would a WHERE, OR statement be better?
Case statements are guaranteed to be evaluated in the order they are written. The first matching value is used. So, for your example, the value 0 would be returned.
This is clearly described in the documentation:
Searched CASE expression:
Evaluates, in the order specified, Boolean_expression for each WHEN clause.
Returns result_expression of the first Boolean_expression that evaluates to TRUE.
If no Boolean_expression evaluates to TRUE, the Database Engine returns the else_result_expression if an ELSE clause is specified, or
a NULL value if no ELSE clause is specified.
As for whether this is good or bad practice, I would lean on the side of neutrality. This is ANSI behavior so you can depend on it, and in some cases it is quite useful:
select (case when val < 10 then 'Less than 10'
when val < 100 then 'Between 10 and 100'
when val < 1000 then 'Between 100 and 1000'
else 'More than 1000' -- or NULL
end) as MyGroup
To conclude further - SQL will stop reading the rest of the of the case/when statement when one of the WHEN clauses is TRUE. Example:
SELECT
CASE
WHEN 3 = 3 THEN 3
WHEN 4 = 4 THEN 4
ELSE NULL
END AS test
This statement returns 3 since this is the first WHEN clause to return a TRUE, even though the following statement is also a TRUE.

SQL Server CASE 1 WHEN 1, WHERE 1=1 AND 1=1

I inherited the following query from a previous application. I'm having a hard time understanding the "Case" in the "Select" and "Where" clause also.
SELECT J1.AC_CODE, J1.PERIOD, J1.JRNAL_NO, J1.DESCRIPTN, - J1.AMOUNT ,
J1.ANAL_T3,
CASE 1
WHEN 1 THEN 'A'
ELSE J1.ACCNT_CODE
END ,
J1.JRNAL_LINE
FROM dbo.JSource J1
WHERE 1=1
AND 1=1
AND NOT ('A' LIKE '%Z%'
AND J1.JRNAL_SRCE IN ('B/F',
'CLRDN')
AND J1.JRNAL_NO = 0)
AND CASE 1
WHEN 1 THEN 'A'
ELSE J1.AC_CODE
END ='A'
AND J1.AC_CODE='156320'
AND J1.PERIOD BETWEEN 2014001 AND 2014012
AND J1.ANAL_T3='ANAL001'
ORDER BY 1,2,3,4,5,6,7,8
I'm not sure If I understand the following clauses correctly:
1st Clause:
CASE 1
WHEN 1 THEN 'A'
ELSE J1.AC_CODE
END
I understood as: If column 1 is true, then choose literal A ortherwise choose J1.AC_CODE.
2nd clause:
WHERE 1=1
AND 1=1
AND NOT ('A' LIKE '%Z%'
AND J1.JRNAL_SRCE IN ('B/F',
'CLRDN')
AND J1.JRNAL_NO = 0)
AND CASE 1
WHEN 1 THEN 'A'
ELSE J1.AC_CODE
END ='A'
AND J1.AC_CODE='156320'
AND J1.PERIOD BETWEEN 2014001 AND 2014012
AND J1.ANAL_T3='ANAL001'
I'm totally lost with this "Where" clause.
Can you help explain this query and write a better version for this whole query?
I'm running this query on SQL Server 2008 (R2)
I understood as: If column 1 is true, then choose literal A ortherwise
choose J1.AC_CODE.
No, it is comparing the value 1 with the value 1 and if that is true the case returns an A and that is of course always true so the case statement will always return A.
Your where clause does not do anything at all.
1=1
AND 1=1
will always be true and the case will always be true and 'A' LIKE '%Z%' will always be false and that makes the entire AND NOT 'A' LIKE '%Z%' .... expression to always be true.
A simpler version of your query would look like this.
SELECT J1.AC_CODE,
J1.PERIOD,
J1.JRNAL_NO,
J1.DESCRIPTN,
- J1.AMOUNT,
J1.ANAL_T3,
'A',
J1.JRNAL_LINE
FROM dbo.JSource J1
WHERE J1.AC_CODE='156320' AND
J1.PERIOD BETWEEN 2014001 AND 2014012 AND
J1.ANAL_T3='ANAL001'
ORDER BY 1,2,3,4,5,6,7,8
Without knowing the history of this query, I am guessing that this was written with testing/debugging in mind and some of that code has been left in place. The case statement in the select line could (and I repeat could as this is my guess from looking at the query) have had other with clauses during creation of the query used for testing and these would have been switched between by changing the value after the CASE (example SELECT ..... CASE 1 WHEN 1 THEN 'A' WHEN 2 THEN 'some value' WHEN 3 'some other value' ELSE J1.ACCNT_CODE).
As for the where 1 = 1, I have seen this used during query creation/testing - mainly because it means each of the true conditions can easily be commented/uncommented or cut & pasted as the first where condition is always true. I've not seen AND 1 = 1 before. Not sure what that line was intended for, but I'd still think came about from testing/debugging and was not taken out the query.

T-SQL CASE Statement without overlapping criteria test

Since the following code prints 'First' and 'Second' INSERT that order, can I conclude that the first condition satisfied is ALWAYS executed?
DECLARE #Constant1 int = 1
DECLARE #Constant2 int = 2
select
case
when #Constant1 = 1
then 'First'
when #Constant1 = 1 and #Constant2 = 2
then 'Second'
end as Result
select
case
when #Constant1 = 1 and #Constant2 = 2
then 'Second'
when #Constant1 = 1
then 'First'
end as Result
I know that sometimes parallel processing effects the outcome and I was trying to understand IF this type of situation that I see in Production would always return the same result.
This question is intended to understand if there is a potential issue in production code. If I were going to write the code anew, I think I would try to make the code explicitly mutually exclusive..
select
case
when #Constant1 = 1 and #Constant2 != 2
then 'First'
when #Constant1 = 1 and #Constant2 = 2
then 'Second'
end as Result
The Documentation for CASE states.
Searched CASE expression:
Evaluates, in the order specified, Boolean_expression for each WHEN clause.
Returns result_expression of the first Boolean_expression that evaluates to TRUE.
If no Boolean_expression evaluates to TRUE, the Database Engine returns the else_result_expression if an ELSE clause is specified, or
a NULL value if no ELSE clause is specified.
So it will return the first true branch.
For a simple query such as in the question I would expect it to not evaluate the other branches either.
A few cases where this short circuiting behaviour does not work as expected/advertised are discussed in this DBA site question.
Does SQL Server read all of a COALESCE function even if the first argument is not NULL?
But just to be clear these issues do not affect the left to right precedence order of the result (except for the case when evaluating a later branch causes an error to occur such that no result is returned at all)

Resources