SQL Server CHOOSE() function behaving unexpectedly with RAND() function - sql-server

I've encountered an interesting SQL server behaviour while trying to generate random values in T-sql using RAND and CHOOSE functions.
My goal was to try to return one of two given values using RAND() as rng. Pretty easy right?
For those of you who don't know it, CHOOSE function accepts in an index number(int) along with a collection of values and returns a value at specified index. Pretty straightforward.
At first attempt my SQL looked like this:
select choose(ceiling((rand()*2)) ,'a','b')
To my surprise, this expression returned one of three values: null, 'a' or 'b'. Since I didn't expect the null value i started digging. RAND() function returns a float in range from 0(included) to 1 (excluded). Since I'm multiplying it by 2, it should return values anywhere in range from 0(included) to 2 (excluded). Therefore after use of CEILING function final value should be one of: 0,1,2. After realising that i extended the value list by 'c' to check whether that'd be perhaps returned. I also checked the docs page of CEILING and learnt that:
Return values have the same type as numeric_expression.
I assumed the CEILINGfunction returned int, but in this case would mean that the value is implicitly cast to int before being used in CHOOSE, which sure enough is stated on the docs page:
If the provided index value has a numeric data type other than int,
then the value is implicitly converted to an integer.
Just in case I added an explicit cast. My SQL query looks like this now:
select choose(cast(ceiling((rand()*2)) as int) ,'a','b','c')
However, the result set didn't change. To check which values cause the problem I tried generating the value beforehand and selecting it alongside the CHOOSE result. It looked like this:
declare #int int = cast(ceiling((rand()*2)) as int)
select #int,choose( #int,'a','b','c')
Interestingly enough, now the result set changed to (1,a), (2,b) which was my original goal. After delving deeper in the CHOOSE docs page and some testing i learned that 'null' is returned in one of two cases:
Given index is a null
Given index is out of range
In this case that would mean that index value when generated inside the SELECT statement is either 0 or above 2/3 (I'm assuming that negative numbers are not possible here and CHOOSE function indexes from 1). As I've stated before 0 should be one of possibilities of:
ceiling((rand()*2))
,but for some reason it's never 0 (at least when i tried it 1 million+ times like this)
set nocount on
declare #test table(ceiling_rand int)
declare #counter int = 0
while #counter<1000000
begin
insert into #test
select ceiling((rand()*2))
set #counter=#counter+1
end
select distinct ceiling_rand from #test
Therefore I assume that the value generated in SELECT is greater than 2/3 or NULL. Why would it be like this only when generated in SELECT statement? Perhaps order of resolving CAST, CELING or RAND inside SELECT is different than it would seem? It's true I've only tried it a limited number of times, but at this point the chances of it being a statistical fluctuation are extremely small. Is it somehow a floating-point error? I truly am stumbled and looking forward to any explanation.
TL;DR: When generating a random number inside a SELECT statement result set of possible values is different then when it's generated before the SELECT statement.
Cheers,
NFSU
EDIT: Formatting

You can see what's going on if you look at the execution plan.
SET SHOWPLAN_TEXT ON
GO
SELECT (select choose(ceiling((rand()*2)) ,'a','b'))
Returns
|--Constant Scan(VALUES:((CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(2.0000000000000000e+000)),0)=(1) THEN 'a' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(2.0000000000000000e+000)),0)=(2) THEN 'b' ELSE NULL END END)))
The CHOOSE is expanded out to
SELECT CASE
WHEN ceiling(( rand() * 2 )) = 1 THEN 'a'
ELSE
CASE
WHEN ceiling(( rand() * 2 )) = 2 THEN 'b'
ELSE NULL
END
END
and rand() is referenced twice. Each evaluation can return a different result.
You will get the same problem with the below rewrite being expanded out too
SELECT CASE ceiling(( rand() * 2 ))
WHEN 1 THEN 'a'
WHEN 2 THEN 'b'
END
Avoid CASE for this and any of its variants.
One method would be
SELECT JSON_VALUE ( '["a", "b"]' , CONCAT('$[', FLOOR(rand()*2) ,']') )

Related

Passing value in a function without quote T-SQL / SQL Server 2012?

I need assistance with a function in SQL Server 2012 that I created to check for the input value. If the functions detects a numeric - return 0, if detect character return 1.
But I get 2 different result for the same number passing it with quote and without quote.
select dbo.IS_ALIEN('56789')
returns 0
select dbo.IS_ALIEN(56789)
returns 1 (I need to return 0)
This is my function:
ALTER FUNCTION [dbo].[IS_ALIEN]
(#alienNAIC CHAR(1))
RETURNS NUMERIC(10,0)
AS
BEGIN
DECLARE #nNum NUMERIC(1,0);
BEGIN
SET #NnUM = ISNUMERIC(#alienNAIC)
END
BEGIN
IF #nNum = 1
RETURN 0;
END
RETURN 1;
END
Same concept for:
select dbo.IS_ALIEN('AA-11990043')
returns 1
or
select dbo.IS_ALIEN(NULL)
returns 1 (I need it to return 0)
I'm using Oracle function reference (below code is just reference from old database):
create or replace FUNCTION "IS_ALIEN"
( alienNAIC IN char )
RETURN NUMBER IS
nNum number;
BEGIN
SELECT MOD(alienNAIC, 2) into nNum FROM dual;
return 0;
EXCEPTION
WHEN INVALID_NUMBER THEN
return 1;
END;
But T-SQL function doesn't allow make exception of error. So I try to converted as much closer.
I suggest you use something like this (I've trimmed it down somewhat):
ALTER FUNCTION [dbo].[IS_ALIEN](#alienNAIC NVARCHAR(10))
RETURNS INT -- NOTE: You could also return tinyint or bit
AS
BEGIN
IF ISNUMERIC(#alienNAIC) = 1
RETURN 0;
RETURN 1;
END
The trouble with what you were trying is that there's an implicit cast to CHAR(1), the result of which is definitely not numeric as #Joel pointed out:
SELECT CAST(0 As CHAR(1)) -- returns character '0', ISNUMERIC(0) = 1
SELECT CAST(9 As CHAR(1)) -- returns character '9', ISNUMERIC(0) = 1
SELECT CAST(12345 As CHAR(1)) -- any number over 9 returns character '*', ISNUMERIC(12345) = 0
It's an odd implicit casting case I hadn't seen before. By making the parameter an NVARCHAR (assumes possible future double-byte input), strings will be correctly checked and integers passed in will be implicitly cast as NVARCHAR, and the ISNUMERIC check will succeed.
EDIT
Re-reading the question and comments, it looks like you want to identify a particular string syntax to determine if something is an "alien" or not. If you're comfortable changing business logic to fix what apparently is a poor legacy implementation, you could consider something like this instead:
ALTER FUNCTION [dbo].[temp](#alienNAIC NVARCHAR(10))
RETURNS INT -- NOTE: You could also return tinyint or bit
AS
BEGIN
IF #alienNAIC like 'AA-%' AND ISNUMERIC(RIGHT(#alienNAIC, LEN(#alienNAIC) - 3)) = 1
RETURN 1; -- notice this is now 1 instead of 0, we're doing a specific check for 'AA-nnnnn...'
RETURN 0;
END
Note that this should be thoroughly tested against legacy data if it's ever to interact with it - you never know what rubbish data a poorly written legacy system has left behind. Fixing this could well break other things. If you do make this change, document it well.
If you need to check just the first character then you can do like that:
CREATE FUNCTION [dbo].[IS_ALIEN]
(#alienNAIC VARCHAR(200))
RETURNS TINYINT
AS
BEGIN
IF LEFT(#alienNAIC,1) BETWEEN '0' AND '9' RETURN 1;
RETURN 0
END
GO
It seems like you are trying to check whether a string starts with a non-numeric character. Such pattern matches can be performed using LIKE, eg
declare #var nvarchar(10)='A56789'
select
case when #var LIKE '[0-9]%'
then 0 else 1
end AS IsAlien
Returns
1
Both declare #var nvarchar(10)=56789 and declare #var int=56789 return 0 because the number is implicitly converted to a string.
The expression is so short that you probably don't need to convert it to a function. If you do, it could be as simple as :
create FUNCTION [dbo].[IS_ALIEN] (#alienNAIC varchar(200))
RETURNS INT
begin
return case when #alienNAIC LIKE '[0-9]%'
then 0 else 1
end
end
If you want to perform the check in a WHERE clause, just use LIKE, not any function, eg:
WHERE alienNAIC NOT LIKE '[0-9]%'
This particular pattern is just a range search that covers all values between 0 and 9....... The server can use an index that covers the text column to quickly identify the matches. It can't do so when it has to check the result of a function. It will have to calculate the value for every single row before filtering.

Unusual SQL Server query with "select top 1 #arastr = k"

select top 1 #arastr = k
from #m
where datalength(k) = (select max(datalength(k)) from #m)
What does this query do, and what is the point of select top 1 #arastr = k? This query is taken from a stored procedure which has been working for 7-8 years, so there is nothing wrong with the query, but I could not understand what it does.
(#m is a temp table which is created in the early part of the query.)
The query select one random value (since top is used without an order by clause) from the column k in the temporary table #m and assigns it to a variable #arastr (which has been previously declared supposedly). The string selected will be any matching the longest (measured in bytes (by the datalength function)) string in the table.
This is a quite common (but a little old fashioned) way to get the value of k into the (previously declared!) variable #arastr for later usage.
The function DATALENGTH will measure the length of e.g. a VARCHAR.
With TOP 1 you geht in any case only one result row, the one with the "longest" k, it's value is in #arastr afterwards...
EDIT: As pointed out by #jpw this will be random, if there is more than one k with the same (longest) length.
Without knowing, what #m looks like and what kind of data is in 'k' I cannot tell you any more.
probably makes more sense if it looks like this
SET #arastr = (SELECT TOP 1 k
FROM #m
WHERE DATALENGTH(k) = (SELECT MAX(DATALENGTH(k)) FROM #m))

Select rows that can't be casted

In a column of my table are stored the number of the house address.
Unfortunately my previous colleagues were not a fan of thinking so they made the column of type varchar and did not block input on the software... so now I'm stuck with a bunch of rows where the number of house/apartment is "N.I.", "Not Info", "Unknown", etc. instead of a meaningful number...
I would like to select only the rows that are not numbers... something like select * from table where CAST(column as int) throws exception
Take a look at IsNumeric, IsInt, IsNumber, you can't use just isnumeric it will return true for - signs and other stuff like that
For example, this returns 1
SELECT ISNUMERIC('2d5'),
ISNUMERIC('+')
select * from table where ISNumeric(column)=0
but it may give false positives .....
Try this:
SELECT * FROM table WHERE ISNUMERIC(column + 'e0') = 0
Create a function that that tries the cast and any other logic that you needs then return 0 if the value doesn't meet your requirements if it succeeds return 1. then use the function in the where clause

Why does SUM(...) on an empty recordset return NULL instead of 0?

I understand why null + 1 or (1 + null) returns null: null means "unknown value", and if a value is unknown, its successor is unknown as well. The same is true for most other operations involving null.[*]
However, I don't understand why the following happens:
SELECT SUM(someNotNullableIntegerField) FROM someTable WHERE 1=0
This query returns null. Why? There are no unknown values involved here! The WHERE clause returns zero records, and the sum of an empty set of values is 0.[**] Note that the set is not unknown, it is known to be empty.
I know that I can work around this behaviour by using ISNULL or COALESCE, but I'm trying to understand why this behaviour, which appears counter-intuitive to me, was chosen.
Any insights as to why this makes sense?
[*] with some notable exceptions such as null OR true, where obviously true is the right result since the unknown value simply does not matter.
[**] just like the product of an empty set of values is 1. Mathematically speaking, if I were to extend $(Z, +)$ to $(Z union {null}, +)$, the obvious choice for the identity element would still be 0, not null, since x + 0 = x but x + null = null.
The ANSI-SQL-Standard defines the result of the SUM of an empty set as NULL. Why they did this, I cannot tell, but at least the behavior should be consistent across all database engines.
Reference: http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt on page 126:
b) If AVG, MAX, MIN, or SUM is specified, then
Case:
i) If TXA is empty, then the result is the null value.
TXA is the operative resultset from the selected column.
When you mean empty table you mean a table with only NULL values, That's why we will get NULL as output for aggregate functions. You can consider this as by design for SQL Server.
Example 1
CREATE TABLE testSUMNulls
(
ID TINYINT
)
GO
INSERT INTO testSUMNulls (ID) VALUES (NULL),(NULL),(NULL),(NULL)
SELECT SUM(ID) FROM testSUMNulls
Example 2
CREATE TABLE testSumEmptyTable
(
ID TINYINT
)
GO
SELECT SUM(ID) Sums FROM testSumEmptyTable
In both the examples you will NULL as output..

T-sql - determine if value is integer

I want to determine if a value is integer (like TryParse in .NET). Unfortunatelly ISNUMERIC does not fit me because I want to parse only integers and not every kind of number. Is there such thing as ISINT or something?
Here is some code to make things clear. If MY_FIELD is not int, this code would fail:
SELECT #MY_VAR = CAST(MY_FIELD AS INT)
FROM MY_TABLE
WHERE MY_OTHER_FIELD = 'MY_FILTER'
Thank you
Here's a blog post describing the creation of an IsInteger UDF.
Basically, it recommends adding '.e0' to the value and using IsNumeric. In this way, anything that already had a decimal point now has two decimal points, causing IsNumeric to be false, and anything already expressed in scientific notation is invalidated by the e0.
In his article Can I convert this string to an integer?, Itzik Ben-Gan provides a solution in pure T-SQL and another that uses the CLR.
Which solution should you choose?
Is the T-SQL or CLR Solution Better? The advantage of using the T-SQL
solution is that you don’t need to go outside the domain of T-SQL
programming. However, the CLR solution has two important advantages:
It's simpler and faster. When I tested both solutions against a table
that had 1,000,000 rows, the CLR solution took two seconds, rather
than seven seconds (for the T-SQL solution), to run on my laptop. So
the next time you need to check whether a given string can be
converted to an integer, you can include the T-SQL or CLR solution
that I provided in this article.
If you only want to maintain T-SQL, then use the pure T-SQL solution. If performance is more important than convenience, then use the CLR solution.
The pure T-SQL Solution is tricky. It combines the built-in ISNUMERIC function with pattern-matching and casting to check if the string represents an int.
SELECT keycol, string, ISNUMERIC(string) AS is_numeric,
CASE
WHEN ISNUMERIC(string) = 0 THEN 0
WHEN string LIKE '%[^-+ 0-9]%' THEN 0
WHEN CAST(string AS NUMERIC(38, 0))
NOT BETWEEN -2147483648. AND 2147483647. THEN 0
ELSE 1
END AS is_int
FROM dbo.T1;
The T-SQL part of the CLR solution is simpler. You call the fn_IsInt function just like you would call ISNUMERIC.
SELECT keycol, string, ISNUMERIC(string) AS is_numeric,
dbo.fn_IsInt(string) AS is_int
FROM dbo.T1;
The C# part is simply a wrapper for the .NET's parsing function Int32.TryParse. This works because the SQL Server int and the .NET Int32 are both 32-bit signed integers.
using System;
using System.Data.SqlTypes;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlBoolean fn_IsInt(SqlString s)
{
if (s.IsNull)
return SqlBoolean.False;
else
{
Int32 i = 0;
return Int32.TryParse(s.Value, out i);
}
}
};
Please read Itzik's article for a full explanation of these code samples.
With sqlserver 2005 and later you can use regex-like character classes with LIKE operator. See here.
To check if a string is a non-negative integer (it is a sequence of decimal digits) you can test that it doesn't contain other characters.
SELECT numstr
FROM table
WHERE numstr NOT LIKE '%[^0-9]%'
Note1: This will return empty strings too.
Note2: Using LIKE '%[0-9]%' will return any string that contains at least a digit.
See fiddle
WHERE IsNumeric(MY_FIELD) = 1 AND CAST(MY_FIELD as VARCHAR(5)) NOT LIKE '%.%'
That is probably the simplest solution. Unless your MY_FIELD contains .00 or something of that sort. In which case, cast it to a float to remove any trailing .00s
Necromancing.
As of SQL-Server 2012+, you can use TRY_CAST, which returns NULL if the cast wasn't successful.
Example:
DECLARE #foo varchar(200)
SET #foo = '0123'
-- SET #foo = '-0123'
-- SET #foo = '+0123'
-- SET #foo = '+-0123'
-- SET #foo = '+-0123'
-- SET #foo = '.123'
-- SET #foo = '1.23'
-- SET #foo = '.'
-- SET #foo = '..'
-- SET #foo = '0123e10'
SELECT CASE WHEN TRY_CAST(#foo AS integer) IS NULL AND #foo IS NOT NULL THEN 0 ELSE 1 END AS isInteger
This is the only really reliable way.
Should you need support for SQL-Server 2008, then fall back to Sam DeHaan's answer:
SELECT CASE WHEN ISNUMERIC(#foo + '.e0') = 1 THEN 1 ELSE 0 END AS isInteger
SQL-Server < 2012 (aka 2008R2) will reach end of (extended) support by 2019-07-09.
At this time, which is very soon, support for < 2012 can be dropped.
I wouldn't use any of the other hacks at this point in time anymore.
Just tell your frugal customers to update - it's been over 10 years since 2008.
See whether the below query will help
SELECT *
FROM MY_TABLE
WHERE CHARINDEX('.',MY_FIELD) = 0 AND CHARINDEX(',',MY_FIELD) = 0
AND ISNUMERIC(MY_FIELD) = 1 AND CONVERT(FLOAT,MY_FIELD) / 2147483647 <= 1
The following is correct for a WHERE clause; to make a function wrap it in CASE WHEN.
ISNUMERIC(table.field) > 0 AND PATINDEX('%[^0123456789]%', table.field) = 0
This work around with IsNumeric function will work:
select * from A where ISNUMERIC(x) =1 and X not like '%.%'
or Use
select * from A where x **not like** '%[^0-9]%'
declare #i numeric(28,5) = 12.0001
if (#i/cast(#i as int) > 1)
begin
select 'this is not int'
end
else
begin
select 'this is int'
end
As of SQL Server 2012, the TRY_CONVERT and TRY_CAST functions were implemented. Thise are vast improvements over the ISNUMERIC solution, which can (and does) give false positives (or negatives). For example if you run the below:
SELECT CONVERT(int,V.S)
FROM (VALUES('1'),
('900'),
('hello'),
('12b'),
('1.1'),
('')) V(S)
WHERE ISNUMERIC(V.S) = 1;
Using TRY_CONVERT (or TRY_CAST) avoids that:
SELECT TRY_CONVERT(int,V.S),
V.S,
ISNUMERIC(V.S)
FROM (VALUES('1'),
('900'),
('hello'),
('12b'),
('1.1'),
('')) V(S)
--WHERE TRY_CONVERT(int,V.S) IS NOT NULL; --To filter to only convertable values
Notice that '1.1' returned NULL, which cause the error before (as a string represtation of a decimal cannot be converted to an int) but also that '' returned 0, even though ISNUMERIC states the value "can't be converted".
Use TRY_CONVERT which is an SQL alternative to TryParse in .NET. IsNumeric() isn’t aware that empty strings are counted as (integer)zero, and that some perfectly valid money symbols, by themselves, are not converted to (money)zero. reference
SELECT #MY_VAR = CASE WHEN TRY_CONVERT(INT,MY_FIELD) IS NOT NULL THEN MY_FIELD
ELSE 0
END
FROM MY_TABLE
WHERE MY_OTHER_FIELD = 'MY_FILTER'
I think that there is something wrong with your database design. I think it is a really bad idea to mix varchar and numbers in one column? What is the reason for that?
Of course you can check if there are any chars other than [0-9], but imagine you have a 1m rows in table and your are checking every row. I think it won't work well.
Anyway if you really want to do it I suggest doing it on the client side.
I have a feeling doing it this way is the work of satan, but as an alternative:
How about a TRY - CATCH?
DECLARE #Converted as INT
DECLARE #IsNumeric BIT
BEGIN TRY
SET #Converted = cast(#ValueToCheck as int)
SET #IsNumeric=1
END TRY
BEGIN CATCH
SET #IsNumeric=0
END CATCH
select IIF(#IsNumeric=1,'Integer','Not integer') as IsInteger
This works, though only in SQL Server 2008 and up.
I tried this script and got the answer
ISNUMERIC(Replace(Replace([enter_your_number],'+','A'),'-','A') + '.0e0')
for example for up question this is answer:
SELECT #MY_VAR = CAST(MY_FIELD AS INT)
FROM MY_TABLE
WHERE MY_OTHER_FIELD = 'MY_FILTER' and ISNUMERIC(Replace(Replace(MY_FIELD,'+','A'),'-','A') + '.0e0') = 1
Why not just do something like:
CASE
WHEN ROUND(MY_FIELD,0)=MY_FIELD THEN CAST(MY_FIELD AS INT)
ELSE MY_FIELD
END
as MY_FIELD2
Sometimes you don't get to design the database, you just have to work with what you are given. In my case it's a database located on a computer that I only have read access to which has been around since 2008.
I need to select from a column in a poorly designed database which is a varchar with numbers 1-100 but sometimes a random string. I used the following to get around it (although I wish I could have re designed the entire database).
SELECT A from TABLE where isnumeric(A)=1
I am not a Pro in SQL but what about checking if it is devideable by 1 ?
For me it does the job.
SELECT *
FROM table
WHERE fieldname % 1 = 0
Use PATINDEX
DECLARE #input VARCHAR(10)='102030.40'
SELECT PATINDEX('%[^0-9]%',RTRIM(LTRIM(#input))) AS IsNumber
reference
http://www.intellectsql.com/post-how-to-check-if-the-input-is-numeric/
Had the same question. I finally used
where ATTRIBUTE != round(ATTRIBUTE)
and it worked for me
WHERE IsNumeric(value + 'e0') = 1 AND CONVERT(FLOAT, value) BETWEEN -2147483648 AND 2147483647
Seeing as this is quite old, but my solution isn't here, i thought to add another possible way to do this:
--This query only returns values with decimals
SELECT ActualCost
FROM TransactionHistory
where cast(ActualCost as int) != ActualCost
--This query only returns values without decimals
SELECT ActualCost
FROM TransactionHistory
where cast(ActualCost as int) = ActualCost
The easy part here is checking if the selected value is the same when cast as an integer.
we can check if its a non integer by
SELECT number2
FROM table
WHERE number2 LIKE '%[^0-9]%' and (( right(number2 ,len(number2)-1) LIKE '%[^0-9]%' and lefT(number2 ,1) <> '-') or ( right(number2 ,len(number2)-1) LIKE '%[^0-9]%' and lefT(number2 ,1) in ( '-','+') ) )
DECLARE #zip_code NCHAR(10)
SET #zip_code = '1239'
IF TRY_PARSE( #zip_code AS INT) / TRY_PARSE( #zip_code AS INT) = 1 PRINT 'integer'
ELSE PRINT 'not integer'
This works fine in SQL Server
SELECT (SELECT ISNUMERIC(2) WHERE ISNUMERIC(2)=1 AND 2 NOT LIKE '%.%')
Case
When (LNSEQNBR / 16384)%1 = 0 then 1
else 0
end

Resources