Generate integer number 1,2,3 in SQL Server - sql-server

I used this query:
SELECT FLOOR(3 * RAND(CONVERT(varbinary, NEWID())))
Can someone please explain how it works? I know all the functions used, but I'm unable to link them.

Tne newid() function doesn't actually generate a string, it generates a uniqueidentifier, also known as a "globally unique identifier", or "GUID". This is a pseudo-random value.
The rand(#seed) function generates a value >= 0 and < 1.
Rand doesn't generate a seed, it accepts the seed as an input parameter. The result depends on the seed. If you pass the same seed value as the input parameter, the result of rand() is always the same. If rand() is called without any seed value, SQL server itself will produce a psuedo-random seed.
Now, this probably seems confusing already. Obvious questions are:
If newid() is already pseudo-random, why do we need rand()?
In the code presented in your question, rand() isn't actually being used to generate random values, that job is really being done by newid(). What rand() is doing is mapping the newid() value to a floating point value between zero and one.
OK then, if rand() with no input seed is already psuedo-random, why do we need newid()??
In the specific sample code in your question, where we are only working with a single scalar value, there's actually no need. The same thing could be accomplished with:
select floor(3 * RAND() + 1)
However, when you are working with multiple rows of data, rand() doesn't get re-seeded by SQL Server for every row, it only gets seeded once. So if you do something like this:
select rand() from sys.objects
Then every row in the result set will have the same value.
The newid() function is different. SQL will generate a different uniqueidentifier for every single row (it sort of "has to by law" - part of the definition of a GUID is that the same GUID should never be generated twice).
So the newid() function is providing a psuedo-random seed value to rand(), and then rand() is mapping that to some floating point value between 0 and 1 (excluding 1).
What is the convert to varbinary doing?
If an argument is passed to rand(), the argument has to be an integer. A uniqueidentifier cannot be implicitly converted to an integer. But a uniqueidentifier can be converted to a varbinary, and then the varbinary can be implicitly converted to an integer. If we make that conversion explicit, it looks like this:
select convert(int, convert(varbinary, newid()))
In your sample code, the integer conversion is being done implicitly. A uniqueidentifier is 16 bytes long, so it gets converted to a 16 byte varbinary. 12 of those bytes then get silently truncated (thrown away), because an integer is only 4 bytes long. The remaining 4 bytes are implicitly converted to the integer.
Note that this truncation could theoretically weaken the randomness of the result. People often use checksum() to convert to an integer rather than casting through a varbinary, because checksum will make use of all of the bytes in the GUID.
what is the multiplication by 3 doing?
Since the rand() function returns a value between 0 and 1, but you want a value "between" 1 and 3, we have to multiply the result of rand():
Values >= 0 and < 1/3 will map to values >= 0 and < 1.
Values >= 1/3 and < 2/3 will map to values >= 1 and < 2.
Values >= 2/3 and < 1 will map to values >= 2 and < 3.
What is floor() doing?
The value we have right now is some floating point value anywhere between 0 and 3 (excluding 3). But you want only the integer values 1 or 2 or 3. So we have to add 1 and shave off the decimal. This is what the + 1 and floor() are doing. You could also get rid of the + 1 and replace floor() with ceiling().

I used this query:
select floor(3 * RAND(convert(varbinary, newid())))+1
RAND() : returns some seed decimal number like 0.405615055347678
newid(): returns some string: 29CADAD4-F9F5-4B79-98F0-33DE745954FC
varbinary : converting data from a string data type to a binary or varbinary data type of unequal length :
eg:
select convert(varbinary, newid())
output:
0x321A7CBE6FBACC41B1EE5BC3C5219B2C

We can generate a random number, using the NEWID() function of SQL
Server. Random number generated by NEWID() method will be a 32 byte
Hexadecimal number, which is unique for your whole system.
The unique identifier generated by the NEWID() can be converted to
VARBINARY using CONVERT() which in turn can be converted to an integer
number USING FLOOR().
select rand()-result will be a random decimal
The FLOOR() function returns the largest integer value that is smaller
than or equal to a number.
select floor(rand()*N) —The generated number is like this: 12.0
The number range of method: 0 to n-1, such as cast (floor (rand() *100)
will generate any integer between 0 and 99.So here our N=3 ... we
will be generating an integer between 0 and 3.
SO, the link is NEWID()>CONVERT()>RAND()>FLOOR>SELECT

Related

SQL Server CHOOSE() function behaving unexpectedly with RAND() function

I've encountered an interesting SQL server behaviour while trying to generate random values in T-sql using RAND and CHOOSE functions.
My goal was to try to return one of two given values using RAND() as rng. Pretty easy right?
For those of you who don't know it, CHOOSE function accepts in an index number(int) along with a collection of values and returns a value at specified index. Pretty straightforward.
At first attempt my SQL looked like this:
select choose(ceiling((rand()*2)) ,'a','b')
To my surprise, this expression returned one of three values: null, 'a' or 'b'. Since I didn't expect the null value i started digging. RAND() function returns a float in range from 0(included) to 1 (excluded). Since I'm multiplying it by 2, it should return values anywhere in range from 0(included) to 2 (excluded). Therefore after use of CEILING function final value should be one of: 0,1,2. After realising that i extended the value list by 'c' to check whether that'd be perhaps returned. I also checked the docs page of CEILING and learnt that:
Return values have the same type as numeric_expression.
I assumed the CEILINGfunction returned int, but in this case would mean that the value is implicitly cast to int before being used in CHOOSE, which sure enough is stated on the docs page:
If the provided index value has a numeric data type other than int,
then the value is implicitly converted to an integer.
Just in case I added an explicit cast. My SQL query looks like this now:
select choose(cast(ceiling((rand()*2)) as int) ,'a','b','c')
However, the result set didn't change. To check which values cause the problem I tried generating the value beforehand and selecting it alongside the CHOOSE result. It looked like this:
declare #int int = cast(ceiling((rand()*2)) as int)
select #int,choose( #int,'a','b','c')
Interestingly enough, now the result set changed to (1,a), (2,b) which was my original goal. After delving deeper in the CHOOSE docs page and some testing i learned that 'null' is returned in one of two cases:
Given index is a null
Given index is out of range
In this case that would mean that index value when generated inside the SELECT statement is either 0 or above 2/3 (I'm assuming that negative numbers are not possible here and CHOOSE function indexes from 1). As I've stated before 0 should be one of possibilities of:
ceiling((rand()*2))
,but for some reason it's never 0 (at least when i tried it 1 million+ times like this)
set nocount on
declare #test table(ceiling_rand int)
declare #counter int = 0
while #counter<1000000
begin
insert into #test
select ceiling((rand()*2))
set #counter=#counter+1
end
select distinct ceiling_rand from #test
Therefore I assume that the value generated in SELECT is greater than 2/3 or NULL. Why would it be like this only when generated in SELECT statement? Perhaps order of resolving CAST, CELING or RAND inside SELECT is different than it would seem? It's true I've only tried it a limited number of times, but at this point the chances of it being a statistical fluctuation are extremely small. Is it somehow a floating-point error? I truly am stumbled and looking forward to any explanation.
TL;DR: When generating a random number inside a SELECT statement result set of possible values is different then when it's generated before the SELECT statement.
Cheers,
NFSU
EDIT: Formatting
You can see what's going on if you look at the execution plan.
SET SHOWPLAN_TEXT ON
GO
SELECT (select choose(ceiling((rand()*2)) ,'a','b'))
Returns
|--Constant Scan(VALUES:((CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(2.0000000000000000e+000)),0)=(1) THEN 'a' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(2.0000000000000000e+000)),0)=(2) THEN 'b' ELSE NULL END END)))
The CHOOSE is expanded out to
SELECT CASE
WHEN ceiling(( rand() * 2 )) = 1 THEN 'a'
ELSE
CASE
WHEN ceiling(( rand() * 2 )) = 2 THEN 'b'
ELSE NULL
END
END
and rand() is referenced twice. Each evaluation can return a different result.
You will get the same problem with the below rewrite being expanded out too
SELECT CASE ceiling(( rand() * 2 ))
WHEN 1 THEN 'a'
WHEN 2 THEN 'b'
END
Avoid CASE for this and any of its variants.
One method would be
SELECT JSON_VALUE ( '["a", "b"]' , CONCAT('$[', FLOOR(rand()*2) ,']') )

How to convert VARCHAR columns to DECIMAL without rounding in SQL Server?

In my SQL class, I'm working with a table that is all VARCHAR. I'm trying to convert each column to a more correct data type.
For example. I have a column called Item_Cost that has a value like:
1.25000000000000000000
I tried to run this query:
ALTER TABLE <table>
ALTER COLUMN Item_Cost DECIMAL
This query does run successfully, but it turns it into 1 instead of 1.25.
How do I prevent the rounding?
Check out the documentation for the data type decimal. The type is defined by optional parameters p (precision) and s (scale). The latter determines the numbers to the right of the decimal point.
Extract from the documentation (I highlighted the important bit in bold):
s (scale)
The number of decimal digits that are stored to the right of
the decimal point. This number is subtracted from p to determine the
maximum number of digits to the left of the decimal point. Scale must
be a value from 0 through p, and can only be specified if precision is
specified. The default scale is 0 and so 0 <= s <= p. Maximum storage
sizes vary, based on the precision.
Defining a suitable precision and scale fixes your issue.
Sample data
create table MyData
(
Item_Cost nvarchar(100)
);
insert into MyData (Item_Cost) values ('1.25000000000000000000');
Solution
ALTER TABLE MyData Alter Column Item_Cost DECIMAL(10, 3);
Result
Item_Cost
---------
1.250
Fiddle

Way to show items where more than 5 decimal places occur?

I am trying to filter out some query results to where it only shows items with 6 decimal places. I don't need it to round up or add 0's to the answer, just filter out anything that is 5 decimal places or below. My current query looks like this: (ex. if item is 199.54215 i dont want to see it but if it is 145.253146 i need it returned)
select
TRA_CODPLANTA,
TRA_WO,
TRA_IMASTER,
tra_codtipotransaccion,
tra_Correlativo,
TRA_INGRESOFECHA,
abs(tra_cantidadparcial) as QTY
from mw_tra_transaccion
where FLOOR (Tra_cantidadparcial*100000) !=tra_cantidadparcial*100000
and substring(tra_imaster,1,2) not in ('CP','SG','PI','MR')
and TRA_CODPLANTA not in ('4Q' , '5C' , '5V' , '8H' , '7W' , 'BD', 'DP')
AND tra_INGRESOFECHA > #from_date
and abs(tra_cantidadparcial) > 0.00000
Any assistance would be greatly appreciated!
Here is an example with ROUND, which seems to be the ideal function to use, since it remains in the realms of numbers. If you have at most 5 decimal places, then rounding to 5 decimal places will leave the value unchanged.
create table #test (Tra_cantidadparcial decimal(20,10));
INSERT #test (Tra_cantidadparcial) VALUES (1),(99999.999999), (1.000001), (45.000001), (45.00001);
SELECT * FROM #test WHERE ROUND(Tra_cantidadparcial,5) != Tra_cantidadparcial;
drop table #test
If your database values are VARCHAR and exist in the DB like so:
100.123456
100.1
100.100
You can achieve this using a wildcard LIKE statement example
WHERE YOUR_COLUMN_NAME LIKE '%.[0-9][0-9][0-9][0-9][0-9][0-9]%'
This will being anything containing a decimal place followed by AT LEAST 6 numeric values
Here is an example using a conversion to varchar and using the LEN - the CHARINDEX of the decimal point, I'm not saying this is the best way, but you did ask for an example in syntax, so here you go:
--Temp Decimal value holding up to 10 decimal places and 10 whole number places
DECLARE #temp DECIMAL(20, 10) = 123.4565432135
--LEN returns an integer number of characters in the converted varchar
--CHARINDEX returns the integer location of the decimal where it is found in the varchar
--IF the number of characters left after subtracting the index of the decimal from the length of the varchar is greater than 5,
--you have more than 6 decimal places
IF LEN(CAST(#temp AS varchar(20))) - CHARINDEX('.', CAST(#temp AS varchar(20)), 0) > 5
SELECT 1
ELSE
SELECT 0
Here is a shorthand way.
WHERE (LEN(CONVERT(DOUBLE PRECISION, FieldName % 1)) - 2) >=5
One way would be to convert / cast that column to a lower precision. Doing this would cause automatic rounding, but that would show you if it is 6 decimals or not based on the last digit. If the last digit of the converted value is 0, then it's false, otherwise it's true.
declare #table table (v decimal(11,10))
insert into #table
values
(1.123456789),
(1.123456),
(1.123),
(1.123405678)
select
v
,cast(v as decimal(11,5)) --here, we are changing the value to have a precision of 5. Notice the rounding.
,right(cast(v as decimal(11,5)),1) --this is taking the last digit. If it's 0, we don't want it
from #table
Thus, your where clause would simply be.
where right(cast(tra_cantidadparcial as decimal(11,5)),1) > 0

Weired Behavior of Round function in MSSQL Database for Real column only

I found weird or strange behavior of Round function in MSSQL for real column type. I have tested this issue in Azure SQL DB and SQL Server 2012
Why #number=201604.125 Return 201604.1 ?
Why round(1.12345,10) Return 1.1234500408 ?
-- For Float column it working as expected
-- Declare #number as float,#number1 as float;
Declare #number as real,#number1 as real;
set #number=201604.125;
set #number1=1.12345;
select #number as Realcolumn_Original
,round(#number,2) as Realcolumn_ROUND_2
,round(#number,3) as Realcolumn_ROUND_3
, #number1 as Realcolumn1_Original
,round(#number1,6) as Realcolumn1_ROUND_6
,round(#number1,7) as Realcolumn1_ROUND_7
,round(#number1,8) as Realcolumn1_ROUND_8
,round(#number1,9) as Realcolumn1_ROUND_9
,round(#number1,10) as Realcolumn1_ROUND_10
Output for real column type
I suspect what you are asking here is why does:
DECLARE #n real = 201604.125;
SELECT #n;
Return 201604.1?
First point of call for things like this should be the documentation: Let's start with float and real (Transact-SQL). Firstly we note that:
The ISO synonym for real is float(24).
If we then look further down:
float [ (n) ] Where n is the number of bits that are used to store the
mantissa of the float number in scientific notation and, therefore,
dictates the precision and storage size. If n is specified, it must be
a value between 1 and 53. The default value of n is 53. n value
Precision Storage size
1-24 7 digits 4 bytes
So, now we know that a real (aka a float(24)) has precision of 7. 201604.125 has a precision of 9, that's 2 too many; so off come that 2 and 5 in the return value.
Now, ROUND (Transact-SQL). That states:
Returns a numeric value, rounded to the specified length or precision.
When using real/float those digits aren't actually lost, as such, due to the floating point. When you use ROUND, you are specifically stating "I want this many decimal places". This is why you can then see the .13 and the .125, as you have specifically asked for those. When you just returned the value of #number it had a precision of 7, due to being a real, so 201604.1 was the value returned.

Using CAST and bigint

I am trying to understand what does this statement does
SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8192.)/1024 /1024
Also why is there a dot after 8192? Can anybody explain this query bit by bit. Thanks!
FILEPROPERTY() returns an int value. Note that the SpaceUsed property is not in bytes but in "pages" - and in SQL Server a page is 8KiB, so multiplying by 8192 to get the size in KiB is appropriate.
I've never encountered a trailing dot without fractional digits before - the documentation for constants/literals in T-SQL does not give an example of this usage, but reading it implies it's a decimal:
decimal constants are represented by a string of numbers that are not enclosed in quotation marks and contain a decimal point.
Thus multiplying the bigint value by a decimal would yield a decimal value, which may be desirable if you want to preserve fractional digits when dividing by 1024 (and then 1024 again), though it's odd that those numbers are actually int literals, so the operation would just be truncation-division.
I haven't tested it, but you could try just this:
SELECT
SUM( FILEPROPERTY( name, 'SpaceUsed' ) ) * ( 8192.0 / 10485760 ) AS TotalGigabytes
FROM
...
If you're reading through code and you need to do research to understand what it's doing - do a favour for the next person who reads the code by adding an explanatory comment to save them from having to do research, e.g. "gets the total number of 8KiB pages used by all databases, then converts it to gigabytes".
The dot . after an Integer converts it implicitly to decimal value. This is most likely here to force the output to also be decimal (not an integer). In this case you only need one part of the operation to be converted to force the output to be in that type.
This probably has to do with bytes/pages since the numbers 8192 and 1024 (most likely for converting to larger unit). One could also imply this by the value of property which indicates how much space is being used by a file.
A page fits within 8kB which means that multiplying pages value by 8192 does convert the output to bytes being used. Then division two times by 1024 succesfully converts the output to gigabytes.
Explanation on functions used:
FILEPROPERTY returns a value for a file name which is stored within database. If a file is not present, null value is returned
CAST is for casting the value to type bigint
SUM is an aggregate function used in a query to sum values for a specified group

Resources