Bitwise AND in Sql Server - sql-server

I have a very typical situation. We have a table called Users which has a column called Branches (varchar 1000).
The organization can have 1000 branches. So if a user has access to branch 1, 5, and 10, the branches string would look like:
1000100001000000000......
(i.e. 1 for a position a User has branch access to based on the branch's number). Please do not advise better data storage options, this is coming to me from a legacy application that is deployed across continents.
Now given this background (and considering that there can be > 10000 users), I want to search for all Users who have access to any one of given set of branches, e.g. Find all users who have access to either branch 10, 65, 90 or 125.
One easy solution is to convert the desired set of branches (i.e. 10, 65, 90, 125) to a branch string (00000010100 etc), then use a scalar UDF to iterate over both the branch strings and return true at first matching occurence where 2 branch strings have 1, and false if there is not a 1 at common position.
Other than that, I also have an option of searching in application in C#. Some of these users are privileged (approx 1000 or more) and their data is cached in application as it is accessed very frequently. But for other users that are not privileged, data is only in db.
I have 2 questions here:
1) For a db search, is there a better way other than the UDF approach I mentioned.
2) For privileged users, what would be better in terms of performance, search in application (which further can be based on a for loop on branch strings like in UDF, or as a Linq Intersect operator on 2 branch arrays, i.e. a Linq Intersect on [1,5,9,50,80,200] and [6,90,256,300] etc.)
Would a db search produce faster results or an application based search?
Consider there might be other parameters for search in both cases, e.g. Last name starts with.
My current approach is to filter rows in db for both situations first on other parameters (like Last name starts with). Then use a scalar UDF to filter this result-set based on branches and then return the results.

Do it in SQL, it will be only 100 times faster than doing it in C# or other front end.
Use the built-in numbers table to break the long string into positions (number series goes up to 2047).
Sample tables
create table users (userid int)
insert users select 1 union all select 2
create table permission (userid int, bigstr varchar(1000))
insert permission
select 1, REPLICATE('0', 56) + '1' -- 57th
+ REPLICATE('0', 32) + '1' -- 90th
+ REPLICATE('0', 64) + '1' -- 155th
+ REPLICATE('0', 845)
insert permission
select 2, REPLICATE('0', 66) + '1' -- 67th
+ REPLICATE('0', 98) + '1' -- 166th
+ REPLICATE('0', 657) + '1' -- 824th
+ REPLICATE('0', 176)
Sample showing all the matching permissions against a list
select *
from users u
inner join permission p on p.userid=u.userid
inner join master..spt_values v on v.type='p'
and SUBSTRING(p.bigstr,v.number,1) = '1'
and v.number between 1 and LEN(p.bigstr) -- or 1000 if it is always 1000
where v.number in (57,90,824)
To find users who have at access to at least one branch in the list:
select distinct u.userid
from users u
inner join permission p on p.userid=u.userid
inner join master..spt_values v on v.type='p'
and SUBSTRING(p.bigstr,v.number,1) = '1'
and v.number between 1 and LEN(p.bigstr) -- or 1000 if it is always 1000
where v.number in (57,90,824)
etc..

You might want to construct strings of the form __1_1____1% for a LIKE query to find all users who have access to branches 3, 5, and 10.
To construct these strings the easiest way is to start with a string of _ characters that's as long as the largest branch number in your set (or larger), and then replace individual _ characters with 1 characters, and then append the % at the end.
As for whether this is faster than doing a loop in the database or a loop in your application, I think your best approach is to just test it.

Bitwise Group Membership:
From the comments, I'm assuming that we are not able to use a link table for group memberships. Here is a bitwise solution that does not use strings. It cannot be an acceptable answer because the number of bits limits the number of groups quite severely. However, by using integers with explicit value comparisons, the database can make use of its indexes efficiently. So I've added it for the case where the number of groups / roles / whatever is limited enough to fit.
PS: Excuse an binary-decimal mess ups, I just plugged stuff in on the fly. Feel free to comment and correct, if I have any errors.
Each group is assigned a bit:
G1: 0001
G2: 0010
G3: 0100
G4: 1000
Users' group memberships are calculated with bitwise &. Here are some examples with the binary and decimal equivalents:
U1: G1: 0001 (01)
U2: G2: 0010 (02)
U3: G3: 0100 (04)
U4: G4: 1000 (08)
U5: G1 & G2: 0011 (03)
U6: G2 & G3: 0110 (06)
U7: G1 & G3: 0101 (05)
U8: G2 & G4: 1010 (10)
U9: G1 & G2 & G4: 1011 (11)
Now, calculate, using iteration from 1-N (N is number of groups) and get a list of all the possible integer values that any particular group can contribute to. For example, G1 will be present in any odd number:
G1' : 0001 (01), 0011 (03), 0101 (05), 0111 (07), 1001 (09), 1011 (11), 1101 (13), 1111 (15)
G2' : 0010 (02), 0011 (03), 0110 (06), 0111 (07), 1010 (10), 1011 (11), 1110 (14), 1111 (15)
G3' : 0100 (04), 0101 (05), 0110 (06), 0111 (07), 1100 (12), 1101 (13), 1110 (14), 1111 (15)
G4' : 1000 (08), 1001 (09), 1010 (10), 1011 (11), 1100 (12), 1101 (13), 1110 (14), 1111 (15)
You can do this with a loop from 1-1000 with a bitwise AND of the group's decimal value 1,2,4,8, etc.
Keep the values in memory, or push them into the table storing your groups' possible memerships e.g. possible_memberships.
Get me users in G1:
Q: select * from users where group_memberships in (1, 3, 5, 7, 9, 11, 13, 15);
A: U1, U5, U7, U9
Get me users in G2:
Q: select * from users where group_memberships in (2, 3, 6, 7, 10, 11, 14, 15);
A: U2, U5, U6, U8, U9
If you have a groups table with column 'possible_memberships', you can put the values in there, saving you
from having to send all the values over wire and allowing the subselect to be cached on the database.
p>
Get me users in G3:
Q: select * from users where group_memberships in (select possible_memberships from groups where name = 'G3');
A: U3, U7, U6

Use a LIKE query. In Sql Server, an _ used in a LIKE expression matches any single character. To get those users that are in branches 1,5, and 10 , you could to it like this:
SELECT columns FROM Users WHERE BRANCHES LIKE '1___1____1%'
This isn't particularly efficient (it's not very sargable), but it should work and it's likely no worse than your udf option.

UPDATE
This is not a feasible solution with a 1000 bit number. I will leave it up incase someone with a smaller number of options comes across this post.
Can you make any changes to the DB schema at all? If you can add a calculated column that holds an integer representation of the binary number you have in your varchar then you can use bitwise logic to select what you are talking about entirely in the DB and pretty quickly.
Here is an example of what I am talking about:
with temp as
(
select 1 as BranchNumber -- 1
union
select 2 -- 01
union
select 5 -- 101
union
select 7 -- 111
union
select 15 as number -- 111
)
--Select users that belong to branch 2
SELECT * from temp
where (BranchNumber & 2) = 2
--returns 2,7,15
--Select users that belong to at least branches 1,2 and 3
SELECT * from temp
where (BranchNumber & 7) = 7
--returns 7,15
To convert binary to a number you would probably have to create a UDF that you could call to populate the new column. I did a little poking around and found this which appears to be a good starting point. I have not tested it yet, so be careful:
SET NOCOUNT ON
CREATE TABLE #nums (pos bigint)
DECLARE #cntr int
SET #cntr = 0
WHILE #cntr < 63 BEGIN
INSERT INTO #nums VALUES (#cntr)
SET #cntr = #cntr + 1
END
DECLARE #binstring varchar(63)
SET #binstring = '10000010000000001000011000000000'
SELECT
IntegerVal =
sum(power(convert(bigint,2),pos)
* substring(reverse(#binstring),pos+1,1)) -- yeah, implicit conversion
FROM #nums
DROP TABLE #nums

The bitwise query would probably outperform string functions though the 1000 bit integer doesn't exist. However as it can be derived from the string you can choose to break it into a particular number of integer sets and query across them. You will simply need to understand the significant bits for each column and adjust the input by a particular constant appropriately or just whack it into a set of bit representing strings and convert to int.

Related

Varbinary search with min and max pattern type

So I'm trying to make a query which goes through varbinary data. The issue is that I can't really finish what I'm trying to achieve. What you should know about the column is varbinary(50) and the patterns that occur have no specific order in writing, meaning every prefix could be anywhere as long it has 3 bytes(0x000000) First one is the prefix second and third are value data that I'm looking to check if its within the range i like. All the data is written like this.
What I've tried:
DECLARE #t TABLE (
val VARBINARY(MAX)
)
INSERT INTO #t SELECT 0x00000100000000000000000000000000000000000000000000000000
INSERT INTO #t SELECT 0x00001000000000000000000000000000000000000000000000000000
INSERT INTO #t SELECT 0x00010000000000000000000000000000000000000000000000000000
INSERT INTO #t SELECT 0x00100000000000000000000000000000000000000000000000000000
INSERT INTO #t SELECT 0x00000f00000000000000000000000000000000000000000000000000
declare #pattern varbinary(max)
declare #pattern2 varbinary(max)
set #pattern = 0x0001
set #pattern2 = #pattern+0xFF
select #pattern,#pattern2
SELECT
*
FROM #t
WHERE val<#pattern
OR val>#pattern2
This was total bust the patterns were accurate up to 2 symbols if I were to use 4 symbols as pattern it would work only if the pattern is in predefined position. I've tried combination of this and everything below.
WHERE CONVERT(varbinary(2), val) = 0xdata
also this:
select *
from table
where CONVERT(varchar(max),val,2) like '%data%'
Which works great for searching exact patterns, but not for ranges, I need some combination of both.
I'm aware I could technically add every possible outcome 1 by 1 and let it cycle through all the listed possibilities, but there has to be a smarter way.
Goals:
Locating the prefix(first binary data pair)
Defining a max value after the prefix, everything above that threshold to be listed in the results. Let's say '26' is the prefix, the highest allowed number after is '9600' or '269600'. Basically any data that exceeds this pattern '269600' should be detected example '269700'.
or query result would post this:
select * from table where CONVERT(varchar(max),attr,2) like
'%269700%'
I need something that would detect this on its own while i just give it start and end to look in between like the highest number variation would be '26ffff', but limiting it to something like 'ff00' is acceptable for what I'm looking for.
My best guess is 2 defined numbers, 1 being the allowed max range
and 2nd for a cap, so it doesn't go through every possible outcome.
But I would be happy to whatever works.
I'm aware this explanation is pretty dire, but bear with me, thanks.
*Update after the last suggestion
SELECT MIN(val), MAX(val) FROM #t where CONVERT(varchar(max),val,2) like '%26%'
This is pretty close, but its not sufficient i need to cycle through alot of data and use it after this would select only min or max even with the prefix filter. I believe i need min and max defined as a start and end range where the query should look for.
**Update2
I'm afraid you would end up disappointed, its nothing that interesting.
The data origin is related to a game server which stores the data like this. There's the predefined prefixes which are the stat type and the rest of the data is the actual numeric value of the stat. The data is represented by 6 characters data intervals. Here is a sample of the data stream. Its always 6-6-6-6-6 as long there's space to record the data on since its capped at 50 characters.
0x0329000414000B14000C14000D0F00177800224600467800473C00550F00000000000000000000000000
**Update3
Yes the groups are always in 3byte fashion, yes my idea was exactly that use the first byte to narrow down the search and use then use the second 2 bytes to filter it. I just don't know how to pull it off in an effective way. I'm not sure if i understood what u meant by predictively aligned assuming you meant if stat/prefix/header would always end up at the same binary location, if that's correct the answer is no. If the 3byte pattern is violated the data becomes unreadable meaning even if you don't need the extra byte you have to count it otherwise the data breaks example of a working data.
0x032900'041400'
example of a broken data:
0x0329'041400'
The only issue i could think is when the prefix and part of the value are both true example:
0x262600
Unless the query is specifically ordered to read the data in 3byte sequence meaning it knows that the first byte is always a prefix and the other 2 bytes are value.
Q:Can that be used as an alignment indicator so that the first non-zero byte after at least 3 zero bytes indicates the start of a group?
A:Yes, but that's unlikely I mean it although possible it would be written in order like:
0x260000'270000'
It wouldn't skip forward an entire 3byte group filled with no data. This type of entry would occur if someone were to manually insert it to the db, the server doesn't make records with gaps like those as far I'm aware:
0x260000'000000'270000'
To address your last comment that's something I don't know how to express it in a working query, except for the boneheaded version which would be me manually adding every possible number within my desired range with +1bit after that number. As you can imagine the query would look terrible. That's why I'm looking for a smarter solution that I cannot figure out how to do so by my self.
select * from #t
where (CONVERT(varchar(max),val,2) like '%262100%' or
CONVERT(varchar(max),attr,2) like '%262200%' or
etc...)
This may be a partial answer from which you can build.
The following will split the input data up into 3-byte (6 hex character) groups. It then extracts the first byte as the key, and several representations of the remaining two bytes as values.
SELECT S.*, P.*
FROM #t T
CROSS APPLY (
SELECT
N.Offset,
SUBSTRING(T.val, N.Offset + 1, 3) AS Segment
FROM (
VALUES
(0), (3), (6), (9), (12), (15), (18), (21), (24), (27),
(30), (33), (36), (39)
) N(Offset)
WHERE N.Offset < LEN(T.val) - 3
) S
CROSS APPLY(
SELECT
CONVERT(TINYINT, SUBSTRING(S.Segment, 1, 1)) AS [Key],
CONVERT(TINYINT, SUBSTRING(S.Segment, 2, 1)) AS [Value1],
CONVERT(TINYINT, SUBSTRING(S.Segment, 3, 1)) AS [Value2],
CONVERT(SMALLINT, SUBSTRING(S.Segment, 2, 2)) AS [Value12],
CONVERT(SMALLINT, SUBSTRING(S.Segment, 3, 1) + SUBSTRING(S.Segment, 2, 1)) AS [Value21]
) P
Given the following input data
0x0329000414000B14000C14000D0F00177800224600467800473C00550F00000000000000000000000000
--^-----^-----^-----^-----^-----^-----^-----^-----^-----^-----^-----^-----^-----^-----
The following results are extracted:
Offset
Segment
Key
Value1
Value2
Value12
Value21
0
0x032900
3
41
0
10496
41
3
0x041400
4
20
0
5120
20
6
0x0B1400
11
20
0
5120
20
9
0x0C1400
12
20
0
5120
20
12
0x0D0F00
13
15
0
3840
15
15
0x177800
23
120
0
30720
120
18
0x224600
34
70
0
17920
70
21
0x467800
70
120
0
30720
120
24
0x473C00
71
60
0
15360
60
27
0x550F00
85
15
0
3840
15
30
0x000000
0
0
0
0
0
33
0x000000
0
0
0
0
0
36
0x000000
0
0
0
0
0
See this db<>fiddle.
DECLARE #YourTable table
(
Id INT PRIMARY KEY,
Val VARBINARY(50)
)
INSERT #YourTable
VALUES (1, 0x0329000414000B14000C14000D0F00177800224600467800473C00550F00000000000000000000000000),
(2, 0x0329002637000B14000C14000D0F00177800224600467800473C00550F00000000000000000000000000);
SELECT Id, Triplet
FROM #YourTable T
CROSS APPLY GENERATE_SERIES(1,DATALENGTH(T.Val),3) s
CROSS APPLY (VALUES (SUBSTRING(T.Val, s.value, 3))) V(Triplet)
WHERE Triplet BETWEEN 0x263700 AND 0x2637FF
This works only with '22 sql server because of 'generate_series'
DECLARE #YourTable table
(
Id INT PRIMARY KEY,
Val VARBINARY(50)
)
INSERT #YourTable
VALUES (1, 0x0329000414000B14000C14000D0F00177800224600467800473C00550F00000000000000000000000000),
(2, 0x0329002637000B14000C14000D0F00177800224600467800473C00550F00000000000000000000000000);
SELECT Id, Triplet
FROM #YourTable T
JOIN (VALUES (1),(4),(7),(10),(13),(16),(19),(22),(25),(28),(31),(34),(37),(40),(43),(46),(49)) Nums(Num) ON Num <= DATALENGTH(T.Val)
CROSS APPLY (VALUES (SUBSTRING(T.Val, Num, 3))) V(Triplet)
WHERE Triplet BETWEEN 0x263700 AND 0x2637FF
This one works on older versions without "generate_series"
The credit is to #Martin Smith from stackexchange
https://dba.stackexchange.com/questions/323235/varbinary-pattern-search

How to see if a sub value is in masked value

I came across some code that stores multiple checkbox values into 1 value in the database.
This works by assigning each value a base 2^n value, and adding all the selected values and storing that result.
I am wondering how to query for records in the database that have a specific value inside the value that is stored in the db.
Example
Checkbox 1 = 2
Checkbox 2 = 4
Checkbox 3 = 8
Checkbox 4 = 16
Lets say checkboxes 1 and 3 are selected. We would store 10 in the database.
How would I query for records the have checkbox 1(value 2) selected in the resulting value that is stored in the database(10)?
This is sql server.
You could use bitwise AND:
SELECT *
FROM tab
WHERE answer & 10 = 10
Similar approach used on ##OPTIONS:
The ##OPTIONS function returns a bitmap of the options, converted to a base 10 (decimal) integer.
To decode the ##OPTIONS value, convert the integer returned by ##OPTIONS to binary, and then look up the values on the table at Configure the user options Server Configuration Option. For example, if SELECT ##OPTIONS; returns the value 5496, use the Windows programmer calculator (calc.exe) to convert decimal 5496 to binary. The result is 1010101111000. The right most characters (binary 1, 2, and 4) are 0, indicating that the first three items in the table are off.
Checking ANSI_NULLS
To view the current setting for this setting, run the following query:
DECLARE #ANSI_NULLS VARCHAR(3) = 'OFF';
IF ( (32 & ##OPTIONS) = 32 ) SET #ANSI_NULLS = 'ON';
SELECT #ANSI_NULLS AS ANSI_NULLS;
you can use bitwise operation which is &
let take 12 as example. The bitwise equivalent of 12 is 1100. if you would like to get whether the checkbox 3 is selected you need to compare with 8 (equivalent 1000)
1100
1000
-----
1000 => 8
So your final query would be
SELECT * from Table1 WHERE answer & 8 = 8

Check last bit in a hex in SQL

I have a SQL entry that is of type hex (varbinary) and I want to do a SELECT COUNT for all the entries that have this hex value ending in 1.
I was thinking about using CONVERT to make my hex into a char and then use WHERE my_string LIKE "%1". The thing is that varchar is capped at 8000 chars, and my hex is longer than that.
What options do I have?
Varbinary actually works with some string manipulation functions, most notably substring. So you can use eg.:
select substring(yourBinary, 1, 1);
To get the first byte of your binary column. To get the last bit then, you can use this:
select substring(yourBinary, len(yourBinary), 1) & 1;
This will give you zero if the bit is off, or one if it is on.
However, if you really only have to check at most the last 4-8 bytes, you can easily use the bitwise operators on the column directly:
select yourBinary & 1;
As a final note, this is going to be rather slow. So if you plan on doing this often, on large amounts of data, it might be better to simply create another bit column just for that, which you can index. If you're talking about at most a thousand rows or so, or if you don't care about speed, fire away :)
Check last four bits = 0001
SELECT SUM(CASE WHEN MyColumn % 16 IN (-15,1) THEN 1 END) FROM MyTable
Check last bit = 1
SELECT SUM(CASE WHEN MyColumn % 2 IN (-1,1) THEN 1 END) FROM MyTable
If you are wondering why you have to check for negative moduli, try SELECT 0x80000001 % 16
Try using this where
WHERE LEFT(my_string,1) = 1
It it's text values ending in 1 then you want the Right as opposed to the Left
WHERE RIGHT(my_string,1) = 1

REAL column holding values outside documented range

According to MSDN, the range for REAL values is - 3.40E + 38 to -1.18E - 38, 0 and 1.18E - 38 to 3.40E + 38. However, I have quite a few values beyond that range in my table.
The following query returns lots of very small values and no very large ones:
SELECT MyColumn ,
*
FROM data.MyTable
WHERE MyColumn <> 0
AND ( MyColumn < CONVERT(REAL, 1.18E-38)
OR MyColumn > CONVERT(REAL, 3.40E+38)
)
AND ( MyColumn < CONVERT(REAL, -3.40E+38)
OR MyColumn > CONVERT(REAL, -1.18E-38)
)
It is easy to show how these values end up in the table. I cannot insert them directly:
CREATE TABLE a(r REAL NULL);
GO
INSERT INTO a(r) VALUES(4.330473E-39);
GO
SELECT r FROM a
GO
DROP TABLE a;
----
0.0
But I can divide two columns and get and outside of range value:
CREATE TABLE a
(
r1 REAL NULL ,
r2 REAL NULL ,
r3 REAL NULL
) ;
GO
INSERT INTO a
( r1, r2 )
VALUES ( 4.330473E-38, 1000 ) ;
GO
UPDATE a
SET r3 = r1 / r2 ;
SELECT r1 ,
r2 ,
r3
FROM a
r1 r2 r3
------------- ------------- -------------
4.330473E-38 1000 4.330433E-41
So I guess MSDN gives wrong ranges of valid data, correct?
Am I missing anything?
Several people suggested that this is a bug.
What part of this behavior exactly is a bug. is it:
Wrong constants documented in MSDN and used in DBCC, as well as wrong threshold for rounding down.
Update being able to save wrong values
Books Online documents only the normal range for single- and double-precision floating point numbers. The IEEE 754 rules also specify floating-point numbers closer to zero than the smallest non-zero normal value, known variously as denormalized, denormal, and subnormal numbers. From that last link:
Denormal numbers provide the guarantee that addition and subtraction of floating-point numbers never underflows; two nearby floating-point
numbers always have a representable non-zero difference. Without
gradual underflow, the subtraction a−b can underflow and produce zero
even though the values are not equal. This can, in turn, lead to
division by zero errors that cannot occur when gradual underflow is
used.
SQL Server is following the rules for single-precision floating point calculations in the examples posted. The bug may be that DBCC checks only for normal values, and throws an incorrect error message when it encounters a stored denormal value.
Example producing a denormal single-precision value:
DECLARE
#v1 real = 14e-39,
#v2 real = 1e+07;
-- 1.4013e-045
SELECT #v1 / #v2;
Example showing a stored float denormal passes DBCC checks:
CREATE TABLE dbo.b (v1 float PRIMARY KEY);
INSERT b VALUES (POWER(2e0, -1075));
SELECT v1 FROM b; -- 4.94065645841247E-324
DBCC CHECKTABLE(b) WITH DATA_PURITY; -- No errors or warnings
DROP TABLE dbo.b;
This is a bug in SQL Server. The last script you post is a nice repro. Add one line to it at the end:
DBCC CHECKDB WITH data_purity
This fails with:
Msg 2570, Level 16, State 3, Line 1 Page (1:313), slot 0 in object ID
357576312, index ID 0, partition ID 1801439851932155904, alloc unit ID
2017612634169999360 (type "In-row data"). Column "r3" value is out of
range for data type "real". Update column to a legal value.
This proves it is a bug. I suggest you file a bug with Microsoft Connect for SQL Server.

How do I generate a random number for each row in a T-SQL select?

I need a different random number for each row in my table. The following seemingly obvious code uses the same random value for each row.
SELECT table_name, RAND() magic_number
FROM information_schema.tables
I'd like to get an INT or a FLOAT out of this. The rest of the story is I'm going to use this random number to create a random date offset from a known date, e.g. 1-14 days offset from a start date.
This is for Microsoft SQL Server 2000.
Take a look at SQL Server - Set based random numbers which has a very detailed explanation.
To summarize, the following code generates a random number between 0 and 13 inclusive with a uniform distribution:
ABS(CHECKSUM(NewId())) % 14
To change your range, just change the number at the end of the expression. Be extra careful if you need a range that includes both positive and negative numbers. If you do it wrong, it's possible to double-count the number 0.
A small warning for the math nuts in the room: there is a very slight bias in this code. CHECKSUM() results in numbers that are uniform across the entire range of the sql Int datatype, or at least as near so as my (the editor) testing can show. However, there will be some bias when CHECKSUM() produces a number at the very top end of that range. Any time you get a number between the maximum possible integer and the last exact multiple of the size of your desired range (14 in this case) before that maximum integer, those results are favored over the remaining portion of your range that cannot be produced from that last multiple of 14.
As an example, imagine the entire range of the Int type is only 19. 19 is the largest possible integer you can hold. When CHECKSUM() results in 14-19, these correspond to results 0-5. Those numbers would be heavily favored over 6-13, because CHECKSUM() is twice as likely to generate them. It's easier to demonstrate this visually. Below is the entire possible set of results for our imaginary integer range:
Checksum Integer: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Range Result: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5
You can see here that there are more chances to produce some numbers than others: bias. Thankfully, the actual range of the Int type is much larger... so much so that in most cases the bias is nearly undetectable. However, it is something to be aware of if you ever find yourself doing this for serious security code.
When called multiple times in a single batch, rand() returns the same number.
I'd suggest using convert(varbinary,newid()) as the seed argument:
SELECT table_name, 1.0 + floor(14 * RAND(convert(varbinary, newid()))) magic_number
FROM information_schema.tables
newid() is guaranteed to return a different value each time it's called, even within the same batch, so using it as a seed will prompt rand() to give a different value each time.
Edited to get a random whole number from 1 to 14.
RAND(CHECKSUM(NEWID()))
The above will generate a (pseudo-) random number between 0 and 1, exclusive. If used in a select, because the seed value changes for each row, it will generate a new random number for each row (it is not guaranteed to generate a unique number per row however).
Example when combined with an upper limit of 10 (produces numbers 1 - 10):
CAST(RAND(CHECKSUM(NEWID())) * 10 as INT) + 1
Transact-SQL Documentation:
CAST(): https://learn.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql
RAND(): http://msdn.microsoft.com/en-us/library/ms177610.aspx
CHECKSUM(): http://msdn.microsoft.com/en-us/library/ms189788.aspx
NEWID(): https://learn.microsoft.com/en-us/sql/t-sql/functions/newid-transact-sql
Random number generation between 1000 and 9999 inclusive:
FLOOR(RAND(CHECKSUM(NEWID()))*(9999-1000+1)+1000)
"+1" - to include upper bound values(9999 for previous example)
Answering the old question, but this answer has not been provided previously, and hopefully this will be useful for someone finding this results through a search engine.
With SQL Server 2008, a new function has been introduced, CRYPT_GEN_RANDOM(8), which uses CryptoAPI to produce a cryptographically strong random number, returned as VARBINARY(8000). Here's the documentation page: https://learn.microsoft.com/en-us/sql/t-sql/functions/crypt-gen-random-transact-sql
So to get a random number, you can simply call the function and cast it to the necessary type:
select CAST(CRYPT_GEN_RANDOM(8) AS bigint)
or to get a float between -1 and +1, you could do something like this:
select CAST(CRYPT_GEN_RANDOM(8) AS bigint) % 1000000000 / 1000000000.0
The Rand() function will generate the same random number, if used in a table SELECT query. Same applies if you use a seed to the Rand function. An alternative way to do it, is using this:
SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) AS [RandomNumber]
Got the information from here, which explains the problem very well.
Do you have an integer value in each row that you could pass as a seed to the RAND function?
To get an integer between 1 and 14 I believe this would work:
FLOOR( RAND(<yourseed>) * 14) + 1
If you need to preserve your seed so that it generates the "same" random data every time, you can do the following:
1. Create a view that returns select rand()
if object_id('cr_sample_randView') is not null
begin
drop view cr_sample_randView
end
go
create view cr_sample_randView
as
select rand() as random_number
go
2. Create a UDF that selects the value from the view.
if object_id('cr_sample_fnPerRowRand') is not null
begin
drop function cr_sample_fnPerRowRand
end
go
create function cr_sample_fnPerRowRand()
returns float
as
begin
declare #returnValue float
select #returnValue = random_number from cr_sample_randView
return #returnValue
end
go
3. Before selecting your data, seed the rand() function, and then use the UDF in your select statement.
select rand(200); -- see the rand() function
with cte(id) as
(select row_number() over(order by object_id) from sys.all_objects)
select
id,
dbo.cr_sample_fnPerRowRand()
from cte
where id <= 1000 -- limit the results to 1000 random numbers
select round(rand(checksum(newid()))*(10)+20,2)
Here the random number will come in between 20 and 30.
round will give two decimal place maximum.
If you want negative numbers you can do it with
select round(rand(checksum(newid()))*(10)-60,2)
Then the min value will be -60 and max will be -50.
try using a seed value in the RAND(seedInt). RAND() will only execute once per statement that is why you see the same number each time.
If you don't need it to be an integer, but any random unique identifier, you can use newid()
SELECT table_name, newid() magic_number
FROM information_schema.tables
You would need to call RAND() for each row. Here is a good example
https://web.archive.org/web/20090216200320/http://dotnet.org.za/calmyourself/archive/2007/04/13/sql-rand-trap-same-value-per-row.aspx
The problem I sometimes have with the selected "Answer" is that the distribution isn't always even. If you need a very even distribution of random 1 - 14 among lots of rows, you can do something like this (my database has 511 tables, so this works. If you have less rows than you do random number span, this does not work well):
SELECT table_name, ntile(14) over(order by newId()) randomNumber
FROM information_schema.tables
This kind of does the opposite of normal random solutions in the sense that it keeps the numbers sequenced and randomizes the other column.
Remember, I have 511 tables in my database (which is pertinent only b/c we're selecting from the information_schema). If I take the previous query and put it into a temp table #X, and then run this query on the resulting data:
select randomNumber, count(*) ct from #X
group by randomNumber
I get this result, showing me that my random number is VERY evenly distributed among the many rows:
It's as easy as:
DECLARE #rv FLOAT;
SELECT #rv = rand();
And this will put a random number between 0-99 into a table:
CREATE TABLE R
(
Number int
)
DECLARE #rv FLOAT;
SELECT #rv = rand();
INSERT INTO dbo.R
(Number)
values((#rv * 100));
SELECT * FROM R
select ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) as [Randomizer]
has always worked for me
Use newid()
select newid()
or possibly this
select binary_checksum(newid())
If you want to generate a random number between 1 and 14 inclusive.
SELECT CONVERT(int, RAND() * (14 - 1) + 1)
OR
SELECT ABS(CHECKSUM(NewId())) % (14 -1) + 1
DROP VIEW IF EXISTS vwGetNewNumber;
GO
Create View vwGetNewNumber
as
Select CAST(RAND(CHECKSUM(NEWID())) * 62 as INT) + 1 as NextID,
'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ'as alpha_num;
---------------CTDE_GENERATE_PUBLIC_KEY -----------------
DROP FUNCTION IF EXISTS CTDE_GENERATE_PUBLIC_KEY;
GO
create function CTDE_GENERATE_PUBLIC_KEY()
RETURNS NVARCHAR(32)
AS
BEGIN
DECLARE #private_key NVARCHAR(32);
set #private_key = dbo.CTDE_GENERATE_32_BIT_KEY();
return #private_key;
END;
go
---------------CTDE_GENERATE_32_BIT_KEY -----------------
DROP FUNCTION IF EXISTS CTDE_GENERATE_32_BIT_KEY;
GO
CREATE function CTDE_GENERATE_32_BIT_KEY()
RETURNS NVARCHAR(32)
AS
BEGIN
DECLARE #public_key NVARCHAR(32);
DECLARE #alpha_num NVARCHAR(62);
DECLARE #start_index INT = 0;
DECLARE #i INT = 0;
select top 1 #alpha_num = alpha_num from vwGetNewNumber;
WHILE #i < 32
BEGIN
select top 1 #start_index = NextID from vwGetNewNumber;
set #public_key = concat (substring(#alpha_num,#start_index,1),#public_key);
set #i = #i + 1;
END;
return #public_key;
END;
select dbo.CTDE_GENERATE_PUBLIC_KEY() public_key;
Update my_table set my_field = CEILING((RAND(CAST(NEWID() AS varbinary)) * 10))
Number between 1 and 10.
Try this:
SELECT RAND(convert(varbinary, newid()))*(b-a)+a magic_number
Where a is the lower number and b is the upper number
If you need a specific number of random number you can use recursive CTE:
;WITH A AS (
SELECT 1 X, RAND() R
UNION ALL
SELECT X + 1, RAND(R*100000) --Change the seed
FROM A
WHERE X < 1000 --How many random numbers you need
)
SELECT
X
, RAND_BETWEEN_1_AND_14 = FLOOR(R * 14 + 1)
FROM A
OPTION (MAXRECURSION 0) --If you need more than 100 numbers

Resources