I've been looking into the int rand() function from <stdlib.h> in C11 when I stumbled over the following cppreference-example for rolling a six sided die.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(void)
{
srand(time(NULL)); // use current time as seed for random generator
int random_variable = rand();
printf("Random value on [0,%d]: %d\n", RAND_MAX, random_variable);
// roll a 6-sided die 20 times
for (int n=0; n != 20; ++n) {
int x = 7;
while(x > 6)
x = 1 + rand()/((RAND_MAX + 1u)/6); // Note: 1+rand()%6 is biased
printf("%d ", x);
}
}
Specifically this part:
[...]
while(x > 6)
x = 1 + rand()/((RAND_MAX + 1u)/6); // Note: 1+rand()%6 is biased
[...]
Questions:
Why the addition of + 1u? Since rand() is [0,RAND_MAX] I'm guessing
that doing rand()/(RAND_MAX/6) -> [0,RAND_MAX/(RAND_MAX/6)] -> [0,6]? And
since it's integer division (LARGE/(LARGE+small)) < 1 -> 0, adding 1u gives it the required range of [0,5]?
Building on the previous question, assuming [0,5], 1 + (rand()/((RAND_MAX+1u)/6)) should only go through [1,6] and never trigger a second loop?
Been poking around to see if rand() has returned float at some point, but
that seems like a pretty huge breakage towards old code? I guess the check
makes sense if you add 1.0f instead of 1u making it a floating point
division?
Trying to wrap my head around this, have a feeling that I might be missing
something..
(P.s. This is not a basis for anything security critical, I'm just exploring
the standard library. D.s)
The code avoids bias by ensuring each possible result in [1, 6] is the output from exactly the same number of return values from rand.
By definition, rand returns int values from 0 to RAND_MAX. So there are 1+RAND_MAX possible values it can return. If 1+RAND_MAX is not a multiple of 6, then it is impossible to partition it into 6 exactly equal intervals of integers. So the code partitions it into 6 equal intervals that are as big as possible and one odd-size fragment interval. Then the results of rand are mapped into these intervals: The first six intervals correspond to results from 1 to 6, and the last interval is rejected, and the code tries again.
When we divide 1+RAND_MAX by 6, there is some quotient q and some remainder r. Now consider the result of rand() / q:
When rand produces a number in [0, q−1], rand() / q will be 0.
When rand produces a number in [q, 2q−1], rand() / q will be 1.
When rand produces a number in [2q, 3q−1], rand() / q will be 2.
When rand produces a number in [3q, 4q−1], rand() / q will be 3.
When rand produces a number in [4q, 5q−1], rand() / q will be 4.
When rand produces a number in [5q, 6q−1], rand() / q will be 5.
When rand produces a number that is 6q or greater, rand() / q will be 6.
Observe that in each of the first six intervals, there are exactly q numbers. In the seventh interval, the possible return values are in [6q, RAND_MAX]. That interval contains r numbers.
This code works by rejecting that last interval:
int x = 7;
while(x > 6)
x = 1 + rand()/((RAND_MAX + 1u)/6);
Whenever rand produces a number in that last fragmentary interval, this code rejects it and tries again. When rand produces a number in one of the whole intervals, this code accepts it and exits (after adding 1 so the results in x are 1 to 6 instead of 0 to 5).
Thus, every output from 1 to 6, inclusive, is mapped to from an exactly equal number of rand values.
This is the best way to produce a uniform distribution from rand in the sense that it has the fewest rejections, given we are using a scheme like this.1 The range of rand has been split into six intervals that are as big as possible. The remaining fragmentary interval cannot be used because the remainder r is less than six, so the r unused values cannot be split evenly over the six desired values for x.
Footnote
1 This is not necessarily the best way to use rand to generate random numbers in [1, 6] overall. For example, from a single rand call with RAND_MAX equal to 32767, we could view the value as a base-six numeral from 000000 to 411411. If it is under 400000, we can take the last five digits, which are each uniformly distributed in [0, 5], and adding one gts us the desired [1, 6]. If it is in [400000, 410000), we can use the last four digits. If it is in [410000, 411000), we can use the last three, and so on. Additionally, the otherwise discarded information, such as the leading digit, might be pooled over multiple rand calls to increase the average number of outputs we get per call to rand.
Related
This question already has answers here:
srand() — why call it only once?
(7 answers)
Closed 4 years ago.
I'm trying to generate 10,000 random numbers in a row in C and am having trouble getting random or even randomish results using the pseudo RNG. I used modulus in a way that I think should create uniformity, which it does, but the results are equivalent to 0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3 etc. when run in a loop in another function calling RNG(4).
int RNG(int n) {
int range = RAND_MAX - (RAND_MAX % n);
srand(time(NULL));
int x = rand();
while (x > range) {
x = rand();
}
return x % n;
}
Any way to get it closer to 1,3,2,0,2,3,1,0,0,3,2,0,1 etc. would be appreciated!
Thank you!
EDIT: Thanks for the responses everyone! Moved the seeding to the start of the function calling RNG and everything is dandy now!
Do not call srand every time you want to generate a number. srand initializes the pseudo-random number generator and is intended to be called just once at the start of your program, or when you want to reset the generator. By resetting it every time, you are forcing rand to generate the same numbers every time you call it within each second on the clock.
Do not use x % n to reduce the number to a desired range. Old implementations of rand are notoriously bad and have patterns in the low bits. Instead, use x / ((RAND_MAX+1u) / n).
The code int range = RAND_MAX - (RAND_MAX % n); is flawed. Suppose n is 4 and RAND_MAX is 7, meaning rand returns 0 to 7. This code sets range to 4, and then while (x > range) x = rand(); discards 5, 6, and 7, while it retains 4. There are two bugs here: The code keeps the five values 0, 1, 2, 3, and 4, which is a mismatch to (not a multiple of) the desired range of 4, and it unnecessarily discards values. If we had kept 4, 5, 6, and 7, we would have a match. You could use:
unsigned range = (RAND_MAX + 1u) - ((RAND_MAX + 1u) % n);
and:
while (x >= range) x = rand();
If you are using C++, switch to using std::uniform_int_distribution. If you are using C, check the quality of rand in your implementation or switch to another generator such as the POSIX srandom and random.
As noted elsewhere, the fix to the repeated numbers is to move the call to srand(time(NULL)) outside this function and call it only once per program at the beginning.
As for why you're getting repeated numbers: The function is being called several times per second. Each time the function executes in a given second, time(NULL) returns the same number, which this code uses to seed the random number generator.
The sequence of random numbers generated from a particular seed will always be the same. This code takes the first number from that sequence, which is always the same for one second, until time(NULL) returns a new value.
I have the following function that generates a random number between -10 and 2:
#include <stdlib.h>
#include <time.h>
static int
getRandomReturnCode(void)
{
int N = 2,
M = -10;
srand(time(NULL));
r = M + rand() / (RAND_MAX / (N - M + 1) + 1);
if (r == 0){
getRandomReturnCode();
}
return r;
}
Currently, if a return code of 0 (success) is returned, it will recursively call the function over until a non-zero return code is met and returned. What can I do to improve my code such that 0 is excluded from the range of randomly chosen numbers?
Whatever you do, don't redraw if a returned value is 0: that will introduce statistical bias into the result.
The best thing to do here is to draw between -10 and 1 inclusive and add 1 to any non-negative output.
Finally, call srand once else you'll ruin the generator's statistical properties.
(Briefly - since this comment is more for the mathematics site - and restricting the argument to a linear congruential generator, what tends to happen if you omit generated values is that you increase the variance of the resulting distribution so it's no longer uniform. A previously drawn small number will be linearly related to the subsequent drawing as the generator's modulus will have no effect. This autocorrelation - necessary for the resulting distribution to be uniform - of adjacent drawings will be curtailed if drawings are discarded and that increases the sample variance.)
my problem this time is not using a line but understanding it,
i received this line from my teacher to randomize a number between the MIN and MAX values, and it works perfectly, but i have tried to understand How exactly and i just couldn't.
I would be happy if anyone could explain it to me step by step (please not i'm not 100% sure how the rand() function works)
Thanks!
int number = (rand() % (DICE_MAX - DICE_MIN +1)) + DICE_MIN; // Randomizing a value between 'DICE_MAX' and 'DICE_MIN' which can be defined on the head of this program.
The function rand() generates a random (well, pseudo-random to be precise) number. The int returned from it has a large range, so you need to scale it to necessary range.
Assuming DICE_MIN to be 1 and DICE_MAX to be 6, you need to generate random integers in the range [1, 6]. There are 6 numbers in the range, and DICE_MAX - DICE_MIN + 1 = 6. So whatever integer you get from rand() the value of rand() % (DICE_MAX - DICE_MIN + 1) will be in the range [0, 5]. Adding the minimum of the required range DICE_MIN to it shifts the range to [1, 6].
This is a very widely practiced technique for generating random numbers in a given range.
rand:
Function: Random number generator.
Include: stdlib.h
syntax: int rand(void);
Return Value: The function rand returns the generated pseudo random number.
Description: The rand function generates an integer between 0 and RAND_MAX (a symbolic constant defined in stdlib.h). standard C states that the value of RAND_MAX must be at least 32767. If rand truly produces integers at random, every number between 0 and RAND_MAX has an equal probability of being chosen each time rand is called.
How it works?
Take an example of rolling a dice (six sided). The remainder operator % is used here in conjugation with rand as :
rand % 6;
to produce integers in the range 0 to 5. This is called scaling. The number 6 is called scaling factor. But, we need to generate number from 1 to 6. Now we shift the range of numbers produced by adding 1 to our result (1 + rand%6).
In general
n = a + rand() % b;
where a is the shifting value (which is equal to the first number in the desired range of consecutive integers, i.e, to lower bound) and b is equal to the width of the desired range of consecutive integers.
In the provided snippet of your's
int number = (rand() % (DICE_MAX - DICE_MIN +1)) + DICE_MIN;
DICE_MAX - DICE_MIN +1 is desired width and DICE_MIN is the shifting value.
Further reading: Using rand().
Consider an algorithm to test the probability that a certain number is picked from a set of N unique numbers after a specific number of tries (for example, with N=2, what's the probability in Roulette (without 0) that it takes X tries for Black to win?).
The correct distribution for this is pow(1-1/N,X-1)*(1/N).
However, when I test this using the following code, there is always a deep ditch at X=31, independently from N, and independently from the seed.
Is this an intrinsic flaw that cannot be prevented due to the implementation specifics of the PRNG in use, is this a real bug, or am I overlooking something obvious?
// C
#include <sys/times.h>
#include <math.h>
#include <stdio.h>
int array[101];
void main(){
int nsamples=10000000;
double breakVal,diffVal;
int i,cnt;
// seed, but doesn't change anything
struct tms time;
srandom(times(&time));
// sample
for(i=0;i<nsamples;i++){
cnt=1;
do{
if((random()%36)==0) // break if 0 is chosen
break;
cnt++;
}while(cnt<100);
array[cnt]++;
}
// show distribution
for(i=1;i<100;i++){
breakVal=array[i]/(double)nsamples; // normalize
diffVal=breakVal-pow(1-1/36.,i-1)*1/36.; // difference to expected value
printf("%d %.12g %.12g\n",i,breakVal,diffVal);
}
}
Tested on an up-to-date Xubuntu 12.10 with libc6 package 2.15-0ubuntu20 and Intel Core i5-2500 SandyBridge, but I discovered this already a few years ago on an older Ubuntu machine.
I also tested this on Windows 7 using Unity3D/Mono (not sure which Mono version, though), and here the ditch happens at X=55 when using System.Random, while Unity's builtin Unity.Random has no visible ditch (at least not for X<100).
The distribution:
The differences:
This is due to glibc's random() function not being random enough. According to this page, for the random numbers returned by random(), we have:
oi = (oi-3 + oi-31) % 2^31
or:
oi = (oi-3 + oi-31 + 1) % 2^31.
Now take xi = oi % 36, and suppose the first equation above is the one used (this happens with a 50% chance for each number). Now if xi-31=0 and xi-3!=0, then the chance that xi=0 is less than 1/36. This is because 50% of the time oi-31 + oi-3 will be less than 2^31, and when that happens,
xi = oi % 36 = (oi-3 + oi-31) % 36 = oi-3 % 36 = xi-3,
which is nonzero. This causes the ditch you see 31 samples after a 0 sample.
What's being measured in this experiment is the interval between successful trials of a Bernoulli experiment, where success is defined as random() mod k == 0 for some k (36 in the OP). Unfortunately, it is marred by the fact that the implementation of random() means that the Bernoulli trials are not statistically independent.
We'll write rndi for the ith output of `random()' and we note that:
rndi = rndi-31 + rndi-3 with probability 0.75
rndi = rndi-31 + rndi-3 + 1 with probability 0.25
(See below for a proof outline.)
Let's suppose rndi-31 mod k == 0 and we're currently looking at rndi. Then it must be the case that rndi-3 mod k ≠ 0, because otherwise we would have counted the cycle as being length k-3.
But (most of the time) (mod k): rndi = rndi-31 + rndi-3 = rndi-3 ≠ 0.
So the current trial is not statistically independent of the previous trials, and the 31st trial after a success is much less likely to succeed than it would in an unbiased series of Bernoulli trials.
The usual advice in using linear-congruential generators, which doesn't actually apply to the random() algorithm, is to use the high-order bits instead of the low-order bits, because high-order bits are "more random" (that is, less correlated with successive values). But that won't work in this case either, because the above identities hold equally well for the function high log k bits as for the function mod k == low log k bits.
In fact, we might expect a linear-congruential generator to work better, particularly if we use the high-order bits of the output, because although the LCG is not particularly good at Monte Carlo simulations, it does not suffer from the linear feedback of random().
random algorithm, for the default case:
Let state be a vector of unsigned longs. Initialize state0...state30 using a seed, some fixed values, and a mixing algorithm. For simplicity, we can consider the state vector to be infinite, although only the last 31 values are used so it's actually implemented as a ring buffer.
To generate rndi: (Note: ⊕ is addition mod 232.)
statei = statei-31 ⊕ statei-3
rndi = (statei - (statei mod 2)) / 2
Now, note that:
(i + j) mod 2 = i mod 2 + j mod 2 if i mod 2 == 0 or j mod 2 == 0
(i + j) mod 2 = i mod 2 + j mod 2 - 2 if i mod 2 == 1 and j mod 2 == 1
If i and j are uniformly distributed, the first case will occur 75% of the time, and the second case 25%.
So, by substitution in the generation formula:
rndi = (statei-31 ⊕ statei-3 - ((statei-31 + statei-3) mod 2)) / 2
= ((statei-31 - (statei-31 mod 2)) ⊕ (statei-3 - (statei-3 mod 2))) / 2 or
= ((statei-31 - (statei-31 mod 2)) ⊕ (statei-3 - (statei-3 mod 2)) + 2) / 2
The two cases can be further reduced to:
rndi = rndi-31 ⊕ rndi-3
rndi = rndi-31 ⊕ rndi-3 + 1
As above, the first case occurs 75% of the time, assuming that rndi-31 and rndi-3 are independently drawn from a uniform distribution (which they're not, but it's a reasonable first approximation).
As others pointed out, random() is not random enough.
Using the higher bits instead of the lower ones does not help in this case. According to the manual (man 3 rand), old implementations of rand() had a problem in the lower bits. That's why random() is recommended instead. Though, the current implementation of rand() uses the same generator as random().
I tried the recommended correct use of the old rand():
if ((int)(rand()/(RAND_MAX+1.0)*36)==0)
...and got the same deep ditch at X=31
Interstingly, if I mix rand()'s numbers with another sequence, I get rid of the ditch:
unsigned x=0;
//...
x = (179*x + 79) % 997;
if(((rand()+x)%36)==0)
I am using an old Linear Congruential Generator. I chose 79, 179 and 997 at random from a primes table. This should generate a repeating sequence of length 997.
That said, this trick probably introduced some non-randomness, some footprint... The resulting mixed sequence will surely fail other statistical tests. x never takes the same value in consecutive iterations. Indeed, it takes exactly 997 iterations to repeat every value.
''[..] random numbers should not be generated with a method chosen at random. Some theory should be used." (D.E.Knuth, "The Art of Computer Programming", vol.2)
For simulations, if you want to be sure, use the Mersenne Twister
What i would love to do is to create a function that takes a parameter that is the limit of which number the random generation should create. I have experienced that some generators that just repeat the number generated over and over again.
How can I make a generator that doesn't return the same number consecutively. Can someone please help me to achieve my goal?
int randomGen(int max)
{
int n;
return n;
}
The simplest way to get uniformly distributed results from rand is something like this:
int limited_rand(int limit)
{
int r, d = RAND_MAX / limit;
limit *= d;
do { r = rand(); } while (r >= limit);
return r / d;
}
The result will be in the range 0 to limit-1, and each will occur with equal probability as long as the values 0 through RAND_MAX all had equal probability with the original rand function.
Other methods such as modular arithmetic or dividing without the loop I used introduce bias. Methods that go through floating point intermediates do not avoid this problem. Getting good random floating point numbers from rand is at least as difficult. Using my function for integers (or an improvement of it) is a good place to start if you want random floats.
Edit: Here's an explanation of what I mean by bias. Suppose RAND_MAX is 7 and limit is 5. Suppose (if this is a good rand function) that the outputs 0, 1, 2, ..., 7 are all equally likely. Taking rand()%5 would map 0, 1, 2, 3, and 4 to themselves, but map 5, 6, and 7 to 0, 1, and 2. This means the values 0, 1, and 2 are twice as likely to pop up as the values 3 and 4. A similar phenomenon happens if you try to rescale and divide, for instance using rand()*(double)limit/(RAND_MAX+1) Here, 0 and 1 map to 0, 2 and 3 map to 1, 4 maps to 2, 5 and 6 map to 3, and 7 maps to 4.
These effects are somewhat mitigated by the magnitude of RAND_MAX, but they can come back if limit is large. By the way, as others have said, with linear congruence PRNGs (the typical implementation of rand), the low bits tend to behave very badly, so using modular arithmetic when limit is a power of 2 may avoid the bias problem I described (since limit usually divides RAND_MAX+1 evenly in this case), but you run into a different problem in its place.
How about this:
int randomGen(int limit)
{
return rand() % limit;
}
/* ... */
int main()
{
srand(time(NULL));
printf("%d", randomGen(2041));
return 0;
}
Any pseudo-random generator will repeat the values over and over again with some period. C only has rand(), if you use that you should definitively initialize the random seed with srand(). But probably your platform has better than that.
On POSIX systems there is a whole family of functions that you should find under the man drand48 page. They have a well defined period and quality. You probably find what you need, there.
Without explicit knowledge of the random generator of your platform, do not do rand() % max. The low-order bytes of simple random number generators are usually not random at all.
Use instead (returns a number between min inclusive and max non-inclusive):
int randomIntegerInRange(int min, int max)
{
double tmp = (double)rand() / (RAND_MAX - 1.);
return min + (int)floor(tmp * (max - min));
}
Update: The solution above is biased (see comments for explanation), and will likely not produce uniform results. I do not delete it since it is a non natural example of what not to do. Please use rejection methods as recommended elsewhere in this thread.