rand() behaves differently between macOS and Linux - c

I'm trying to generate a random-number sequence with rand().
I have something like this:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int Random(int min, int max)
{
/* returns a random integer in [min, max] */
double uniform; // random variable from uniform distribution of [0, 1]
int ret; // return value
srand((unsigned int)clock());
uniform = rand() / (double)RAND_MAX;
ret = (int)(uniform * (double)(max - min)) + min;
return ret;
}
int main(void)
{
for(int i=0; i<10; i++)
printf("%d ", Random(0, 100));
printf("\n");
return 0;
}
It made different results when executed on macOS v10.14 (Mojave) and Ubuntu 18.04 (Bionic Beaver).
It works on Ubuntu:
76 42 13 49 85 7 43 28 15 1
But not on macOS:
1 1 1 1 1 1 1 1 1 1
Why doesn't it work well on macOS? Is there something different in random number generators?

I'm a Mac user. To generate random numbers I initialise the seed like this:
srand(time(NULL));
Plus, try initialising it in your main.

If reproducible "random" numbers are something you care about, you should avoid the rand function. The C standard doesn't specify exactly what the sequence produced by rand is, even if the seed is given via srand. Notably:
rand uses an unspecified random number algorithm, and that algorithm can differ between C implementations, including versions of the same standard library.
rand returns values no greater than RAND_MAX, and RAND_MAX can differ between C implementations.
Instead, you should use an implementation of a pseudorandom number generator with a known algorithm, and you should also rely on your own way to transform pseudorandom numbers from that algorithm into the numbers you desire. (For many ways to do so, see my page on sampling algorithms. Note that there are other things to consider when reproducibility is important.)
See also the following:
Does Python have a function to mimic the sequence of C's rand()?
Why is the use of rand() considered bad?
How predictable is the result of rand() between individual systems?

rand is obsolete in Mac. Use random() instead.

Related

rand function is giving me the same result at each run even when I called srand(time(NULL)) [duplicate]

This question already has answers here:
Rand() % 14 only generates the values 6 or 13
(3 answers)
Closed 1 year ago.
I have a problem, I want to use rand() to get a random number between 0 and 6, but it always gives me 4 at each run, even when I call srand(time(NULL))
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int main(void)
{
srand(time(NULL));
int rd = rand() % 7;
printf("%d\n", rd);
return (0);
}
output is 4 at each run
There are two fundamental problems with your code which, in combination, produce the curious result you're experiencing.
Almost anyone will warn you about the use of the rand() interface. Indeed, the Mac OS manpage itself starts with a warning:
$ man rand
NAME
rand, srand, sranddev, rand_r -- bad random number generator
Yep, it's a bad random number generator. Bad random number generators can be hard to seed, among other problems.
But speaking of seeding, here's another issue, perhaps less discussed but nonetheless important:
Do not use time(NULL) to seed your random number generator.
The linked answer goes into more detail about this, but the basic issue is simple: the value of time(NULL) changes infrequently (if frequently is measured in nanoseconds), and doesn't change much when it changes. So not only are you relying on the program to not be run very often (or at least less than once per second), you're also depending on the random number generator to produce radically different values from slightly different seeds. Perhaps a good random number generator would do that, but we've already established that rand() is a bad random number generator.
OK, that's all very general. The specific problem is somewhat interesting, at least for academic purposes (academic, since the practicial solution is always "use a better random number generator and seed it with a good random seed"). The precise problem here is that you're using rand() % 7.
That's a problem because what the Mac OS / FreeBSD implementation of rand() does is to multiply the seed by a multiple of 7. Because that product is reduced modulo 232 (which is not a multiple of 7), the value modulo 7 of the first random number produced by slowly incrementing seeds will eventually change, but it will have to wait until the amount of the overflow changes.
Here's a link to the code. The essence is in these three lines:
hi = *ctx / 127773;
lo = *ctx % 127773;
x = 16807 * lo - 2836 * hi;
which, according to a comment, "compute[s] x = (7^5 * x) mod (2^31 - 1) without overflowing 31 bits." x is the value which will eventually be returned (modulo 232) and it is also the next seed. *ctx is the current seed.
16807 is, as the comment says, 75, which is obviously divisible by 7. And 2836 mod 7 is 1. So by the rules of modular arithmetic:
x mod 7 = (16807 * lo) mod 7 - (2836 * hi) mod 7
= 0 - hi mod 7
That value only depends on hi, which is seed / 127773. So hi changes exactly once every 127773 ticks. Since the result of time(NULL) is in seconds, that's one change in 127773 seconds, which is about a day and a half. So if you ran your program once a day, you'd notice that the first random number is sometimes the same as the previous day and sometimes one less. But you're running it quite a bit more often than that, even if you wait a few seconds between runs, so you just see the same first random number every time. Eventually it will tick down and then you'll see a series of 3s instead of 4s.
As mentioned by #rici, the problem is caused by the poor implementation of rand(). The man page for srand() recommends using arc4random() instead. Alternatively, you could try seeding with a value taken directly from /dev/urandom as follows:
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
int seed;
FILE *f = fopen("/dev/urandom", "r");
fread(&seed, sizeof(int), 1, f);
srand(seed);
fclose(f);
/* Should be a lot more unpredictable: */
printf("%d\n", rand() % 7);
return (0);
}

Why does rand() repeat numbers far more often on Linux than Mac?

I was implementing a hashmap in C as part of a project I'm working on and using random inserts to test it. I noticed that rand() on Linux seems to repeat numbers far more often than on Mac. RAND_MAX is 2147483647/0x7FFFFFFF on both platforms. I've reduced it to this test program that makes a byte array RAND_MAX+1-long, generates RAND_MAX random numbers, notes if each is a duplicate, and checks it off the list as seen.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
int main() {
size_t size = ((size_t)RAND_MAX) + 1;
char *randoms = calloc(size, sizeof(char));
int dups = 0;
srand(time(0));
for (int i = 0; i < RAND_MAX; i++) {
int r = rand();
if (randoms[r]) {
// printf("duplicate at %d\n", r);
dups++;
}
randoms[r] = 1;
}
printf("duplicates: %d\n", dups);
}
Linux consistently generates around 790 million duplicates. Mac consistently only generates one, so it loops through every random number that it can generate almost without repeating. Can anyone please explain to me how this works? I can't tell anything different from the man pages, can't tell which RNG each is using, and can't find anything online. Thanks!
While at first it may sound like the macOS rand() is somehow better for not repeating any numbers, one should note that with this amount of numbers generated it is expected to see plenty of duplicates (in fact, around 790 million, or (231-1)/e). Likewise iterating through the numbers in sequence would also produce no duplicates, but wouldn't be considered very random. So the Linux rand() implementation is in this test indistinguishable from a true random source, whereas the macOS rand() is not.
Another thing that appears surprising at first glance is how the macOS rand() can manage to avoid duplicates so well. Looking at its source code, we find the implementation to be as follows:
/*
* Compute x = (7^5 * x) mod (2^31 - 1)
* without overflowing 31 bits:
* (2^31 - 1) = 127773 * (7^5) + 2836
* From "Random number generators: good ones are hard to find",
* Park and Miller, Communications of the ACM, vol. 31, no. 10,
* October 1988, p. 1195.
*/
long hi, lo, x;
/* Can't be initialized with 0, so use another value. */
if (*ctx == 0)
*ctx = 123459876;
hi = *ctx / 127773;
lo = *ctx % 127773;
x = 16807 * lo - 2836 * hi;
if (x < 0)
x += 0x7fffffff;
return ((*ctx = x) % ((unsigned long) RAND_MAX + 1));
This does indeed result in all numbers between 1 and RAND_MAX, inclusive, exactly once, before the sequence repeats again. Since the next state is based on multiplication, the state can never be zero (or all future states would also be zero). Thus the repeated number you see is the first one, and zero is the one that is never returned.
Apple has been promoting the use of better random number generators in their documentation and examples for at least as long as macOS (or OS X) has existed, so the quality of rand() is probably not deemed important, and they've just stuck with one of the simplest pseudorandom generators available. (As you noted, their rand() is even commented with a recommendation to use arc4random() instead.)
On a related note, the simplest pseudorandom number generator I could find that produces decent results in this (and many other) tests for randomness is xorshift*:
uint64_t x = *ctx;
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
*ctx = x;
return (x * 0x2545F4914F6CDD1DUL) >> 33;
This implementation results in almost exactly 790 million duplicates in your test.
MacOS provides an undocumented rand() function in stdlib. If you leave it unseeded, then the first values it outputs are 16807, 282475249, 1622650073, 984943658 and 1144108930. A quick search will show that this sequence corresponds to a very basic LCG random number generator that iterates the following formula:
xn+1 = 75 · xn (mod 231 − 1)
Since the state of this RNG is described entirely by the value of a single 32-bit integer, its period is not very long. To be precise, it repeats itself every 231 − 2 iterations, outputting every value from 1 to 231 − 2.
I don't think there's a standard implementation of rand() for all versions of Linux, but there is a glibc rand() function that is often used. Instead of a single 32-bit state variable, this uses a pool of over 1000 bits, which to all intents and purposes will never produce a fully repeating sequence. Again, you can probably find out what version you have by printing the first few outputs from this RNG without seeding it first. (The glibc rand() function produces the numbers 1804289383, 846930886, 1681692777, 1714636915 and 1957747793.)
So the reason you're getting more collisions in Linux (and hardly any in MacOS) is that the Linux version of rand() is basically more random.
rand() is defined by the C standard, and the C standard does not specify which algorithm to use. Obviously, Apple is using an inferior algorithm to your GNU/Linux implementation: The Linux one is indistinguishable from a true random source in your test, while the Apple implementation just shuffles the numbers around.
If you want random numbers of any quality, either use a better PRNG that gives at least some guarantees on the quality of the numbers it returns, or simply read from /dev/urandom or similar. The later gives you cryptographic quality numbers, but is slow. Even if it is too slow by itself, /dev/urandom can provide some excellent seeds to some other, faster PRNG.
In general, the rand/srand pair has been considered sort of deprecated for a long time due to low-order bits displaying less randomness than high-order bits in the results. This may or may not have anything to do with your results, but I think this is still a good opportunity to remember that even though some rand/srand implementations are now more up to date, older implementations persist and it's better to use random(3). On my Arch Linux box, the following note is still in the man page for rand(3):
The versions of rand() and srand() in the Linux C Library use the same
random number generator as random(3) and srandom(3), so the lower-order
bits should be as random as the higher-order bits. However, on older
rand() implementations, and on current implementations on different
systems, the lower-order bits are much less random than the higher-or-
der bits. Do not use this function in applications intended to be por-
table when good randomness is needed. (Use random(3) instead.)
Just below that, the man page actually gives very short, very simple example implementations of rand and srand that are about the simplest LC RNGs you've ever seen and having a small RAND_MAX. I don't think they match what's in the C standard library, if they ever did. Or at least I hope not.
In general, if you're going to use something from the standard library, use random if you can (the man page lists it as POSIX standard back to POSIX.1-2001, but rand is standard way back before C was even standardized). Or better yet, crack open Numerical Recipes (or look for it online) or Knuth and implement one. They're really easy and you only really need to do it once to have a general purpose RNG with the attributes you most often need and which is of known quality.

Generation of Random Binary Numbers in C

I need to randomly generate bits but the number of bits should be in a definite ratio.
Say I want to generate a 100 bits.
So if the ratio is 3:2
It has to generate 60 0s and 40 1s.
How will I be able to achieve this in C?
Suppose you have want a 1 with probability p where p is double in the inclusive range [0.0,1.0].
Then you can use this logic of rand_bit() below.
#include <stdio.h>
#include <stdlib.h>
int rand_bit(double p){
if(p==1.0){//Unusual but OK exact comparison of double.
return 1;
}
double r=((double)rand())/((double)RAND_MAX);
return r<p?1:0;
}
//Demonstration...
int main(void) {
srand(78721);//Demonstration is reproducible....
const int test=1000000;
int count=0;
double p=0.6;//60% 1s.
for(int i=1;i<=test;++i){
count+=rand_bit(p);
}
double prop=((double)count)/((double)test);
printf("%f (error=%f)\n",prop,(prop-p));
return 0;
}
Typical output:
0.600500 (error=0.000500)
Remember to seed the random number generator with srand(). Pass in a fixed value as above to get a reproducible result or srand((int)time(NULL)); to get different results run-to-run.
Also note that the built in random number generators in C are generally not great.
They're usually fine for games, and OK for generating test cases for business applications but not usually fit for scientific simulations and cryptographically worthless.
The condition if(p==1.0) is there so we can be sure that p==1.0 returns 1 always. p==0.0 is assured by r<p.

Does "n * (rand() / RAND_MAX)" make a skewed random number distribution?

I'd like to find an unskewed way of getting random numbers in C (although at most I'm going to be using it for values of 0-20, and more likely only 0-8). I've seen this formula but after running some tests I'm not sure if it's skewed or not. Any help?
Here is the full function used:
int randNum()
{
return 1 + (int) (10.0 * (rand() / (RAND_MAX + 1.0)));
}
I seeded it using:
unsigned int iseed = (unsigned int)time(NULL);
srand (iseed);
The one suggested below refuses to work for me I tried
int greek;
for (j=0; j<50000; j++)
{
greek =rand_lim(5);
printf("%d, " greek);
greek =(int) (NUM * (rand() / (RAND_MAX + 1.0)));
int togo=number[greek];
number[greek]=togo+1;
}
and it stops working and gives me the same number 50000 times when I comment out printf.
Yes, it's skewed, unless your RAND_MAX happens to be a multiple of 10.
If you take the numbers from 0 to RAND_MAX, and try to divide them into 10 piles, you really have only three possibilities:
RAND_MAX is a multiple of 10, and the piles come out even.
RAND_MAX is not a multiple of 10, and the piles come out uneven.
You split it into uneven groups to start with, but throw away all the "extras" that would make it uneven.
You rarely have control over RAND_MAX, and it's often a prime number anyway. That really only leaves 2 and 3 as possibilities.
The third option looks roughly like this:
[Edit: After some thought, I've revised this to produce numbers in the range 0...(limit-1), to fit with the way most things in C and C++ work. This also simplifies the code (a tiny bit).
int rand_lim(int limit) {
/* return a random number in the range [0..limit)
*/
int divisor = RAND_MAX/limit;
int retval;
do {
retval = rand() / divisor;
} while (retval == limit);
return retval;
}
For anybody who questions whether this method might leave some skew, I also wrote a rather different version, purely for testing. This one uses a decidedly non-random generator with a very limited range, so we can simply iterate through every number in the range. It looks like this:
#include <stdlib.h>
#include <stdio.h>
#define MAX 1009
int next_val() {
// just return consecutive numbers
static int v=0;
return v++;
}
int lim(int limit) {
int divisor = MAX/limit;
int retval;
do {
retval = next_val() / divisor;
} while (retval == limit);
return retval;
}
#define LIMIT 10
int main() {
// we'll allocate extra space at the end of the array:
int buckets[LIMIT+2] = {0};
int i;
for (i=0; i<MAX; i++)
++buckets[lim(LIMIT)];
// and print one beyond what *should* be generated
for (i=0; i<LIMIT+1; i++)
printf("%2d: %d\n", i, buckets[i]);
}
So, we're starting with numbers from 0 to 1009 (1009 is prime, so it won't be an exact multiple of any range we choose). So, we're starting with 1009 numbers, and splitting it into 10 buckets. That should give 100 in each bucket, and the 9 leftovers (so to speak) get "eaten" by the do/while loop. As it's written right now, it allocates and prints out an extra bucket. When I run it, I get exactly 100 in each of buckets 0..9, and 0 in bucket 10. If I comment out the do/while loop, I see 100 in each of 0..9, and 9 in bucket 10.
Just to be sure, I've re-run the test with various other numbers for both the range produced (mostly used prime numbers), and the number of buckets. So far, I haven't been able to get it to produce skewed results for any range (as long as the do/while loop is enabled, of course).
One other detail: there is a reason I used division instead of remainder in this algorithm. With a good (or even decent) implementation of rand() it's irrelevant, but when you clamp numbers to a range using division, you keep the upper bits of the input. When you do it with remainder, you keep the lower bits of the input. As it happens, with a typical linear congruential pseudo-random number generator, the lower bits tend to be less random than the upper bits. A reasonable implementation will throw out a number of the least significant bits already, rendering this irrelevant. On the other hand, there are some pretty poor implementations of rand around, and with most of them, you end up with better quality of output by using division rather than remainder.
I should also point out that there are generators that do roughly the opposite -- the lower bits are more random than the upper bits. At least in my experience, these are quite uncommon. That with which the upper bits are more random are considerably more common.

create a random number less than a max given value

What i would love to do is to create a function that takes a parameter that is the limit of which number the random generation should create. I have experienced that some generators that just repeat the number generated over and over again.
How can I make a generator that doesn't return the same number consecutively. Can someone please help me to achieve my goal?
int randomGen(int max)
{
int n;
return n;
}
The simplest way to get uniformly distributed results from rand is something like this:
int limited_rand(int limit)
{
int r, d = RAND_MAX / limit;
limit *= d;
do { r = rand(); } while (r >= limit);
return r / d;
}
The result will be in the range 0 to limit-1, and each will occur with equal probability as long as the values 0 through RAND_MAX all had equal probability with the original rand function.
Other methods such as modular arithmetic or dividing without the loop I used introduce bias. Methods that go through floating point intermediates do not avoid this problem. Getting good random floating point numbers from rand is at least as difficult. Using my function for integers (or an improvement of it) is a good place to start if you want random floats.
Edit: Here's an explanation of what I mean by bias. Suppose RAND_MAX is 7 and limit is 5. Suppose (if this is a good rand function) that the outputs 0, 1, 2, ..., 7 are all equally likely. Taking rand()%5 would map 0, 1, 2, 3, and 4 to themselves, but map 5, 6, and 7 to 0, 1, and 2. This means the values 0, 1, and 2 are twice as likely to pop up as the values 3 and 4. A similar phenomenon happens if you try to rescale and divide, for instance using rand()*(double)limit/(RAND_MAX+1) Here, 0 and 1 map to 0, 2 and 3 map to 1, 4 maps to 2, 5 and 6 map to 3, and 7 maps to 4.
These effects are somewhat mitigated by the magnitude of RAND_MAX, but they can come back if limit is large. By the way, as others have said, with linear congruence PRNGs (the typical implementation of rand), the low bits tend to behave very badly, so using modular arithmetic when limit is a power of 2 may avoid the bias problem I described (since limit usually divides RAND_MAX+1 evenly in this case), but you run into a different problem in its place.
How about this:
int randomGen(int limit)
{
return rand() % limit;
}
/* ... */
int main()
{
srand(time(NULL));
printf("%d", randomGen(2041));
return 0;
}
Any pseudo-random generator will repeat the values over and over again with some period. C only has rand(), if you use that you should definitively initialize the random seed with srand(). But probably your platform has better than that.
On POSIX systems there is a whole family of functions that you should find under the man drand48 page. They have a well defined period and quality. You probably find what you need, there.
Without explicit knowledge of the random generator of your platform, do not do rand() % max. The low-order bytes of simple random number generators are usually not random at all.
Use instead (returns a number between min inclusive and max non-inclusive):
int randomIntegerInRange(int min, int max)
{
double tmp = (double)rand() / (RAND_MAX - 1.);
return min + (int)floor(tmp * (max - min));
}
Update: The solution above is biased (see comments for explanation), and will likely not produce uniform results. I do not delete it since it is a non natural example of what not to do. Please use rejection methods as recommended elsewhere in this thread.

Resources