C: Random Number Generation - What (If Anything) Is Wrong With This

C: Random Number Generation - What (If Anything) Is Wrong With This - c

For a simple simulation in C, I need to generate exponential random variables. I remember reading somewhere (but I can't find it now, and I don't remember why) that using the rand() function to generate random integers in a fixed range would generate non-uniformly distributed integers. Because of this, I'm wondering if this code might have a similar problem:
//generate u ~ U[0,1]
u = ( (double)rand() / ((double)(RAND_MAX));
//inverse of exponential CDF to get exponential random variable
expon = -log(1-u) * mean;
Thank you!

The problem with random numbers in a fixed range is that a lot of people do this for numbers between 100 and 200 for example:
100 + rand() % 100
That is not uniform. But by doing this it is (or is close enough to uniform at least):
u = 100 + 100 * ((double)rand() / ((double)(RAND_MAX));
Since that's what you're doing, you should be safe.

In theory, at least, rand() should give you a discrete uniform distribution from 0 to RAND_MAX... in practice, it has some undesirable properties, such as a small period, so whether it's useful depends on how you're using it.

RAND_MAX is usually 32k, while the LCG rand() uses generates pseudorandom 32 bit numbers. Thus, the lack of uniformity, as well as low periodicity, will generally go unnoticed.
If you require high quality pseudorandom numbers, you could try George Marsaglia's CMWC4096 (Complementary Multiply With Carry). This is probably the best pseudorandom number generator around, with extreme periodicity and uniform distribution (you just have to pick good seeds for it). Plus, it's blazing fast (not as fast as a LCG, but approximately twice as fast as a Mersenne Twister.

Yes and no. The problem you're thinking of arises when you're clamping the output from rand() into a range that's smaller than RAND_MAX (i.e. there are fewer possible outputs than inputs).
In your case, you're (normally) reversing that: you're taking a fairly small number of bits produced by the random number generator, and spreading them among what will usually be a larger number of bits in the mantissa of your double. That means there are normally some bit patterns in the double (and therefore, specific values of the double) that can never occur. For most people's uses that's not a problem though.
As far as the "normally" goes, it's always possible that you have a 64-bit random number generator, where a double typically has a 53-bit mantissa. In this case, you could have the same kind of problem as with clamping the range with integers.

No, your algorithm will work; it's using the modulus function that does things imperfectly.
The one problem is that because it's quantized, once in a while it will generate exactly RAND_MAX and you'll be asking for log(1-1). I'd recommend at least (rand() + 0.5)/(RAND_MAX+1), if not a better source like drand48().
There are much faster ways to compute the necessary numbers, e.g. the Ziggurat algorithm.

Related

How to generate either 0 or 1 randomly in C

I have read so many posts on this topic:
How does rand() work? Does it have certain tendencies? Is there something better to use?
How does the random number generator work in C
and this is what I got:
1) xn+1 depends on xn i.e., previous random number that is generated.
2) It is not recommended to initialize the seed more than once in the program.
3) It is a bad practice to use rand()%2 to generate either 0 or 1 randomly.
My questions are:
1) Are there any other libraries that I missed to take a look to generate a completely random number (either 0 or 1) without depending on previous output?
2) If there is any other work around using the inbuilt rand() function to satisfy the requirement?
3) What is the side effect of initializing the seed more than once in a program?
Code snippet:
srand(time(NULL));
d1=rand()%2;
d2=rand()%2;
Here my intention is to make d1 and d2 completely independent of each other.
My initial thought is to do this:
srand(time(NULL));
d1=rand()%2;
srand(time(NULL));
d2=rand()%2;
But as I mentioned earlier which is based on other posts, this is a bad practice I suppose?
So, can anyone please answer the above questions? I apologize if I completely missed an obvious thing.

Are there any other libraries that I missed to take a look to generate a completely random number between 0 and 1 without depending on previous output?
Not in the standard C library. There are lots of other libraries which generate "better" pseudo-random numbers.
If there is any other work around using the inbuilt rand() function to satisfy the requirement?
Most standard library implementations of rand produce sequences of random numbers where the low-order bit(s) have a short sequence and/or are not as independent of each other as one would like. The high-order bits are generally better distributed. So a better way of using the standard library rand function to generate a random single bit (0 or 1) is:
(rand() > RAND_MAX / 2)
or use an interior bit:
(rand() & 0x400U != 0)
Those will produce reasonably uncorrelated sequences with most standard library rand implementations, and impose no more computational overhead than checking the low-order bit. If that's not good enough for you, you'll probably want to research other pseudo-random number generators.
All of these (including rand() % 2) assume that RAND_MAX is odd, which is almost always the case. (If RAND_MAX were even, there would be an odd number of possible values and any way of dividing an odd number of possible values into two camps must be slightly biased.)
What is the side effect of initializing the seed more than once in a program?
You should think of the random number generator as producing "not very random" numbers after being seeded, with the quality improving as you successively generate new random numbers. And remember that if you seed the random number generator using some seed, you will get exactly the same sequence as you will the next time you seed the generator with the same seed. (Since time() returns a number of seconds, two successive calls in quick succession will usually produce exactly the same number, or very occasionally two consecutive numbers. But definitely not two random uncorrelated numbers.)
So the side effect of reseeding is that you get less random numbers, and possibly exactly the same ones as you got the last time you reseeded.

1) Are there any other libraries that I missed to take a look to
generate a completely random number between 0 and 1 without depending
on previous output?
This sub-question is off-topic for Stack Overflow, but I'll point out that POSIX and BSD systems have an alternative random number generator function named random() that you could consider if you are programming for such a platform (e.g. Linux, OS X).
2) If there is any other work around using the inbuilt rand() function
to satisfy the requirement?
Traditional computers (as opposed to quantum computers) are deterministic machines. They cannot do true randomness. Every completely programmatic "random number generator" is in practice a psuedo-random number generator. They generate completely deterministic sequences, but the values from a given set of calls are distributed across the generator's range in a manner approximately consistent with a target probability distribution (ordinarily the uniform distribution).
Some operating systems provide support for generating numbers that depend on something more chaotic and less predictable than a computed sequence. For instance, they may collect information from mouse movements, CPU temperature variations, or other such sources, to produce more objectively random (yet still deterministic) numbers. Linux, for example, has such a driver that is often exposed as the special file /dev/random. The problem with these is that they have a limited store of entropy, and therefore cannot provide numbers at a sustained high rate. If you need only a few random numbers, however, then that might be a suitable source.
3) What is the side effect of initializing the seed more than once in
a program?
Code snippet:
srand(time(NULL));
d1=rand()%2;
d2=rand()%2;
Here my intention is to make d1 and d2 completely independent of each
other.
My initial thought is to do this:
srand(time(NULL));
d1=rand()%2;
srand(time(NULL));
d2=rand()%2;
But as I mentioned earlier which is based on other posts, this is a
bad practice I suppose?
It is indeed bad if you want d1 and d2 to have a 50% probability of being different. time() returns the number of seconds since the epoch, so it is highly likely that it will return the same value when called twice so close together. The sequence of pseudorandom numbers is completely determined by the seed (this is a feature, not a bug), and when you seed the PRNG, you restart the sequence. Even if you used a higher-resolution clock to make the seeds more likely to differ, you don't escape correlation this way; you just change the function generating numbers for you. And the result does not have the same guarantees for output distribution.
Additionally, when you do rand() % 2 you use only one bit of the approximately log2(RAND_MAX) + 1 bits that it produced for you. Over the whole period of the PRNG, you can expect that bit to take each value the same number of times, but over narrow ranges you may sometimes see some correlation.
In the end, your requirement for your two random numbers to be completely independent of one another is probably way overkill. It is generally sufficient for the pseudo-random result of one call to be have no apparent correlation with the results of previous calls. You probably achieve that well enough with your first code snippet, even despite the use of only one bit per call. If you prefer to use more of the bits, though, then with some care you could base the numbers you choose on the parity of the count of how many bits are set in the values returned by rand().

Use this
(double)rand() / (double)RAND_MAX
completely random number ..... without depending on previous output?
Well in reality computers can't generate completely random numbers. There has to be some dependencies. But for almost all practical purposes, you can use rand().
side effect of initializing the seed more than once
No side effect. But that would mean you're completely invalidating the point of using rand(). If you're re initilizeing seed every time, the random number is more dependent on time(and processor).
any other work around using the inbuilt rand() function
You can write something like this:
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
int main(int argc,char *argv[])
{
srand(time(NULL));
printf("%lf\n",(double)rand()/(double)RAND_MAX);
printf("%lf\n",(double)rand()/(double)RAND_MAX);
}
If you want to generate either a 0 or a 1, I think using rand()%2 is perfectly fine as the probability of an even number is same as probability of an odd number(probability of all numbers is equal for an unbiased random number generator).

Is it safe to take only a few bits from a number obtained with a Mersenne Twister

I have to work with some code produced by an employee that is now retired, and I have some few strange things concerning random numbers. At some points, he shifted the value returned by a PRNG 10 bits to the right and then use a mask on this value.
I have already seen on the internet that some PRNG have poor randomness properties with some bits in the number they generate (like the last one, simply alternating between 1 and 0), but I've searched if some litterture existed for such problems on the Mersenne Twister, but I haven't found any. Does anyone know something about this ?

Normally, any bit should be random, this is a property of the Mersenne twister.
However (I do not know MT very deeply) you may have long-term dependence between some bits.
It is recommended to use the library functions for setting the integer range, rather than arranging the bits yourself, or you never know the complex properties it may get.
If you use the c++11 standard library, just use std::mt19937 together with std::uniform_int_distribution

I am not sure about the Mersenne Twister in particular, but what comes to mind is the typical advice one gets when trying to get a random integer in the range [0, n). If you have a PRNG returning integers with a larger range than n, you should never use modulo to reduce range like
x = rand() % n;
but ought to rescale the number
x = (int) floor(((double) rand()) / ((double) RAND_MAX)) * n);
instead. The reason is that the most significant bits of the pseudo-random number typically are more random than the lesser bits, so while the modulo operation keeps things nice and floating-point free, it also discards those precious significant bits.
While I do not know what the code you mentioned attempted to do, it might be that the shift-to-right plus masking could have been to reduce the range of the random numbers in a way that discards the least significant bits.

c++ random number generator : std::rand() vs RANMAR : smallest double

I recently discovered something that bugs me...
I use RANMAR algorithm to generate random number. It is said that it is the best algorithm currently available (if you know a better one, please let me know).
I was really surprised to notice that the smallest double it can generate is roughly 1e-8.
So I tried with std::rand() with the common
(double)rand() / RAND_MAX;
way of generating double and I noticed that the smallest number is roughly 1e-9. I kind of understand that in this case because 1.0/RAND_MAX is roughly 1.0/2^31 ~ 1e-9 (on my computer, I know that RAND_MAX can have different values).
I was therefore wondering if it was possible to generate random double between [0:1] with the smallest possible value beeing near machine precision.
[edit]
I just want to be more precise... when I said that the smallest number that was generated was of the order of 1e-9, I should also have said that the next one is 0. Therefore there is a huge gap (infinity number of numbers) between 1e-9 and 0 that will be considered as 0. I mean by that if you do the following test
double x(/*is a value computed somehow in your code that is small ~1e-12*/);
if(x>rand()){ /*will be true for all random number generated that are below 1e-9*/}
So the condition will be true for too many numbers...
[/edit]

Given a 32-bit unsigned ran num, how to manipulate value in a certain range [low, high]?

Like the title says, I'm using a random number generator on a Freescale Coldfire chip and it returns a 32 bit unsigned value. As far as I know, there is no way to configure the generator to limit the range. What would be the best way to manipulate the number to be in the accepted range?
I was thinking of modding the number by the high range value but I would still have to deal with the lower bound.

This C FAQ article How can I get random integers in a certain range? explains how to properly generate random numbers in range [M,N] basically the formula you should use is:
M + (random number) / (RAND_MAX / (N - M + 1) + 1)

Stephan T. Lavavej explains why doing this is still not going to be that great:
From Going Native 2013 - rand() Considered Harmful
If you really care about even distribution, stick with a power of 2, or find some routines for dithering.

Yes, the traditional way is to MOD by the range, and this will be fine for many ordinary uses (simulating cards, dice, etc.) if the range is very small (a range of 52 compared to the 32-bit range of your generator is quite small). This will still be biased, but the bias will be nearly impossible to detect. If your range is bigger, the bias will be bigger. For example, if you want a random 9-digit number, the bias will be dreadful. If you want to eliminate bias altogether, there is no alternative but to do rejection sampling--that is, you must have a loop in which you generate random numbers and throw some away. If you do it right, you can keep those extra numbers needed to a minimum so you don't slow things down much.

How to get pseudo-random uniformly distributed integers in C good enough for statistical simulation?

I'm writing a Monte Carlo simulation and am going to need a lot of random bits for generating integers uniformly distributed over {1,2,...,N} where N<40. The problem with using the C rand function is that I'd waste a lot of perfectly good bits using the standard rand % N technique. What's a better way for generating the integers?
I don't need cryptographically secure random numbers, but I don't want them to skew my results. Also, I don't consider downloading a batch of bits from random.org a solution.

rand % N does not work; it skews your results unless RAND_MAX + 1 is a multiple of N.
A correct approach is to figure out the largest multiple of N that's smaller than RAND_MAX, and then generate random numbers until it's less than that value. Only then should you do the modulo operation. This gives you a worst-case rejection ratio of 50%.

in addition to oli's answer:
if you're desperately concerned about bits then you can manage a queue of bits by hand, only retrieving as many as are necessary for the next number (ie upper(log2(n))).
but you should make sure that your generator is good enough. simple linear congruential (sp?) generators are better in the higher bits than the lower (see comments) so your current modular division approach makes more sense there.
numerical recipes has a really good section on all this and is very easy to read (not sure it mentions saving bits, but as a general ref).
update if you're unsure whether it's needed or not, i would not worry about this for now (unless you have better advice from someone who understands your particular context).

Represent rand in base40 and take the digits as numbers. Drop any incomplete digits, that is, drop the first digit if it doesn't have the full range [0..39] and drop the whole random number if the first digit takes its highest-possible value (e.g. if RAND_MAX is base40 is 21 23 05 06, drop all numbers having the highest base-40 digit 21).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight