I recently discovered something that bugs me...
I use RANMAR algorithm to generate random number. It is said that it is the best algorithm currently available (if you know a better one, please let me know).
I was really surprised to notice that the smallest double it can generate is roughly 1e-8.
So I tried with std::rand() with the common
(double)rand() / RAND_MAX;
way of generating double and I noticed that the smallest number is roughly 1e-9. I kind of understand that in this case because 1.0/RAND_MAX is roughly 1.0/2^31 ~ 1e-9 (on my computer, I know that RAND_MAX can have different values).
I was therefore wondering if it was possible to generate random double between [0:1] with the smallest possible value beeing near machine precision.
[edit]
I just want to be more precise... when I said that the smallest number that was generated was of the order of 1e-9, I should also have said that the next one is 0. Therefore there is a huge gap (infinity number of numbers) between 1e-9 and 0 that will be considered as 0. I mean by that if you do the following test
double x(/*is a value computed somehow in your code that is small ~1e-12*/);
if(x>rand()){ /*will be true for all random number generated that are below 1e-9*/}
So the condition will be true for too many numbers...
[/edit]
Related
I want to know how to model random variables using "basic operations". The only random function I know, at least for C, is rand(), along with srand for seeding. There probably exists packages somewhere online but lets say I want to implement it on my own. I don't know if there are other very common random functions, but if not, lets just stick with rand() and the C language.
rand() allows me to pseudo-randomly generate an int from 0 to RAND_MAX. I can then use mod to get an int in some range. I can next mod 2 to choose a sign and get negative numbers. I can also do rand()/RAND_MAX to model values in the interval (0,1) and shift this to model Uniform(a,b).
But what I am not sure about is if I can extend this to model any probability distribution and at what point do I have to worry about accuracy especially when dealing with infinities and irrational probabilities. Also, this method is very crude so I would like to know of more standard ways using basic tools if any.
A simple example:
I have the random variable X such that Pr(X = 1)=1/pi and Pr(X=0)=1-1/pi. Since pi is irrational, I would approximate the probability of getting 1/pi with rand() and choose X=1 if I get an int from 0 to Round(RAND_MAX*1/pi). So this is approximating twice, once for pi and another time for rounding.
Is there a better approach? How would one go about modeling something more complicated such as a continuous random variable on the interval (0,infinity) or a discrete random variable with irrational probabilities on a countably infinite set. Would my approach still work or would I have to worry about rounding errors?
EDIT: Also how does the pseudo-randomness instead of randomness of rand() change things and how would I account for these changes?
I can then use mod to get an int in some range
No, you can't. Try it with dice. You want a number between 1 and 5. So you take the roll mod 5 (kind of, it would actually be ((roll-1)%5)+1). This maps 1 to 1, 2 to 2, etc. 5 to 5 and 6 to 1. You now have 1 twice as likely as any other roll.
The correct way of doing this is to find the nearest power of 2 higher than your range, mask out the bits of the random number above that power of 2 then check if you're in range. If you aren't in range, try again (will potentially loop forever, in practice the average number of retries is less than 2). This assumes that your random numbers are a stream of bits and not something else. This is usually a safe assumption for decent generators.
I can also do rand()/RAND_MAX to model values in the interval (0,1)
No, you can't. That's not how floating point numbers work. This generates a horrible distribution.
Either the number of bits in the integer is smaller than the number of bits in the mantissa, then you'll just have a bunch of floating point numbers you can't ever generate. Or the number of bits in the integer is bigger than the number of bits in the mantissa and then you'll truncate your integer when converting it to floating point before the division and will generate certain numbers much more often.
in the interval (0,1) and shift this to model Uniform(a,b).
This makes things even worse. First you lose bits in one direction, then you lose bits in the other direction.
To actually generate uniformly distributed floating point numbers in an arbitrary range is harder than it looks.
I've done some experiments to figure this out myself a few years ago, learning floating point internals in the process and I've written some code with a lot of comments with reasoning here: https://github.com/art4711/random-double
In short, to generate random floating point numbers in an arbitrary range: find the bigger absolute value of the range. That is the start, the other end of the range is the end. Figure out the next representable number from start to end. Subtract that next number from start, that becomes the step. Calculate how many steps exist between start and end. Generate a uniformly distributed random number between 0 and number of steps. start + step * random number is the answer. Also, because of how floating point work, this might not be exactly what you're looking for. All possible floating point values are most certainly not possible to generate using this method (except in very special cases). But this method guarantees that every possible value is equally likely.
Notice that your misconceptions are very common. Almost everyone does those things. Random numbers in the industry are anything but random. The word random in computer science pretty much means "predictable, repeatable, easily breakable and exploitable, quite possibly not well distributed". And don't get me started on the quality of the "random" number generators in standard libraries. If you dig around my github stuff, you'll find a package for Go with a long README rant about this.
I'm not going to respond to the rest of your question, those bits require a book or two.
Like the title says, I'm using a random number generator on a Freescale Coldfire chip and it returns a 32 bit unsigned value. As far as I know, there is no way to configure the generator to limit the range. What would be the best way to manipulate the number to be in the accepted range?
I was thinking of modding the number by the high range value but I would still have to deal with the lower bound.
This C FAQ article How can I get random integers in a certain range? explains how to properly generate random numbers in range [M,N] basically the formula you should use is:
M + (random number) / (RAND_MAX / (N - M + 1) + 1)
Stephan T. Lavavej explains why doing this is still not going to be that great:
From Going Native 2013 - rand() Considered Harmful
If you really care about even distribution, stick with a power of 2, or find some routines for dithering.
Yes, the traditional way is to MOD by the range, and this will be fine for many ordinary uses (simulating cards, dice, etc.) if the range is very small (a range of 52 compared to the 32-bit range of your generator is quite small). This will still be biased, but the bias will be nearly impossible to detect. If your range is bigger, the bias will be bigger. For example, if you want a random 9-digit number, the bias will be dreadful. If you want to eliminate bias altogether, there is no alternative but to do rejection sampling--that is, you must have a loop in which you generate random numbers and throw some away. If you do it right, you can keep those extra numbers needed to a minimum so you don't slow things down much.
I'm writing a Monte Carlo simulation and am going to need a lot of random bits for generating integers uniformly distributed over {1,2,...,N} where N<40. The problem with using the C rand function is that I'd waste a lot of perfectly good bits using the standard rand % N technique. What's a better way for generating the integers?
I don't need cryptographically secure random numbers, but I don't want them to skew my results. Also, I don't consider downloading a batch of bits from random.org a solution.
rand % N does not work; it skews your results unless RAND_MAX + 1 is a multiple of N.
A correct approach is to figure out the largest multiple of N that's smaller than RAND_MAX, and then generate random numbers until it's less than that value. Only then should you do the modulo operation. This gives you a worst-case rejection ratio of 50%.
in addition to oli's answer:
if you're desperately concerned about bits then you can manage a queue of bits by hand, only retrieving as many as are necessary for the next number (ie upper(log2(n))).
but you should make sure that your generator is good enough. simple linear congruential (sp?) generators are better in the higher bits than the lower (see comments) so your current modular division approach makes more sense there.
numerical recipes has a really good section on all this and is very easy to read (not sure it mentions saving bits, but as a general ref).
update if you're unsure whether it's needed or not, i would not worry about this for now (unless you have better advice from someone who understands your particular context).
Represent rand in base40 and take the digits as numbers. Drop any incomplete digits, that is, drop the first digit if it doesn't have the full range [0..39] and drop the whole random number if the first digit takes its highest-possible value (e.g. if RAND_MAX is base40 is 21 23 05 06, drop all numbers having the highest base-40 digit 21).
i have to generate random numbers for 3 different cases.
i. 1 dice
ii. a pair of dice
iii. 3 dices
my questions:
1. please suggest me sm good logic to generate random numbers for all the 3 cases.
2. does the logic change when i consider the cses of 2 dices, rather than 1?
3.how much of an effect does the range in which we have to genrate a random number affect the logic of the random function?
If the range is small enough, you shouldn't have problems in using the usual modulo method
int GetRandomInt(int Min, int Max)
{
return (rand()%(Max-Min+1))+Min;
}
(where Min a Max specify a closed interval, [Min, Max])
and calling it once for each dice roll. Don't forget to call srand(time(NULL)) at the start of the application (at the start only, not each time you want to get a random number) to seed the random number generator.
If the range starts to be bigger, you may have to face two problems:
First, the range of rand() obviously isn't [0, +∞), instead it's [0, RAND_MAX], where RAND_MAX is a #define guaranteed to be at least 32767. If your range (Max-Min) spans over RAND_MAX, then, with this method, you'll have some numbers that will have a zero probability of being returned.
This is more subtle: suppose that RAND_MAX is bigger than your range, but not that bigger, let's say that RAND_MAX==1.5*/(Max-Min).
In this case, the distribution of results won't be uniform: rand() returns you an integer in the range [0, RAND_MAX] (and each integer in this range should be equiprobable), but you are taking the rest of the division with (Max-Min). This means that the numbers in the first half of your required range have twice the probability of being returned than the others: they can actually come from the first and the third third of the rand() range, while the second half of the required range can come only from the second third of the rand() range.
What does this mean for you?
Probably nothing. If all you want to do is a dice-roll simulator, you can go without problems with the modulo method, since the range involved is small, and the second problem, despite being still present, it's almost irrelevant: suppose your range is 3 and MAX_RAND 32767: from 0 to 32765, 0, 1 and 2 has the same probability, but going up to 32767 0 and 1 gain one potential exit, which is almost irrelevant, since they pass from a perfect 1/3 (10922/32766=0,333...) probability for each one to a 10922/32767 for 2 (~0,33332) and 10923/32767 (~0,33335) for 0 and 1 (assuming that rand() provides a perfect distribution).
Anyhow, to overcome such problems a quite used method is to "stretch" the rand() range over a wider range (or compressing it to a smaller range) using a method like this:
int GetRandomInt(int Min, int Max)
{
return (int)(((double)rand())/MAX_RAND*(Max-Min))+Min;
}
based on the equivalence rand():MAX_RAND=X:(Max-Min). The conversion to double is necessary, otherwise the integer division between rand() and its maximum value will always yield 0 (or 1 in the rare case of rand()==MAX_RAND); it could be done in integer arithmetic performing the product first if MAX_RAND is small and the range too is not too wide, otherwise there's a high risk of overflow.
I suspect that, if the output range is bigger than the range of rand(), the "stretching" and the fp value truncation (due to the conversion to int) affect in some way the distribution, but just locally (e.g. in small ranges you may never get a certain number, but globally the distribution will look ok).
Notice that this method helps to overcome to a diffused limitation of the C standard library random number generator, i.e. the low randomness of the lower bits of the returned value - which are, incidentally, the ones you are using when you perform a modulo operation with a small output range.
However, keep in mind that the C standard library RNG is a simple one, that strives to comply with "easy" statistical rules, and as such is easily predictable; it shouldn't be used when "serious" random numbers are required (e.g. cryptography). For such needs there are dedicated RNG libraries (e.g. the RNG part of the GNU Scientific Library), or, if you need really random stuff, there are several real random number services (one of the most famous is this), which do not use mathematical pseudo-RNG, but take their numbers from real random sources (e.g. radioactive decay).
Yea, like DarkDust said, this sounds like homework, so, to answer your questions with that in mind, I'd say:
--> No, the logic doesnt not change, no matter how many dices you include.
--> Easiest way to do this would be, just make a function that give you ONE
random function, and depending on how many dices you have, call it that
many times.
--> You can instead include for loop in the function and add the values into
array and return the array.
This way, you can generate random number for 100 dices too.
Since this sounds like homework I'm just going to give hints which should be "good enough" for you (a pro would do it slightly differently): use the random() function and the % (modulo) operator. Modulo is the "reminder" after a division.
For a simple simulation in C, I need to generate exponential random variables. I remember reading somewhere (but I can't find it now, and I don't remember why) that using the rand() function to generate random integers in a fixed range would generate non-uniformly distributed integers. Because of this, I'm wondering if this code might have a similar problem:
//generate u ~ U[0,1]
u = ( (double)rand() / ((double)(RAND_MAX));
//inverse of exponential CDF to get exponential random variable
expon = -log(1-u) * mean;
Thank you!
The problem with random numbers in a fixed range is that a lot of people do this for numbers between 100 and 200 for example:
100 + rand() % 100
That is not uniform. But by doing this it is (or is close enough to uniform at least):
u = 100 + 100 * ((double)rand() / ((double)(RAND_MAX));
Since that's what you're doing, you should be safe.
In theory, at least, rand() should give you a discrete uniform distribution from 0 to RAND_MAX... in practice, it has some undesirable properties, such as a small period, so whether it's useful depends on how you're using it.
RAND_MAX is usually 32k, while the LCG rand() uses generates pseudorandom 32 bit numbers. Thus, the lack of uniformity, as well as low periodicity, will generally go unnoticed.
If you require high quality pseudorandom numbers, you could try George Marsaglia's CMWC4096 (Complementary Multiply With Carry). This is probably the best pseudorandom number generator around, with extreme periodicity and uniform distribution (you just have to pick good seeds for it). Plus, it's blazing fast (not as fast as a LCG, but approximately twice as fast as a Mersenne Twister.
Yes and no. The problem you're thinking of arises when you're clamping the output from rand() into a range that's smaller than RAND_MAX (i.e. there are fewer possible outputs than inputs).
In your case, you're (normally) reversing that: you're taking a fairly small number of bits produced by the random number generator, and spreading them among what will usually be a larger number of bits in the mantissa of your double. That means there are normally some bit patterns in the double (and therefore, specific values of the double) that can never occur. For most people's uses that's not a problem though.
As far as the "normally" goes, it's always possible that you have a 64-bit random number generator, where a double typically has a 53-bit mantissa. In this case, you could have the same kind of problem as with clamping the range with integers.
No, your algorithm will work; it's using the modulus function that does things imperfectly.
The one problem is that because it's quantized, once in a while it will generate exactly RAND_MAX and you'll be asking for log(1-1). I'd recommend at least (rand() + 0.5)/(RAND_MAX+1), if not a better source like drand48().
There are much faster ways to compute the necessary numbers, e.g. the Ziggurat algorithm.