I have read so many posts on this topic:
How does rand() work? Does it have certain tendencies? Is there something better to use?
How does the random number generator work in C
and this is what I got:
1) xn+1 depends on xn i.e., previous random number that is generated.
2) It is not recommended to initialize the seed more than once in the program.
3) It is a bad practice to use rand()%2 to generate either 0 or 1 randomly.
My questions are:
1) Are there any other libraries that I missed to take a look to generate a completely random number (either 0 or 1) without depending on previous output?
2) If there is any other work around using the inbuilt rand() function to satisfy the requirement?
3) What is the side effect of initializing the seed more than once in a program?
Code snippet:
srand(time(NULL));
d1=rand()%2;
d2=rand()%2;
Here my intention is to make d1 and d2 completely independent of each other.
My initial thought is to do this:
srand(time(NULL));
d1=rand()%2;
srand(time(NULL));
d2=rand()%2;
But as I mentioned earlier which is based on other posts, this is a bad practice I suppose?
So, can anyone please answer the above questions? I apologize if I completely missed an obvious thing.
Are there any other libraries that I missed to take a look to generate a completely random number between 0 and 1 without depending on previous output?
Not in the standard C library. There are lots of other libraries which generate "better" pseudo-random numbers.
If there is any other work around using the inbuilt rand() function to satisfy the requirement?
Most standard library implementations of rand produce sequences of random numbers where the low-order bit(s) have a short sequence and/or are not as independent of each other as one would like. The high-order bits are generally better distributed. So a better way of using the standard library rand function to generate a random single bit (0 or 1) is:
(rand() > RAND_MAX / 2)
or use an interior bit:
(rand() & 0x400U != 0)
Those will produce reasonably uncorrelated sequences with most standard library rand implementations, and impose no more computational overhead than checking the low-order bit. If that's not good enough for you, you'll probably want to research other pseudo-random number generators.
All of these (including rand() % 2) assume that RAND_MAX is odd, which is almost always the case. (If RAND_MAX were even, there would be an odd number of possible values and any way of dividing an odd number of possible values into two camps must be slightly biased.)
What is the side effect of initializing the seed more than once in a program?
You should think of the random number generator as producing "not very random" numbers after being seeded, with the quality improving as you successively generate new random numbers. And remember that if you seed the random number generator using some seed, you will get exactly the same sequence as you will the next time you seed the generator with the same seed. (Since time() returns a number of seconds, two successive calls in quick succession will usually produce exactly the same number, or very occasionally two consecutive numbers. But definitely not two random uncorrelated numbers.)
So the side effect of reseeding is that you get less random numbers, and possibly exactly the same ones as you got the last time you reseeded.
1) Are there any other libraries that I missed to take a look to
generate a completely random number between 0 and 1 without depending
on previous output?
This sub-question is off-topic for Stack Overflow, but I'll point out that POSIX and BSD systems have an alternative random number generator function named random() that you could consider if you are programming for such a platform (e.g. Linux, OS X).
2) If there is any other work around using the inbuilt rand() function
to satisfy the requirement?
Traditional computers (as opposed to quantum computers) are deterministic machines. They cannot do true randomness. Every completely programmatic "random number generator" is in practice a psuedo-random number generator. They generate completely deterministic sequences, but the values from a given set of calls are distributed across the generator's range in a manner approximately consistent with a target probability distribution (ordinarily the uniform distribution).
Some operating systems provide support for generating numbers that depend on something more chaotic and less predictable than a computed sequence. For instance, they may collect information from mouse movements, CPU temperature variations, or other such sources, to produce more objectively random (yet still deterministic) numbers. Linux, for example, has such a driver that is often exposed as the special file /dev/random. The problem with these is that they have a limited store of entropy, and therefore cannot provide numbers at a sustained high rate. If you need only a few random numbers, however, then that might be a suitable source.
3) What is the side effect of initializing the seed more than once in
a program?
Code snippet:
srand(time(NULL));
d1=rand()%2;
d2=rand()%2;
Here my intention is to make d1 and d2 completely independent of each
other.
My initial thought is to do this:
srand(time(NULL));
d1=rand()%2;
srand(time(NULL));
d2=rand()%2;
But as I mentioned earlier which is based on other posts, this is a
bad practice I suppose?
It is indeed bad if you want d1 and d2 to have a 50% probability of being different. time() returns the number of seconds since the epoch, so it is highly likely that it will return the same value when called twice so close together. The sequence of pseudorandom numbers is completely determined by the seed (this is a feature, not a bug), and when you seed the PRNG, you restart the sequence. Even if you used a higher-resolution clock to make the seeds more likely to differ, you don't escape correlation this way; you just change the function generating numbers for you. And the result does not have the same guarantees for output distribution.
Additionally, when you do rand() % 2 you use only one bit of the approximately log2(RAND_MAX) + 1 bits that it produced for you. Over the whole period of the PRNG, you can expect that bit to take each value the same number of times, but over narrow ranges you may sometimes see some correlation.
In the end, your requirement for your two random numbers to be completely independent of one another is probably way overkill. It is generally sufficient for the pseudo-random result of one call to be have no apparent correlation with the results of previous calls. You probably achieve that well enough with your first code snippet, even despite the use of only one bit per call. If you prefer to use more of the bits, though, then with some care you could base the numbers you choose on the parity of the count of how many bits are set in the values returned by rand().
Use this
(double)rand() / (double)RAND_MAX
completely random number ..... without depending on previous output?
Well in reality computers can't generate completely random numbers. There has to be some dependencies. But for almost all practical purposes, you can use rand().
side effect of initializing the seed more than once
No side effect. But that would mean you're completely invalidating the point of using rand(). If you're re initilizeing seed every time, the random number is more dependent on time(and processor).
any other work around using the inbuilt rand() function
You can write something like this:
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
int main(int argc,char *argv[])
{
srand(time(NULL));
printf("%lf\n",(double)rand()/(double)RAND_MAX);
printf("%lf\n",(double)rand()/(double)RAND_MAX);
}
If you want to generate either a 0 or a 1, I think using rand()%2 is perfectly fine as the probability of an even number is same as probability of an odd number(probability of all numbers is equal for an unbiased random number generator).
Related
I am playing around with a pair of algorithms I found on other SO posts (listed in the references below), and am trying to find out how to improve the distribution. I am effectively extending the range of a random number by doubling the number of bits, and want to ensure that the distribution is as uniform as possible, while removing (or at least reducing) the effects of modulo bias and other artifacts for a shuffling algorithm that would be using the result of my modified random number generator.
So, it is to my understanding that if I initialize my RNG with a constant seed (ie: srand(1)) I will get the same pattern of deterministic outputs from calling rand() in a for loop. Now, if I were to initialize my seed via srand(time(NULL)), it would be a different pattern, but it still might not help me with the following problem: I am trying to figure out if I were to implement the following algorithm:
Take two random numbers a,b
Calculate a*(RAND_MAX+1)+b
Would I be able to:
Generate every possible coordinate pair (a,b), where a,b ∈ Z+ on [0, RAND_MAX] (a and b are positive integers between zero and RAND_MAX inclusive).
Maximize the uniformity of the entire distribution (ie: an optimally flat histogram).
While the output of rand() is supposed to be uniformly distributed, I don't know if it's guaranteed to give me values for N,N+1 calls to rand each loop and give me every pair listing in point (1) before the random sequence repeats itself again. My new random number generator could theoretically generate random values on [0, RAND_MAX ^ 2], but I don't know if there might be "holes" aka values in this range that can never be generated by my algorithm.
I attempted to get further on this question myself, but I couldn't find information on how long a random sequence generated be rand() in C goes until it repeats itself. Lacking this and other information, I couldn't figure out whether or not it is possible to generate every pair (a,b).
So, using rand(), is it possible to achieve point (1) and if so, are there any solid suggestions on how to optimize its "randomness" according to point (2)?
Thank you for your time and assistance.
Update
I later revisited this problem and simulated it using an 8-bit PRNG. While it could indeed generate every possible coordinate pair, the distribution was actually quite interesting, and definitely not uniform. In the end, I read several articles/papers on PRNGs and used a Mersenne Twiser algorithm to generate the additional bits needed (i.e. MT19937-64).
References
Extend rand() max range, Accessed 2014-05-07, <https://stackoverflow.com/questions/9775313/extend-rand-max-range>
Shuffle array in C, Accessed 2014-05-07, <https://stackoverflow.com/questions/6127503/shuffle-array-in-c>
Assumptions
As pointed out in the comments, the behaviour of rand() is implementation dependent. So, let's make a few simplifying assumptions to get to the point of the question:
rand() can generate all values from 0 to RAND_MAX. Justfication: If it could not, then it would be even harder to generate all possible pairs (a, b).
rand() generates a statistically random sequence. Justification: The result of composing two random functions (or the same one twice) is only as good as the base random function.
Of course, we shouldn't expect the result to be better than the building blocks, so any deficiencies in the rand() implementation will reflect itself in any functions composed from it.
Holes in the distribution
Seeding rand() generates a deterministic sequence for a given seed, as the seed determines the PRNG's initial state. The sequence's maximum period is 2N where N is the number of bits in the state. Note that the state may in fact have more bits than RAND_MAX, we'll assume RAND_MAX = (2N - 1) for this section. Because it is a sequence, generating two successive "random" N-bit values a and b means that a ≠ b. Therefore, the method a*(RAND_MAX+ 1)+b will have some holes.
A little explanation on a ≠ b: PRNGs work by maintaining an internal state of N bits. It uses that state uniquely to determine its next state, so once the same state recurs, the sequence starts to repeat itself. The number of states gone through before the sequence starts repeating itself is called the period. So, technically we could have a = b, but that implies a period of 1, and that's a very bad PRNG. For more information, a helpful answer on PRNG periods has been posted on the Software Engineering site.
An algorithm without holes
One way to allow successive "random" calls to be equal is to generate 2 N-bit numbers, but consider only a certain number of bits significant, i.e. discard some. Now, we can have a = b, though with very slightly less probability than another random number c. Note this is similar to how Java's random number generator works. It is seeded with 48 bits, but outputs a 32-bit random number, discarding 16 bits (assuming # of bits in seed = # of bits in state).
However, since you need values larger than RAND_MAX, what you could do is use the above method, and then concatenate the bits, until you get enough bits to reach the desired maximum (though again, the distribution is not quite uniform).
I have a program, in C, which creates an array of 1000 integers using a random number from 0-999 and then performs some sort of algorithm on that array. In order to test the algorithm's running time, I try running the program 10000 times, but every time I run it the array is the same for a few arrays and then it changes. I have used the srand() function to feed the seed with the current time, but it still does not help.
Is there an alternative solution to rand() or a way to fix this?
My function is:
void getarray(int *ptr1, int size, int option){
int n;
srand(time(NULL));
for(n=0; n<size; n++)
*(ptr1+n) = *(ptr2+n)= rand()%1000;
}
Thanks in advance!
You should only call srand once: on program startup.
Right now, if you call your function multiple times before time changes your sequence will be the same.
The lrand48() call tends to have a lot more state internally and a better pseudorandom number distribution.
However, note that you're reseeding with only 1-second granularity, so calls within the same second will generate the same sequence. Put your srand() call in main() or somewhere, once, instead of recalling it in getarray.
You should investigate very carefully if rand() is the better function for the job.
It varies by compiler and platforms but it is often implmented as a "Linear congruential generator" which are very convenient in terms of speed and memory usage but have poor statistical properties (i.e. you can tell if a long enough sequence has been generated by congruential random generator or if it's truly random).
In your use case (testing algorithm's speed) may be perfectly fine to use rand() as long the execution is not influenced by the statistical properties of the data. If rand() is a linear congruential RNG, number sequences show a pattern which means that at any given time it is not true that all the numbers are equiprobable. A nice example is in this wikipedia picture:
Your system might also have a RNG (e.g. /dev/random) and its associated functions but be aware that those are meant to produce few high quality random numbers and may be pretty slow to use. You might even run out of numbers and end up waiting for the system to collect more enthropy!
A simple, pretty fast RNG with statistical properties good enough for cryptography is ISAAC. Personally I use it whenever I need decent random numbers.
Another alternative is to use true random numbers as those generated by RANDOM.org or HotBits but it may be overkill in your case.
As a side note, RANDOM.ORG has a nice page on RNG with another example of "patterns" created by the PHP rand() function
First, call the seed function once, not in a loop.
Second, I suggest you either :
1) Switch to the random(3) function
2) Pick something from rand48 / lrand48
3) Read the number of desired bytes for /dev/random yourself.
Solution 1) is easy and somewhat portable. 2 need a bit of thinking, 3 is the most work, and the least portable.
I work in C and i am trying(desperately) to make a random generator that not only generates
a different number every time i run the generator but also a different sequence every time i run the program.I tested almost everything i found online.I resulted in 2 good ways to make a good random generator.
The first one is to use a different seed every time.But this means that i have to use a different-random seed every time,a matter that i didn't solve at first.Here is what i am trying now but it's not truly random as i want:
int myrand(int random_seed){
random_seed = random_seed * 1103515245 +12345;
return (unsigned int)(random_seed / 65536) % 32768;
}
Every time i call the function i increase the seed by 1.
The second way is to use time.Time changes and this is randomness.I also tried many ways to implement this.My latest try is here:
Compiler error-Possible IDE error"undefined reference to gettimeofday error"
but i couldn't use the gettimeofday function because i work in Windows.Also in that question i didn't get any answers.
Could anyone give help me of how i can implement a random generator(probably using time) in C working in Windows?Or should i use Unix?
Seed your RNG with a good source of entropy.
Under unix, use /dev/random.
Under windows, use something like CryptoAPI - Windows equivalent of /dev/random
What you are asking for is not a random number generator, but how to use the random number generator already included in the C standard library.
All you need to do is seed it once at program startup:
srand(time(NULL));
That's all. It's portable and will give you a different sequence every time you run the program, given that at least one second has passed since the last time you've ran it.
There is no harm in seeding it again later, but no point in it either.
The C standard library has the header time.h (or ctime if you are using C++)(reference). The functions there will be supported in Windows and Unix.
I would recommend time() or clock() as seed for your random number generator.
An other way to get totally random input is the usage of the mouse position or other things influenced from outside.
There are many ways to implement prng but unfortunately none of them is real random number generator. time(NULL) is a good approach but I'm using "blum blum shub". It generates one bit random number
Since you're asking explicitly for a Windows solution I'd suggest to avoid time(NULL) or clock() as a seed for srand()since their granularity is very limited (ms). Instead you could use the result of the performance counter:
LARGE_INTEGER PerformanceCount;
QueryPerformanceCounter(&PerformanceCount);
srand(PerformanceCount.LowPart);
The increment rate of the frequency of QueryPerformanceCounter() can be obtained by a call to QueryPerformanceFrequency(). This typically increases at at least 1 MHz and sometimes even into the GHz range. Therefore it provides a fast changing source for the seed.
Edit: As understood from your earlier question also a gettimeofday() alike implementation won't give fine granularity. It may show the word tv_usec in its argument but on WIndows it will not provide microseconds granularity as it does on Linux systems.
quote:
to make a random generator that not only generates a different number every time i run the generator
Definitions of random do not include that concept. Rather the idea is that you have an equal chance of selecting any number, regardless of the number previously chosen. Which means it is theoretically possible to pick the same number twice.
If you are dealing a deck of cards then that meets your criterion of no duplicates. Using the deck dealing approach means keeping track of "used" numbers.
You should also be aware that PNRGs (pseudorandom number generators) are cyclic (periodic). After you have generated numbers, usually a large number, you then start all over again and repeat exactly the name sequence of numbers. The UNIX rand() function generates integers integers in the range [0, {RAND_MAX}] and has a period of 2^32
Really consider reading this short page:
See: http://pubs.opengroup.org/onlinepubs/009695399/functions/rand.html
I am reading a C book. In the description of the rand() function, they say:
rand returns a pseudo-random integer in the range 0 to RAND_MAX. RAND_MAX is implementation dependent but at least 32767.
I don't understand; what is a "pseudo-random integer"?
Thanks.
Informally, a pseudorandom number is a number that isn't truly random, but is "random enough" for most purposes.
Computers are inherently deterministic devices. The processor executes specific commands in a specific order, and programs control how the processor does so. Consequently, it's hard for programs to generate random numbers because no deterministic process can create a random number. Thus what many programs do is use a pseudorandom number generator, which is a function that produces numbers according to some deterministic formula that appear to be random but actually are not. Most programming languages provide some sort of pseudorandom number generator for general programming use, and when true randomness isn't needed they work just fine.
However, they have their limitations. In cryptographic settings, in many cases true randomness is required in order to prevent attackers from guessing the workings of a system and compromising it, for example. In this case, it is possible to get truly random numbers by using specialized hardware that can amplify background noise or use quantum effects. This sort of randomness is extremely hard to generate, though, and so it's not commonly used unless absolute unpredictability is required.
From wiki entry : Pseudorandom number generator
A pseudorandom number generator
(PRNG), also known as a deterministic
random bit generator (DRBG), is an
algorithm for generating a sequence of
numbers that approximates the
properties of random numbers.
In other words, its approximately random (to a known extent), but not truly random in the sense of random noise from a physical source.
It means that the number may seems to be random but the truth is computer can't generate a truly random number, it works out a number based on a logic which tends to be random.
Like observing time difference in keystrokes, and then using it as just an input in calculating a number. Such things combined with several other inputs tend to give random numbers but in reality its just an algorithm which tends to give random numbers. If the same conditions are matched, it will give the same number.
To add some pedantry to the other correct answers that you've already received, there really is no such thing as a "pseudorandom integer" (or for that matter, a random integer). This is the point that John von Neumann was making in his famous quote (which is usually misleading abbreviated to just the first sentence):
"Any one who considers arithmetical
methods of producing random digits is,
of course, in a state of sin. For, as
has been pointed out several times,
there is no such thing as a random
number — there are only methods to
produce random numbers, and a strict
arithmetic procedure of course is not
such a method"
Consider the number 7. Is it a random integer? If it is, is it a pseudorandom integer or a true random integer? What about the number -10? Clearly these questions don't make sense (obligatory link to xkcd 221).
So, as others have pointed out, what the book actually means is that the number is generated by a pseudorandom (i.e. deterministic) number generator as opposed to being obtained from a truly random sequence. There are several different pseudorandom number generator algorithms and the best ones generate output that is statistically indistinguishable from a true random sequence.
It means that it generates a number sequence that exhibits the properties of a random number in terms of distribution, but which is mathematically generated and deterministic in its output for any particular seed value.
You can see this by creating a program that uses such a function to generate a short sequence, and observing that each time you run it it generates the same sequence. This option surprises the unsuspecting, but is also quite a useful property in testing code. When a less predictable sequence is required, one normally initialised the PRNG with a seed value that is itself unpredictable, often derived from the current system time, since it is not normally predictable when a program will be started.
Pseudo random number generators are not normally suitable for security applications such as data encryption, or on-line gambling applications, where their ultimate predictability could be a serious weakness.
In most computers, a random number is considered psuedorandom because it is not completely random.
pseudo=fake
Basically, the current time is used to generate a random number, so it can be predicted what number will be generated initially, and for most applications, this is fine.
I dont get. If it has a fixed length, choosing the lags and the mod over and over again will give the same number, no?
To be precise, the lagged Fibonacci is a pseudo-random number generator. It's not true random, but it's much better than, say, the more commonly used linear congruential generator (the standard generator for C++, Java, etc). I'm not sure why you think it will give the same number all over again, but it's true that like all pseudo-random number generator, it has a period after which the sequence of numbers will repeat again.
The multiplicative LFG has a period of (2^k - 1)*2^(M-3). For practical parameters, this is actually quite huge (LCG's period is only M).
The only catch with LFG is that the initialization procedure is very complex, and the mathematics behind it is incomplete. It's best to consult the literature for good choice of parameters and recommended procedure for proper seeding.
As an illustration, a multiplicative LFG with parameters (j=31, k=52) and modulus m=2^32 is seeded with an array of 52 32-bit numbers.
Additional references:
http://sprng.fsu.edu/Version4.0/generators.html
More details on this generator and the seeding algorithms can be found in papers by Mascagni, et al.
It's not random, its pseudorandom
From this http://en.wikipedia.org/wiki/Lagged_Fibonacci_generator
Lagged Fibonacci generators have a maximum period of (2^k - 1)*2^(M-1) if addition or subtraction is used, and (2^k-1) if exclusive-or operations are used to combine the previous values. If, on the other hand, multiplication is used, the maximum period is (2^k - 1)*2^(M-3), or 1/4 of period of the additive case.
So, given a certain seed value, the sequence of output values is predictable and repeatable, and it has a cycle. It will repeat if you wait long enough - but the cycle is quite large.
For an observer that doesn't know the seed value, the sequence appears to be quite random so it can be useful as a source of "randomness" for simulations and other situations where true randomness isn't required.
It's random in the same way that any pseudorandom number generator is--which is to say, not at all.
However, lagged fibonacci (and all linear feedback shift register PRNGs) improve on a basic linear congruential generator by increasing the state size. That is, the next value depends on several former values, rather than just the immediate previous one. Combined with a decent seed you should be able to get fairly decent results.
Edit:
From your post, it isn't clear that you understand that the underlying state is stored in a shift register, meaning that it isn't static but updated (by shifting each value one place to the left, dropping the leftmost value, and appending the most recent value on the right side) after each draw. In this way, drawing the same number over & over again is avoided (for most seed values, at least).
It all depends on the seed. Most random number generators do give the same sequence of numbers for a fixed seed value.
Random number generators are often one-to-one functions where for every input there is a constant output. To make it "random" you have to feed it a seed (which must be "random"), like the system time or the values of computer memory locations, for example.
If you're wondering why you don't just straight up use the seed (the time, etc.), it's because the time is sequential (1,2,3,4) whereas most pseudorandom number generators spit out numbers that appear random (8, 27, 13, 1). That way if you're generating pseudorandom numbers in a loop (which happens very fast), you're not just getting {1,2,3,4}...