I'm new to C and multithreading programming, and I was reading a book which talks about thread-unsafe function as:
A pseudorandom number generator is a simple example of this class of thread-unsafe functions. Consider the pseudorandom number generator:
unsigned next_seed = 1;
/* rand - return pseudorandom integer in the range 0..32767 */
unsigned rand(void)
{
next_seed = next_seed*1103515245 + 12543;
return (unsigned)(next_seed>>16) % 32768;
}
/* srand - set the initial seed for rand() */
void srand(unsigned new_seed)
{
next_seed = new_seed;
}
The rand function is thread-unsafe because the result of the current invocation depends on an intermediate result from the previous iteration. When we call rand repeatedly from a single thread after seeding it with a call to srand, we can expect a repeatable sequence of numbers. However, this assumption no longer holds if multiple threads are calling rand. The only way to make a function such as rand thread-safe is to rewrite
it so that it does not use any static data, relying instead on the caller to pass the state information in arguments.
I'm a little bit confused, below is my question:
Q1-we can add sem_wait(P operation) and sem_post (V operation) functions to protect next_seed global varaible, then we can get the same repeatable result just like using a single thread, why the author says we need to rewrite the whole function?
Q2-Since we want to generate random numbers, therefore we don't really expect a repeatable sequence of numbers, so why rand function is thread-unsafe even though it behaves correctly?
Since we want to generate random numbers, therefore we don't really expect a repeatable sequence of numbers
There may be some cases where "you" really don't care but there's also a lot of cases where you want to be able to create exactly the same random sequence.
Example 1:
Assume you have written a program that runs fine again and again but then suddenly crashes one day. Now you'll like to debug the crash so you rerun the program. Now it doesn't crash... and it doesn't crash the next 10.000 times but then...
How will you ever debug that if your not able to generate exactly the same random sequence?
Example 2:
Random sequences are often used (together with e.g. various coverage measurements) for testing. It can be testing of your program but it can also be testing of external things like RTL code for FPGAs/ASICs. When a test case fails, you want to be able to redo that exact same test.
Again: How will you ever debug that if your not able to generate exactly the same random sequence?
For both of the above examples what you'll do is to print/log the seed at start up so that you can later rerun using that seed and thereby get the exact same sequence.
Regarding your Q1 the answer is more or less the same. Protection by mutexes/semaphores will not help you in getting the same sequence as you can't control the sequence that the threads will get access to the function generating the random numbers.
Related
I quite like being able to generate the same set of pseudo-random data repeatedly, especially with tweaking experimental code. Through observation I would say that rand() seems to give the same sequence of numbers each time*.
Is it guaranteed to do this for repeated executions on the same machine / for different machines / for different architectures?
*For the same seed obviously.
Yes, given the same environment for the program. From the C standard §7.20.2.2/2,
The srand function uses the argument as a seed for a new sequence of pseudo-random numbers to be returned by subsequent calls to rand. If srand is then called with the same seed value, the sequence of pseudo-random numbers shall be repeated. If rand is called before any calls to srand have been made, the same sequence shall be generated as when srand is first called with a seed value of 1.
Of course, this assumes it is using the same implementation detail (i.e. same machine, same library at the same execution period). The C standard does not mandate a standard random number generating algorithm, thus, if you run the program with a different C standard library, one may get a different random number sequence.
See the question Consistent pseudo-random numbers across platforms if you need a portable and guaranteed random number sequence with a given seed.
It is guaranteed to give the same sequence for the same seed passed to srand() - but only for the duration of a single execution of the program. In general, if an implementation has a choice in behaviour, there is no specific requirement for that choice to remain the same across subsequent executions.
It would be conforming for an implementation to pick a "master seed" at each program startup, and use that to perturb the pseudo-random number generator in a way that is different each time the program starts.
If you wish for more determinism, you should implement a PRNG with specific parameters in your program.
No.
The C standard says:
If srand is then called with the same
seed value, the sequence of
pseudo-random numbers shall be
repeated.
But nowhere does it say what the sequence of pseudo-random numbers actually is - so it differs across implementations.
The only guarantee made is that rand() will give the same sequence of numbers for a given seed for a given implementation. There's no guarantee that the sequence will be the same across different machines or different architectures - and it almost certainly won't be.
If you need to use the exact same set of pseudo-random numbers for experimental purposes, one thing you could do is to use srand to generate a long sequence of random numbers and write them to a file/database. Then, write a portable "random number generator" function that returns values sequentially from that file. That way, you can be assured that you are using the same input data regardless of the platform, srand implementation, or seed value.
When switching to a different machine/runtime/whatever you might be out of luck. There is another possible choice the drand48 family of functions. These are normalized to use the same algorithm on all machines.
If you are in a UNIX/Linux enviroment you can see the drand48() and srand48() at your man pages if you are not you can see online manuals for the C Language.
The prototypes can be found at /usr/include/stdlib.h .
The first one use the Linear Congruential Method that is frequently used in Simulations.
If you provide the same seed to srand48() i.e. srand48(2) and then put the dran48() in a for loop then the sequence will be the same every time.
i.e.
include stdio.h
include stdlib.h
double drand48();
int main(void){
int i;
double rn;
srand48(2);
for(i=0; i<10; i++){
randNum = drand48();
printf("%.6l\n", randNum);
return 0;
}
I know that in order to avoid repeating the same output of the rand() function a pseudo-random number generator must be seeded with the srand function. That means, if I try say srand(1), the output of the rand() will be one value, if I try srand(2), the output will contain another value. But when I try the first argument again like srand(1), the value will be the same as in the first output. This issue made me think that all random values would be predictable in some way. Is it possible to have different output for the same seed (say if I try the same seed tomorrow)? Or are random values predictable indeed?
With the traditional definition of a pseudorandom generator, if you know what the generator has been seeded with, then the sequence of output values is completely determined and not random. This means that if you knew the seed for a random generator, then you could predict every single output that generator would produce from that point forward. (A good random number generator is one where seeing a sequence of outputs of the generator does not let you easily reverse-engineer what the random seed is or predict other values.)
I seem to remember reading a while back that, a while back, some popular poker websites were not doing a good job choosing their random seeds. Some people figured out that you could input the pattern of cards you were seeing, and the system could then reverse-engineer the random seed and let you predict all the future cards. Oops. These days, we have cryptographically secure pseudorandom generators based on encryption routines that, at least when it comes to what's known in the open literature, can't be predicted even if you have gigabytes of random bits of output from the generators.
If you do need to get something that really isn't predictable - that is, you want to get a bunch of truly random bits - you'll need to use something other than a pseudorandom number generator. Most operating systems have some mechanism in place to generate values that do appear to be truly random. They might, for example, look at how long it takes for different capacitors to discharge on the motherboard, or factor in timing information from a clock, or see how the user interacts with the keyboard, etc. These data can be fed into something called an entropy accumulator that slowly builds up more and more random bits. If you need a value that's truly random and can't be predicted in advance, you can check your particular OS for the mechanism used to get data from the entropy accumulator. (You can read from /dev/random on UNIX-style machines, for example.)
Often, pulling data from the entropy accumulator takes time, since the computer has to wait long enough for enough different sources to mix together to give you back high-quality random data. A common strategy, therefore, is to use the entropy accumulator to get a high-quality random seed, then "stretch" the randomness by using it as the seed of a strong pseudorandom generator.
Here is the language of the C Standard:
7.22.2 Pseudo-random sequence generation functions
7.22.2.1 The rand function
Synopsis
#include <stdlib.h>
int rand(void);
The rand function computes a sequence of pseudo-random integers in the range 0 to RAND_MAX
The rand function is not required to avoid data races with other calls to pseudo-random sequence generation functions. The implementation shall behave as if no library function calls the rand function.
Returns
The rand function returns a pseudo-random integer.
Environmental limits
The value of the RAND_MAX macro shall be at least 32767.
7.22.2.2 The srand function
Synopsis
#include <stdlib.h>
void srand(unsigned int seed);
The srand function uses the argument as a seed for a new sequence of pseudo-random numbers to be returned by subsequent calls to rand. If srand is then called with the same seed value, the sequence of pseudo-random numbers shall be repeated. If rand is called before any calls to srand have been made, the same sequence shall be generated as when srand is first called with a seed value of 1.
The srand function is not required to avoid data races with other calls to pseudo-random sequence generation functions. The implementation shall behave as if no library function calls the srand function.
Returns
The srand function returns no value.
In other words, rand() returns a pseudo-random sequence of integers between 0 and RAND_MAX. The sequence is not random, it is predictable for every value passed to srand(), including if srand() is never called.
In order to try and get different sequences for successive runs of the program, srand() can be called with a rapidly varying value, such as the return value of clock(). Note that calling srand(time(NULL)) will produce the same sequence for multiple runs of the program during the same second.
I can't understand why I have to use rand_r() in generating random numbers in a thread function. And also why I need to use different seed for each thread.
Why I need different seed in each?
rand_r() is a pseudo-random number generator. That is to say, it generates a pseduo-random sequence of numbers: Each call returns the next number in the sequence.
"Random" means "unpredictable." If you have a generator for a truly random sequence of numbers, you will be unable to predict the next number in the sequence, no matter how many of the preceding numbers you already know.
A "Pseudo random" is like a random sequence in some ways—can be used as if it was random in some applications—but it isn't random at all. In fact, it is 100% predictable. All you need to know to predict the next number in the sequence is to know the state of the generator and the algorithm that it uses.
The seed for a pseudo-random generator provides a way to put the generator into a known, repeatable state. If you provide the same seed to two different instances of the generator, then both generators will return exactly the same sequence of values.
Do you want each thread to get exactly the same sequence as every other thread? It's up to you. If that's what you want, then seed each one with the same value. If you want them to get different "random" numbers, then seed each generator with a different value.
Also, if you want different runs of the program to get different "random" values, then you have to seed with a different value each time the program is run.
why I need to use rand_r() in threads
From the documentation of rand : The function rand() is not reentrant or thread-safe, ... this can be done using the reentrant function rand_r().
why I need different seed for each threads?
you don't necessary need, it is your choice to use or not the same seed in all the threads
Is there a way to generate random numbers in c language independent of time.
The idea is that I want to generate an array of random numbers at a time,but since rand() method depends on time,all the values in the array are generated similarly.
rand() doesn't depend on time. People typically seed their pseudo-random number generator using the current time (through the srand() function), but they don't have to. You can just pass whatever number you want to srand().
If your random numbers aren't of a high enough quality for your purposes (libc's rand is notorious for its inadequacy), you should look at other sources of randomness. On most operating systems, you can get high-quality random data just by reading from /dev/random (or /dev/urandom), and the Windows API provides CryptGenRandom. There are also a lot of cross-platform libraries that provide high-quality PRNGS; OpenSSL is one of them.
rand() generates values sequentially (in a time-sequence), but does not depend upon time (as in "time of day"), unless you seed the generator with srand(time(NULL)). If you don't do this, it's dependent on 1 (one).
There's also rand_r() (POSIX) to return the value of the current seed. You could use these to coordinate multiple streams of random-numbers, by saving and restoring the appropriate seed values.
For a non-deterministic seed without using time(NULL) you'll probably have to resort to a system-specific source (/dev/random on unix).
At all costs don't do this, and proceed to use myrand() as a replacement for rand(). This will return the same value for each call during each clock second.
unsigned myrand() { // BAD! NO!
srand(time(NULL)); // re-seeding destroys the properties of `rand()`
return rand();
}
If you call srand(), it should be just once at the beginning of the program.
The sequential determinism of rand() is actually a very useful property for testing programs. What you get is an (almost-)random, but repeatable sequence. If you print out the seed value at the start of the program, you can re-use the same value to produce the same results (like if it doesn't work on that run).
I have a program, in C, which creates an array of 1000 integers using a random number from 0-999 and then performs some sort of algorithm on that array. In order to test the algorithm's running time, I try running the program 10000 times, but every time I run it the array is the same for a few arrays and then it changes. I have used the srand() function to feed the seed with the current time, but it still does not help.
Is there an alternative solution to rand() or a way to fix this?
My function is:
void getarray(int *ptr1, int size, int option){
int n;
srand(time(NULL));
for(n=0; n<size; n++)
*(ptr1+n) = *(ptr2+n)= rand()%1000;
}
Thanks in advance!
You should only call srand once: on program startup.
Right now, if you call your function multiple times before time changes your sequence will be the same.
The lrand48() call tends to have a lot more state internally and a better pseudorandom number distribution.
However, note that you're reseeding with only 1-second granularity, so calls within the same second will generate the same sequence. Put your srand() call in main() or somewhere, once, instead of recalling it in getarray.
You should investigate very carefully if rand() is the better function for the job.
It varies by compiler and platforms but it is often implmented as a "Linear congruential generator" which are very convenient in terms of speed and memory usage but have poor statistical properties (i.e. you can tell if a long enough sequence has been generated by congruential random generator or if it's truly random).
In your use case (testing algorithm's speed) may be perfectly fine to use rand() as long the execution is not influenced by the statistical properties of the data. If rand() is a linear congruential RNG, number sequences show a pattern which means that at any given time it is not true that all the numbers are equiprobable. A nice example is in this wikipedia picture:
Your system might also have a RNG (e.g. /dev/random) and its associated functions but be aware that those are meant to produce few high quality random numbers and may be pretty slow to use. You might even run out of numbers and end up waiting for the system to collect more enthropy!
A simple, pretty fast RNG with statistical properties good enough for cryptography is ISAAC. Personally I use it whenever I need decent random numbers.
Another alternative is to use true random numbers as those generated by RANDOM.org or HotBits but it may be overkill in your case.
As a side note, RANDOM.ORG has a nice page on RNG with another example of "patterns" created by the PHP rand() function
First, call the seed function once, not in a loop.
Second, I suggest you either :
1) Switch to the random(3) function
2) Pick something from rand48 / lrand48
3) Read the number of desired bytes for /dev/random yourself.
Solution 1) is easy and somewhat portable. 2 need a bit of thinking, 3 is the most work, and the least portable.