Seeding random number generator for a batch of programs - c

So normally I use something like:
srand(time(0));
To get pseudo-randomness that changes with every program invocation. However, I'm now in a situation where I have a batch of programs that will all be starting at the same time and since time only changes every second, most of the time all of my programs start with the same seed.
What is a better strategy for seeding my RNG when I want a bunch of programs to start at once and all get different seeds?

Use srand(time(0) ^ getpid()) to permute your seed by a process-specific value. This will ensure different seeds for processes started within the same second. Take care to not use this for anything "important", e.g., cryptography or real money is involved.
The permutation is accomplished using the "exclusive-or" or XOR operator '^'. Because we know that two process running at the same time must necessarily have different process-ids, by xoring the response from time(0) with the current PID, we can get some assurance that two different processes won't have exactly the same seed.
Note that this is a very weak assurance as we are only twiddling a few bits. If the time were to increment by exactly one second and the process id were to increment by exactly one, in certain circumstances you would end up with identical seeds.
If you need truly distinct random number seeds, then you want to read 4 bytes from /dev/random and then use that as an integer to seed your RNG.
And again, PLEASE do not use this random number sequence for anything "important". And by important I mean anything more than a simple monte-carlo simulation or a game of rock-paper-scissors.

You can combine time(0) with some other program-specific value. For example, XOR it with a decent hash of the program name (argv[0] would normally suffice), or even just (preferably a hash of) the process id (if they're launched on the same host, otherwise XOR further with a hash of the hostname or IP). You could even go so far as to use a hash of an UUID.
Note: just xoring with the process id is pretty weak - if the second ticks over while the processes are starting, it could easily be that the bits flipped in the time value match the bits that differ between two of your process ids, leaving the generated seed identical. That said, put in an amount of effort commensurate with how much you have reason to care....

Maybe you need to use getpid() as well as some sub-second time (and perhaps whole second time too). Maybe make an MD5 hash of the values, and use that? Using getpid() guarantees that part of the data you're using will be unique to each process, but doesn't of itself give you very much randomness in your seed. Of course, using rand() is hardly guaranteed to be a good random number generator. You certainly can't use it for cryptography.

Related

How to generate either 0 or 1 randomly in C

I have read so many posts on this topic:
How does rand() work? Does it have certain tendencies? Is there something better to use?
How does the random number generator work in C
and this is what I got:
1) xn+1 depends on xn i.e., previous random number that is generated.
2) It is not recommended to initialize the seed more than once in the program.
3) It is a bad practice to use rand()%2 to generate either 0 or 1 randomly.
My questions are:
1) Are there any other libraries that I missed to take a look to generate a completely random number (either 0 or 1) without depending on previous output?
2) If there is any other work around using the inbuilt rand() function to satisfy the requirement?
3) What is the side effect of initializing the seed more than once in a program?
Code snippet:
srand(time(NULL));
d1=rand()%2;
d2=rand()%2;
Here my intention is to make d1 and d2 completely independent of each other.
My initial thought is to do this:
srand(time(NULL));
d1=rand()%2;
srand(time(NULL));
d2=rand()%2;
But as I mentioned earlier which is based on other posts, this is a bad practice I suppose?
So, can anyone please answer the above questions? I apologize if I completely missed an obvious thing.
Are there any other libraries that I missed to take a look to generate a completely random number between 0 and 1 without depending on previous output?
Not in the standard C library. There are lots of other libraries which generate "better" pseudo-random numbers.
If there is any other work around using the inbuilt rand() function to satisfy the requirement?
Most standard library implementations of rand produce sequences of random numbers where the low-order bit(s) have a short sequence and/or are not as independent of each other as one would like. The high-order bits are generally better distributed. So a better way of using the standard library rand function to generate a random single bit (0 or 1) is:
(rand() > RAND_MAX / 2)
or use an interior bit:
(rand() & 0x400U != 0)
Those will produce reasonably uncorrelated sequences with most standard library rand implementations, and impose no more computational overhead than checking the low-order bit. If that's not good enough for you, you'll probably want to research other pseudo-random number generators.
All of these (including rand() % 2) assume that RAND_MAX is odd, which is almost always the case. (If RAND_MAX were even, there would be an odd number of possible values and any way of dividing an odd number of possible values into two camps must be slightly biased.)
What is the side effect of initializing the seed more than once in a program?
You should think of the random number generator as producing "not very random" numbers after being seeded, with the quality improving as you successively generate new random numbers. And remember that if you seed the random number generator using some seed, you will get exactly the same sequence as you will the next time you seed the generator with the same seed. (Since time() returns a number of seconds, two successive calls in quick succession will usually produce exactly the same number, or very occasionally two consecutive numbers. But definitely not two random uncorrelated numbers.)
So the side effect of reseeding is that you get less random numbers, and possibly exactly the same ones as you got the last time you reseeded.
1) Are there any other libraries that I missed to take a look to
generate a completely random number between 0 and 1 without depending
on previous output?
This sub-question is off-topic for Stack Overflow, but I'll point out that POSIX and BSD systems have an alternative random number generator function named random() that you could consider if you are programming for such a platform (e.g. Linux, OS X).
2) If there is any other work around using the inbuilt rand() function
to satisfy the requirement?
Traditional computers (as opposed to quantum computers) are deterministic machines. They cannot do true randomness. Every completely programmatic "random number generator" is in practice a psuedo-random number generator. They generate completely deterministic sequences, but the values from a given set of calls are distributed across the generator's range in a manner approximately consistent with a target probability distribution (ordinarily the uniform distribution).
Some operating systems provide support for generating numbers that depend on something more chaotic and less predictable than a computed sequence. For instance, they may collect information from mouse movements, CPU temperature variations, or other such sources, to produce more objectively random (yet still deterministic) numbers. Linux, for example, has such a driver that is often exposed as the special file /dev/random. The problem with these is that they have a limited store of entropy, and therefore cannot provide numbers at a sustained high rate. If you need only a few random numbers, however, then that might be a suitable source.
3) What is the side effect of initializing the seed more than once in
a program?
Code snippet:
srand(time(NULL));
d1=rand()%2;
d2=rand()%2;
Here my intention is to make d1 and d2 completely independent of each
other.
My initial thought is to do this:
srand(time(NULL));
d1=rand()%2;
srand(time(NULL));
d2=rand()%2;
But as I mentioned earlier which is based on other posts, this is a
bad practice I suppose?
It is indeed bad if you want d1 and d2 to have a 50% probability of being different. time() returns the number of seconds since the epoch, so it is highly likely that it will return the same value when called twice so close together. The sequence of pseudorandom numbers is completely determined by the seed (this is a feature, not a bug), and when you seed the PRNG, you restart the sequence. Even if you used a higher-resolution clock to make the seeds more likely to differ, you don't escape correlation this way; you just change the function generating numbers for you. And the result does not have the same guarantees for output distribution.
Additionally, when you do rand() % 2 you use only one bit of the approximately log2(RAND_MAX) + 1 bits that it produced for you. Over the whole period of the PRNG, you can expect that bit to take each value the same number of times, but over narrow ranges you may sometimes see some correlation.
In the end, your requirement for your two random numbers to be completely independent of one another is probably way overkill. It is generally sufficient for the pseudo-random result of one call to be have no apparent correlation with the results of previous calls. You probably achieve that well enough with your first code snippet, even despite the use of only one bit per call. If you prefer to use more of the bits, though, then with some care you could base the numbers you choose on the parity of the count of how many bits are set in the values returned by rand().
Use this
(double)rand() / (double)RAND_MAX
completely random number ..... without depending on previous output?
Well in reality computers can't generate completely random numbers. There has to be some dependencies. But for almost all practical purposes, you can use rand().
side effect of initializing the seed more than once
No side effect. But that would mean you're completely invalidating the point of using rand(). If you're re initilizeing seed every time, the random number is more dependent on time(and processor).
any other work around using the inbuilt rand() function
You can write something like this:
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
int main(int argc,char *argv[])
{
srand(time(NULL));
printf("%lf\n",(double)rand()/(double)RAND_MAX);
printf("%lf\n",(double)rand()/(double)RAND_MAX);
}
If you want to generate either a 0 or a 1, I think using rand()%2 is perfectly fine as the probability of an even number is same as probability of an odd number(probability of all numbers is equal for an unbiased random number generator).

Seeding rand() by itself

If a program generates a few numbers using rand(), stores the last rand() result, and on repeated runs use srand(stored_seed), would this provide some shorter, but still usable random number sequence?
srand should be run exactly once. If you initialize it more than once the resulted sequence may not be so random.
A good way to initialize the PRNG is srand(time(NULL)*getpid());
alternatively you can try:
timeval t;
gettimeofday(&t, NULL);
srand((t.tv_usec/100) + (t.tv_sec/100));//getpid is optional
Explanation:
A PRNG (Pseudo-Random Number Generator) generates a deterministic
sequence of numbers dependent on the algorithm used. A given algorithm
will always produce the same sequence from a given starting point
(seed). If you don't explicitly see the PRNG then it will usually
start from the same default seed every time an application is run,
resulting in the same sequence of numbers being used.
To fix this you need to seed the PRNG yourself with a different seed
(to give a different sequence) each time the application is run. The
usual approach is to use time(NULL) which sets the seed based on the
current time. As long as you don't start two instances of the
application within a second of each other, you'll be guaranteed a
different random sequence.
There's no need to seed the sequence each time you want a new random
number. And I'm not sure about this, but I have the feeling that
depending on the PRNG algorithm being used re-seeding for every new
number may actually result in lower randomness in the resulting
sequence.
Source: link

Random Number Seed Generator: Using System On Time

I'm trying to understand why this code is a security flaw. From my understanding, it is not safe to generate a random number seed using the system time when the program is turned on, but how is that guessable? I'm not sure I see a possible exploit.
Code example removed due to DMCA Request by D. Wagner
For a start, your code has a bug where it doesn't use the value in time_in_sec at all - it overwrites that in the following line:
seed = time_micro_sec >> 7;
Further, time_micro_sec only has 1000000 possible values, and after right-shifting by 7 that reduces to only 7813 possible values. This space is trivial to search through by brute-force.
Even if you fix these bugs, at the end of the day the srand() / rand() random number generator is just not a cryptographically-strong PRNG. The interface is ultimately limited by the fact that srand() takes an unsigned int argument, which on common platforms limits the entropy in the initial state to just 32 bits. This is insufficient to prevent a brute-force attack on the seed.
Do not use srand() / rand() for security-critical random numbers.
If you could predict what the system time is when you run the program (which is not a stretch if you're the one running the program), then you can predict the value fed into the seed and thus determine what the "random" number generator will generate.
Ignoring the outright bug, which I assume is not intentional...
You don't have to predict the exact time for this to be a problem - just the rough when. For any given seed, the random values generated are a completely predictable sequence. So, if you repeatedly make actions, you can attempt to deduce which seed matches the pattern you're seeing If you can narrow down your start time to within a minute, say, that's only 60,000 seeds to search over (assuming milliseconds, though you mix comments about millis, variables named micros, and values which don't appear to be either), which might be simple. A day is only 86,400,000 - larger, but doable.
If I have the power to, e.g. crash your application and force a restart, it becomes MUCH easier to predict your seed, and then further exploit it. If not, if you have local access, finding the start time is straightforward. If it's a remote attack, it may be harder, but, e.g. on an MMO, server restarts might be predictable, your bank may do service updates Sunday from 10PM to 4AM, etc. - all of which leak start times.
Basically: if you want something secure, use something that's designed to be secure in the first place.

trying to find a fully-random number generator

I work in C and i am trying(desperately) to make a random generator that not only generates
a different number every time i run the generator but also a different sequence every time i run the program.I tested almost everything i found online.I resulted in 2 good ways to make a good random generator.
The first one is to use a different seed every time.But this means that i have to use a different-random seed every time,a matter that i didn't solve at first.Here is what i am trying now but it's not truly random as i want:
int myrand(int random_seed){
random_seed = random_seed * 1103515245 +12345;
return (unsigned int)(random_seed / 65536) % 32768;
}
Every time i call the function i increase the seed by 1.
The second way is to use time.Time changes and this is randomness.I also tried many ways to implement this.My latest try is here:
Compiler error-Possible IDE error"undefined reference to gettimeofday error"
but i couldn't use the gettimeofday function because i work in Windows.Also in that question i didn't get any answers.
Could anyone give help me of how i can implement a random generator(probably using time) in C working in Windows?Or should i use Unix?
Seed your RNG with a good source of entropy.
Under unix, use /dev/random.
Under windows, use something like CryptoAPI - Windows equivalent of /dev/random
What you are asking for is not a random number generator, but how to use the random number generator already included in the C standard library.
All you need to do is seed it once at program startup:
srand(time(NULL));
That's all. It's portable and will give you a different sequence every time you run the program, given that at least one second has passed since the last time you've ran it.
There is no harm in seeding it again later, but no point in it either.
The C standard library has the header time.h (or ctime if you are using C++)(reference). The functions there will be supported in Windows and Unix.
I would recommend time() or clock() as seed for your random number generator.
An other way to get totally random input is the usage of the mouse position or other things influenced from outside.
There are many ways to implement prng but unfortunately none of them is real random number generator. time(NULL) is a good approach but I'm using "blum blum shub". It generates one bit random number
Since you're asking explicitly for a Windows solution I'd suggest to avoid time(NULL) or clock() as a seed for srand()since their granularity is very limited (ms). Instead you could use the result of the performance counter:
LARGE_INTEGER PerformanceCount;
QueryPerformanceCounter(&PerformanceCount);
srand(PerformanceCount.LowPart);
The increment rate of the frequency of QueryPerformanceCounter() can be obtained by a call to QueryPerformanceFrequency(). This typically increases at at least 1 MHz and sometimes even into the GHz range. Therefore it provides a fast changing source for the seed.
Edit: As understood from your earlier question also a gettimeofday() alike implementation won't give fine granularity. It may show the word tv_usec in its argument but on WIndows it will not provide microseconds granularity as it does on Linux systems.
quote:
to make a random generator that not only generates a different number every time i run the generator
Definitions of random do not include that concept. Rather the idea is that you have an equal chance of selecting any number, regardless of the number previously chosen. Which means it is theoretically possible to pick the same number twice.
If you are dealing a deck of cards then that meets your criterion of no duplicates. Using the deck dealing approach means keeping track of "used" numbers.
You should also be aware that PNRGs (pseudorandom number generators) are cyclic (periodic). After you have generated numbers, usually a large number, you then start all over again and repeat exactly the name sequence of numbers. The UNIX rand() function generates integers integers in the range [0, {RAND_MAX}] and has a period of 2^32
Really consider reading this short page:
See: http://pubs.opengroup.org/onlinepubs/009695399/functions/rand.html

Alternative Entropy Sources

Okay, I guess this is entirely subjective and whatnot, but I was thinking about entropy sources for random number generators. It goes that most generators are seeded with the current time, correct? Well, I was curious as to what other sources could be used to generate perfectly valid, random (The loose definition) numbers.
Would using multiple sources (Such as time + current HDD seek time [We're being fantastical here]) together create a "more random" number than a single source? What are the logical limits of the amount of sources? How much is really enough? Is the time chosen simply because it is convenient?
Excuse me if this sort of thing is not allowed, but I'm curious as to the theory behind the sources.
The Wikipedia article on Hardware random number generator's lists a couple of interesting sources for random numbers using physical properties.
My favorites:
A nuclear decay radiation source detected by a Geiger counter attached to a PC.
Photons travelling through a semi-transparent mirror. The mutually exclusive events (reflection — transmission) are detected and associated to "0" or "1" bit values respectively.
Thermal noise from a resistor, amplified to provide a random voltage source.
Avalanche noise generated from an avalanche diode. (How cool is that?)
Atmospheric noise, detected by a radio receiver attached to a PC
The problems section of the Wikipedia article also describes the fragility of a lot of these sources/sensors. Sensors almost always produce decreasingly random numbers as they age/degrade. These physical sources should be constantly checked by statistical tests which can analyze the generated data, ensuring the instruments haven't broken silently.
SGI once used photos of a lava lamp at various "glob phases" as the source for entropy, which eventually evolved into an open source random number generator called LavaRnd.
I use Random.ORG, they provide free random data from Atmospheric noise, that I use to periodically re-seed a Mersene-Twister RNG. Its about as random as you can get with no hardware dependencies.
Don't worry about a "good" seed for a random number generator. The statistical properties of the sequence do not depend on how the generator is seeded. There are other things, however. to worry about. See Pitfalls in Random Number Generation.
As for hardware random number generators, these physical sources have to be measured, and the measurement process has systematic errors. You might find "pseudo" random numbers to have higher quality than "real" random numbers.
Linux kernel uses device interrupt timing (mouse, keyboard, hard drives) to generate entropy. There's a nice article on Wikipedia on entropy.
Modern RNGs are both checked against correlations in nearby seeds and run several hundred iterations after the seeding. So, the unfortunately boring but true answer is that it really doesn't matter very much.
Generally speaking, using random physical processes have to be checked that they conform to a uniform distribution and are otherwise detrended.
In my opinion, it's often better to use a very well understood pseudo-random number generator.
I've used an encryption program that used the users mouse movement to generate random numbers. The only problem was that the program had to pause and ask the user to move the mouse around randomly for a few seconds to work properly which might not always be practical.
I found HotBits several years ago - the numbers are generated from radioactive decay, genuinely random numbers.
There are limits on how many numbers you can download a day, but it has always amused me to use these as really, really random seeds for RNG.
Some TPM (Trusted Platform Module) "chips" have a hardware RNG. Unfortunately, the (Broadcom) TPM in my Dell laptop lacks this feature, but many computers sold today come with a hardware RNG that uses truly unpredictable quantum mechanical processes. Intel has implemented the thermal noise variety.
Also, don't use the current time alone to seed an RNG for cryptographic purposes, or any application where unpredictability is important. Using a few low order bits from the time in conjunction with several other sources is probably okay.
A similar question may be useful to you.
Sorry I'm late to this discussion (what is it 3 1/2 years old now?), but I've a rekindled interest in PRN generation and alternate sources of entropy. Linux kernel developer Rusty Russell recently had a discussion on his blog on alternate sources of entropy (other than /dev/urandom).
But, I'm not all that impressed with his choices; a NIC's MAC address never changes (although it is unique from all others), and PID seems like too small a possible sample size.
I've dabbled with a Mersenne Twister (on my Linux box) which is seeded with the following algorithm. I'm asking for any comments/feedback if anyone's willing and interested:
Create an array buffer of 64 bits + 256 bits * number of /proc files below.
Place the time stamp counter (TSC) value in the first 64 bits of this buffer.
For each of the following /proc files, calculate the SHA256 sum:
/proc/meminfo
/proc/self/maps
/proc/self/smaps
/proc/interrupts
/proc/diskstats
/proc/self/stat
Place each 256-bit hash value into its own area of the array created in (1).
Create a SHA256 hash of this entire buffer. NOTE: I could (and probably should) use a different hash function completely independent of the SHA functions - this technique has been proposed as a "safeguard" against weak hash functions.
Now I have 256 bits of HOPEFULLY random (enough) entropy data to seed my Mersenne Twister. I use the above to populate the beginning of the MT Array (624 32-bit integers), and then initialize the remainder of that array with the MT author's code. Also, I could use a different hash function (e.g. SHA384, SHA512), but I'd need a different size array buffer (obviously).
The original Mersenne Twister code called for one single 32-bit seed, but I feel that's horribly inadequate. Running "merely" 2^32-1 different MTs in search of breaking the crypto is not beyond the realm of practical possibility in this day and age.
I'd love to read anyone's feedback on this. Criticism is more than welcome. I will defend my use of the /proc files as above because they're constantly changing (especially the /proc/self/* files, and the TSC always yields a different value (nanosecond [or better] resolution, IIRC). I've run Diehard tests on this (to the tune of several hundred billion bits), and it seems to be passing with flying colors. But that's probably more testament to the soundness of the Mersenne Twister as a PRNG than to how I'm seeding it.
Of course, these aren't totally impervious to someone hacking them, but I just don't see all of these (and SHA*) being hacked and broken to in my lifetime.
Some use keyboard input (timeouts between keystrokes), I heard of I think in a novel that radio static reception can be used - but of course that requires other hardware and software...
Noise on top of the Cosmic Microwave Background spectrum. Of course you must first remove some anisotropy, foreground objects, correlated detector noise, galaxy and local group velocities, polarizations etc. Many pitfalls remain.
Source of seed isn't that much important. More important is the pseudo numbers generator algorithm. However I've heard some time ago about generating seed for some bank operations. They took many factors together:
time
processor temperature
fan speed
cpu voltage
I don't remember more :)
Even if some of these parameters doesn't change much in time, you can put them into some good hashing function.
How to generate good random number?
Maybe we can take into account inifinite number of universes? If this is true, that all the time new parallel universes are being created, we can do something like this:
int Random() {
return Universe.object_id % MAX_INT;
}
In every moment we should be on another branch of parallel universes, so we should have different id. The only problem is how to get Universe object :)
How about spinning off a thread that will manipulate some variable in a tight loop for a fixed amount of time before it is killed. What you end up with will depend on the processor speed, system load, etc... Very hokey, but better than just srand(time(NULL))...
Don't worry about a "good" seed for a random number generator. The statistical properties of the sequence do not depend on how the generator is seeded.
I disagree with John D. Cook's advice. If you seed the Mersenne Twister with all bits set to zero except one, it will initially generate numbers which are anything but random. It takes a long time for the generator to churn this state into anything that would pass statistical tests. Simply setting the first 32 bits of the generator to a seed will have a similar effect. Also, if the entire state is set to zero the generator will produce endless zeroes.
Properly written RNG code will have a properly written seeding algorithm that accepts say a 64 bit value and seeds the generator so it will produce decent random numbers for each possible input. So if you are using a reliable library then any seed will do. But if you hack together your own implementation then you need to be careful.

Resources