I'm building project for studies on ATmega128 with my friend and we have a problem with random number generator (from 0 to 5), because the function always show the same result. We can't add time.h, because AVR Studio didn't accept this.
Code below:
uint8_t randomNumber(uint8_t r){
r = rand()%5;
return r;
}
other try
uint8_t randomNumber(uint8_t min, uint8_t max){
uint8_t = result;
result = min + rand() % (max+1 - min);
return result;
}
Any ideas?
Thanks,
Sebastian
Wow this question sent me down the rabbit hole.
Pseudorandom numbers are relatively easy to generate.
Truly random numbers are VERY hard to generate.
The quality of your random number (whether biases appear in it) depends completely on your seed value.
Seed values for random number generators must be (wait for it) random, otherwise people can guess which numbers you are using, defeating the randomness of your generator.
Where to get a random seed value from?
Options proposed by the internet:
Natural noise from the environment (read an adc, or... https://www.fourmilab.ch/hotbits/ (i know it's not practical for an arduino project, but interesting none the less)).
Time user inputs (humans are by default not precise).
Timing differences between crystals. [https://en.wikipedia.org/wiki/Clock_drift]
Mild disclaimer:
1/3 have been proven unsafe in commercial environments, and it's easy to see how #2 could be gamed by using a computer instead of a human.
So the quickest way is probably to use a floating ADC. Before you think this is a good idea: https://skemman.is/bitstream/1946/10689/1/ardrand.pdf
Remember: Larger pools of seeds increases the randomness (aka using a 32bit random seed value is better than using a boolean random seed value).
ADC's on the 128 have 1024 values, realistically, a floating point value will trend to far less than that (I've read you should treat it like 32).
To improve your chances of getting random numbers, take the lowest bit from the adc reading multiple times (aka read the adc 16 times to get a 16 bit "random" number).
Assuming you have your adc set up etc.
UNTESTED PSEUDO CODE
/* srand example */
#include <stdio.h> /* printf, NULL */
#include <stdlib.h> /* srand, rand */
#include <avr/io.h>
//pseudo code. you must implement init_adc() and read_adc()
int main ()
{
//Init and seed.
uint16_t u_rand_val = 0;
uint16_t u_seed_rand_val = 0;
init_adc();
//Note we're assuming the channel that you are reading from is FLOATING or hooked up to something very noisy.
//Gather bits from the adc, pushing them into your pseudorandom seed.
for(uint8_t i=0; i<16; i++){
u_seed_rand_val = u_seed_rand_val<<1 | (read_adc()&0b1);
}
srand (u_seed_rand_val);
while(1){
//Do whatever you were going to do.
//Note that calls to rand() use the seed set up by srand above.
u_rand_val = rand()%5;
print("Cur val:%u", u_rand_val);
}
return 0;
}
Related
This question already has answers here:
Rand() % 14 only generates the values 6 or 13
(3 answers)
Closed 1 year ago.
I have a problem, I want to use rand() to get a random number between 0 and 6, but it always gives me 4 at each run, even when I call srand(time(NULL))
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int main(void)
{
srand(time(NULL));
int rd = rand() % 7;
printf("%d\n", rd);
return (0);
}
output is 4 at each run
There are two fundamental problems with your code which, in combination, produce the curious result you're experiencing.
Almost anyone will warn you about the use of the rand() interface. Indeed, the Mac OS manpage itself starts with a warning:
$ man rand
NAME
rand, srand, sranddev, rand_r -- bad random number generator
Yep, it's a bad random number generator. Bad random number generators can be hard to seed, among other problems.
But speaking of seeding, here's another issue, perhaps less discussed but nonetheless important:
Do not use time(NULL) to seed your random number generator.
The linked answer goes into more detail about this, but the basic issue is simple: the value of time(NULL) changes infrequently (if frequently is measured in nanoseconds), and doesn't change much when it changes. So not only are you relying on the program to not be run very often (or at least less than once per second), you're also depending on the random number generator to produce radically different values from slightly different seeds. Perhaps a good random number generator would do that, but we've already established that rand() is a bad random number generator.
OK, that's all very general. The specific problem is somewhat interesting, at least for academic purposes (academic, since the practicial solution is always "use a better random number generator and seed it with a good random seed"). The precise problem here is that you're using rand() % 7.
That's a problem because what the Mac OS / FreeBSD implementation of rand() does is to multiply the seed by a multiple of 7. Because that product is reduced modulo 232 (which is not a multiple of 7), the value modulo 7 of the first random number produced by slowly incrementing seeds will eventually change, but it will have to wait until the amount of the overflow changes.
Here's a link to the code. The essence is in these three lines:
hi = *ctx / 127773;
lo = *ctx % 127773;
x = 16807 * lo - 2836 * hi;
which, according to a comment, "compute[s] x = (7^5 * x) mod (2^31 - 1) without overflowing 31 bits." x is the value which will eventually be returned (modulo 232) and it is also the next seed. *ctx is the current seed.
16807 is, as the comment says, 75, which is obviously divisible by 7. And 2836 mod 7 is 1. So by the rules of modular arithmetic:
x mod 7 = (16807 * lo) mod 7 - (2836 * hi) mod 7
= 0 - hi mod 7
That value only depends on hi, which is seed / 127773. So hi changes exactly once every 127773 ticks. Since the result of time(NULL) is in seconds, that's one change in 127773 seconds, which is about a day and a half. So if you ran your program once a day, you'd notice that the first random number is sometimes the same as the previous day and sometimes one less. But you're running it quite a bit more often than that, even if you wait a few seconds between runs, so you just see the same first random number every time. Eventually it will tick down and then you'll see a series of 3s instead of 4s.
As mentioned by #rici, the problem is caused by the poor implementation of rand(). The man page for srand() recommends using arc4random() instead. Alternatively, you could try seeding with a value taken directly from /dev/urandom as follows:
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
int seed;
FILE *f = fopen("/dev/urandom", "r");
fread(&seed, sizeof(int), 1, f);
srand(seed);
fclose(f);
/* Should be a lot more unpredictable: */
printf("%d\n", rand() % 7);
return (0);
}
I was implementing a hashmap in C as part of a project I'm working on and using random inserts to test it. I noticed that rand() on Linux seems to repeat numbers far more often than on Mac. RAND_MAX is 2147483647/0x7FFFFFFF on both platforms. I've reduced it to this test program that makes a byte array RAND_MAX+1-long, generates RAND_MAX random numbers, notes if each is a duplicate, and checks it off the list as seen.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
int main() {
size_t size = ((size_t)RAND_MAX) + 1;
char *randoms = calloc(size, sizeof(char));
int dups = 0;
srand(time(0));
for (int i = 0; i < RAND_MAX; i++) {
int r = rand();
if (randoms[r]) {
// printf("duplicate at %d\n", r);
dups++;
}
randoms[r] = 1;
}
printf("duplicates: %d\n", dups);
}
Linux consistently generates around 790 million duplicates. Mac consistently only generates one, so it loops through every random number that it can generate almost without repeating. Can anyone please explain to me how this works? I can't tell anything different from the man pages, can't tell which RNG each is using, and can't find anything online. Thanks!
While at first it may sound like the macOS rand() is somehow better for not repeating any numbers, one should note that with this amount of numbers generated it is expected to see plenty of duplicates (in fact, around 790 million, or (231-1)/e). Likewise iterating through the numbers in sequence would also produce no duplicates, but wouldn't be considered very random. So the Linux rand() implementation is in this test indistinguishable from a true random source, whereas the macOS rand() is not.
Another thing that appears surprising at first glance is how the macOS rand() can manage to avoid duplicates so well. Looking at its source code, we find the implementation to be as follows:
/*
* Compute x = (7^5 * x) mod (2^31 - 1)
* without overflowing 31 bits:
* (2^31 - 1) = 127773 * (7^5) + 2836
* From "Random number generators: good ones are hard to find",
* Park and Miller, Communications of the ACM, vol. 31, no. 10,
* October 1988, p. 1195.
*/
long hi, lo, x;
/* Can't be initialized with 0, so use another value. */
if (*ctx == 0)
*ctx = 123459876;
hi = *ctx / 127773;
lo = *ctx % 127773;
x = 16807 * lo - 2836 * hi;
if (x < 0)
x += 0x7fffffff;
return ((*ctx = x) % ((unsigned long) RAND_MAX + 1));
This does indeed result in all numbers between 1 and RAND_MAX, inclusive, exactly once, before the sequence repeats again. Since the next state is based on multiplication, the state can never be zero (or all future states would also be zero). Thus the repeated number you see is the first one, and zero is the one that is never returned.
Apple has been promoting the use of better random number generators in their documentation and examples for at least as long as macOS (or OS X) has existed, so the quality of rand() is probably not deemed important, and they've just stuck with one of the simplest pseudorandom generators available. (As you noted, their rand() is even commented with a recommendation to use arc4random() instead.)
On a related note, the simplest pseudorandom number generator I could find that produces decent results in this (and many other) tests for randomness is xorshift*:
uint64_t x = *ctx;
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
*ctx = x;
return (x * 0x2545F4914F6CDD1DUL) >> 33;
This implementation results in almost exactly 790 million duplicates in your test.
MacOS provides an undocumented rand() function in stdlib. If you leave it unseeded, then the first values it outputs are 16807, 282475249, 1622650073, 984943658 and 1144108930. A quick search will show that this sequence corresponds to a very basic LCG random number generator that iterates the following formula:
xn+1 = 75 · xn (mod 231 − 1)
Since the state of this RNG is described entirely by the value of a single 32-bit integer, its period is not very long. To be precise, it repeats itself every 231 − 2 iterations, outputting every value from 1 to 231 − 2.
I don't think there's a standard implementation of rand() for all versions of Linux, but there is a glibc rand() function that is often used. Instead of a single 32-bit state variable, this uses a pool of over 1000 bits, which to all intents and purposes will never produce a fully repeating sequence. Again, you can probably find out what version you have by printing the first few outputs from this RNG without seeding it first. (The glibc rand() function produces the numbers 1804289383, 846930886, 1681692777, 1714636915 and 1957747793.)
So the reason you're getting more collisions in Linux (and hardly any in MacOS) is that the Linux version of rand() is basically more random.
rand() is defined by the C standard, and the C standard does not specify which algorithm to use. Obviously, Apple is using an inferior algorithm to your GNU/Linux implementation: The Linux one is indistinguishable from a true random source in your test, while the Apple implementation just shuffles the numbers around.
If you want random numbers of any quality, either use a better PRNG that gives at least some guarantees on the quality of the numbers it returns, or simply read from /dev/urandom or similar. The later gives you cryptographic quality numbers, but is slow. Even if it is too slow by itself, /dev/urandom can provide some excellent seeds to some other, faster PRNG.
In general, the rand/srand pair has been considered sort of deprecated for a long time due to low-order bits displaying less randomness than high-order bits in the results. This may or may not have anything to do with your results, but I think this is still a good opportunity to remember that even though some rand/srand implementations are now more up to date, older implementations persist and it's better to use random(3). On my Arch Linux box, the following note is still in the man page for rand(3):
The versions of rand() and srand() in the Linux C Library use the same
random number generator as random(3) and srandom(3), so the lower-order
bits should be as random as the higher-order bits. However, on older
rand() implementations, and on current implementations on different
systems, the lower-order bits are much less random than the higher-or-
der bits. Do not use this function in applications intended to be por-
table when good randomness is needed. (Use random(3) instead.)
Just below that, the man page actually gives very short, very simple example implementations of rand and srand that are about the simplest LC RNGs you've ever seen and having a small RAND_MAX. I don't think they match what's in the C standard library, if they ever did. Or at least I hope not.
In general, if you're going to use something from the standard library, use random if you can (the man page lists it as POSIX standard back to POSIX.1-2001, but rand is standard way back before C was even standardized). Or better yet, crack open Numerical Recipes (or look for it online) or Knuth and implement one. They're really easy and you only really need to do it once to have a general purpose RNG with the attributes you most often need and which is of known quality.
I have this homework where i need to implement xorshift32(i can t use anything else) so i can generate some numbers but i don t understand how the algorithm works or how to implement it.
I am trying to print the generated number but i don t know how to call the xorshift32 function because of the state[static 1] argument.
uint32_t xorshift32(uint32_t state[static 1])
{
uint32_t x = state[0];
x ^= x << 13;
x ^= x >> 17;
x ^= x << 5;
state[0] = x;
return x;
}
I do not have much information on xorshft32 other that what is on wikipedia(en.wikipedia.org/wiki/Xorshift).
This is an extended comment to the good answer by Jabberwocky.
The Xorshift variants, rand(), and basically all random number generator functions, are actually pseudorandom number generators. They are not "real random", because the sequence of numbers they generate depends on their internal state; but they are "pseudorandom", because if you do not know the generator internal state, the sequence of numbers they generate is random in the statistical sense.
George Marsaglia, the author of the Xorshift family of pseudorandom number generators, also developed a set of statistical tools called Diehard tests that can be used to analyse the "randomness" of the sequences generated. Currently, the TestU01 tests are probably the most widely used and trusted; in particular, the 160-test BigCrush set.
The sequence generated by ordinary pseudorandom number generators often allows one to determine the internal state of the generator. This means that observing a long enough generated sequence, allows one to fairly reliably predict the future sequence. Cryptographically secure pseudorandom number generators avoid that, usually by applying a cryptographically secure hash function to the output; one would need a catalog of the entire sequence to be able to follow it. When the periods are longer than 2256 or so, there is not enough baryonic matter in the entire observable universe to store the sequence.
My own favourite PRNG is Xorshift64*, which has a period of 264-1, and passes all but the MatrixRank test in BigCrush. In C99 and later, you can implement it using
#include <inttypes.h>
typedef struct {
uint64_t state;
} prng_state;
static inline uint64_t prng_u64(prng_state *const p)
{
uint64_t state = p->state;
state ^= state >> 12;
state ^= state << 25;
state ^= state >> 27;
p->state = state;
return state * UINT64_C(2685821657736338717);
}
The state can be initialized to any nonzero uint64_t. (A zero state will lead the generator to generate all zeros till infinity. The period is 264-1, because the generator will have each 64-bit state (excluding zero) exactly once during each period.)
It is good enough for most use cases, and extremely fast. It belongs to the class of linear-feedback shift register pseudorandom number generators.
Note that the variant which returns an uniform distribution between 0 and 1,
static inline double prng_one(prng_state *p)
{
return prng_u64(p) / 18446744073709551616.0;
}
uses the high bits; the high 32 bits of the sequence does pass all BigCrunch tests in TestU01 suite, so this is a surprisingly good (randomness and efficiency) generator for double-precision uniform random numbers -- my typical use case.
The format above allows multiple independent generators in a single process, by specifying the generator state as a parameter. If the basic generator is implemented in a header file (thus the static inline; it is a preprocessor macro-like function), you can switch between generators by switching between header files, and recompiling the binary.
(You are usually better off by using a single generator, unless you use multiple threads in a pseudorandom number heavy simulator, in which case using a separate generator for each thread will help a lot; avoids cacheline ping-pong between threads competing for the generator state, in particular.)
The rand() function in most C standard library implementations is a linear-congruential generator. They often suffer from poor choices of the coefficients, and nowadays, also from the relative slowness of the modulo operator (when the modulus is not a power of two).
The most widely used pseudorandom number generator is the Mersenne Twister, by Makoto Matsumoto (松本 眞) and Takuji Nishimura (西村 拓士). It is a twisted generalized linear feedback shift register, and has quite a large state (about 2500 bytes) and very long period (219937-1).
When we talk of true random number generators, we usually mean a combination of a pseudorandom number generator (usually a cryptographically secure one), and a source of entropy; random bits with at least some degree of true physical randomness.
In Linux, Mac OS, and BSDs at least, the operating system kernel exposes a source of pseudorandom numbers (getentropy() in Linux and OpenBSD, getrandom() in Linux, /dev/urandom, /dev/arandom, /dev/random in many Unixes, and so on). Entropy is gathered from physical electronic sources, like internal processor latencies, physical interrupt line timings, (spinning disk) hard drive timings, possibly even keyboard and mice. Many motherboards and some processors even have hardware random number sources that can be used as sources for entropy (or even directly as "trusted randomness sources").
The exclusive-or operation (^ in C) is used to mix in randomness to the generator state. This works, because exclusive-or between a known bit and a random bit results in a random bit; XOR preserves randomness. When mixing entropy pools (with some degree of randomness in the bit states) using XOR, the result will have at least as much entropy as the sources had.
Note that that does not mean that you get "better" random numbers by mixing the output of two or more generators. The statistics of true randomness is hard for humans to grok (just look at how poor the common early rand() implementations were! HORRIBLE!). It is better to pick a generator (or a set of generators to switch between at compile time, or at run time) that passes the BigCrunch tests, and ensure it has a good random initial state on every run. That way you leverage the work of many mathematicians and others who have worked on these things for decades, and can concentrate on the other stuff, what you yourself are good at.
The C code in the wikipedia article is somewhat misleading:
Here is a working example that uses both the 32 bit and the 64 bit versions:
#include <stdio.h>
#include <stdint.h>
/* The state word must be initialized to non-zero */
uint32_t xorshift32(uint32_t state[])
{
/* Algorithm "xor" from p. 4 of Marsaglia, "Xorshift RNGs" */
uint32_t x = state[0];
x ^= x << 13;
x ^= x >> 17;
x ^= x << 5;
state[0] = x;
return x;
}
uint64_t xorshift64(uint64_t state[])
{
uint64_t x = state[0];
x ^= x << 13;
x ^= x >> 7;
x ^= x << 17;
state[0] = x;
return x;
}
int main()
{
uint32_t state[1] = {1234}; // "seed" (can be anthing but 0)
for (int i = 0; i < 50; i++)
{
printf("%u\n", xorshift32(state));
}
uint64_t state64[1] = { 1234 }; // "seed" (can be anthing but 0)
for (int i = 0; i < 50; i++)
{
printf("%llu\n", xorshift64(state64));
}
}
The mathematical aspects are explained in the wikipedia article and in it's footnotes.
The rest is basic C language knowledge, ^ is the C bitwise XOR operator.
I'm interested in generating fast random booleans (or equivalently a Bernoulli(0.5) random variable) in C. Of course if one has a fast random generator with a decent statistical behaviour the problem "sample a random Bernoulli(0.5)" is easily solved: sample x uniformly in (0,1) and return 1 if x<0.5, 0 otherwise.
Suppose speed is the most important thing, now I have two questions/considerations:
Many random doubles generators first generate an integer m uniformly in a certain range [0,M] and then simply return the division m/M. Wouldn't it be faster just to check whether m < M/2 (here M/2 is fixed, so we are saving one division)
Is there any faster way to do it? At the end, we're asking for way less statistical properties here: we're maybe still interested in a long period but, for example, we don't care about the uniformity of the distribution (as long as roughly 50% of the values are in the first half of the range).
Extracting say the last bit of a random number can wreak havoc as linear congruential generators can alternate between odd and even numbers1. A scheme like clock() & 1 would also have ghastly correlation plains.
Consider a solution based on the quick and dirty generator of Donald Kunth: for uint32_t I, sequence
I = 1664525 * I + 1013904223;
and 2 * I < I is the conditional yielding the Boolean drawing. Here I'm relying on the wrap-around behaviour of I which should occur half the time, and a potentially expensive division is avoided.
Testing I <= 0x7FFFFFFF is less flashy and might be faster still, but the hardcoding of the midpoint is not entirely satisfactory.
1 The generator I present here does.
I'm interested in generating fast random booleans
Using a LCG can be fast, yet since OP's needs only a bool result, consider extracting only 1 bit at a time from a reasonable generator and save the rest for later. #Akshay L Aradhya
Example based on #R.. and #R.. code.
extern uint32_t lcg64_temper(uint64_t *seed); // see R.. code
static uint64_t gseed; // Initialize this in some fashion.
static unsigned gcount = 0;
bool rand_bool(void) {
static uint32_t rbits;
if (gcount == 0) {
gcount = 32; // I'd consider using 31 here, just to cope with some LCG weaknesses.
rbits = lcg64_temper(&gseed);
}
gcount--;
bool b = rbits & 1;
rbits >>= 1;
return b;
}
I need to randomly generate bits but the number of bits should be in a definite ratio.
Say I want to generate a 100 bits.
So if the ratio is 3:2
It has to generate 60 0s and 40 1s.
How will I be able to achieve this in C?
Suppose you have want a 1 with probability p where p is double in the inclusive range [0.0,1.0].
Then you can use this logic of rand_bit() below.
#include <stdio.h>
#include <stdlib.h>
int rand_bit(double p){
if(p==1.0){//Unusual but OK exact comparison of double.
return 1;
}
double r=((double)rand())/((double)RAND_MAX);
return r<p?1:0;
}
//Demonstration...
int main(void) {
srand(78721);//Demonstration is reproducible....
const int test=1000000;
int count=0;
double p=0.6;//60% 1s.
for(int i=1;i<=test;++i){
count+=rand_bit(p);
}
double prop=((double)count)/((double)test);
printf("%f (error=%f)\n",prop,(prop-p));
return 0;
}
Typical output:
0.600500 (error=0.000500)
Remember to seed the random number generator with srand(). Pass in a fixed value as above to get a reproducible result or srand((int)time(NULL)); to get different results run-to-run.
Also note that the built in random number generators in C are generally not great.
They're usually fine for games, and OK for generating test cases for business applications but not usually fit for scientific simulations and cryptographically worthless.
The condition if(p==1.0) is there so we can be sure that p==1.0 returns 1 always. p==0.0 is assured by r<p.