Generating discrete uniform distribution in C - c

I'm trying to generate a discrete uniform distribution in C between 0 and 1.
Normally you'd expect: t = rand()%2 , but it seems there is a problem with this approach (it seems to be related to lower bits having more probabilities, although I don't really understand much about that).
I tried a trick that I found somewhere on the Internet:
Let t1,t2 be 2 not so uniform distributions between 0 and 1 with probability p for 1, (1-p) for p. Then we take 2 random numbers:
t1 : p for 1, (1-p) for 0
t2 : p for 1, (1-p) for 0
If t1!=t2 we have the probability for (t1,t2)=(1,0) and (t1,t2) = (0,1) to be the same: p(1-p). So we just the repeat the sampling until we get t1!=t2 and we choose the random number t = t1 (it really doesn't matter). Here is my code:
#include <time.h>
#include <stdlib.h>
int main()
{
/*
Declare variable to hold seconds on clock.
*/
int i,t1,t2,t;
time_t seconds;
seconds = time(NULL);
/*
Get value from system clock and
place in seconds variable.
*/
time(&seconds);
/*
Convert seconds to a unsigned
integer.
*/
srand((unsigned int) seconds);
/*
Output random values.
*/
for (i =0; i < 10; ++i)
{
do
{
t1 = rand()%2;
t2 = rand()%2;
}
while (t1==t2);
t = t1;
printf("%d\n",t);
}
/*printf("%d",rand()%2);
printf("%d",rand()%2);*/
return 0;
}
Am I right or wrong? Thank you very much!

Never use rand(). Use random() or even better, a generator from the PCG family.
For either one, all of the provided bits are good individually. random() provides 31 random bits. Use all of them instead of just one. There's no point in throwing away the other 30. E.g.
static inline int random_bit(void)
{
static long val;
static int bits = 0;
int bit;
if (bits == 0) {
val = random();
bits = 31;
}
bit = val & 1;
val >>= 1;
bits--;
return bit;
}

The built-in random number generator rand() isn't guaranteed to have a particular distribution as you assumed (probability of 'p' and '1-p'). Although rand() > RAND_MAX / 2 is better, it still may not be having a particular distribution. It is better to use any other method as described here.
Having said that, if you assumed that the probability of 1 and 0 are 'p' and '1-p' for your random number generator, then what you have done to generate uniform distribution looks mathematically correct with probability of 2*p*(1-p) for each of 1 and 0, although you wouldn't be willing to use this as you indicated in comments.

Related

Efficient random function when I want only 0 or 1

I know you can use rand() % 2 to get a random choice of 0 and 1 in C, but is there something more efficient?
My question is not so much about C specifically but about how random number generators work. If I understand correctly, they do some complicated math on the seed to get an even distribution between 0 and RAND_MAX, but is there a way to do less math if you just need a binary choice?
Thanks
is there a way to do less math if you just need a binary choice?
Yes, but it depends on how "good" a random distribution and sequence (or apparent lack) is required. C does not specify the quality of rand(). With quality of randomness specified, alternative solutions exist. How fast? - it depends on many things not supplied by OP. If code is to use rand(), the below will modestly improve performance over a simple rand() % 2u
Call rand() once in a while to extract n random bits and use 1 of those bits per call.
This function uses RAND_MAX to rate the number of bits n received per rand() call. A value of RAND_MAX == 32767 or 0x7FFF would imply 15 random bits.
int rand01(void) {
// Insure RAND_MAX is a power-of-2 - 1
assert(((RAND_MAX + 1u) & RAND_MAX) == 0);
static unsigned rmax = 0;
static int rbits;
if (rmax == 0) {
rmax = RAND_MAX;
rbits = rand();
}
rmax /= 2u;
int r = rbits%2u;
rbits /= 2u;
return r;
}
Note that this approach does not reset the random state completely with srand() . A srand() call is not aware of this function's state.
maybe you can try using my method:
int i;
i = time(NULL) % 2;
this only works if you don't do more than one random per second but you can also do:
struct timeval tv;
gettimeofday(&tv,NULL);
unsigned long random_number = (1000000 * tv.tv_sec + tv.tv_usec) % 2;
this will update your random number at a microsec rate.

About a criteria for random integers number generation (C)

I am running a bunch of physical simulations in which I need random numbers. I'm using the standard rand() function in C++.
So it works like this: first I precalculate a bunch of probabilities that are of the form 1/(1+exp(a)), for a set of different a. They're of type double as returned by the exp function in the math library, and then things must happen with those probabilities, there are only two of them, so I generate a random number uniformly distributed between 0 and 1 and compared with those precalculated probabilities. To do that, I used:
double p = double(rand()%101)/100.0;
so I'm given random values between 0 and 1 both included. This didn't yield to correct physical results. I tried this:
double p = double(rand()%1000001)/1000000.0;
And this worked. I don't really understand why so I would like some criteria about how to do it. My intuition tells that if I do
double p = double(rand()%(N+1))/double(N);
with N big enough such that the smallest division (1/N) is much smaller than the smallest probability 1/1+exp(a) then I will be getting realistic random numbers.
I would like to understand why, though.
rand() returns a random number between 0 and RAND_MAX.
Therefore you need this:
double p = double(rand() % RAND_MAX) / double(RAND_MAX);
Also run this snippet and you will understand:
int i;
for (i = 1; i < 30; i++)
{
int rnd = rand();
double p0 = double(rnd % 101) / 100.0;
double p1 = double(rnd % 1000001) / 1000000.0;
printf ("%d\t%f\t%f\n", rnd, p0, p1);
}
for (i = 1; i < 30; i++)
{
int rnd = rand();
double p0 = double(rnd) / double(RAND_MAX);
printf ("%d\t%f\n", rnd, p0);
}
You have multiple problems.
rand() isn't very random at all. On almost all operating systems it returns badly distributed, horribly biased numbers. It's actually quite hard to find a good random number generator, but I can guarantee you that rand() will be among the worst you can find.
rand() % N gives a biased distribution. Think about the pigeonhole principle. Let's simplify it, assume that rand returns numbers [0,7) and your N is 6. 0 to 5 map to 0 to 5, 6 maps to 0 and 7 maps to 1, meaning that 0 and 1 are twice as likely to come out.
Converting the numbers to double before division does not remove the bias from 2, it just makes it less visible. The pigeonhole principle applies regardless of the conversions you do.
Converting a well-distributed random number from integer to float/double is harder than it looks. Simple division ignores the problems of how floating point math works.
I can't help you much with 1, you need to do research. Look around the net for random number libraries. If you want something very random and unpredictable you need to look for cryptographic random libraries. If you want a repeatable but good random number Mersenne Twister should probably be good enough. But you need to do the research here.
For 2 and 3 there are standard solutions. You are mapping a set from M elements to N elements and rand % N will only work iff N < M and N and M share prime factors. Since on most systems M will be a power of two it means that N also has to be a power of two. So assuming that M is a power of two the algorithm is: find the nearest power of 2 higher or equal to N, let's call it P. Generate randomness_source() % P. If the number is higher than N, throw it away and try again. This is the only safe way to do this. Cleverer people than you and me have spent years on this problem, there's no better way to remove the bias.
For 4, you can probably ignore the problem and just divide, in an absolute majority of cases this should be good enough. If you really want to study the problem, I've done some work on it and published the code on github. There I go through some basic principles of how floating point numbers work and how it relates to generating random numbers.
// produces pseudorandom bits. These are NOT crypto quality bits. Has the same underlying unpredictability as uncooked
// rand() output. It buffers rand() bits to produce a more convenient zero-to-the-argument range including negative
// arguments, corrects for the toward-zero bias of the modular construction I'd be using otherwise, eliminates the
// RAND_MAX range limitation, (use INT64_MAX instead) and effectively obscures biases and sequence telltales due to
// annoyingly bad rand libraries. It does not correct these biases; anyone tracking the arguments and outputs has
// enough information to reconstruct the rand() output and detect them. But it makes the relationships drastically more complicated.
// needs stdint, stdlib.
int64_t privaterandom(int64_t range, int reset){
static uint64_t state = 0;
int64_t retval;
if (reset != 0){
srand((unsigned int)range);
state = (uint64_t)range;
}
if (range == 0) return (0);
if (range < 0) return -privaterandom(-range, 0);
if (range > UINT64_MAX/0xFFFFFFFF){
retval = (privaterandom(range/0xFFFFFFFF, 0) * 0xFFFFFFFF); // order of operations matters
return (retval + privaterandom(0xFFFFFFFF, 0));
}
while (state < UINT64_MAX / 0xFF){
state *= RAND_MAX;
state += rand();
}
retval = (state % range);
// makes "pigeonhole" bias alternate unpredictably between toward-even and toward-odd
if ((state/range > (state - (retval) )/ range) && state % 2 == 0) retval++;
state /= range;
return retval;
}
int64_t Random(int64_t range){ return (privaterandom(range, 0));}
int64_t Random_Init(int64_t seed){return (privaterandom(seed, 1));}

Generate random integer numbers in C, not within a range

I want to generate 100 nodes with random x and y co ordinates. But i do not want to specify any range. Like rand(100) will generate numbers only between 1 to 100. But i want the numbers distributed over a large region and i want them to be random. How can i implement it using C?
i have tried:
int gen_rand_position(void)
{
int i,j,a[100],b[100];
for(i=0,j=0;i<100,j<100;i++,j++)
{
x=rand();
y=rand();
a[i]=x;
b[j]=y;
}
}
This not choosing randomly. Can i have more efficient random function?
You need to have a range otherwise what are you going to do with an infinite number?
With no arguments - rand() will return an integer between 0 and RAND_MAX ( normally 32765).
If you need a number larger than this you could combine two rand() numbers. There are complicated statistical arguments about the best way to combine random numbers so you don't change the randomness but I don't think you need to worry about that.
Edit: since RAND_MAX is (in this case) a 15bit number, to get a 30bit range multiply two rand() together, to get a 32bit range multiply again - it may wrap around but that doesn't change the randomness (significantly).
To obtain a random number distributed over the entire int range, combine the random bits from multiple calls to rand():
#include <stdlib.h>
int large_rand()
{
const int RAND_BITS = 15; /* covers stdc minimum for RAND_MAX */
const int INT_BITS = 8 * sizeof(int);
const int ITERS = (INT_BITS + RAND_BITS - 1) / RAND_BITS;
int i, result = 0;
for (i = 0; i < ITERS; i++) {
result <<= RAND_BITS;
result |= rand() & ~(~0U << RAND_BITS);
}
return result;
}
To get a random number in the desired range, use large_rand() % (MAX + 1), where MAX is the largest number you want to get.

Mixing 16 bit linear PCM streams and avoiding clipping/overflow

I've trying to mix together 2 16bit linear PCM audio streams and I can't seem to overcome the noise issues. I think they are coming from overflow when mixing samples together.
I have following function ...
short int mix_sample(short int sample1, short int sample2)
{
return #mixing_algorithm#;
}
... and here's what I have tried as #mixing_algorithm#
sample1/2 + sample2/2
2*(sample1 + sample2) - 2*(sample1*sample2) - 65535
(sample1 + sample2) - sample1*sample2
(sample1 + sample2) - sample1*sample2 - 65535
(sample1 + sample2) - ((sample1*sample2) >> 0x10) // same as divide by 65535
Some of them have produced better results than others but even the best result contained quite a lot of noise.
Any ideas how to solve it?
The best solution I have found is given by Viktor Toth. He provides a solution for 8-bit unsigned PCM, and changing that for 16-bit signed PCM, produces this:
int a = 111; // first sample (-32768..32767)
int b = 222; // second sample
int m; // mixed result will go here
// Make both samples unsigned (0..65535)
a += 32768;
b += 32768;
// Pick the equation
if ((a < 32768) || (b < 32768)) {
// Viktor's first equation when both sources are "quiet"
// (i.e. less than middle of the dynamic range)
m = a * b / 32768;
} else {
// Viktor's second equation when one or both sources are loud
m = 2 * (a + b) - (a * b) / 32768 - 65536;
}
// Output is unsigned (0..65536) so convert back to signed (-32768..32767)
if (m == 65536) m = 65535;
m -= 32768;
Using this algorithm means there is almost no need to clip the output as it is only one value short of being within range. Unlike straight averaging, the volume of one source is not reduced even when the other source is silent.
here's a descriptive implementation:
short int mix_sample(short int sample1, short int sample2) {
const int32_t result(static_cast<int32_t>(sample1) + static_cast<int32_t>(sample2));
typedef std::numeric_limits<short int> Range;
if (Range::max() < result)
return Range::max();
else if (Range::min() > result)
return Range::min();
else
return result;
}
to mix, it's just add and clip!
to avoid clipping artifacts, you will want to use saturation or a limiter. ideally, you will have a small int32_t buffer with a small amount of lookahead. this will introduce latency.
more common than limiting everywhere, is to leave a few bits' worth of 'headroom' in your signal.
Here is what I did on my recent synthesizer project.
int* unfiltered = (int *)malloc(lengthOfLongPcmInShorts*4);
int i;
for(i = 0; i < lengthOfShortPcmInShorts; i++){
unfiltered[i] = shortPcm[i] + longPcm[i];
}
for(; i < lengthOfLongPcmInShorts; i++){
unfiltered[i] = longPcm[i];
}
int max = 0;
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
int val = unfiltered[i];
if(abs(val) > max)
max = val;
}
short int *newPcm = (short int *)malloc(lengthOfLongPcmInShorts*2);
for(int i = 0; i < lengthOfLongPcmInShorts; i++){
newPcm[i] = (unfilted[i]/max) * MAX_SHRT;
}
I added all the PCM data into an integer array, so that I get all the data unfiltered.
After doing that I looked for the absolute max value in the integer array.
Finally, I took the integer array and put it into a short int array by taking each element dividing by that max value and then multiplying by the max short int value.
This way you get the minimum amount of 'headroom' needed to fit the data.
You might be able to do some statistics on the integer array and integrate some clipping, but for what I needed the minimum amount of headroom was good enough for me.
There's a discussion here: https://dsp.stackexchange.com/questions/3581/algorithms-to-mix-audio-signals-without-clipping about why the A+B - A*B solution is not ideal. Hidden down in one of the comments on this discussion is the suggestion to sum the values and divide by the square root of the number of signals. And an additional check for clipping couldn't hurt. This seems like a reasonable (simple and fast) middle ground.
I think they should be functions mapping [MIN_SHORT, MAX_SHORT] -> [MIN_SHORT, MAX_SHORT] and they are clearly not (besides first one), so overflows occurs.
If unwind's proposition won't work you can also try:
((long int)(sample1) + sample2) / 2
Since you are in time domain the frequency info is in the difference between successive samples, when you divide by two you damage that information. That's why adding and clipping works better. Clipping will of course add very high frequency noise which is probably filtered out.

A way to find the nearest prime number to an unsigned long integer ( 32 bits wide ) in C?

I'm looking for a way to find the closest prime number. Greater or less than, it doesn't matter, simply the closest ( without overflowing, preferably. ) As for speed, if it can compute it in approximately 50 milliseconds on a 1GHz machine ( in software, running inside Linux ), I'd be ecstatic.
The largest prime gap in the range up to (2^32 - 1) is (335). There are (6542) primes less than (2^16) that can be tabulated and used to sieve successive odd values after a one-time setup. Obviously, only primes <= floor(sqrt(candidate)) need be tested for a particular candidate value.
Alternatively: The deterministic variant of the Miller-Rabin test, with SPRP bases: {2, 7, 61} is sufficient to prove primality for a 32-bit value. Due to the test's complexity (requires exponentiation, etc), I doubt it would be as fast for such small candidates.
Edit: Actually, if multiply/reduce can be kept to 32-bits in exponentiation (might need 64-bit support), the M-R test might be better. The prime gaps will typically be much smaller, making the sieve setup costs excessive. Without large lookup tables, etc., you might also get a boost from better cache locality.
Furthermore: The product of primes {2, 3, 5, 7, 11, 13, 17, 19, 23} = (223092870). Explicitly test any candidate in [2, 23]. Calculate greatest common divisor: g = gcd(u, 223092870UL). If (g != 1), the candidate is composite. If (g == 1 && u < (29 * 29)), the candidate (u > 23) is definitely prime. Otherwise, move on to the more expensive tests. A single gcd test using 32-bit arithmetic is very cheap, and according to Mertens' (?) theorem, this will detect ~ 68.4% of all odd composite numbers.
UPDATE 2: Fixed (in a heavy-handed way) some bugs that caused wrong answers for small n. Thanks to Brett Hale for noticing! Also added some asserts to document some assumptions.
UPDATE: I coded this up and it seems plenty fast enough for your requirements (solved 1000 random instances from [2^29, 2^32-1] in <100ms, on a 2.2GHz machine -- not a rigorous test but convincing nonetheless).
It is written in C++ since that's what my sieve code (which I adapted from) was in, but the conversion to C should be straightforward. The memory usage is also (relatively) small which you can see by inspection.
You can see that because of the way the function is called, the number returned is the nearest prime that fits in 32 bits, but in fact this is the same thing since the primes around 2^32 are 4294967291 and 4294967311.
I tried to make sure there wouldn't be any bugs due to integer overflow (since we're dealing with numbers right up to UINT_MAX); hopefully I didn't make a mistake there. The code could be simplified if you wanted to use 64-bit types (or you knew your numbers would be smaller than 2^32-256) since you wouldn't have to worry about wrapping around in the loop conditions. Also this idea scales for bigger numbers as long as you're willing to compute/store the small primes up to the needed limit.
I should note also that the small-prime-sieve runs quite quickly for these numbers (4-5 ms from a rough measurement) so if you are especially memory-starved, running it every time instead of storing the small primes is doable (you'd probably want to make the mark[] arrays more space efficient in this case)
#include <iostream>
#include <cmath>
#include <climits>
#include <cassert>
using namespace std;
typedef unsigned int UI;
const UI MAX_SM_PRIME = 1 << 16;
const UI MAX_N_SM_PRIMES = 7000;
const UI WINDOW = 256;
void getSMPrimes(UI primes[]) {
UI pos = 0;
primes[pos++] = 2;
bool mark[MAX_SM_PRIME / 2] = {false};
UI V_SM_LIM = UI(sqrt(MAX_SM_PRIME / 2));
for (UI i = 0, p = 3; i < MAX_SM_PRIME / 2; ++i, p += 2)
if (!mark[i]) {
primes[pos++] = p;
if (i < V_SM_LIM)
for (UI j = p*i + p + i; j < MAX_SM_PRIME/2; j += p)
mark[j] = true;
}
}
UI primeNear(UI n, UI min, UI max, const UI primes[]) {
bool mark[2*WINDOW + 1] = {false};
if (min == 0) mark[0] = true;
if (min <= 1) mark[1-min] = true;
assert(min <= n);
assert(n <= max);
assert(max-min <= 2*WINDOW);
UI maxP = UI(sqrt(max));
for (int i = 0; primes[i] <= maxP; ++i) {
UI p = primes[i], k = min / p;
if (k < p) k = p;
UI mult = p*k;
if (min <= mult)
mark[mult-min] = true;
while (mult <= max-p) {
mult += p;
mark[mult-min] = true;
}
}
for (UI s = 0; (s <= n-min) || (s <= max-n); ++s)
if ((s <= n-min) && !mark[n-s-min])
return n-s;
else if ((s <= max-n) && !mark[n+s-min])
return n+s;
return 0;
}
int main() {
UI primes[MAX_N_SM_PRIMES];
getSMPrimes(primes);
UI n;
while (cin >> n) {
UI win_min = (n >= WINDOW) ? (n-WINDOW) : 0;
UI win_max = (n <= UINT_MAX-WINDOW) ? (n+WINDOW) : UINT_MAX;
if (!win_min)
win_max = 2*WINDOW;
else if (win_max == UINT_MAX)
win_min = win_max-2*WINDOW;
UI p = primeNear(n, win_min, win_max, primes);
cout << "found nearby prime " << p << " from window " << win_min << ' ' << win_max << '\n';
}
}
You can sieve intervals in that range if you know primes up to 2^16 (there are only 6542 <= 2^16; you should go a bit higher if the prime itself could be greater than 2^32 - 1). Not necessarily the fastest way but very simple, and fancier prime testing techniques are really suited to much larger ranges.
Basically, do a regular Sieve of Eratosthenes to get the "small" primes (say the first 7000). Obviously you only need to do this once at the start of the program, but it should be very fast.
Then, supposing your "target" number is 'a', consider the interval [a-n/2, a+n/2) for some value of n. Probably n = 128 is a reasonable place to start; you may need to try adjacent intervals if the numbers in the first one are all composite.
For every "small" prime p, cross out its multiples in the range, using division to find where to start. One optimization is that you only need to start crossing off multiples starting at p*p (which means that you can stop considering primes once p*p is above the interval).
Most of the primes except the first few will have either one or zero multiples inside the interval; to take advantage of this you can pre-ignore multiples of the first few primes. The simplest thing is to ignore all even numbers, but it's not uncommon to ignore multiples of 2, 3, and 5; this leaves integers congruent to 1, 7, 11, 13, 17, 19, 23, and 29 mod 30 (there are eight, which map nicely to the bits of a byte when sieving a large range).
...Sort of went off on a tangent there; anyway once you've processed all the small primes (up till p*p > a+n/2) you just look in the interval for numbers you didn't cross out; since you want the closest to a start looking there and search outward in both directions.

Resources