Consider an algorithm to test the probability that a certain number is picked from a set of N unique numbers after a specific number of tries (for example, with N=2, what's the probability in Roulette (without 0) that it takes X tries for Black to win?).
The correct distribution for this is pow(1-1/N,X-1)*(1/N).
However, when I test this using the following code, there is always a deep ditch at X=31, independently from N, and independently from the seed.
Is this an intrinsic flaw that cannot be prevented due to the implementation specifics of the PRNG in use, is this a real bug, or am I overlooking something obvious?
// C
#include <sys/times.h>
#include <math.h>
#include <stdio.h>
int array[101];
void main(){
int nsamples=10000000;
double breakVal,diffVal;
int i,cnt;
// seed, but doesn't change anything
struct tms time;
srandom(times(&time));
// sample
for(i=0;i<nsamples;i++){
cnt=1;
do{
if((random()%36)==0) // break if 0 is chosen
break;
cnt++;
}while(cnt<100);
array[cnt]++;
}
// show distribution
for(i=1;i<100;i++){
breakVal=array[i]/(double)nsamples; // normalize
diffVal=breakVal-pow(1-1/36.,i-1)*1/36.; // difference to expected value
printf("%d %.12g %.12g\n",i,breakVal,diffVal);
}
}
Tested on an up-to-date Xubuntu 12.10 with libc6 package 2.15-0ubuntu20 and Intel Core i5-2500 SandyBridge, but I discovered this already a few years ago on an older Ubuntu machine.
I also tested this on Windows 7 using Unity3D/Mono (not sure which Mono version, though), and here the ditch happens at X=55 when using System.Random, while Unity's builtin Unity.Random has no visible ditch (at least not for X<100).
The distribution:
The differences:
This is due to glibc's random() function not being random enough. According to this page, for the random numbers returned by random(), we have:
oi = (oi-3 + oi-31) % 2^31
or:
oi = (oi-3 + oi-31 + 1) % 2^31.
Now take xi = oi % 36, and suppose the first equation above is the one used (this happens with a 50% chance for each number). Now if xi-31=0 and xi-3!=0, then the chance that xi=0 is less than 1/36. This is because 50% of the time oi-31 + oi-3 will be less than 2^31, and when that happens,
xi = oi % 36 = (oi-3 + oi-31) % 36 = oi-3 % 36 = xi-3,
which is nonzero. This causes the ditch you see 31 samples after a 0 sample.
What's being measured in this experiment is the interval between successful trials of a Bernoulli experiment, where success is defined as random() mod k == 0 for some k (36 in the OP). Unfortunately, it is marred by the fact that the implementation of random() means that the Bernoulli trials are not statistically independent.
We'll write rndi for the ith output of `random()' and we note that:
rndi = rndi-31 + rndi-3 with probability 0.75
rndi = rndi-31 + rndi-3 + 1 with probability 0.25
(See below for a proof outline.)
Let's suppose rndi-31 mod k == 0 and we're currently looking at rndi. Then it must be the case that rndi-3 mod k ≠ 0, because otherwise we would have counted the cycle as being length k-3.
But (most of the time) (mod k): rndi = rndi-31 + rndi-3 = rndi-3 ≠ 0.
So the current trial is not statistically independent of the previous trials, and the 31st trial after a success is much less likely to succeed than it would in an unbiased series of Bernoulli trials.
The usual advice in using linear-congruential generators, which doesn't actually apply to the random() algorithm, is to use the high-order bits instead of the low-order bits, because high-order bits are "more random" (that is, less correlated with successive values). But that won't work in this case either, because the above identities hold equally well for the function high log k bits as for the function mod k == low log k bits.
In fact, we might expect a linear-congruential generator to work better, particularly if we use the high-order bits of the output, because although the LCG is not particularly good at Monte Carlo simulations, it does not suffer from the linear feedback of random().
random algorithm, for the default case:
Let state be a vector of unsigned longs. Initialize state0...state30 using a seed, some fixed values, and a mixing algorithm. For simplicity, we can consider the state vector to be infinite, although only the last 31 values are used so it's actually implemented as a ring buffer.
To generate rndi: (Note: ⊕ is addition mod 232.)
statei = statei-31 ⊕ statei-3
rndi = (statei - (statei mod 2)) / 2
Now, note that:
(i + j) mod 2 = i mod 2 + j mod 2 if i mod 2 == 0 or j mod 2 == 0
(i + j) mod 2 = i mod 2 + j mod 2 - 2 if i mod 2 == 1 and j mod 2 == 1
If i and j are uniformly distributed, the first case will occur 75% of the time, and the second case 25%.
So, by substitution in the generation formula:
rndi = (statei-31 ⊕ statei-3 - ((statei-31 + statei-3) mod 2)) / 2
= ((statei-31 - (statei-31 mod 2)) ⊕ (statei-3 - (statei-3 mod 2))) / 2 or
= ((statei-31 - (statei-31 mod 2)) ⊕ (statei-3 - (statei-3 mod 2)) + 2) / 2
The two cases can be further reduced to:
rndi = rndi-31 ⊕ rndi-3
rndi = rndi-31 ⊕ rndi-3 + 1
As above, the first case occurs 75% of the time, assuming that rndi-31 and rndi-3 are independently drawn from a uniform distribution (which they're not, but it's a reasonable first approximation).
As others pointed out, random() is not random enough.
Using the higher bits instead of the lower ones does not help in this case. According to the manual (man 3 rand), old implementations of rand() had a problem in the lower bits. That's why random() is recommended instead. Though, the current implementation of rand() uses the same generator as random().
I tried the recommended correct use of the old rand():
if ((int)(rand()/(RAND_MAX+1.0)*36)==0)
...and got the same deep ditch at X=31
Interstingly, if I mix rand()'s numbers with another sequence, I get rid of the ditch:
unsigned x=0;
//...
x = (179*x + 79) % 997;
if(((rand()+x)%36)==0)
I am using an old Linear Congruential Generator. I chose 79, 179 and 997 at random from a primes table. This should generate a repeating sequence of length 997.
That said, this trick probably introduced some non-randomness, some footprint... The resulting mixed sequence will surely fail other statistical tests. x never takes the same value in consecutive iterations. Indeed, it takes exactly 997 iterations to repeat every value.
''[..] random numbers should not be generated with a method chosen at random. Some theory should be used." (D.E.Knuth, "The Art of Computer Programming", vol.2)
For simulations, if you want to be sure, use the Mersenne Twister
Related
EDIT:
My question is: rand()%N is considered very bad, whereas the use of integer arithmetic is considered superior, but I cannot see the difference between the two.
People always mention:
low bits are not random in rand()%N,
rand()%N is very predictable,
you can use it for games but not for cryptography
Can someone explain if any of these points are the case here and how to see that?
The idea of the non-randomness of the lower bits is something that should make the PE of the two cases that I show differ, but it's not the case.
I guess many like me would always avoid using rand(), or rand()%N because we've been always taught that it is pretty bad. I was curious to see how "wrong" random integers generated with c rand()%N effectively are. This is also a follow up to Ryan Reich's answer in How to generate a random integer number from within a range.
The explanation there sounds very convincing, to be honest; nevertheless, I thought I’d give it a try. So, I compare the distributions in a VERY naive way. I run both random generators for different numbers of samples and domains. I didn't see the point of computing a density instead of histograms, so I just computed histograms and, just by looking, I would say they both look just as uniform. Regarding the other point that was raised, about the actual randomness (despite being uniformly distributed). I — again naively —compute the permutation entropy for these runs, which are the same for both sample sets, which tell us that there's no difference between both regarding the ordering of the occurrence.
So, for many purposes, it seems to me that rand()%N would be just fine, how can we see their flaws?
Here I show you a very simple, inefficient and not very elegant (but I think correct) way of computing these samples and get the histograms together with the permutation entropies.
I show plots for domains (0,i) with i in {5,10,25,50,100} for different number of samples:
There's not much to see in the code I guess, so I will leave both the C and the matlab code for replication purposes.
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int main(int argc, char *argv[]){
unsigned long max = atoi(argv[2]);
int samples=atoi(argv[3]);
srand(time(NULL));
if(atoi(argv[1])==1){
for(int i=0;i<samples;++i)
printf("%ld\n",rand()%(max+1));
}else{
for(int i=0;i<samples;++i){
unsigned long
num_bins = (unsigned long) max + 1,
num_rand = (unsigned long) RAND_MAX + 1,
bin_size = num_rand / num_bins,
defect = num_rand % num_bins;
long x;
do {
x = rand();
}
while (num_rand - defect <= (unsigned long)x);
printf("%ld\n",x/bin_size);
}
}
return 0;
}
And here is the Matlab code to plot this and compute the PEs (the recursion for the permutations I took it from: https://www.mathworks.com/matlabcentral/answers/308255-how-to-generate-all-possible-permutations-without-using-the-function-perms-randperm):
system('gcc randomTest.c -o randomTest.exe;');
max = 100;
samples = max*10000;
trials = 200;
system(['./randomTest.exe 1 ' num2str(max) ' ' num2str(samples) ' > file1'])
system(['./randomTest.exe 2 ' num2str(max) ' ' num2str(samples) ' > file2'])
a1=load('file1');
a2=load('file2');
uni = figure(1);
title(['Samples: ' num2str(samples)])
subplot(1,3,1)
h1 = histogram(a1,max+1);
title('rand%(max+1)')
subplot(1,3,2)
h2 = histogram(a2,max+1);
title('Integer arithmetic')
as=[a1,a2];
ns=3:8;
H = nan(numel(ns),size(as,2));
for op=1:size(as,2)
x = as(:,op);
for n=ns
sequenceOcurrence = zeros(1,factorial(n));
sequences = myperms(1:n);
sequencesArrayIdx = sum(sequences.*10.^(size(sequences,2)-1:-1:0),2);
for i=1:numel(x)-n
[~,sequenceOrder] = sort(x(i:i+n-1));
out = sequenceOrder'*10.^(numel(sequenceOrder)-1:-1:0).';
sequenceOcurrence(sequencesArrayIdx == out) = sequenceOcurrence(sequencesArrayIdx == out) + 1;
end
chunks = length(x) - n + 1;
ps = sequenceOcurrence/chunks;
hh = sum(ps(logical(ps)).*log2(ps(logical(ps))));
H(n,op) = hh/log2(factorial(n));
end
end
subplot(1,3,3)
plot(ns,H(ns,:),'--*','linewidth',2)
ylabel('PE')
xlabel('Sequence length')
filename = ['all_' num2str(max) '_' num2str(samples) ];
export_fig(filename)
Due to the way modulo arithmetic works if N is significant compared to RAND_MAX doing %N will make it so you're considerably more likely to get some values than others. Imagine RAND_MAX is 12, and N is 9. If the distribution is good then the chances of getting one of 0, 1, or 2 is 0.5, and the chances of getting one of 3, 4, 5, 6, 7, 8 is 0.5. The result being that you're twice as likely to get a 0 instead of a 4. If N is an exact divider of RAND_MAX this distribution problem doesn't happen, and if N is very small compared to RAND_MAX the issue becomes less noticeable. RAND_MAX may not be a particularly large value (maybe 2^15 - 1), making this problem worse than you may expect. The alternative of doing (rand() * n) / (RAND_MAX + 1) also doesn't give an even distribution, however, it will be every mth value (for some m) that will be more likely to occur rather than the more likely values all being at the low end of the distribution.
If N is 75% of RAND_MAX then the values in the bottom third of your distribution are twice as likely as the values in the top two thirds (as this is where the extra values map to)
The quality of rand() will depend on the implementation of the system that you're on. I believe that some systems have had very poor implementation, OS Xs man pages declare rand obsolete. The Debian man page says the following:
The versions of rand() and srand() in the Linux C Library use the same
random number generator as random(3) and srandom(3), so the lower-order
bits should be as random as the higher-order bits. However, on older
rand() implementations, and on current implementations on different
systems, the lower-order bits are much less random than the higher-
order bits. Do not use this function in applications intended to be
portable when good randomness is needed. (Use random(3) instead.)
Both approaches have their pitfalls, and your graphs are little more than a pretty verification of the central limit theorem! For a sensible implementation of rand():
% N suffers from a "pigeon-holing" effect if 1u + RAND_MAX is not a multiple of N
/((RAND_MAX + 1u)/N) does not, in general, evenly distribute the return of rand across your range, due to integer truncation effects.
On balance, if N is small cf. RAND_MAX, I'd plump for % for its tractability. In any case test your generator to see it it has the appropriate statistical properties for your application.
rand() % N is considered extremely poor not because the distribution is bad, but because the randomness is poor-to-nonexistent. (If anything the distribution will be too good.)
If N is not small with respect to RAND_MAX, both
rand() % N
and
rand() / (RAND_MAX / N + 1)
will have more or less the same, poor distribution -- certain values will occur with significantly higher probability than others.
Looking at distribution histograms won't show you that for some implementations, rand() % N has a much, much worse problem -- to show that you'd have to perform some correlations with previous values. (For example, try taking rand() % 2, then subtracting from the previous value you got, and plotting a histogram of the differences. If the difference is never 0, you've got a problem.)
I would like to say that the implementations for which rand()'s low-order bits aren't random are simply buggy. I'd like to think that all those buggy implementations would have disappeared by now. I'd like to think that programmers shouldn't have to worry about calling rand()%N any more. But, unfortunately, my wishes don't change the fact that this seems to be one of those bugs that never get fixed, meaning that programmers do still have to worry.
See also the C FAQ list, question 13.16.
I was writing a very simple program to examine if a number could divide another number evenly:
// use the divider squared to reduce iterations
for(divider = 2; (divider * divider) <= number; divider++)
if(number % divider == 0)
print("%d can divided by %d\n", number, divider);
Now I was curious if the task could be done by finding the square root of number and compare it to divider. However, it seems that sqrt() isn't really able to boost the efficiency. How was sqrt() handled in C and how can I boost the efficiency of sqrt()? Also, is there any other way to approach the answer with even greater efficiency?
Also, the
number % divider == 0
is used to test if divider could evenly divide number, is there also a more efficient way to do the test besides using %?
I'm not going to address what the best algorithm to find all factors of an integer is. Instead I would like to comment on your current method.
There are thee conditional tests cases to consider
(divider * divider) <= number
divider <= number/divider
divider <= sqrt(number)
See Conditional tests in primality by trial division for more detials.
The case to use depends on your goals and hardware.
The advantage of case 1 is that it does not require a division. However, it can overflow when divider*divider is larger than the largest integer. Case two does not have the overflow problem but it requires a division. For case3 the sqrt only needs to be calculated once but it requires that the sqrt function get perfect squares correct.
But there is something else to consider many instruction sets, including the x86 instruction set, return the remainder as well when doing a division. Since you're already doing number % divider this means that you get it for free when doing number / divider.
Therefore, case 1 is only useful on system where the division and remainder are not calculated in one instruction and you're not worried about overflow.
Between case 2 and case3 I think the main issue is again the instruction set. Choose case 2 if the sqrt is too slow compared to case2 or if your sqrt function does not calculate perfect squares correctly. Choose case 3 if the instruction set does not calculate the divisor and remainder in one instruction.
For the x86 instruction set case 1, case 2 and case 3 should give essentially equal performance. So there should be no reason to use case 1 (however see a subtle point below) . The C standard library guarantees that the sqrt of perfect squares are done correctly. So there is no disadvantage to case 3 either.
But there is one subtle point about case 2. I have found that some compilers don't recognize that the division and remainder are calculated together. For example in the following code
for(divider = 2; divider <= number/divider; divider++)
if(number % divider == 0)
GCC generates two division instruction even though only one is necessary. One way to fix this is to keep the division and reminder close like this
divider = 2, q = number/divider, r = number%divider
for(; divider <= q; divider++, q = number/divider, r = number%divider)
if(r == 0)
In this case GCC produces only one division instruction and case1, case 2 and case 3 have the same performance. But this code is a bit less readable than
int cut = sqrt(number);
for(divider = 2; divider <= cut; divider++)
if(number % divider == 0)
so I think overall case 3 is the best choice at least with the x86 instruction set.
However, it seems that sqrt() isn't really able to boost the efficiency
That is to be expected, as the saved multiplication per iteration is largely dominated by the much slower division operation inside the loop.
Also, the number % divider = 0 is used to test if divider could evenly divide number, is there also a more efficient way to do the test besides using %?
Not that I know of. Checking whether a % b == 0 is at least as hard as checking a % b = c for some c, because we can use the former to compute the latter (with one extra addition). And at least on Intel architectures, computing the latter is just as computationally expensive as a division, which is amongst the slowest operations in typical, modern processors.
If you want significantly better performance, you need a better factorization algorithm, of which there are plenty. One particular simple one with runtime O(n1/4) is Pollard's ρ algorithm. You can find a straightforward C++ implementation in my algorithms library. Adaption to C is left as an exercise to the reader:
int rho(int n) { // will find a factor < n, but not necessarily prime
if (~n & 1) return 2;
int c = rand() % n, x = rand() % n, y = x, d = 1;
while (d == 1) {
x = (1ll*x*x % n + c) % n;
y = (1ll*y*y % n + c) % n;
y = (1ll*y*y % n + c) % n;
d = __gcd(abs(x - y), n);
}
return d == n ? rho(n) : d;
}
void factor(int n, map<int, int>& facts) {
if (n == 1) return;
if (rabin(n)) { // simple randomized prime test (e.g. Miller–Rabin)
// we found a prime factor
facts[n]++;
return;
}
int f = rho(n);
factor(n/f, facts);
factor(f, facts);
}
Constructing the factors of n from its prime factors is then an easy task. Just use all possible exponents for the found prime factors and combine them in each possible way.
In C, you can take square roots of floating point numbers with the sqrt() family of functions in the header <math.h>.
Taking square roots is usually slower than dividing because the algorithm to take square roots is more complicated than the division algorithm. This is not a property of the C language but of the hardware that executes your program. On modern processors, taking square roots can be just as fast as dividing. This holds, for example, on the Haswell microarchitecture.
However, if the algorithmic improvements are good, the slightly slower speed of a sqrt() call usually doesn't matter.
To only compare up to the square root of number, employ code like this:
#include <math.h>
/* ... */
int root = (int)sqrt((double)number);
for(divider = 2; divider <= root; divider++)
if(number % divider = 0)
print("%d can divided by %d\n", number, divider);
This is just my random thought, so please comment and critisize it if it's wrong.
The idea is to precompute all the prime numbers below a certain range and use it as a table.
Looping though the table, check if the prime number is a factor, if it is, then increament the counter for that prime number, if not then increment the index. Terminate when the index reaches the end or the prime number to check exceeds the input.
At end, the result is a table of all the prime factors of the input, and their counts. Then generating all natual factors should be trival, isn't it?
Worst case, the loop needs to go to the end, then it takes 6542 iterations.
Considering the input is [0, 4294967296] this is similar to O(n^3/8).
Here's MATLAB code that implements this method:
if p is generated by p=primes(65536); this method would work for all inputs between [0, 4294967296] (but not tested).
function [ output_non_zero ] = fact2(input, p)
output_table=zeros(size(p));
i=1;
while(i<length(p));
if(input<1.5)
break;
% break condition: input is divided to 1,
% all prime factors are found.
end
if(rem(input,p(i))<1)
% if dividable, increament counter and don't increament index
% keep checking until not dividable
output_table(i)=output_table(i)+1;
input = input/p(i);
else
% not dividable, try next
i=i+1;
end
end
% remove all zeros, should be handled more efficiently
output_non_zero = [p(output_table~=0);...
output_table(output_table~=0)];
if(input > 1.5)
% the last and largest prime factor could be larger than 65536
% hence would skip from the table, add it to the end of output
% if exists
output_non_zero = [output_non_zero,[input;1]];
end
end
test
p=primes(65536);
t = floor(rand()*4294967296);
b = fact2(t, p);
% check if all prime factors adds up and they are all primes
assert((prod(b(1,:).^b(2,:))==t)&&all(isprime(b(1,:))), 'test failed');
I am running a bunch of physical simulations in which I need random numbers. I'm using the standard rand() function in C++.
So it works like this: first I precalculate a bunch of probabilities that are of the form 1/(1+exp(a)), for a set of different a. They're of type double as returned by the exp function in the math library, and then things must happen with those probabilities, there are only two of them, so I generate a random number uniformly distributed between 0 and 1 and compared with those precalculated probabilities. To do that, I used:
double p = double(rand()%101)/100.0;
so I'm given random values between 0 and 1 both included. This didn't yield to correct physical results. I tried this:
double p = double(rand()%1000001)/1000000.0;
And this worked. I don't really understand why so I would like some criteria about how to do it. My intuition tells that if I do
double p = double(rand()%(N+1))/double(N);
with N big enough such that the smallest division (1/N) is much smaller than the smallest probability 1/1+exp(a) then I will be getting realistic random numbers.
I would like to understand why, though.
rand() returns a random number between 0 and RAND_MAX.
Therefore you need this:
double p = double(rand() % RAND_MAX) / double(RAND_MAX);
Also run this snippet and you will understand:
int i;
for (i = 1; i < 30; i++)
{
int rnd = rand();
double p0 = double(rnd % 101) / 100.0;
double p1 = double(rnd % 1000001) / 1000000.0;
printf ("%d\t%f\t%f\n", rnd, p0, p1);
}
for (i = 1; i < 30; i++)
{
int rnd = rand();
double p0 = double(rnd) / double(RAND_MAX);
printf ("%d\t%f\n", rnd, p0);
}
You have multiple problems.
rand() isn't very random at all. On almost all operating systems it returns badly distributed, horribly biased numbers. It's actually quite hard to find a good random number generator, but I can guarantee you that rand() will be among the worst you can find.
rand() % N gives a biased distribution. Think about the pigeonhole principle. Let's simplify it, assume that rand returns numbers [0,7) and your N is 6. 0 to 5 map to 0 to 5, 6 maps to 0 and 7 maps to 1, meaning that 0 and 1 are twice as likely to come out.
Converting the numbers to double before division does not remove the bias from 2, it just makes it less visible. The pigeonhole principle applies regardless of the conversions you do.
Converting a well-distributed random number from integer to float/double is harder than it looks. Simple division ignores the problems of how floating point math works.
I can't help you much with 1, you need to do research. Look around the net for random number libraries. If you want something very random and unpredictable you need to look for cryptographic random libraries. If you want a repeatable but good random number Mersenne Twister should probably be good enough. But you need to do the research here.
For 2 and 3 there are standard solutions. You are mapping a set from M elements to N elements and rand % N will only work iff N < M and N and M share prime factors. Since on most systems M will be a power of two it means that N also has to be a power of two. So assuming that M is a power of two the algorithm is: find the nearest power of 2 higher or equal to N, let's call it P. Generate randomness_source() % P. If the number is higher than N, throw it away and try again. This is the only safe way to do this. Cleverer people than you and me have spent years on this problem, there's no better way to remove the bias.
For 4, you can probably ignore the problem and just divide, in an absolute majority of cases this should be good enough. If you really want to study the problem, I've done some work on it and published the code on github. There I go through some basic principles of how floating point numbers work and how it relates to generating random numbers.
// produces pseudorandom bits. These are NOT crypto quality bits. Has the same underlying unpredictability as uncooked
// rand() output. It buffers rand() bits to produce a more convenient zero-to-the-argument range including negative
// arguments, corrects for the toward-zero bias of the modular construction I'd be using otherwise, eliminates the
// RAND_MAX range limitation, (use INT64_MAX instead) and effectively obscures biases and sequence telltales due to
// annoyingly bad rand libraries. It does not correct these biases; anyone tracking the arguments and outputs has
// enough information to reconstruct the rand() output and detect them. But it makes the relationships drastically more complicated.
// needs stdint, stdlib.
int64_t privaterandom(int64_t range, int reset){
static uint64_t state = 0;
int64_t retval;
if (reset != 0){
srand((unsigned int)range);
state = (uint64_t)range;
}
if (range == 0) return (0);
if (range < 0) return -privaterandom(-range, 0);
if (range > UINT64_MAX/0xFFFFFFFF){
retval = (privaterandom(range/0xFFFFFFFF, 0) * 0xFFFFFFFF); // order of operations matters
return (retval + privaterandom(0xFFFFFFFF, 0));
}
while (state < UINT64_MAX / 0xFF){
state *= RAND_MAX;
state += rand();
}
retval = (state % range);
// makes "pigeonhole" bias alternate unpredictably between toward-even and toward-odd
if ((state/range > (state - (retval) )/ range) && state % 2 == 0) retval++;
state /= range;
return retval;
}
int64_t Random(int64_t range){ return (privaterandom(range, 0));}
int64_t Random_Init(int64_t seed){return (privaterandom(seed, 1));}
I am stuck in a program while finding modulus of division.
Say for example I have:
((a*b*c)/(d*e)) % n
Now, I cannot simply calculate the expression and then modulo it to n as the multiplication and division are going in a loop and the value is large enough to not fit even in long long.
As clarified in comments, n can be considered prime.
I found that, for multiplication, I can easily calculate it as:
((a%n*b%n)%n*c%n)%n
but couldn't understand how to calculate the division part then.
The problem I am facing is say for a simple example:
((7*3*5)/(5*3)) % 11
The value of above expression would be 7
but if I calculate the multiplication, modulo, it would be like:
((7%11)*(3%11))%11 = 10
((10%11)*(5%11))%11 = 6
now I am left with 6/15 and I have no way to generate correct answer.
Could someone help me. Please make me understand the logic by above example.
Since 11 is prime, Z11 is a field. Since 15 % 11 is 4, 1/15 equals 3 (since 3 * 4 % 11 is 1). Therefore, 6/15 is 6 * 3 which is 7 mod 11.
In your comments below the question, you clarify that the modulus will always be a prime.
To efficiently generate a table of multiplicative inverses, you can raise 2 to successive powers to see which values it generates. Note that in a field Zp, where p is an odd prime, 2p-1 = 1. So, for Z11:
2^1 = 2
2^2 = 4
2^3 = 8
2^4 = 5
2^5 = 10
2^6 = 9
2^7 = 7
2^8 = 3
2^9 = 6
So the multiplicative inverse of 5 (which is 24) is 26 (which is 9).
So, you can generate the above table like this:
power_of_2[0] = 1;
for (int i = 1; i < n; ++i) {
power_of_2[i] = (2*power_of_2[i-1]) % n;
}
And the multiplicative inverse table can be computed like this:
mult_inverse[1] = 1;
for (int i = 1; i < n; ++i) {
mult_inverse[power_of_2[i]] = power_of_2[n-1-i];
}
In your example, since 15 = 4 mod 11, you actually end up with having to evaluate (6/4) mod 11.
In order to find an exact solution to this, rearrange it as 6 = ( (x * 4) mod 11), which makes clearer how the modulo division works.
If nothing else, if the modulus is always small, you can iterate from 0 to modulus-1 to get the solution.
Note that when the modulus is not prime, there may be multiple solutions to the reduced problem. For instance, there are two solutions to 4 = ( ( x * 2) mod 8): 2 and 6. This will happen for a reduced problem of form:
a = ( (x * b) mod c)
whenever b and c are NOT relatively prime (ie whenever they DO share a common divisor).
Similarly, when b and c are NOT relatively prime, there may be no solution to the reduced problem. For instance, 3 = ( (x * 2) mod 8) has no solution. This happens whenever the largest common divisor of b and c does not also divide a.
These latter two circumstances are consequences of the integers from 0 to n-1 not forming a group under multiplication (or equivalently, a field under + and *) when n is not prime, but rather forming simply the less useful structure of a ring.
I think the way the question is asked, it should be assumed that the numerator is divisible by the denominator. In that case the finite field solution for prime n and speculations about possible extensions and caveats for non-prime n is basically overkill. If you have all the numerator terms and denominator terms stored in arrays, you can iteratively test pairs of (numerator term, denominator term) and quickly find the greatest common divisor (gcd), and then divide the numerator term and denominator term by the gcd. (Finding the gcd is a classical problem and you can easily find a simple solution online.) In the worst case you will have to iterate over all possible pairs but at some point, if the denominator indeed divides the numerator, then you'll eventually be left with reduced numerator terms and all denominator terms will be 1. Then you're ready to apply multiplication (avoiding overflow) the way you described.
As n is prime, dividing an integer b is simply multiplying b's inverse. That is:
(a / b) mod n = (a * inv(b)) mod n
where
inv(b) = (b ^ (n - 2)) mod n
Calculating inv(b) can be done in O(log(n)) time using the Exponentiation by squaring algorithm. Here is the code:
int inv(int b, int n)
{
int r = 1, m = n - 2;
while (m)
{
if (m & 1) r = (long long)r * b % n;
b = (long long)b * b % n;
m >>= 1;
}
return r;
}
Why it works? According to Fermat's little theorem, if n is prime, b ^ (n - 1) mod n = 1 for any positive integer b. Therefore we have inv(b) * b mod n = 1.
Another solution for finding inv(b) is the Extended Euclidean algorithm, which needs a bit more code to implement.
I think you can distribute the division like
z = d*e/3
(a/z)*(b/z)*(c/z) % n
Remains only the integer division problem.
I think the problem you had was that you picked a problem that was too simple for an example. In that case the answer was 7 , but what if a*b*c was not evenly divisible by c*d ? You should probably look up how to do division with modulo first, it should be clear to you :)
Instead of dividing, think in terms of multiplicative inverses. For each number in a mod-n system, there ought to be an inverse, if certain conditions are met. For d and e, find those inverses, and then it's all just multiplying. Finding the inverses is not done by dividing! There's plenty of info out there...
I have seen many questions on SO about this particular subject but none of them has any answer for me, so I thought of asking this question.
I wanted to generate a random number between [-1, 1]. How I can do this?
Use -1+2*((float)rand())/RAND_MAX
rand() generates integers in the range [0,RAND_MAX] inclusive therefore, ((float)rand())/RAND_MAX returns a floating-point number in [0,1]. We get random numbers from [-1,1] by adding it to -1.
EDIT: (adding relevant portions of the comment section)
On the limitations of this method:
((float)rand())/RAND_MAX returns a percentage (a fraction from 0 to 1). So since the range between -1 to 1 is 2 integers, I multiply that fraction by 2 and then add it to the minimum number you want, -1. This also tells you about the quality of your random numbers since you will only have RAND_MAX unique random numbers.
If all you have is the Standard C library, then other people's answers are sensible. If you have POSIX functionality available to you, consider using the drand48() family of functions. In particular:
#define _XOPEN_SOURCE 600 /* Request non-standard functions */
#include <stdlib.h>
double f = +1.0 - 2.0 * drand48();
double g = -1.0 + 2.0 * drand48();
Note that the manual says:
The drand48() and erand48() functions shall return non-negative, double-precision, floating-point values, uniformly distributed over the interval [0.0,1.0).
If you strictly need [-1.0,+1.0] (as opposed to [-1.0,+1.0)), then you face a very delicate problem with how to extend the range.
The drand48() functions give you considerably more randomness than the typical implementation of rand(). However, if you need cryptographic randomness, none of these are appropriate; you need to look for 'cryptographically strong PRNG' (PRNG = pseudo-random number generator).
I had a similar question a while back and thought that it might be more efficient to just generate the fractional part directly. I did some searching and came across an interesting fast floating point rand that doesn't use floating point division or multiplication or a int->float cast can be done with some intimate knowledge of the internal representation of a float:
float sfrand( void )
{
unsigned int a=(rand()<<16)|rand(); //we use the bottom 23 bits of the int, so one
//16 bit rand() won't cut it.
a=(a&0x007fffff) | 0x40000000;
return( *((float*)&a) - 3.0f );
}
The first part generates a random float from [2^1,2^2), subtract 3 and you have [-1, 1). This of course may be too intimate for some applications/developers but it was just what I was looking for. This mechanism works well for any range that is a power of 2 wide.
For starters, you'll need the C library function rand(). This is in the stdlib.h header file, so you should put:
#include <stdlib.h>
near the beginning of your code. rand() will generate a random integer between zero and RAND_MAX so dividing it by RAND_MAX / 2 will give you a number between zero and 2 inclusive. Subtract one, and you're onto your target range of -1 to 1.
However, if you simply do int n = rand() / (RAND_MAX / 2) you will find you don't get the answer which you expect. This is because both rand() and RAND_MAX / 2 are integers, so integer arithmetic is used. To stop this from happening, some people use a float cast, but I would recommend avoiding casts by multiplying by 1.0.
You should also seed your random number generator using the srand() function. In order to get a different result each time, people often seed the generator based on the clock time, by doing srand(time(0)).
So, overall we have:
#include <stdlib.h>
srand(time(0);
double r = 1.0 * rand() / (RAND_MAX / 2) - 1;
While the accepted answer is fine in many cases, it will leave out "every other number", because it is expanding a range of already discrete values by 2 to cover the [-1, 1] interval. In a similar way if you had a random number generator which could generate an integer from [0, 10] and you wanted to generate [0, 20], simply multiplying by 2 will span the range, but not be able to cover the range (it would leave out all the odd numbers).
It probably has sufficiently fine grain for your needs, but does have this drawback, which could be statistically significant (and detrimental) in many applications - particularly monte carlo simulations and systems which have sensitive dependence on initial conditions.
A method which is able to generate any representable floating point number from -1 to 1 inclusive should rely on generating a sequence a1.a2 a3 a4 a5 ... up to the limit of your floating point precision which is the only way to be able to generate any possible float in the range. (i.e. following the definition of the real numbers)
From the "The C Standard Library"
int rand(void) - Returns pseudo-random number in range 0 to RAND_MAX
RAND_MAX - Maximum value returned by rand().
So:
rand() will return a pseudo-random number in range 0 to RAND_MAX
rand() / (double) RAND_MAX will return a pseudo-random number in range 0 to 1
2 * (rand() / (double) RAND_MAX) will return a pseudo-random number in range 0 to 2
2 * (rand() / (double) RAND_MAX) - 1 will return a pseudo-random number in range -1 to 1
As others already noted, any attempts to simply transform the range of 'rand()' function from [0, RAND_MAX] into the desired [-1, +1] will produce a random number generator that can only generate a discrete set of floating-point values. For a floating-point generator the density of these values might be insufficient in some applications (if the implementation-defined value of RAND_MAX is not sufficiently large). If this is a problem, one can increase the aforementioned density exponentially by using two or more 'rand()' calls instead of one.
For example, by combining the results of two consecutive calls to 'rand()' one can obtain a pseudo-random number in [0, (RAND_MAX + 1)^2 - 1] range
#define RAND_MAX2 ((RAND_MAX + 1ul) * (RAND_MAX + 1) - 1)
unsigned long r2 = (unsigned long) rand() * (RAND_MAX + 1) + rand();
and later use the same method to transform it into a floating-point number in [-1, +1] range
double dr2 = r2 * 2.0 / RAND_MAX2 - 1;
By using this method one can build-up as many 'rand()' calls as necessary, keeping an eye on integer overflow, of course.
As a side note, this method of combining consecutive 'rand()' calls doesn't produce very high quality pseudo-random number generators, but it might work perfectly well for many purposes.