Fast C random boolean generator

Fast C random boolean generator - c

I'm interested in generating fast random booleans (or equivalently a Bernoulli(0.5) random variable) in C. Of course if one has a fast random generator with a decent statistical behaviour the problem "sample a random Bernoulli(0.5)" is easily solved: sample x uniformly in (0,1) and return 1 if x<0.5, 0 otherwise.
Suppose speed is the most important thing, now I have two questions/considerations:
Many random doubles generators first generate an integer m uniformly in a certain range [0,M] and then simply return the division m/M. Wouldn't it be faster just to check whether m < M/2 (here M/2 is fixed, so we are saving one division)
Is there any faster way to do it? At the end, we're asking for way less statistical properties here: we're maybe still interested in a long period but, for example, we don't care about the uniformity of the distribution (as long as roughly 50% of the values are in the first half of the range).

Extracting say the last bit of a random number can wreak havoc as linear congruential generators can alternate between odd and even numbers1. A scheme like clock() & 1 would also have ghastly correlation plains.
Consider a solution based on the quick and dirty generator of Donald Kunth: for uint32_t I, sequence
I = 1664525 * I + 1013904223;
and 2 * I < I is the conditional yielding the Boolean drawing. Here I'm relying on the wrap-around behaviour of I which should occur half the time, and a potentially expensive division is avoided.
Testing I <= 0x7FFFFFFF is less flashy and might be faster still, but the hardcoding of the midpoint is not entirely satisfactory.
1 The generator I present here does.

I'm interested in generating fast random booleans
Using a LCG can be fast, yet since OP's needs only a bool result, consider extracting only 1 bit at a time from a reasonable generator and save the rest for later. #Akshay L Aradhya
Example based on #R.. and #R.. code.
extern uint32_t lcg64_temper(uint64_t *seed); // see R.. code
static uint64_t gseed; // Initialize this in some fashion.
static unsigned gcount = 0;
bool rand_bool(void) {
static uint32_t rbits;
if (gcount == 0) {
gcount = 32; // I'd consider using 31 here, just to cope with some LCG weaknesses.
rbits = lcg64_temper(&gseed);
}
gcount--;
bool b = rbits & 1;
rbits >>= 1;
return b;
}

Related

Why does rand() repeat numbers far more often on Linux than Mac?

I was implementing a hashmap in C as part of a project I'm working on and using random inserts to test it. I noticed that rand() on Linux seems to repeat numbers far more often than on Mac. RAND_MAX is 2147483647/0x7FFFFFFF on both platforms. I've reduced it to this test program that makes a byte array RAND_MAX+1-long, generates RAND_MAX random numbers, notes if each is a duplicate, and checks it off the list as seen.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
int main() {
size_t size = ((size_t)RAND_MAX) + 1;
char *randoms = calloc(size, sizeof(char));
int dups = 0;
srand(time(0));
for (int i = 0; i < RAND_MAX; i++) {
int r = rand();
if (randoms[r]) {
// printf("duplicate at %d\n", r);
dups++;
}
randoms[r] = 1;
}
printf("duplicates: %d\n", dups);
}
Linux consistently generates around 790 million duplicates. Mac consistently only generates one, so it loops through every random number that it can generate almost without repeating. Can anyone please explain to me how this works? I can't tell anything different from the man pages, can't tell which RNG each is using, and can't find anything online. Thanks!

While at first it may sound like the macOS rand() is somehow better for not repeating any numbers, one should note that with this amount of numbers generated it is expected to see plenty of duplicates (in fact, around 790 million, or (231-1)/e). Likewise iterating through the numbers in sequence would also produce no duplicates, but wouldn't be considered very random. So the Linux rand() implementation is in this test indistinguishable from a true random source, whereas the macOS rand() is not.
Another thing that appears surprising at first glance is how the macOS rand() can manage to avoid duplicates so well. Looking at its source code, we find the implementation to be as follows:
/*
* Compute x = (7^5 * x) mod (2^31 - 1)
* without overflowing 31 bits:
* (2^31 - 1) = 127773 * (7^5) + 2836
* From "Random number generators: good ones are hard to find",
* Park and Miller, Communications of the ACM, vol. 31, no. 10,
* October 1988, p. 1195.
*/
long hi, lo, x;
/* Can't be initialized with 0, so use another value. */
if (*ctx == 0)
*ctx = 123459876;
hi = *ctx / 127773;
lo = *ctx % 127773;
x = 16807 * lo - 2836 * hi;
if (x < 0)
x += 0x7fffffff;
return ((*ctx = x) % ((unsigned long) RAND_MAX + 1));
This does indeed result in all numbers between 1 and RAND_MAX, inclusive, exactly once, before the sequence repeats again. Since the next state is based on multiplication, the state can never be zero (or all future states would also be zero). Thus the repeated number you see is the first one, and zero is the one that is never returned.
Apple has been promoting the use of better random number generators in their documentation and examples for at least as long as macOS (or OS X) has existed, so the quality of rand() is probably not deemed important, and they've just stuck with one of the simplest pseudorandom generators available. (As you noted, their rand() is even commented with a recommendation to use arc4random() instead.)
On a related note, the simplest pseudorandom number generator I could find that produces decent results in this (and many other) tests for randomness is xorshift*:
uint64_t x = *ctx;
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
*ctx = x;
return (x * 0x2545F4914F6CDD1DUL) >> 33;
This implementation results in almost exactly 790 million duplicates in your test.

MacOS provides an undocumented rand() function in stdlib. If you leave it unseeded, then the first values it outputs are 16807, 282475249, 1622650073, 984943658 and 1144108930. A quick search will show that this sequence corresponds to a very basic LCG random number generator that iterates the following formula:
xn+1 = 75 · xn (mod 231 − 1)
Since the state of this RNG is described entirely by the value of a single 32-bit integer, its period is not very long. To be precise, it repeats itself every 231 − 2 iterations, outputting every value from 1 to 231 − 2.
I don't think there's a standard implementation of rand() for all versions of Linux, but there is a glibc rand() function that is often used. Instead of a single 32-bit state variable, this uses a pool of over 1000 bits, which to all intents and purposes will never produce a fully repeating sequence. Again, you can probably find out what version you have by printing the first few outputs from this RNG without seeding it first. (The glibc rand() function produces the numbers 1804289383, 846930886, 1681692777, 1714636915 and 1957747793.)
So the reason you're getting more collisions in Linux (and hardly any in MacOS) is that the Linux version of rand() is basically more random.

rand() is defined by the C standard, and the C standard does not specify which algorithm to use. Obviously, Apple is using an inferior algorithm to your GNU/Linux implementation: The Linux one is indistinguishable from a true random source in your test, while the Apple implementation just shuffles the numbers around.
If you want random numbers of any quality, either use a better PRNG that gives at least some guarantees on the quality of the numbers it returns, or simply read from /dev/urandom or similar. The later gives you cryptographic quality numbers, but is slow. Even if it is too slow by itself, /dev/urandom can provide some excellent seeds to some other, faster PRNG.

In general, the rand/srand pair has been considered sort of deprecated for a long time due to low-order bits displaying less randomness than high-order bits in the results. This may or may not have anything to do with your results, but I think this is still a good opportunity to remember that even though some rand/srand implementations are now more up to date, older implementations persist and it's better to use random(3). On my Arch Linux box, the following note is still in the man page for rand(3):
The versions of rand() and srand() in the Linux C Library use the same
random number generator as random(3) and srandom(3), so the lower-order
bits should be as random as the higher-order bits. However, on older
rand() implementations, and on current implementations on different
systems, the lower-order bits are much less random than the higher-or-
der bits. Do not use this function in applications intended to be por-
table when good randomness is needed. (Use random(3) instead.)
Just below that, the man page actually gives very short, very simple example implementations of rand and srand that are about the simplest LC RNGs you've ever seen and having a small RAND_MAX. I don't think they match what's in the C standard library, if they ever did. Or at least I hope not.
In general, if you're going to use something from the standard library, use random if you can (the man page lists it as POSIX standard back to POSIX.1-2001, but rand is standard way back before C was even standardized). Or better yet, crack open Numerical Recipes (or look for it online) or Knuth and implement one. They're really easy and you only really need to do it once to have a general purpose RNG with the attributes you most often need and which is of known quality.

How does XorShift32 works?

I have this homework where i need to implement xorshift32(i can t use anything else) so i can generate some numbers but i don t understand how the algorithm works or how to implement it.
I am trying to print the generated number but i don t know how to call the xorshift32 function because of the state[static 1] argument.
uint32_t xorshift32(uint32_t state[static 1])
{
uint32_t x = state[0];
x ^= x << 13;
x ^= x >> 17;
x ^= x << 5;
state[0] = x;
return x;
}
I do not have much information on xorshft32 other that what is on wikipedia(en.wikipedia.org/wiki/Xorshift).

This is an extended comment to the good answer by Jabberwocky.
The Xorshift variants, rand(), and basically all random number generator functions, are actually pseudorandom number generators. They are not "real random", because the sequence of numbers they generate depends on their internal state; but they are "pseudorandom", because if you do not know the generator internal state, the sequence of numbers they generate is random in the statistical sense.
George Marsaglia, the author of the Xorshift family of pseudorandom number generators, also developed a set of statistical tools called Diehard tests that can be used to analyse the "randomness" of the sequences generated. Currently, the TestU01 tests are probably the most widely used and trusted; in particular, the 160-test BigCrush set.
The sequence generated by ordinary pseudorandom number generators often allows one to determine the internal state of the generator. This means that observing a long enough generated sequence, allows one to fairly reliably predict the future sequence. Cryptographically secure pseudorandom number generators avoid that, usually by applying a cryptographically secure hash function to the output; one would need a catalog of the entire sequence to be able to follow it. When the periods are longer than 2256 or so, there is not enough baryonic matter in the entire observable universe to store the sequence.
My own favourite PRNG is Xorshift64*, which has a period of 264-1, and passes all but the MatrixRank test in BigCrush. In C99 and later, you can implement it using
#include <inttypes.h>
typedef struct {
uint64_t state;
} prng_state;
static inline uint64_t prng_u64(prng_state *const p)
{
uint64_t state = p->state;
state ^= state >> 12;
state ^= state << 25;
state ^= state >> 27;
p->state = state;
return state * UINT64_C(2685821657736338717);
}
The state can be initialized to any nonzero uint64_t. (A zero state will lead the generator to generate all zeros till infinity. The period is 264-1, because the generator will have each 64-bit state (excluding zero) exactly once during each period.)
It is good enough for most use cases, and extremely fast. It belongs to the class of linear-feedback shift register pseudorandom number generators.
Note that the variant which returns an uniform distribution between 0 and 1,
static inline double prng_one(prng_state *p)
{
return prng_u64(p) / 18446744073709551616.0;
}
uses the high bits; the high 32 bits of the sequence does pass all BigCrunch tests in TestU01 suite, so this is a surprisingly good (randomness and efficiency) generator for double-precision uniform random numbers -- my typical use case.
The format above allows multiple independent generators in a single process, by specifying the generator state as a parameter. If the basic generator is implemented in a header file (thus the static inline; it is a preprocessor macro-like function), you can switch between generators by switching between header files, and recompiling the binary.
(You are usually better off by using a single generator, unless you use multiple threads in a pseudorandom number heavy simulator, in which case using a separate generator for each thread will help a lot; avoids cacheline ping-pong between threads competing for the generator state, in particular.)
The rand() function in most C standard library implementations is a linear-congruential generator. They often suffer from poor choices of the coefficients, and nowadays, also from the relative slowness of the modulo operator (when the modulus is not a power of two).
The most widely used pseudorandom number generator is the Mersenne Twister, by Makoto Matsumoto (松本 眞) and Takuji Nishimura (西村 拓士). It is a twisted generalized linear feedback shift register, and has quite a large state (about 2500 bytes) and very long period (219937-1).
When we talk of true random number generators, we usually mean a combination of a pseudorandom number generator (usually a cryptographically secure one), and a source of entropy; random bits with at least some degree of true physical randomness.
In Linux, Mac OS, and BSDs at least, the operating system kernel exposes a source of pseudorandom numbers (getentropy() in Linux and OpenBSD, getrandom() in Linux, /dev/urandom, /dev/arandom, /dev/random in many Unixes, and so on). Entropy is gathered from physical electronic sources, like internal processor latencies, physical interrupt line timings, (spinning disk) hard drive timings, possibly even keyboard and mice. Many motherboards and some processors even have hardware random number sources that can be used as sources for entropy (or even directly as "trusted randomness sources").
The exclusive-or operation (^ in C) is used to mix in randomness to the generator state. This works, because exclusive-or between a known bit and a random bit results in a random bit; XOR preserves randomness. When mixing entropy pools (with some degree of randomness in the bit states) using XOR, the result will have at least as much entropy as the sources had.
Note that that does not mean that you get "better" random numbers by mixing the output of two or more generators. The statistics of true randomness is hard for humans to grok (just look at how poor the common early rand() implementations were! HORRIBLE!). It is better to pick a generator (or a set of generators to switch between at compile time, or at run time) that passes the BigCrunch tests, and ensure it has a good random initial state on every run. That way you leverage the work of many mathematicians and others who have worked on these things for decades, and can concentrate on the other stuff, what you yourself are good at.

The C code in the wikipedia article is somewhat misleading:
Here is a working example that uses both the 32 bit and the 64 bit versions:
#include <stdio.h>
#include <stdint.h>
/* The state word must be initialized to non-zero */
uint32_t xorshift32(uint32_t state[])
{
/* Algorithm "xor" from p. 4 of Marsaglia, "Xorshift RNGs" */
uint32_t x = state[0];
x ^= x << 13;
x ^= x >> 17;
x ^= x << 5;
state[0] = x;
return x;
}
uint64_t xorshift64(uint64_t state[])
{
uint64_t x = state[0];
x ^= x << 13;
x ^= x >> 7;
x ^= x << 17;
state[0] = x;
return x;
}
int main()
{
uint32_t state[1] = {1234}; // "seed" (can be anthing but 0)
for (int i = 0; i < 50; i++)
{
printf("%u\n", xorshift32(state));
}
uint64_t state64[1] = { 1234 }; // "seed" (can be anthing but 0)
for (int i = 0; i < 50; i++)
{
printf("%llu\n", xorshift64(state64));
}
}
The mathematical aspects are explained in the wikipedia article and in it's footnotes.
The rest is basic C language knowledge, ^ is the C bitwise XOR operator.

Random integers in C, how bad is rand()%N compared to integer arithmetic? What are its flaws?

EDIT:
My question is: rand()%N is considered very bad, whereas the use of integer arithmetic is considered superior, but I cannot see the difference between the two.
People always mention:
low bits are not random in rand()%N,
rand()%N is very predictable,
you can use it for games but not for cryptography
Can someone explain if any of these points are the case here and how to see that?
The idea of the non-randomness of the lower bits is something that should make the PE of the two cases that I show differ, but it's not the case.
I guess many like me would always avoid using rand(), or rand()%N because we've been always taught that it is pretty bad. I was curious to see how "wrong" random integers generated with c rand()%N effectively are. This is also a follow up to Ryan Reich's answer in How to generate a random integer number from within a range.
The explanation there sounds very convincing, to be honest; nevertheless, I thought I’d give it a try. So, I compare the distributions in a VERY naive way. I run both random generators for different numbers of samples and domains. I didn't see the point of computing a density instead of histograms, so I just computed histograms and, just by looking, I would say they both look just as uniform. Regarding the other point that was raised, about the actual randomness (despite being uniformly distributed). I — again naively —compute the permutation entropy for these runs, which are the same for both sample sets, which tell us that there's no difference between both regarding the ordering of the occurrence.
So, for many purposes, it seems to me that rand()%N would be just fine, how can we see their flaws?
Here I show you a very simple, inefficient and not very elegant (but I think correct) way of computing these samples and get the histograms together with the permutation entropies.
I show plots for domains (0,i) with i in {5,10,25,50,100} for different number of samples:
There's not much to see in the code I guess, so I will leave both the C and the matlab code for replication purposes.
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
int main(int argc, char *argv[]){
unsigned long max = atoi(argv[2]);
int samples=atoi(argv[3]);
srand(time(NULL));
if(atoi(argv[1])==1){
for(int i=0;i<samples;++i)
printf("%ld\n",rand()%(max+1));
}else{
for(int i=0;i<samples;++i){
unsigned long
num_bins = (unsigned long) max + 1,
num_rand = (unsigned long) RAND_MAX + 1,
bin_size = num_rand / num_bins,
defect = num_rand % num_bins;
long x;
do {
x = rand();
}
while (num_rand - defect <= (unsigned long)x);
printf("%ld\n",x/bin_size);
}
}
return 0;
}
And here is the Matlab code to plot this and compute the PEs (the recursion for the permutations I took it from: https://www.mathworks.com/matlabcentral/answers/308255-how-to-generate-all-possible-permutations-without-using-the-function-perms-randperm):
system('gcc randomTest.c -o randomTest.exe;');
max = 100;
samples = max*10000;
trials = 200;
system(['./randomTest.exe 1 ' num2str(max) ' ' num2str(samples) ' > file1'])
system(['./randomTest.exe 2 ' num2str(max) ' ' num2str(samples) ' > file2'])
a1=load('file1');
a2=load('file2');
uni = figure(1);
title(['Samples: ' num2str(samples)])
subplot(1,3,1)
h1 = histogram(a1,max+1);
title('rand%(max+1)')
subplot(1,3,2)
h2 = histogram(a2,max+1);
title('Integer arithmetic')
as=[a1,a2];
ns=3:8;
H = nan(numel(ns),size(as,2));
for op=1:size(as,2)
x = as(:,op);
for n=ns
sequenceOcurrence = zeros(1,factorial(n));
sequences = myperms(1:n);
sequencesArrayIdx = sum(sequences.*10.^(size(sequences,2)-1:-1:0),2);
for i=1:numel(x)-n
[~,sequenceOrder] = sort(x(i:i+n-1));
out = sequenceOrder'*10.^(numel(sequenceOrder)-1:-1:0).';
sequenceOcurrence(sequencesArrayIdx == out) = sequenceOcurrence(sequencesArrayIdx == out) + 1;
end
chunks = length(x) - n + 1;
ps = sequenceOcurrence/chunks;
hh = sum(ps(logical(ps)).*log2(ps(logical(ps))));
H(n,op) = hh/log2(factorial(n));
end
end
subplot(1,3,3)
plot(ns,H(ns,:),'--*','linewidth',2)
ylabel('PE')
xlabel('Sequence length')
filename = ['all_' num2str(max) '_' num2str(samples) ];
export_fig(filename)

Due to the way modulo arithmetic works if N is significant compared to RAND_MAX doing %N will make it so you're considerably more likely to get some values than others. Imagine RAND_MAX is 12, and N is 9. If the distribution is good then the chances of getting one of 0, 1, or 2 is 0.5, and the chances of getting one of 3, 4, 5, 6, 7, 8 is 0.5. The result being that you're twice as likely to get a 0 instead of a 4. If N is an exact divider of RAND_MAX this distribution problem doesn't happen, and if N is very small compared to RAND_MAX the issue becomes less noticeable. RAND_MAX may not be a particularly large value (maybe 2^15 - 1), making this problem worse than you may expect. The alternative of doing (rand() * n) / (RAND_MAX + 1) also doesn't give an even distribution, however, it will be every mth value (for some m) that will be more likely to occur rather than the more likely values all being at the low end of the distribution.
If N is 75% of RAND_MAX then the values in the bottom third of your distribution are twice as likely as the values in the top two thirds (as this is where the extra values map to)
The quality of rand() will depend on the implementation of the system that you're on. I believe that some systems have had very poor implementation, OS Xs man pages declare rand obsolete. The Debian man page says the following:
The versions of rand() and srand() in the Linux C Library use the same
random number generator as random(3) and srandom(3), so the lower-order
bits should be as random as the higher-order bits. However, on older
rand() implementations, and on current implementations on different
systems, the lower-order bits are much less random than the higher-
order bits. Do not use this function in applications intended to be
portable when good randomness is needed. (Use random(3) instead.)

Both approaches have their pitfalls, and your graphs are little more than a pretty verification of the central limit theorem! For a sensible implementation of rand():
% N suffers from a "pigeon-holing" effect if 1u + RAND_MAX is not a multiple of N
/((RAND_MAX + 1u)/N) does not, in general, evenly distribute the return of rand across your range, due to integer truncation effects.
On balance, if N is small cf. RAND_MAX, I'd plump for % for its tractability. In any case test your generator to see it it has the appropriate statistical properties for your application.

rand() % N is considered extremely poor not because the distribution is bad, but because the randomness is poor-to-nonexistent. (If anything the distribution will be too good.)
If N is not small with respect to RAND_MAX, both
rand() % N
and
rand() / (RAND_MAX / N + 1)
will have more or less the same, poor distribution -- certain values will occur with significantly higher probability than others.
Looking at distribution histograms won't show you that for some implementations, rand() % N has a much, much worse problem -- to show that you'd have to perform some correlations with previous values. (For example, try taking rand() % 2, then subtracting from the previous value you got, and plotting a histogram of the differences. If the difference is never 0, you've got a problem.)
I would like to say that the implementations for which rand()'s low-order bits aren't random are simply buggy. I'd like to think that all those buggy implementations would have disappeared by now. I'd like to think that programmers shouldn't have to worry about calling rand()%N any more. But, unfortunately, my wishes don't change the fact that this seems to be one of those bugs that never get fixed, meaning that programmers do still have to worry.
See also the C FAQ list, question 13.16.

Fast hashing of 32 bit values to between 0 and 254 inclusive

I'm looking for a fast way in C to hash numbers 32-bit numbers more or less uniformly between 0 and 254. 255 is reserved for a special purpose.
As an added constraint, I'm looking for a method that would map well to being used with ISA-specific vector intrinsics or to a language like OpenCL or CUDA without introducing control flow divergence between the vector lanes/threads.
Ordinarily, I would just use the following code to hash the number between 0 and 255, as this is just a fast way of doing x mod 256.
inline uint8_t hash(uint32_t x){ return x & 255; }
I could just give in and use the following:
inline uint8_t hash(uint32_t x){ return x % 255; }
However, this solution seems unimaginative and unlikely to be the highest performing solution. I found code at this site (http://homepage.cs.uiowa.edu/~jones/bcd/mod.shtml#exmod15) that appears to provide a reasonable solution for scalar code and have inserted it here for your convenience.
uint32_t mod255( uint32_t a ) {
a = (a >> 16) + (a & 0xFFFF); /* sum base 2**16 digits */
a = (a >> 8) + (a & 0xFF); /* sum base 2**8 digits */
if (a < 255) return a;
if (a < (2 * 255)) return a - 255;
return a - (2 * 255);
}
I see two potential performance issues with this code:
The large number of if statements makes me question how easy it will be for a compiler or human :) to effectively vectorize the code without leading to control flow divergence within a warp/wavefront on a SIMT architecture or vectorized execution on a multicore CPU. If such divergence does occur, it will reduce parallel efficiency, as the divergent paths will have to be run in series.
It looks like it could be troublesome for a branch predictor (not applicable on common GPU architectures) as the code path that executes depends on the value of the input. Therefore, if there is a mix of small and large values interspersed with one another, this code will likely sacrifice some performance due to a moderate number of branch mispredictions.
Any recommendations on alternatives that I could use are most welcome. Alternatively, let me know if what I am asking for is unreasonable.

The "if statements on GPU kill performance" is a popular misconception which desperately wants to live on, it seems.
The large number of if statements makes me question how easy it will
be for a compiler or human :) to vectorize the code.
First of all I wouldn't consider 2 if statements a "large number of if statements", and those are so short and trivial that I'm willing to bet the compiler will turn them into branchless conditional moves or predicated instructions. There will be no performance penalty at all. (Do check the generated assembly, however).
It looks like it could be troublesome for a branch predictor as the code path that executes depends on the value of the input. Therefore, if there is a mix of small and large values interspersed with one another, this code will likely sacrifice some performance due to a moderate number of branch mispredictions.
Current GPUs do not have branch predictors. Note however that depending on the underlying hardware, operation on integers (and notably shifting) may be quite costly.

I would just do this:
uchar fast_mod255( uint a32 ) {
ushort a16 = (a32 >> 16) + (a32 & 0xFFFF); /* sum base 2**16 digits */
uchar a8 = (a16 >> 8) + (a16 & 0xFF); /* sum base 2**8 digits */
return (a8 % 255);
}
Another option is to just do:
uchar fast_mod255( uchar4 a ) {
return (dot(a) % 255); // or return (distance(a) % 255);
}
GPUs are very efficient in computing the distances and dot products, even in 4 dimensions. And it is a valid way of hashing as well. Dsicarding the overflowed values.
No branching, and a clever compiler can even optimize it out. Or do you really need that values that fall in the 255 zone have a scattered pattern instead of 1?

I wanted to answer my own question because over the last 2 years I have seen ways to get around a slow integer divide instruction. The easiest way is to make the integer a compile-time constant. Any decent modern compiler should replace the integer divide with an equivalent set of other instructions with typically higher throughput (how many such instructions can be retired per cycle) and reduced latency (how many cycles it takes the instruction to execute). If you're curious, check out Hacker's Delight (an excellent book on low-level computer arithmetic).
I wanted to share another finding, which I found on Daniel Lemire's blog (located here). The code that follows doesn't compute mod 255 but does something similar, which is equally useful in a number of applications and much faster.
Suppose that you have a set of numbers S that are uniformly randomly picked from the range 0 to 2^k - 1 inclusive, where k >= 0. In this case, if you care only about mapping numbers roughly uniformly from 0 to 254 inclusive, you may do the following:
For each number n in a set S, you may map n to one of the 255 candidate values by multiplying n by 255 and then arithmetically shifting the result to the right by k digits.
Here is the function that you call on each n for a fixed value of k:
int map_to_0_to_254(int n, int k){
return (n * 255) >> k;
}
As an example, if the values for the argument n range uniformly randomly from 0 to 4095 (2^12 - 1),
then map_to_0_254(n, 12) will return a value in the range 0 to 254 inclusive.
Here is a more general templated version in C++ for mapping to range from 0 to range_size - 1 inclusive:
template<typename T>
T map_to_0_to_range_size_minus_1(T n, T range_size, T k){
return (n * range_size) >> k;
}
REMEMBER that this code assumes that the inputs for n are roughly uniformly randomly distributed between 0 and 2^k - 1 inclusive. If that property holds, then the outputs will be roughly uniformly distributed between 0 and range_size - 1 inclusive. The larger 2^k is relative to range_size, the more uniform the mapping will be for a fixed set of inputs.
Why This is Useful
This approach has applications to computing hash functions for hash tables where the number of bins is not a power of 2. Those operations would ordinarily require a long-latency integer divide instruction, which is often an order of magnitude slower to execute than an integer multiply, because you often do not know the number of bins in the hash table at compile time.

Deterministic bit scrambling to filter coordinates

I am trying to write a function that, given an (x,y) coordinate pair and the random seed of the program, will psuedo-randomly return true for some preset percentage of all such pairs. There are no limits on x or y beyond the restrictions of the data type, which is a 32-bit signed int.
My current approach is to scramble the bits of x, y, and the seed together and then compare the resulting number to the percentage:
float percentage = 0.005;
...
unsigned int n = (x ^ y) ^ seed;
return (((float) n / UINT_MAX) < percentage);
However, it seems that this approach would be biased for certain values of x and y. For example, if it returns true for (0,a), it will also return true for (a,0).
I know this implementation that just XORs them together is naive. Is there a better bit-scrambling algorithm to use here that will not be biased?
Edit: To clarify, I am not starting with a set of (x,y) coordinates, nor am I trying to get a fixed-size set of coordinates that evaluate to true. The function should be able to evaluate a truth value for arbitrary x, y, and seed, with the percentage controlling the average frequency of "true" coordinates.

The easy solution is to use a good hashing algorithm. You can do the range check on the value of hash(seed || x || y).
Of course, selecting points individually with percentage p does not guarantee that you will end up with a sample whose size will be exactly p * N. (That's the expected size of the sample, but any given sample will deviate a bit.) If you want to get a sample of size precisely k from a universe of N objects, you can use the following simple algorithm:
Examine the elements in the sample one at a time until k reaches 0.
When examining element i, add it to the sample if its hash value mapped onto the range [0, N-i) is less than k. If you add the element to the sample, decrement k.
There's no way to get the arithmetic absolutely perfect (since there is no way to perfectly partition 2i different hash values into n buckets unless n is a power of 2), so there will always be a tiny bias. (Floating point arithmetic does not help; the number of possible floating point values is also fixed, and suffers from the same bias.)
If you do 64-bit arithmetic, the bias will be truly tiny, but the arithmetic is more complicated unless your environment provides a 128-bit multiply. So you might feel satisfied with 32-bit computations, where the bias of one in a couple of thousand million [Note 1] doesn't matter. Here, you can use the fact that any 32 bits in your hash should be as unbiased as any other 32 bits, assuming your hash algorithm is any good (see below). So the following check should work fine:
// I need k elements from a remaining universe of n, and I have a 64-bit hash.
// Return true if I should select this element
bool select(uint32_t n, uint32_t k, uint64_t hash) {
return ((hash & (uint32_t)(-1)) * (uint64_t)n) >> 32 < k;
}
// Untested example sampler
// select exactly k elements from U, using a seed value
std::vector<E> sample(const std::vector<E>& U, uint64_t seed, uint32_t k) {
std::vector<E> retval;
uint32_t n = U.size();
for (uint32_t n = U.size(); k && n;) {
E& elt = U[--n];
if (select(n, k, hash_function(seed, elt))) {
retval.push_back(elt);
--k;
}
}
return retval;
}
Assuming you need to do this a lot, you'll want to use a fast hash algorithm; since you're not actually working in a secure environment, you don't need to worry about whether the algorithm is cryptographically secure.
Many high-speed hashing algorithms work on 64-bit units, so you could maximize the speed by constructing a 128-bit input consisting of a 64-bit seed and the two 32-bit co-ordinates. You can then unroll the hash loop to do exactly two blocks.
I won't venture a guess at the best hash function for your purpose. You might want to check out one or more of these open-source hashing functions:
Farmhash https://code.google.com/p/farmhash/
Murmurhash https://code.google.com/p/smhasher/
xxhash https://code.google.com/p/xxhash/
siphash https://github.com/majek/csiphash/
... and many more.
Notes
A couple of billion, if you're on that side of the Atlantic.

I would prefer feeding seed, x, and y through a Combined Linear Congruential Generator.
This is generally much faster than hashing, and it is designed specifically for the purpose: To output a pseudo-random number uniformly in a certain range.
Using coefficients recommended by Wichmann-Hill (which are also used in some versions of Microsoft Excel) we can do:
si = 171 * s % 30269;
xi = 172 * x % 30307;
yi = 170 * y % 30323;
r_combined = fmod(si/30269. + xi/30307. + yi/30323., 1.);
return r_combined < percentage;
Where s is the seed on the first call, and the previous si on each subsequent call. (Thanks to rici's comment for this point.)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight