Generating uniformly distributed random numbers in distributed environment

Generating uniformly distributed random numbers in distributed environment - database

I have to generate a "unique Random Number" in a Wireless sensor network which works on the principle of Gossiping.
The requirements are:
Each node has to generate a unique Random Number, without
having any shared knowledge of what other nodes have generated.
The Distribution of the generated Random number should be uniform with
respect to each other.
It would be preferable if the range of the generated random number is around 10-16 bits or may be lesser.
The limitations are:
One node has no idea what number the other nodes in the network are generating.
Implementation in C, C++.
I also have the provision of using a unique seed for random number generation. the seed could be any number in the range 0-2^15.
If there is no way of generating such numbers, then it would be helpful if there is any method by which I can meet some of the above requirements.
If you can suggest me some way to achieve this result it would be really helpful.

for this solution to work you must know the total number of nodes in the generation network. let this number be n.
the basic idea is to generate uniformly distributed random numbers on each participating node inside a given interval. the n intervals of the participating nodes must not overlap.
a shared seed does not complicate matters if the total number of nodes does not change and each node can be assigned statically some integer i <= n such that each number is issued exactly once. instead of generating a single random number on each turn, n numbers are generated, and node i takes the i-th number from this series.
however, the overall distribution of random numbers generated will not be uniform unless ...:
you synchronize random number generation.
all intervals have the same size.
for information on random number generation on individual nodes see here.

Related

Simplest way to make a histogram of an unknown, finite list of discrete floating point numbers

I have a code that generates a sequence of configurations of some system of interest (Markov Chain Monte Carlo). For each configuration, I make a measurement of a particular value for that configuration, which is bounded between zero and some maximum which I can presumably predict before hand, let's call it Rmax. It can only take a finite number of discrete values in between 0 and Rmax, but the values could be irrational and are not evenly spaced, and I don't know them a priori, or necessarily how many there are (though I could probably estimate an upper bound). I want to generate a very large number of configurations (on the order 1e8) and make a histogram of the distribution of these values, but the issue that I am facing is how to effectively keep track of them.
For example, if the values were integers in the range [0,N-1], I would just create an integer array of N elements, initially set to zero, and increment the appropriate array element for each configuration, e.g. in pseudocode
do i = 1, 1e8
call generateConfig()
R = measureR() ! R is an integer
Rhist(R)++
end do
How can I do something similar to count or tally the number of times each of these irrational, non-uniformly distributed numbers occurs?

Named range of consistent random numbers

Background
Following on from a question I asked a while ago about getting an array of different (but not necessarily unique) random numbers to which the answer was this:
=RANDBETWEEN(ROW(A1:A10)^0,10)
To get an array of 10 random numbers between 1 and 10
The Problem
If I create a named range (called "randArray") with the formula above I hoped I would be able to reference randArray a number of times and get the same set of random numbers. Granted, they would change each time I press F9 or update the worksheet -- but change together.
This is what I get instead, two completely different sets of random numbers
I'm not surprised by this behavior but how can I achieve this without using VBA and without putting the random numbers onto the worksheet?
If you're interested
This example is intended to be MCVE. In my actual case, I am using random numbers to estimate Pi. The user stipulates how many random points to apply and gets an accordingly accurate estimation. The problem arises because I also graph the points and when there are a small number of points it's very clear to see that the estimation and the graph don't represent the same dataset
Update
I have awarded the initial bounty to #Michael for providing an interesting and different solution. I am still looking for a complete solution which allows the user to stipulate how many random points to use, and although there might not be a perfect answer I'm still interested in any other possible solutions and more than happy to put up further bounties.
Thank you to everyone who has contributed so far.

This solution generates 10 seemingly random numbers between 1 and 10 that persist for nearly 9 seconds at a time. This allows repeated calls of the same formula to return the same set of values in a single refresh.
You can modify the time frame if required. Shorter time periods allow for more frequent updates, but also slightly increase the extremely unlikely chance that some calls to the formula occur after the cutover point resulting in a 2nd set of 10 random numbers for subsequent calls.
Firstly, define an array "Primes" with 10 different prime numbers:
={157;163;167;173;179;181;191;193;197;199}
Then, define this formula that will return an array of 10 random numbers:
=MOD(ROUND(MOD(ROUND(NOW(),4)*70000,Primes),0),10)+1
Explanation:
We need to build our own random number generator that we can seed with the same value for an amount of time; long enough for the called formula to keep returning the same value.
Firstly, we create a seed: ROUND(NOW(),4) creates a new seed number every 0.0001 days = 8.64 seconds.
We can generate rough random numbers using the following formula:
Random = Seed * 7 mod Prime
https://cdsmith.wordpress.com/2011/10/10/build-your-own-simple-random-numbers/
Ideally, a sequence of random numbers is generated by taking input from the previous output, but we can't do that in a single function. So instead, this uses 10 different prime numbers, essentially starting 10 different random number generators. Now, this has less reliability at generating random numbers, but testing results further below shows it actually seems to do a pretty good job.
ROUND(NOW(),4)*70000 gets our seed up to an integer and multiplies by 7 at the same time
MOD(ROUND(NOW(),4)*70000,Prime) generates a sequence of 10 random numbers from 0 to the respective prime number
ROUND(MOD(ROUND(NOW(),4)*70000,Prime),0) is required to get us back to an integer because Excel seems to struggle with apply Mod to floating point numbers.
=MOD(ROUND(MOD(ROUND(NOW(),4)*70000,Prime),0),10)+1 takes just the value from the ones place (random number from 0 to 9) and shifts it to give us a random number from 1 to 10
Testing results:
I generated 500 lots of 10 random numbers (in columns instead of rows) for seed values incrementing by 0.0001 and counted the number of times each digit occurred for each prime number. You can see that each digit occurred nearly 500 times in total and that the distribution of each digit is nearly equal between each prime number. So, this may be adequate for your purposes.
Looking at the numbers generated in immediate succession you can see similarities between adjacent prime numbers, they're not exactly the same but they're pretty close in places, even if they're offset by a few rows. However, if the refresh is occurring at random intervals, you'll still get seemingly random numbers and this should be sufficient for your purposes. Otherwise, you can still apply this approach to a more complex random number generator or try a different mix of prime numbers that are further apart.
Update 1: Trying to find a way of being able to specify the number of random numbers generated without storing a list of primes.
Attempt 1: Using a single prime with an array of seeds:
=MOD(ROUND(MOD(ROUND(NOW()+ROW(OFFSET(INDIRECT("A1"),0,0,SampleSize))/10000,4)*70000,1013),0),10)+1
This does give you an even distribution, but it really is just repeating the exact same sequence of 10 numbers over and over. Any analysis of the sample would be identical to analysing =MOD(ROW(1:SampleSize),10)+1. I think you want more variation than that!
Attempt 2: Working on a 2-dimensional array that still uses 10 primes....
Update 2: Didn't work. It had terrible performance. A new answer has been submitted that takes a similar but different approach.

OK, here's a solution where users can specify the number of values in defined name SAMPLESIZE
=MOD(ROUND(MOD(ROUND(NOW()+ROW(OFFSET(INDIRECT("A1"),0,0,SampleSize)),4)*10000*163,1013),0)+ROUND(MOD(ROUND(NOW()+ROW(OFFSET(INDIRECT("A1"),0,0,SampleSize))*2,4)*10000*211,1013),0)+ROUND(MOD(ROUND(NOW()+ROW(OFFSET(INDIRECT("A1"),0,0,SampleSize))*3,4)*10000*17,1013),0)+ROUND(MOD(ROUND(NOW()+ROW(OFFSET(INDIRECT("A1"),0,0,SampleSize))*5,4)*10000*179,53),0)+ROUND(MOD(ROUND(NOW()+ROW(OFFSET(INDIRECT("A1"),0,0,SampleSize))*7,4)*10000*6101,1013),0),10)+1
It's a long formula, but has good efficiency and can be used in other functions. Attempts at a shorter formula resulted in unusably poor performance and arrays that for some reason couldn't be used in other functions.
This solution combines 5 different prime number generators to increase variety in the generated random numbers. Some arbitrary constants were introduced to try to reduce repeating patterns.
This has correct distribution and fairly good randomness. Repeated testing with a SampleSize of 10,000 resulted in frequencies of individual numbers varying between 960 and 1040 with no overall favoritism. However it seems to have the strange property of never generating the same number twice in a row!

You can achieve this using just standard spreadsheet formulas.
One way is to use the so called Lehmer random number method. It generates a sequence of random numbers in your spreadsheet that stays the same until you change the "seed number", a number you choose yourself and will recreate a different random sequence for each seed number you choose.
The short version:
In cell B1, enter your "seed" number, it can be any number from 1 to 2,147,483,647
In cell B2 enter the formula =MOD(48271*B1,2^31-1) , this will generate the first random number of your sequence.
Now copy this cell down as far as the the random sequence you want to generate.
That's it. For your named range, go ahead and name the range from B2 down as far as your sequence goes. If you want a different set of numbers, just change the seed in B1. If you ever want to recreate the same set of numbers just use the same seed and the same random sequence will appear.
More details in this tutorial:
How to generate random numbers that don't change in Excel and Google Sheets

It's not a great answer but considering the limitation of a volatile function, it is definitely a possible answer to use the IF formula with Volatile function and a Volatile variable placed somewhere in the worksheet.
I used the below formula to achieve the desired result
=IF(rngIsVolatile,randArray,A1:A10)
I set cell B12 as rngIsVolatile. I pasted the screenshots below to see it in working.
When rngIsVolatile is set to True, it picks up new values from randArray:
When rngIsVolatile is set to False, it picks up old values from A1:A10:

Certainty of data distribution profile when performing a sort operation

Lets assume we have some data structure like an array of n entries, and for arguments sake lets assume that the data has bounded numerical values .
Is there a way to determine the profile of the data , say monotonically ascending ,descending etc to a reasonable degree, perhaps with a certainty value of z having checked k entries within the data structure?

Assuming we have an array of size N, this means that we have N-1 comparisons between each adjacent elements in the array. Let M=N-1. M represents the number of relations. The probability of the array not being in the correct order is
1/M
If you select a subset of K relations to determine monoticallly ascending or descending, the theoretical probability of certainty is
K / M
Since these are two linear equations, it is easy to see that if you want to be .9 sure, then you will need to check about 90% of the entries.
This only takes into account the assumptions in your question. If you can are aware of the probability distribution, then using statistics, you could randomly check a small percentage of the array.
If you only care about the array being in relative order(for example, on an interval from [0,10], most 1s would be close to the beginning.), this is another question altogether. An algorithm that does this as opposed to just sorting, would have to have a high cost for swapping elements and a cheap cost for comparisons. Otherwise, there would be no performance pay offs from writing a complex algorithm to handle the check.
It is important to note that this is theoretically speaking. I am assuming no distribution in the array.

The easier problem is to check the probability of encountering such orderly behavior from random data.
Eg. If numbers are arranged randomly there is p=0.5 that the first number is lower than the second number (we will come to the case of repetitions later). Now, if you sample k pairs and in every case first number is lower than second number, the probability of observing it is 2^(-k).
Coming back to repetitions, keep track of observed repetitions and factor it in. Eg. If the probability of repetition is q, probability of not observing repetitions is (1-q), probability of observing increasing or equal is q + (1-q)/2, so exponentiatewith k to get the probaility.

Generate pseudo sample of population given probabilities

I would like to generate pseudo data that conforms to the distribution of actual sampled data. Looking for an efficient and accurate method in C/Obj-C for iphone development. Currently the occurrance of 60 different categories in 1000 sampled events has been assigned a probability (0-1). I want to generate 1000 new events which conform to the same probabilities.
Clarification {
I have a categorical distribution of set {1,2,...,60}. I understand that samples from this distribution will conform to the probabilities of each category. Therefore I need to take 1000 samples from this distribution. I have determined (thanks to answers so far) that I need to:
Normalize this distribution by summing the values and dividing each
by the sum.
Order them.
Create a CDF by replacing each value with the sum of all previous values.
Then I can generate a uniform random number between 0 and 1, and find the greatest number in the CDF whose value is less than or equal to the number just chosen, and return the category corresponding to this CDF value.
}
Q1. Is this the correct way to solve the problem?
Q2. The caveat still holds that I'm using NSDecimals to store the category probabilities. Are there any libraries available or functions in Cocoa or Math.h, etc. that I can use to do this simply? I'm open to trying new libraries, currently only have Core-Plot and the standard Cocoa libraries in this project. Thanks.

Your problem description is unclear. But it sounds like you're looking for inverse transform sampling.
Basically, you first need to generate a cumulative distribution function (CDF) corresponding to your original data; call it F(x). You then generate uniform random data in the range 0->1, and then transform it using the inverse CDF, i.e F-1(x).

Here's my suggestion. This assumes that when you say "normalized probability" you mean the sum of the probability of all types is 1. (If not, you'll need to rescale so that's the case.)
Make up some order for your 60 types. (Say, alphabetic.)
Generate a random number between 0 and 1. (Call it your "target".)
Create an accumulator, initially at 0.
Loop through your 60 types. For each type:
Add the probability of that type of event to your accumulator.
If your accumulator is >= your target, generate an event of that type and stop.
If you do that 1000 times, I believe you'll get the distribution you're looking for.

Shuffling biased random numbers

While thinking about this question and conversing with the participants, the idea came up that shuffling a finite set of clearly biased random numbers makes them random because you don't know the order in which they were chosen. Is this true and if so can someone point to some resources?
EDIT: I think I might have been a little unclear. Suppose a bad random numbers generator. Take n values. These are biased(the rng is bad). Is there a way through shuffling to make the output of the rng over multiple trials statistically match the output of a known good rng?

False.
There is an easy test: Assume the bias in the original set creation algorithm is "creates sets whose arithmetic average is significantly lower than expected average". Obviously, shuffling the result of the algorithm will not change the averages and thus not remove the bias.
Also, regarding your clarification: How would you shuffle the set? Using the same bad output from the bad RNG that created the set in the first place? Or using a better RNG? Which raises the question why you don't use that directly.

It's not true. In the other question the problem is to select 30 random numbers in [1..9] with a sum of 200. After choosing about on average 20 of them randomly, you reach a point where you can't select nines anymore because this would make the total sum go over 200. Of the remaining 10 numbers, most will be ones and twos. So in the end, ones and twos are very overrepresented in the selected numbers. Shuffling doesn't change that. But it's not clear how the random distribution really should look like, so one could say this is as good a solution as any.
In general, if your "random" numbers will be biased to, say, low numbers, they will be biased that way no matter the ordering.

Just shuffling a set of numbers of already random numbers won't do anything to the probability distribution of course. That would mean false. Perhaps I misunderstand your question though?

I would say false, with a caveat:
I think there is random, and then there is 'random-enough'. For most applications that I have needed to work on, 'random-enough' was more than enough, i.e. picking a 'random' ad to display on a page from a list of 300 or so that have paid to be placed on that site.
I am sure a mathematician could prove my very basic 'random' selection criteria is not truly random at all, but in fact is predictable - for my clients, and for the users, nobody cares.
On the other hand if I was writing a video game to be used in Las Vegas where large amounts of money was at hand I'd define random differently (and may have a hard time coming up with truly random).

False
The set is finite, suppose consists of n numbers. What happens if you choose n+1 numbers? Let's also consider a basic random function as implemented in many languages which gives you a random number in [0,1). However, this number is limited to three digits after the decimal giving you a set of 1000 possible numbers (0.000 - 0.999). However in most cases you will not need to use all these 1000 numbers so this amount of randomness is more than enough.
However for some uses, you will need a better random generator than this. So it all comes down to exactly how many random numbers you are going to need, and how random you need them to be.
Addition after reading original question: in the case that you have some sort of limitation (such as in the original question in which each set of selected numbers must sum up to a certain N) you are not really selected random numbers per se, but rather choosing numbers in a random order from a given set (specifically, a permutation of numbers summing up to N).
Addition to edit: Suppose your bad number generator generated the sequence (1,1,1,2,2,2). Does the permutation (1,2,2,1,1,2) satisfy your definition of random?

Completely and utterly untrue: Shuffling doesn't remove a bias, it just conceals it from the casual observer. It's like removing your dog's fondly-laid present from your carpet by just pushing under the sofa - you really haven't solved the problem, you've just made it less conspicuous. Anyone with a nose knows that there is still a problem that needs removing.
The randomness must be applied evenly over the whole range, so here's one way (off the top of my head, lots of assumptions, yadda yadda. The point is the approach, not the code - start with everything even, then introduce your randomness in a consistent fashion until you're done. The only bias now is dependent on the values chosen for 'target' and 'numberofnumbers', which is part of the question.)
target = 200
numberofnumbers = 30
numbers = array();
for (i=0; i<numberofnumbers; i++)
numbers[i] = 9
while (sum(numbers)>target)
numbers[random(numberofnumbers)]--

False. Consider a bad random number generator producing only zeros (I said it was BAD :-) No amount of shuffling the zeros would change any property of that sequence.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight