How to calculate the number of possible combinations fot a random stratifed population - sampling

I have divided a population wiofth 100 members into 10 strata and from each strata select one member to make up a random stratifed sample. Is there a formula to compute the number of possible combinations as expexct this will be less than for simple random sampling

Related

How to calculate the product of frequencies of different elements of an array efficiently?

We are given an array and I have to calculate the product of frequencies of numbers in a particular range of the array i,e. [L,R].
How to do it?
My approach:- Say, [1,2,2,2,45,45,4]. L=2 and R=6. Answer=3(frequency of 2)*2(frequency of 45)=6.
Just traverse the array(FROM L TO R) and put the frequencies of each number in a map; finally multiply all those values. Is there any better method to do this for multiple range queries online?
Do we require persistence ?
If the size of array is 'N' and number of queries is 'Q' . I want a much better time complexity than O(N*Q).

Adding random numbers to each number in a column of a table

I want to add random numbers to each element within the column of a table. This is is what I've been doing, but my approach adds same random number to all elements in that specific column.
NewEdge(:,2) = NewEdge(:,2)+ randi(3);
How can I add a different random number to each element?
NewEdge(:,2) = NewEdge(:,2)+ randi(3,size(NewEdge(:,2)); % Looks pretty
NewEdge(:,2) = NewEdge(:,2)+ randi(3,size(NewEdge,1),1); % Probably faster
randi(3) is a single scalar. Random, but still one number. You want to add a vector of random numbers, so call randi(imax,sz1,sz2), where imax is your maximum allowable integer, 3 in your case, and sz1,sz2 the sizes of your desired matrix, in this case you want as many rows as contained in NewEdge, and only a single column.

randomize sequence of array elements [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Is this C implementation of Fisher-Yates shuffle correct?
In order to simulate different order of input sequence into my application, I would like to generate a list of randomize sequence for an array input. For example, given an arr[10], the default sequence is 0,1,..,8,9. However, I'd like to manipulate the sequence into random order, for eg, 2,4,5,1,9,0,3,7,8,6.
I think rand() will generate a random value between 0-9, but it doesn't assure that each element is generated at least once. In such case, I am thinking about below pseudo, but is there a better way to generate random input sequence, and assure each number within the given range is generated at least once?
round #1:
generate a random number within 0-9.
let say 2 is selected
round #2:
generate a random number within 0-1
generate a random number within 3-9
let say 0 and 4 are selected
round #3:
generate a random number within 1
generate a random number within 3
generate a random number within 5-9
let say 1, 3, 7 are selected
round #4:
generate a random number within 5-6
generate a random number within 8-9
continue until 10 numbers are selected
Look at Fisher-Yates shuffle algorithm here for your requirement.
Here is another approach:
Create array of 10 elements and fill it with the values you want to randomize (0, 1, ..., 9).
Iterate for k from 9 down to 0 and pick a random number index = rand(k).
Swap the element at position k with the element at position index.
At the end your array will contain a random permutation of the original values, which is exactly what you wanted.

Efficient way of calculating average difference of array elements from array average value

Is there a way to calculate the average distance of array elements from array average value, by only "visiting" each array element once? (I search for an algorithm)
Example:
Array : [ 1 , 5 , 4 , 9 , 6 ]
Average : ( 1 + 5 + 4 + 9 + 6 ) / 5 = 5
Distance Array : [|1-5|, |5-5|, |4-5|, |9-5|, |6-5|] = [4 , 0 , 1 , 4 , 1 ]
Average Distance : ( 4 + 0 + 1 + 4 + 1 ) / 5 = 2
The simple algorithm needs 2 passes.
1st pass) Reads and accumulates values, then divides the result by array length to calculate average value of array elements.
2nd pass) Reads values, accumulates each one's distance from the previously calculated average value, and then divides the result by array length to find the average distance of the elements from the average value of the array.
The two passes are identical. It is the classic algorithm of calculating the average of a set of values. The first one takes as input the elements of the array, the second one the distances of each element from the array's average value.
Calculating the average can be modified to not accumulate the values, but caclulating the average "on the fly" as we sequentialy read the elements from the array.
The formula is:
Compute Running Average of Array's elements
-------------------------------------------
RA[i] = E[i] {for i == 1}
RA[i] = RA[i-1] - RA[i-1]/i + A[i]/i { for i > 1 }
Where A[x] is the array's element at position x, RA[x] is the average of the array's elements between position 1 and x (running average).
My question is:
Is there a similar algorithm, to calculate "on the fly" (as we read the array's elements), the average distance of the elements from the array's mean value?
The problem is that, as we read the array's elements, the final average value of the array is not known. Only the running average is known. So calculating differences from the running average will not yield the correct result. I suppose, if such algorithm exists, it probably should have the "ability" to compensate, in a way, on each new element read for the error calculated as far.
I don't think you can do better than O(n log n).
Suppose the array were sorted. Then we could divide it into the elements less than the average and the elements greater than the average. (If some elements are equal to the average, that doesn't matter.) Suppose the first k elements are less than the average. Then the average distance is
D = ((xave-x1) + (xave-x2) + (xave-x3) + ... + (xave-xk) + (xk+1-xave) + (xk+2-xave) + ... + (xn-xave))/n
= (-x1) + (-x2) + (-x3) + ... + (-xk) + (xk+1) + (xk+2) + ... + (xn) + (n-2k)xave)/n
= ( [sum of elements above average] - [sum of elements below average] + (n-2k)xave)/n
You could calculate this in one pass by working in from both ends, adjusting the limits on the (as-yet-unknown) average as you go. This would be O(n), and the sorting is O(n logn) (and they could perhaps be done in the same operation), so the whole thing is O(n logn).
The only problem with a two pass approach is that you need to reread or store the entire sequence for the second pass. The obvious improvement would be to maintain a data structure so that you could adjust the sum of absolute differences when the average value changed.
Suppose that you change the average value to a very large value, by observing a huge number. Now compare the change made by this to that caused by observing a not quite so huge value. You will be able to work out the difference between the two sums of absolute differences, because both average values are above all the other numbers, so all of the absolute values decrease by the difference between the two huge averages. This predictable change carries on until the average meets the highest value observed in the standard numbers, and this change allows you to find out what the highest number observed was.
By running experiments like this you can recover the set of numbers observed before the numbers you shove in to run the experiments. Therefore any clever data structure you use to keep track of sums of absolute differences is capable of storing the set of numbers observed, which (except for order, and cases where multiple copies of the same number are observed) is pretty much what you do by storing all the numbers seen for a second pass. So I don't think there is a trick for the case of sums of absolute differences as there is for squares of differences, where most of the information you care about is described by just the pair of numbers (sum, sum of squares).
if the l2 norm (average distance squared) is ok then it's:
sqrt(sum(x^2)/n - (sum(x)/n)^2)
that's (square root of) the average x^2 minus the square of the average x.
it's called variance (actually, the above is the square root of the variance, which is called the standard deviation, and is a typical "measure of spread").
note that this is more sensitive to outliers than the measure you originally asked for.
Your followup described your context as HLSL reading from a texture. If your filter footprint is a power of two and is aligned with the same power-of-two boundaries in the original image, you can use MIP maps to find the average value of the filter region.
For example, for an 8x8 filter, precompute a MIP map three levels down the MIP chain, whose elements will be the averages of each 8x8 region. Then a single texture read from that MIP level texture will give you the average for the 8x8 region. Unfortunately this doesn't work for sliding the filter around to arbitrary positions (not multiples of 8 in this example).
You could make use of intermediate MIP levels to decrease the number of texture reads by utilizing the MIP averages of 4x4 or 2x2 areas whenever possible, but that would complicate the algorithm quite a bit.

Django: Creating model instances with a random integer field value that average up to a specified number

I have a model like so:
class RunnerStat(models.Model):
id_card= models.CharField(max_length=32)
miles = models.PositiveSmallIntegerField()
last_modified = models.DateField(auto_now=True)
I want to create several RunnerStat instances with random miles values that average up to a specific number.
I know this might involve some statistics (distribution, etc.). Does anyone have any pointers? Or done something similar and could share some code?
Example: Create 100 RunnerStat objects with random miles values that average out to 10.
If the average has to be around the the given number but not exactly it you can use the random.gauss(mu, sigma) from the random module. This will create a more natural random set of of values that have a mean (average) around the given value for mu with a standard deviation of sigma. The more runners you create the closer the mean will get to the desired value.
import random
avg = 10
stddev = 5
n = random.gauss(avg,stddev)
for r in range(100):
r = RunnerStat(miles=avg+n)
r.save()
If you need the average to be the exact number then you could always create a runner (or more reasonably a few runners) that counter balance what ever your current difference from the mean is.
Well for something to average to a specific number, they need to add up to a specific number. For instance, if you want 100 items to average to 10, the 100 items need to add up to 1000 since 1000/100 = 10.
One way to do this, which isn't completely random is to generate a random number, then both subtract and add that to your average, generating two RunnerStat items.
So you do something like this (note this is from my head and untested):
import random
avg = 10
n = random.randint(0,5)
r1 = RunnerStat(miles=avg-n)
r2 = RunnerStat(miles=avg+n)
r1.save()
r2.save()
Of course fill in the other fields too. I just put the miles in the RunnerStats. The downside is that your RunnerStats must be an even number. You could write it to pass in a number and if it is odd the last one must be exactly the number you want the average to be.

Resources