Related
I want to sum over tuples of length n, i.e. I have a vector (m_1,...,m_n) where mi is an integer greater or equal to zero with the constraint that the sum of all vector elements is equal to k.
What is the most efficient way to implement this?
My naive approach would be to iterate through all combinations with m_i between 0 and k and check if they satisfy the criterion, but this seems inefficient.
For instance, if k=2 and n=2, then
(2,0),(1,1),(0,2) would be the possible values of m1,m2 that I would like to have. Is there a way to generate these numbers efficiently (I don't necessarily have to store them all in an array, but I want to iterate over all possible combinations)
Ok, random stuff I deleted.
If you look at FXT book/library by J.Arndt, there is on page 342 section 16.3 "Partition into m parts"
Here is algorithm and reference to the code to generate exactly m-vector of partitioning of n.
You'll probably need to modify it, he doesn't have bins with zeros, starts with ones.
And some thoughts on the matter. n is sum, and you have k bins. Start with |n|0|...|0| combination. Define operation "distribute 1" which is take one from the leftmost bin and distribute it into all other bins.
E.g. D1(|n|0|...|0|)=tuple(|n-1|1|...|0|, ..., |n-1|0|...|1|)
Then you apply D1() to the tuple, and get tuple of tuples. And so on and so forth, till first bin is exhausted.
You could think this as a tree:
root |n|0|...|0|
D1 applied once, k-1 leaves |n-1|1|...|0| ... |n-1|0|...|1|
Next tree level, D1 applied once to previous level, each node getting k-1 children.
THe only thing left is how to traverse it - DFS, BFS, or anything else from https://en.wikipedia.org/wiki/Tree_traversal
There is an array where all but one of the cells are 0, and we want to find the index of that single non-zero cell. The problem is, every time that you check for a cell in this array, that non-zero element will do one of the following:
move forward by 1
move backward by 1
stay where it is.
For example, if that element is currently at position 10, and I check what is in arr[5], then the element may be at position 9, 10 or 11 after I checked arr[5].
We only need to find the position where the element is currently at, not where it started at (which is impossible).
The hard part is, if we write a for loop, there really is no way to know if the element is currently in front of you, or behind you.
Some more context if it helps:
The interviewer did give a hint which is maybe I should move my pointer back after checking x-number of cells. The problem is, when should I move back, and by how many slots?
While "thinking out loud", I started saying a bunch of common approaches hoping that something would hit. When I said recursion, the interviewer did say "recursion is a good start". I don't know recursion really is the right approach, because I don't see how I can do recursion and #1 at the same time.
The interviewer said this problem can't be solved in O(n^2). So we are looking at at least O(n^3), or maybe even exponential.
Tl;dr: Your best bet is to keep checking each even index in the array in turn, wrapping around as many times as necessary until you find your target. On average you will stumble upon your target in the middle of your second pass.
First off, as many have already said, it is indeed impossible to ensure you will find your target element in any given amount of time. If the element knows where your next sample will be, it can always place itself somewhere else just in time. The best you can do is to sample the array in a way that minimizes the expected number of accesses - and because after each sample you learn nothing except if you were successful or not and a success means you stop sampling, an optimal strategy can be described simply as a sequence of indexes that should be checked, dependent only on the size of the array you're looking through. We can test each strategy in turn via automated means to see how well they perform. The results will depend on the specifics of the problem, so let's make some assumptions:
The question doesn't specify the starting position our target. Let us assume that the starting position is chosen uniformly from across the entire array.
The question doesn't specify the probability our target moves. For simplicity let's say it's independent on parameters such as the current position in the array, time passed and the history of samples. Using the probability 1/3 for each option gives us the least information, so let's use that.
Let us test our algorithms on an array of 100 101 elements. Also, let us test each algorithm one million times, just to be reasonably sure about its average case behavior.
The algorithms I've tested are:
Random sampling: after each attempt we forget where we were looking and choose an entirely new index at random. Each sample has an independent 1/n chance of succeeding, so we expect to take n samples on average. This is our control.
Sweep: try each position in sequence until our target is found. If our target wasn't moving, this would take n/2 samples on average. Our target is moving, however, so we may miss it on our first sweep.
Slow sweep: the same, except we test each position several times before moving on. Proposed by Patrick Trentin with a slowdown factor of 30x, tested with a slowdown factor of 2x.
Fast sweep: the opposite of slow sweep. After the first sample we skip (k-1) cells before testing the next one. The first pass starts at ary[0], the next at ary[1] and so on. Tested with each speed up factor (k) from 2 to 5.
Left-right sweep: First we check each index in turn from left to right, then each index from right to left. This algorithm would be guaranteed to find our target if it was always moving (which it isn't).
Smart greedy: Proposed by Aziuth. The idea behind this algorithm is that we track each cell probability of holding our target, then always sampling the cell with the highest probability. On one hand, this algorithm is relatively complex, on the other hand it sounds like it should give us the optimal results.
Results:
The results are shown as [average] ± [standard derivation].
Random sampling: 100.889145 ± 100.318212
At this point I have realised a fencepost error in my code. Good thing we have a control sample. This also establishes that we have in the ballpark of two or three digits of useful precision (sqrt #samples), which is in line with other tests of this type.
Sweep: 100.327030 ± 91.210692
The chance of our target squeezing through the net well counteracts the effect of the target taking n/2 time on average to reach the net. The algorithm doesn't really fare any better than a random sample on average, but it's more consistent in its performance and it isn't hard to implement either.
slow sweep (x0.5): 128.272588 ± 99.003681
While the slow movement of our net means our target will probably get caught in the net during the first sweep and won't need a second sweep, it also means the first sweep takes twice as long. All in all, relying on the target moving onto us seems a little inefficient.
fast sweep x2: 75.981733 ± 72.620600
fast sweep x3: 84.576265 ± 83.117648
fast sweep x4: 88.811068 ± 87.676049
fast sweep x5: 91.264716 ± 90.337139
That's... a little surprising at first. While skipping every other step means we complete each lap in twice as many turns, each lap also has a reduced chance of actually encountering the target. A nicer view is to compare Sweep and FastSweep in broom-space: rotate each sample so that the index being sampled is always at 0 and the target drifts towards the left a bit faster. In Sweep, the target moves at 0, 1 or 2 speed each step. A quick parallel with the Fibonacci base tells us that the target should hit the broom/net around 62% of the time. If it misses, it takes another 100 turns to come back. In FastSweep, the target moves at 1, 2 or 3 speed each step meaning it misses more often, but it also takes half as much time to retry. Since the retry time drops more than the hit rate, it is advantageous to use FastSweep over Sweep.
Left-right sweep: 100.572156 ± 91.503060
Mostly acts like an ordinary sweep, and its score and standard derivation reflect that. Not too surprising a result.
Aziuth's smart greedy: 87.982552 ± 85.649941
At this point I have to admit a fault in my code: this algorithm is heavily dependent on its initial behavior (which is unspecified by Aziuth and was chosen to be randomised in my tests). But performance concerns meant that this algorithm will always choose the same randomized order each time. The results are then characteristic of that randomisation rather than of the algorithm as a whole.
Always picking the most likely spot should find our target as fast as possible, right? Unfortunately, this complex algorithm barely competes with Sweep 3x. Why? I realise this is just speculation, but let us peek at the sequence Smart Greedy actually generates: During the first pass, each cell has equal probability of containing the target, so the algorithm has to choose. If it chooses randomly, it could pick up in the ballpark of 20% of cells before the dips in probability reach all of them. Afterwards the landscape is mostly smooth where the array hasn't been sampled recently, so the algorithm eventually stops sweeping and starts jumping around randomly. The real problem is that the algorithm is too greedy and doesn't really care about herding the target so it could pick at the target more easily.
Nevertheless, this complex algorithm does fare better than both simple Sweep and a random sampler. it still can't, however, compete with the simplicity and surprising efficiency of FastSweep. Repeated tests have shown that the initial randomisation could swing the efficiency anywhere between 80% run time (20% speedup) and 90% run time (10% speedup).
Finally, here's the code that was used to generate the results:
class WalkSim
attr_reader :limit, :current, :time, :p_stay
def initialize limit, p_stay
#p_stay = p_stay
#limit = limit
#current = rand (limit + 1)
#time = 0
end
def poke n
r = n == #current
#current += (rand(2) == 1 ? 1 : -1) if rand > #p_stay
#current = [0, #current, #limit].sort[1]
#time += 1
r
end
def WalkSim.bench limit, p_stay, runs
histogram = Hash.new{0}
runs.times do
sim = WalkSim.new limit, p_stay
gen = yield
nil until sim.poke gen.next
histogram[sim.time] += 1
end
histogram.to_a.sort
end
end
class Array; def sum; reduce 0, :+; end; end
def stats histogram
count = histogram.map{|k,v|v}.sum.to_f
avg = histogram.map{|k,v|k*v}.sum / count
variance = histogram.map{|k,v|(k-avg)**2*v}.sum / (count - 1)
{avg: avg, stddev: variance ** 0.5}
end
RUNS = 1_000_000
PSTAY = 1.0/3
LIMIT = 100
puts "random sampling"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{y.yield rand (LIMIT + 1)}}
}
puts "sweep"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{0.upto(LIMIT){|i|y.yield i}}}
}
puts "x0.5 speed sweep"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{0.upto(LIMIT){|i|2.times{y.yield i}}}}
}
(2..5).each do |speed|
puts "x#{speed} speed sweep"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{speed.times{|off|off.step(LIMIT, speed){|i|y.yield i}}}}
}
end
puts "sweep LR"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {
Enumerator.new {|y|loop{
0.upto(LIMIT){|i|y.yield i}
LIMIT.downto(0){|i|y.yield i}
}}
}
$sg_gen = Enumerator.new do |y|
probs = Array.new(LIMIT + 1){1.0 / (LIMIT + 1)}
loop do
ix = probs.each_with_index.map{|v,i|[v,rand,i]}.max.last
probs[ix] = 0
probs = [probs[0] * (1 + PSTAY)/2 + probs[1] * (1 - PSTAY)/2,
*probs.each_cons(3).map{|a, b, c| (a + c) / 2 * (1 - PSTAY) + b * PSTAY},
probs[-1] * (1 + PSTAY)/2 + probs[-2] * (1 - PSTAY)/2]
y.yield ix
end
end
$sg_cache = []
def sg_enum; Enumerator.new{|y| $sg_cache.each{|n| y.yield n}; $sg_gen.each{|n| $sg_cache.push n; y.yield n}}; end
puts "smart greedy"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {sg_enum}
no forget everything about loops.
copy this array to another array and then check what cells are now non-zero. for example if your main array is mainArray[] you can use:
int temp[sizeOfMainArray]
int counter = 0;
while(counter < sizeOfArray)
{
temp[counter] == mainArray[counter];
}
//then check what is non-zero in copied array
counter = 0;
while(counter < sizeOfArray)
{
if(temp[counter] != 0)
{
std::cout<<"I Found It!!!";
}
}//end of while
One approach perhaps :
i - Have four index variables f,f1,l,l1. f is pointing at 0,f1 at 1, l is pointing at n-1 (end of the array) and l1 at n-2 (second last element)
ii - Check the elements at f1 and l1 - are any of them non zero ? If so stop. If not, check elements at f and l (to see if the element has jumped back 1).
iii - If f and l are still zero, increment the indexes and repeat step ii. Stop when f1 > l1
Iff an equality check against an array index makes the non-zero element jump.
Why not think of a way where we don't really require an equality check with an array index?
int check = 0;
for(int i = 0 ; i < arr.length ; i++) {
check |= arr[i];
if(check != 0)
break;
}
Orrr. Maybe you can keep reading arr[mid]. The non-zero element will end up there. Some day. Reasoning: Patrick Trentin seems to have put it in his answer (somewhat, its not really that, but you'll get an idea).
If you have some information about the array, maybe we can come up with a niftier approach.
Ignoring the trivial case where the 1 is in the first cell of the array if you iterate through the array testing each element in turn you must eventually get to the position i where the 1 is in cell i+2. So when you read cell i+1 one of three things is going to happen.
The 1 stays where it is, you're going to find it next time you look
The 1 moves away from you, your back to the starting position with the 1 at i+2 next time
The 1 moves to cell you've just checked, it dodged your scan
Re-reading the i+1 cell will find the 1 in case 3 but just give it another chance to move in cases 1 and 2 so a strategy based on re-reading won't work.
My option would therefore to adopt a brute force approach, if I keep scanning the array then I'm going to hit case 1 at some point and find the elusive 1.
Assumptions:
The array is no true array. This is obvious given the problem. We got some class that behaves somewhat like an array.
The array is mostly hidden. The only public operations are [] and size().
The array is obfuscated. We cannot get any information by retrieving it's address and then analyze the memory at that position. Even if we iterate through the whole memory of our system, we can't do tricks due to some advanced cryptographic means.
Every field of the array has the same probability to be the first field that hosts the one.
We know the probabilities of how the one changes it's position when triggered.
Probability controlled algorithm:
Introduce another array of same size, the probability array (over double).
This array is initialized with all fields to be 1/size.
Every time we use [] on the base array, the probability array changes in this way:
The accessed position is set to zero (did not contain the one)
An entry becomes the sum of it's neighbors times the probability of that neighbor to jump to the entries position. (prob_array_next_it[i] = prob_array_last_it[i-1]*prob_jump_to_right + prob_array_last_it[i+1]*prob_jump_to_left + prob_array_last_it[i]*prob_dont_jump, different for i=0 and i=size-1 of course)
The probability array is normalized (setting one entry to zero set the sum of the probabilities to below one)
The algorithm accesses the field with the highest probability (chooses amongst those that have)
It might be able to optimize this by controlling the flow of probabilities, but that needs to be based on the wandering event and might require some research.
No algorithm that tries to solve this problem is guaranteed to terminate after some time. For a complexity, we would analyze the average case.
Example:
Jump probabilities are 1/3, nothing happens if trying to jump out of bounds
Initialize:
Hidden array: 0 0 1 0 0 0 0 0
Probability array: 1/8 1/8 1/8 1/8 1/8
1/8 1/8 1/8
First iteration: try [0] -> failure
Hidden array: 0 0 1 0 0 0 0 0 (no jump)
Probability array step 1: 0
1/8 1/8 1/8 1/8 1/8 1/8 1/8
Probability array step 2: 1/24 2/24 1/8
1/8 1/8 1/8 1/8 1/8
Probability array step 2: same normalized (whole array * 8/7):
1/21 2/21 1/7
1/7 1/7 1/7 1/7 1/7
Second iteration: try [2] as 1/7 is the maximum and this is the first field with 1/7 -> success (example should be clear by now, of course this might not work so fast on another example, had no interest of doing this for a lot of iterations since the probabilities would get cumbersome to compute by hand, would need to implement it. Note that if the one jumped to the left, we wouldn't have checked it so fast, even if it remained there for some time)
I have a bit of a technical issue, but I feel like it should be possible with MATLAB's powerful toolset.
What I have is a random n by n matrix of 0's and w's, say generated with
A=w*(rand(n,n)<p);
A typical value of w would be 3000, but that should not matter too much.
Now, this matrix has two important quantities, the vectors
c = sum(A,1);
r = sum(A,2)';
These are two row vectors, the first denotes the sum of each column and the second the sum of each row.
What I want to do next is randomize each value of w, for example between 0.5 and 2. This I would do as
rand_M = (0.5-2).*rand(n,n) + 0.5
A_rand = rand_M.*A;
However, I don't want to just pick these random numbers: I want them to be such that for every column and row, the sums are still equal to the elements of c and r. So to clean up the notation a bit, say we define
A_rand_c = sum(A_rand,1);
A_rand_r = sum(A_rand,2)';
I want that for all j = 1:n, A_rand_c(j) = c(j) and A_rand_r(j) = r(j).
What I'm looking for is a way to redraw the elements of rand_M in a sort of algorithmic fashion I suppose, so that these demands are finally satisfied.
Now of course, unless I have infinite amounts of time this might not really happen. I therefore accept these quantities to fall into a specific range: A_rand_c(j) has to be an element of [(1-e)*c(j),(1+e)*c(j)] and A_rand_r(j) of [(1-e)*r(j),(1+e)*r(j)]. This e I define beforehand, say like 0.001 or something.
Would anyone be able to help me in the process of finding a way to do this? I've tried an approach where I just randomly repick the numbers, but this really isn't getting me anywhere. It does not have to be crazy efficient either, I just need it to work in finite time for networks of size, say, n = 50.
To be clear, the final output is the matrix A_rand that satisfies these constraints.
Edit:
Alright, so after thinking a bit I suppose it might be doable with some while statement, that goes through every element of the matrix. The difficult part is that there are four possibilities: if you are in a specific element A_rand(i,j), it could be that A_rand_c(j) and A_rand_r(i) are both too small, both too large, or opposite. The first two cases are good, because then you can just redraw the random number until it is smaller than the current value and improve the situation. But the other two cases are problematic, as you will improve one situation but not the other. I guess it would have to look at which criteria is less satisfied, so that it tries to fix the one that is worse. But this is not trivial I would say..
You can take advantage of the fact that rows/columns with a single non-zero entry in A automatically give you results for that same entry in A_rand. If A(2,5) = w and it is the only non-zero entry in its column, then A_rand(2,5) = w as well. What else could it be?
You can alternate between finding these single-entry rows/cols, and assigning random numbers to entries where the value doesn't matter.
Here's a skeleton for the process:
A_rand=zeros(size(A)) is the matrix you are going to fill
entries_left = A>0 is a binary matrix showing which entries in A_rand you still need to fill
col_totals=sum(A,1) is the amount you still need to add in every column of A_rand
row_totals=sum(A,2) is the amount you still need to add in every row of A_rand
while sum( entries_left(:) ) > 0
% STEP 1:
% function to fill entries in A_rand if entries_left has rows/cols with one nonzero entry
% you will need to keep looping over this function until nothing changes
% update() A_rand, entries_left, row_totals, col_totals every time you loop
% STEP 2:
% let (i,j) be the indeces of the next non-zero entry in entries_left
% assign a random number to A_rand(i,j) <= col_totals(j) and <= row_totals(i)
% update() A_rand, entries_left, row_totals, col_totals
end
update()
A_rand(i,j) = random_value;
entries_left(i,j) = 0;
col_totals(j) = col_totals(j) - random_value;
row_totals(i) = row_totals(i) - random_value;
end
Picking the range for random_value might be a little tricky. The best I can think of is to draw it from a relatively narrow distribution centered around N*w*p where p is the probability of an entry in A being nonzero (this would be the average value of row/column totals).
This doesn't scale well to large matrices as it will grow with n^2 complexity. I tested it for a 200 by 200 matrix and it worked in about 20 seconds.
Is there any efficient techniques to do the following summation ?
Given a finite set A containing n integers A={X1,X2,…,Xn}, where Xi is an integer. Now there are n subsets of A, denoted by A1, A2, ... , An. We want to calculate the summation for each subset. Are there some efficient techniques ?
(Note that n is typically larger than the average size of all the subsets of A.)
For example, if A={1,2,3,4,5,6,7,9}, A1={1,3,4,5} , A2={2,3,4} , A3= ... . A naive way of computing the summation for A1 and A2 needs 5 Flops for additions:
Sum(A1)=1+3+4+5=13
Sum(A2)=2+3+4=9
...
Now, if computing 3+4 first, and then recording its result 7, we only need 3 Flops for addtions:
Sum(A1)=1+7+5=13
Sum(A2)=2+7=9
...
What about the generalized case ? Is there any efficient methods to speed up the calculation? Thanks!
For some choices of subsets there are ways to speed up the computation, if you don't mind doing some (potentially expensive) precomputation, but not for all. For instance, suppose your subsets are {1,2}, {2,3}, {3,4}, {4,5}, ..., {n-1,n}, {n,1}; then the naive approach uses one arithmetic operation per subset, and you obviously can't do better than that. On the other hand, if your subsets are {1}, {1,2}, {1,2,3}, {1,2,3,4}, ..., {1,2,...,n} then you can get by with n-1 arithmetic ops, whereas the naive approach is much worse.
Here's one way to do the precomputation. It will not always find optimal results. For each pair of subsets, define the transition cost to be min(size of symmetric difference, size of Y - 1). (The symmetric difference of X and Y is the set of things that are in X or Y but not both.) So the transition cost is the number of arithmetic operations you need to do to compute the sum of Y's elements, given the sum of X's. Add the empty set to your list of subsets, and compute a minimum-cost directed spanning tree using Edmonds' algorithm (http://en.wikipedia.org/wiki/Edmonds%27_algorithm) or one of the faster but more complicated variations on that theme. Now make sure that when your spanning tree has an edge X -> Y you compute X before Y. (This is a "topological sort" and can be done efficiently.)
This will give distinctly suboptimal results when, e.g., you have {1,2}, {3,4}, {1,2,3,4}, {5,6}, {7,8}, {5,6,7,8}. After deciding your order of operations using the procedure above you could then do an optimization pass where you find cheaper ways to evaluate each set's sum given the sums already computed, and this will probably give fairly decent results in practice.
I suspect, but have made no attempt to prove, that finding an optimal procedure for a given set of subsets is NP-hard or worse. (It is certainly computable; the set of possible computations you might do is finite. But, on the face of it, it may be awfully expensive; potentially you might be keeping track of about 2^n partial sums, be adding any one of them to any other at each step, and have up to about n^2 steps, for a super-naive cost of (2^2n)^(n^2) = 2^(2n^3) operations to try every possibility.)
Assuming that 'addition' isn't simply an ADD operation but instead some very intensive function involving two integer operands, then an obvious approach would be to cache the results.
You could achieve that via a suitable data structure, for example a key-value dictionary containing keys formed by the two operands and the answers as the value.
But as you specified C in the question, then the simplest approach would be an n by n array of integers, where the solution to x + y is stored at array[x][y].
You can then repeatedly iterate over the subsets, and for each pair of operands you check the appropriate position in the array. If no value is present then it must be calculated and placed in the array. The value then replaces the two operands in the subset and you iterate.
If the operation is commutative then the operands should be sorted prior to looking up the array (i.e. so that the first index is always the smallest of the two operands) as this will maximise "cache" hits.
A common optimization technique is to pre-compute intermediate results. In your case, you might pre-compute all sums with 2 summands from A and store them in a lookup table. This will result in |A|*|A+1|/2 table entries, where |A| is the cardinality of A.
In order to compute the element sum of Ai, you:
look up the sum of the first two elements of Ai and save them in tmp
while there is an element x left in Ai:
look up the sum of tmp and x
In order to compute the element sum of A1 = {1,3,4,5} from your example, you do the following:
lookup(1,3) = 4
lookup(4,4) = 8
lookup(8,5) = 13
Note that computing the sum of any given Ai doesn't require summation, since all the work has already been conducted while pre-computing the lookup table.
If you store the lookup table in a hash table, then lookup() is in O(1).
Possible optimizations to this approach:
construct the lookup table while computing the summation results; hence, you only compute those summations that you actually need. Your lookup table is now a cache.
if your addition operation is commutative, you can save half of your cache size by storing only those summations where the smaller summand comes first. Then modify lookup() such that lookup(a,b) = lookup(b,a) if a > b.
If assuming summation is time consuming action you can find LCS of every pair of subsets (by assuming they are sorted as mentioned in comments, or if they are not sorted sort them), after that calculate sum of LCS of maximum length (over all LCS in pairs), then replace it's value in related arrays with related numbers, update their LCS and continue this way till there is no LCS with more than one number. Sure this is not optimum, but it's better than naive algorithm (smaller number of summation). However you can do backtracking to find best solution.
e.g For your sample input:
A1={1,3,4,5} , A2={2,3,4}
LCS (A_1,A_2) = {3,4} ==>7 ==>replace it:
A1={1,5,7}, A2={2,7} ==> LCS = {7}, maximum LCS length is `1`, so calculate sums.
Still you can improve it by calculation sum of two random numbers, then again taking LCS, ...
NO. There is no efficient techique.
Because it is NP complete problem. and there are no efficient solutions for such problem
why is it NP-complete?
We could use algorithm for this problem to solve set cover problem, just by putting extra set in set, conatining all elements.
Example:
We have sets of elements
A1={1,2}, A2={2,3}, A3 = {3,4}
We want to solve set cover problem.
we add to this set, set of numbers containing all elements
A4 = {1,2,3,4}
We use algorhitm that John Smith is aking for and we check solution A4 is represented whit.
We solved NP-Complete problem.
Is there a way to calculate the average distance of array elements from array average value, by only "visiting" each array element once? (I search for an algorithm)
Example:
Array : [ 1 , 5 , 4 , 9 , 6 ]
Average : ( 1 + 5 + 4 + 9 + 6 ) / 5 = 5
Distance Array : [|1-5|, |5-5|, |4-5|, |9-5|, |6-5|] = [4 , 0 , 1 , 4 , 1 ]
Average Distance : ( 4 + 0 + 1 + 4 + 1 ) / 5 = 2
The simple algorithm needs 2 passes.
1st pass) Reads and accumulates values, then divides the result by array length to calculate average value of array elements.
2nd pass) Reads values, accumulates each one's distance from the previously calculated average value, and then divides the result by array length to find the average distance of the elements from the average value of the array.
The two passes are identical. It is the classic algorithm of calculating the average of a set of values. The first one takes as input the elements of the array, the second one the distances of each element from the array's average value.
Calculating the average can be modified to not accumulate the values, but caclulating the average "on the fly" as we sequentialy read the elements from the array.
The formula is:
Compute Running Average of Array's elements
-------------------------------------------
RA[i] = E[i] {for i == 1}
RA[i] = RA[i-1] - RA[i-1]/i + A[i]/i { for i > 1 }
Where A[x] is the array's element at position x, RA[x] is the average of the array's elements between position 1 and x (running average).
My question is:
Is there a similar algorithm, to calculate "on the fly" (as we read the array's elements), the average distance of the elements from the array's mean value?
The problem is that, as we read the array's elements, the final average value of the array is not known. Only the running average is known. So calculating differences from the running average will not yield the correct result. I suppose, if such algorithm exists, it probably should have the "ability" to compensate, in a way, on each new element read for the error calculated as far.
I don't think you can do better than O(n log n).
Suppose the array were sorted. Then we could divide it into the elements less than the average and the elements greater than the average. (If some elements are equal to the average, that doesn't matter.) Suppose the first k elements are less than the average. Then the average distance is
D = ((xave-x1) + (xave-x2) + (xave-x3) + ... + (xave-xk) + (xk+1-xave) + (xk+2-xave) + ... + (xn-xave))/n
= (-x1) + (-x2) + (-x3) + ... + (-xk) + (xk+1) + (xk+2) + ... + (xn) + (n-2k)xave)/n
= ( [sum of elements above average] - [sum of elements below average] + (n-2k)xave)/n
You could calculate this in one pass by working in from both ends, adjusting the limits on the (as-yet-unknown) average as you go. This would be O(n), and the sorting is O(n logn) (and they could perhaps be done in the same operation), so the whole thing is O(n logn).
The only problem with a two pass approach is that you need to reread or store the entire sequence for the second pass. The obvious improvement would be to maintain a data structure so that you could adjust the sum of absolute differences when the average value changed.
Suppose that you change the average value to a very large value, by observing a huge number. Now compare the change made by this to that caused by observing a not quite so huge value. You will be able to work out the difference between the two sums of absolute differences, because both average values are above all the other numbers, so all of the absolute values decrease by the difference between the two huge averages. This predictable change carries on until the average meets the highest value observed in the standard numbers, and this change allows you to find out what the highest number observed was.
By running experiments like this you can recover the set of numbers observed before the numbers you shove in to run the experiments. Therefore any clever data structure you use to keep track of sums of absolute differences is capable of storing the set of numbers observed, which (except for order, and cases where multiple copies of the same number are observed) is pretty much what you do by storing all the numbers seen for a second pass. So I don't think there is a trick for the case of sums of absolute differences as there is for squares of differences, where most of the information you care about is described by just the pair of numbers (sum, sum of squares).
if the l2 norm (average distance squared) is ok then it's:
sqrt(sum(x^2)/n - (sum(x)/n)^2)
that's (square root of) the average x^2 minus the square of the average x.
it's called variance (actually, the above is the square root of the variance, which is called the standard deviation, and is a typical "measure of spread").
note that this is more sensitive to outliers than the measure you originally asked for.
Your followup described your context as HLSL reading from a texture. If your filter footprint is a power of two and is aligned with the same power-of-two boundaries in the original image, you can use MIP maps to find the average value of the filter region.
For example, for an 8x8 filter, precompute a MIP map three levels down the MIP chain, whose elements will be the averages of each 8x8 region. Then a single texture read from that MIP level texture will give you the average for the 8x8 region. Unfortunately this doesn't work for sliding the filter around to arbitrary positions (not multiples of 8 in this example).
You could make use of intermediate MIP levels to decrease the number of texture reads by utilizing the MIP averages of 4x4 or 2x2 areas whenever possible, but that would complicate the algorithm quite a bit.