Correct way to get weighted average of concrete array-values along continous interval

Correct way to get weighted average of concrete array-values along continous interval - arrays

I've been looking for a while onto websearch, however, possibly or probably I am missing the right terminology.
I have arbitrary sized arrays of scalars ...
array = [n_0, n_1, n_2, ..., n_m]
I also have a function f->x->y, with 0<=x<=1, and y an interpolated value from array. Examples:
array = [1,2,9]
f(0) = 1
f(0.5) = 2
f(1) = 9
f(0.75) = 5.5
My problem is that I want to compute the average value for some interval r = [a..b], where a E [0..1] and b E [0..1], i.e. I want to generalize my interpolation function f->x->y to compute the average along r.
My mind boggles me slightly w.r.t. finding the right weighting. Imagine I want to compute f([0.2,0.8]):
array --> 1 | 2 | 9
[0..1] --> 0.00 0.25 0.50 0.75 1.00
[0.2,0.8] --> ^___________________^
The latter being the range of values I want to compute the average of.
Would it be mathematically correct to compute the average like this?: *
1 * (1-0.8) <- 0.2 'translated' to [0..0.25]
+ 2 * 1
avg = + 9 * 0.2 <- 0.8 'translated' to [0.75..1]
----------
1.4 <-- the sum of weights

This looks correct.
In your example, your interval's length is 0.6. In that interval, your number 2 is taking up (0.75-0.25)/0.6 = 0.5/0.6 = 10/12 of space. Your number 1 takes up (0.25-0.2)/0.6 = 0.05 = 1/12 of space, likewise your number 9.
This sums up to 10/12 + 1/12 + 1/12 = 1.
For better intuition, think about it like this: The problem is to determine how much space each array-element covers along an interval. The rest is just filling the machinery described in http://en.wikipedia.org/wiki/Weighted_average#Mathematical_definition .

Related

Check subset sum for special array equation

I was trying to solve the following problem.
We are given N and A[0]
N <= 5000
A[0] <= 10^6 and even
if i is odd then
A[i] >= 3 * A[i-1]
if i is even
A[i]= 2 * A[i-1] + 3 * A[i-2]
element at odd index must be odd and at even it must be even.
We need to minimize the sum of the array.
and We are given a Q numbers
Q <= 1000
X<= 10^18
We need to determine is it possible to get subset-sum = X from our array.
What I have tried,
Creating a minimum sum array is easy. Just follow the equations and constraints.
The approach that I know for subset-sum is dynamic programming which has time complexity sum*sizeof(Array) but since sum can be as large as 10^18 that approach won't work.
Is there any equation relation that I am missing?

We can make it with a bit of math:
sorry for latex I am not sure it is possible on stack?
let X_n be the sequence (same as being defined by your A)
I assume X_0 is positive.
Thus sequence is strictly increasing and minimization occurs when X_{2n+1} = 3X_{2n}
We can compute the general term of X_{2n} and X_{2n+1}
v_0 =
X0
X1
v_1 =
X1
X2
the relation between v_0 and v_1 is
M_a =
0 1
3 2
the relation between v_1 and v_2 is
M_b =
0 1
0 3
hence the relation between v_2 and v_0 is
M = M_bM_a =
3 2
9 6
we deduce
v_{2n} =
X_{2n}
X_{2n+1}
v_{2n} = M^n v_0
Follow the classical diagonalization... and we (unless mistaken) get
X_{2n} = 9^n/3 X_0 + 2*9^{n-1}X_1
X_{2n+1} = 9^n X_0 + 2*9^{n-1}/3X_1
recall that X_1 = 3X_0 thus
X_{2n} = 9^n X_0
X_{2n+1} = 3.9^n X_0
Now if we represent the sum we want to check in base 9 we get
9^{n+1} 9^n
___ ________ ___ ___
X^{2n+2} X^2n
In the X^{2n} places we can only put a 1 or a 0 (that means we take the 2n-th elem from the A)
we may also put a 3 in the place of the X^{2n} place which means we selected the 2n+1th elem from the array
so we just have to decompose number in base 9, and check whether all its digits or either 0,1 or 3 (and also if its leading digit is not out of bound of our array....)

Julia way to write k-step look ahead function?

Suppose I have two arrays representing a probabilistic graph:
2
/ \
1 -> 4 -> 5 -> 6 -> 7
\ /
3
Where the probability of going to state 2 is 0.81 and the probability of going to state 3 is (1-0.81) = 0.19. My arrays represent the estimated values of the states as well as the rewards. (Note: Each index of the array represents its respective state)
V = [0, 3, 8, 2, 1, 2, 0]
R = [0, 0, 0, 4, 1, 1, 1]
The context doesn't matter so much, it's just to give an idea of where I'm coming from. I need to write a k-step look ahead function where I sum the discounted value of rewards and add it to the estimated value of the kth-state.
I have been able to do this so far by creating separate functions for each step look ahead. My goal of asking this question is to figure out how to refactor this code so that I don't repeat myself and use idiomatic Julia.
Here is an example of what I am talking about:
function E₁(R::Array{Float64,1}, V::Array{Float64, 1}, P::Float64)
V[1] + 0.81*(R[1] + V[2]) + 0.19*(R[2] + V[3])
end
function E₂(R::Array{Float64,1}, V::Array{Float64, 1}, P::Float64)
V[1] + 0.81*(R[1] + R[3]) + 0.19*(R[2] + R[4]) + V[4]
end
function E₃(R::Array{Float64,1}, V::Array{Float64, 1}, P::Float64)
V[1] + 0.81*(R[1] + R[3]) + 0.19*(R[2] + R[4]) + R[5] + V[5]
end
.
.
.
So on and so forth. It seems that if I was to ignore E₁() this would be exceptionally easy to refactor. But because I have to discount the value estimate at two different states, I'm having trouble thinking of a way to generalize this for k-steps.
I think obviously I could write a single function that took an integer as a value and then use a bunch of if-statements but that doesn't seem in the spirit of Julia. Any ideas on how I could refactor this? A closure of some sort? A different data type to store R and V?

It seems like you essentially have a discrete Markov chain. So the standard way would be to store the graph as its transition matrix:
T = zeros(7,7)
T[1,2] = 0.81
T[1,3] = 0.19
T[2,4] = 1
T[3,4] = 1
T[5,4] = 1
T[5,6] = 1
T[6,7] = 1
Then you can calculate the probabilities of ending up at each state, given an intial distribution, by multiplying T' from the left (because usually, the transition matrix is defined transposedly):
julia> T' * [1,0,0,0,0,0,0] # starting from (1)
7-element Array{Float64,1}:
0.0
0.81
0.19
0.0
0.0
0.0
0.0
Likewise, the probability of ending up at each state after k steps can be calculated by using powers of T':
julia> T' * T' * [1,0,0,0,0,0,0]
7-element Array{Float64,1}:
0.0
0.0
0.0
1.0
0.0
0.0
0.0
Now that you have all probabilities after k steps, you can easily calculate expectations as well. Maybe it pays of to define T as a sparse matrix.

Using a for loop to generate elements of a vector

I am trying to compute with the equation
and I would like to store each value into a row vector. Here is my attempt:
multiA = [1];
multiB = [];
NA = 6;
NB = 4;
q = [0,1,2,3,4,5,6];
for i=2:7
multiA = [multiA(i-1), (factorial(q(i) + NA - 1))/(factorial(q(i))*factorial(NA-1))];
%multiA = [multiA, multiA(i)];
end
multiA
But this does not work. I get the error message
Attempted to access multiA(3); index out
of bounds because numel(multiA)=2.
multiA = [multiA(i-1), (factorial(q(i)
+ NA -
1))/(factorial(q(i))*factorial(NA-1))];
Is my code even remotely close to what I want to achieve? What can I do to fix it?

You don't need any loop, just use the vector directly.
NA = 6;
q = [0,1,2,3,4,5,6];
multiA = factorial(q + NA - 1)./(factorial(q).*factorial(NA-1))
gives
multiA =
1 6 21 56 126 252 462
For multiple N a loop isn't necessary neither:
N = [6,8,10];
q = [0,1,2,3,4,5,6];
[N,q] = meshgrid(N,q)
multiA = factorial(q + N - 1)./(factorial(q).*factorial(N-1))
Also consider the following remarks regarding the overflow for n > 21 in:
f = factorial(n)
Limitations
The result is only accurate for double-precision values of n that are less than or equal to 21. A larger value of n produces a result that
has the correct order of magnitude and is accurate for the first 15
digits. This is because double-precision numbers are only accurate up
to 15 digits.
For single-precision input, the result is only accurate for values of n that are less than or equal to 13. A larger value of n produces a
result that has the correct order of magnitude and is accurate for the
first 8 digits. This is because single-precision numbers are only
accurate up to 8 digits.

Factorials of moderately large numbers can cause overflow. Two possible approaches to prevent that:
Avoid computing terms that will cancel. This approach is specially suited to the case when q is of the form 1,2,... as in your example. It also has the advantage that, for each value of q, the result for the previous value is reutilized, thus minimizing the number of operations:
>> q = 1:6;
>> multiA = cumprod((q+NA-1)./q)
multiA =
6 21 56 126 252 462
Note that 0 is not allowed in q. But the result for 0 is just 1, so the final result would be just [1 multiA].
For q arbitrary (not necessarily of the form 1,2,...), you can use the gammaln function, which gives the logarithms of the factorials:
>> q = [0 1 2 6 3];
>> multiA = exp(gammaln(q+NA)-gammaln(q+1)-gammaln(NA));
>>multiA =
1.0000 6.0000 21.0000 462.0000 56.0000

You want to append a new element to the end of 'multiA':
for i=2:7
multiA = [multiA, (factorial(q(i) + NA - 1))/(factorial(q(i))*factorial(NA-1))];
end
A function handle makes it much simpler:
%define:
omega=#(q,N)(factorial(q + N - 1))./(factorial(q).*factorial(N-1))
%use:
omega(0:6,4) %q=0..6, N=4

It might be better to use nchoosek as opposed to factorial. The latter can overflow quite easily, I'd imagine.
multiA=nan(1,7);
for i=1:7
multiA(i)=nchoosek(q(i)+N-1, q(i));
end

Finding the row with max separation between elements of an array in matlab

I have an array of size m x n. Each row has n elements which shows some probability (between 0 and 1). I want to find the row which has the max difference between its elements while it would be better if its nonzero elements are greater as well.
For example in array Arr:
Arr = [0.1 0 0.33 0 0.55 0;
0.01 0 0.10 0 0.2 0;
1 0.1 0 0 0 0;
0.55 0 0.33 0 0.15 0;
0.17 0.17 0.17 0.17 0.17 0.17]
the best row would be 3rd row, because it has more distinct values with greater values. How can I compute this using Matlab?

It seems that you're looking for the row with the greatest standard deviation, which is basically a measure of how much the values vary from the average.
If you want to ignore zero elements, use Shai's useful suggestion to replace zero elements to NaN. Indeed, some of MATLAB's built-in functions allow ignoring them:
Arr2 = Arr;
Arr2(~Arr) = NaN;
To find the standard deviation we'll employ nanstd (not std, because it doesn't ignore NaN values) along the rows, i.e. the 2nd dimension:
nanstd(Arr2, 0, 2)
To find the greatest standard deviation and it's corresponding row index, we'll apply nanmax and obtain both output variables:
[stdmax, idx] = nanmax(nanstd(Arr2, 0, 2));
Now idx holds hold the index of the desired row.
Example
Let's run this code on the input that you provided in your question:
Arr = [0.1 0 0.33 0 0.55 0;
0.01 0 0.10 0 0.2 0;
1 0.1 0 0 0 0;
0.55 0 0.33 0 0.15 0;
0.17 0.17 0.17 0.17 0.17 0.17];
Arr2 = Arr;
Arr2(~Arr) = NaN;
[maxstd, idx] = nanmax(nanstd(Arr2, 0, 2))
idx =
3
Note that the values in row #3 differ one from another much more than those in row #1, and therefore the standard deviation of row #3 is greater. This also corresponds to your comment:
... ergo a row with 3 zero and 3 non-zero but close values is worse than a row with 4 zeros and 2 very different values.
For this reason I believe that in this case 3 is indeed the correct answer.

It seems like you wish to ignore 0s in your matrix. You may achieve this by setting them to NaN and proceed using special build-in functions that ignore NaNs (e.g., nanmin, nanmax, etc.)
Here is a sample code for finding the row (ri) with the largest difference between minimal (nonzero) response and the maximal response:
nArr = Arr;
nArr( Arr == 0 ) = NaN; % replace zeros with NaNs
mn = nanmin(nArr, [], 2); % find minimal, non zero response at each row
mx = nanmax(nArr, [], 2); % maximal response
[~, ri] = nanmax( mx - mn ); % fid the row with maximal difference

Array Representation of Polynomials

I was reading about linked list implementation of polynomials. It stated,
Compare this representation with storing the same
polynomial using an array structure.
In the array we have to have keep a slot for each exponent
of x, thus if we have a polynomial of order 50 but containing
just 6 terms, then a large number of entries will be zero in
the array.
I was wondering how do we represent a polynomial in an array? Please guide me.
Thanks

A complete Java implementation of array-based polynomials is here: https://cs.lmu.edu/~ray/classes/dsa/assignment/2/answers/
The basic idea is that if you have a polynomial like
4x^6-2x+5
then your array would look like
0 1 2 3 4 5 6
+----+-----+----+----+----+----+----+
| 5 | -2 | 0 | 0 | 0 | 0 | 4 |
+----+-----+----+----+----+----+----+
That is
the coefficient 5 is in slot 0 of the array (representing 5x^0)
the coefficient -2 is in slot 1 of the array (representing -2x^1)
the coefficient 4 is in slot 6 of the array (representing 4x^6)
You can probably see how this represenation would be wasteful for polynomials like
3x^5000 + 2
In cases like this you want instead to use a sparse array representation. The simplest approach would be to use a map (dictionary) whose keys are the exponoents and whose values are the coefficients.

Suppose your polynomial is
6x^50 + 4x^2 + 2x + 1
The paragraph you have posted is describing storing it in an array like this:
polynomial = new array(50) // I'm assuming all elements initialize to zero
polynomial[50] = 6
polynomial[2] = 4
polynomial[1] = 2
polynomial[0] = 1
Basically its wasting a lot of space this way. Here the array index is the 'power' of x for the polynomial in x.

usually you hold one element for each exponent, so the polynom is actually:
poly[0]*x^0 + poly[1]*x^1 + ... + poly[n-1]*x^(n-1)
for example. if you have p(x) = 3x^5+5x^2+1, your array will be:
poly[0] = 1
poly[1] = 0
poly[2] = 5
poly[3] = 0
poly[4] = 0
poly[5] = 3

If you have a polynomial function like
3x^4 + x^2 + 2x + 1 = 0
You can represent it in array as
[1 2 1 0 3]
So, the element 0 is the coefficient of x^0, element 1 is the coefficient of x^1 and so on...