How to calculate the weighted average over a cell-array of arrays? - arrays

In generalisation of my previous question, how can a weighted average over cell elements (that are and shall remain arrays themselves) be performed?
I'd start by modifying gnovice's answer like this:
dim = ndims(c{1}); %# Get the number of dimensions for your arrays
M = cat(dim+1,c{:}); %# Convert to a (dim+1)-dimensional matrix
meanArray = sum(M.*weigth,dim+1)./sum(weigth,dim+1); %# Get the weighted mean across arrays
And before that make sure weight has the correct shape. The three cases that I think need to be taken care of are
weight = 1 (or any constant) => return the usual mean value
numel(weight) == length(c) => weight is per cell-element c{n} (but equal for each array element for fixed n)
numel(weight) == numel(cell2mat(c)) => each array-element has its own weight...
Case one is easy, and case 3 unlikely to happen so at the moment I'm interested in case 2: How can I transform weight into a array such that M.*weight has the correct dimensions in the sum above? Of course an answer that shows another way to obtain a weighted averaged is appreciated as well.
edit In fact, case 3 is even more trivial(what a tautology, apologies) than case 1 if weight has the same structure as c.
Here's an example of what I mean for case 2:
c = { [1 2 3; 1 2 3], [4 8 3; 4 2 6] };
weight = [ 2, 1 ];
should return
meanArray = [ 2 4 3; 2 2 4 ]
(e.g. for the first element (2*1 + 1*4)/(2+1) = 2)

After familiarizing myself with REPMAT, now here's my solution:
function meanArray = cellMean(c, weight)
% meanArray = cellMean(c, [weight=1])
% mean over the elements of a cell c, keeping matrix structures of cell
% elements etc. Use weight if given.
% based on http://stackoverflow.com/q/5197692/321973, courtesy of gnovice
% (http://stackoverflow.com/users/52738/gnovice)
% extended to weighted averaging by Tobias Kienzler
% (see also http://stackoverflow.com/q/5231406/321973)
dim = ndims(c{1}); %# Get the number of dimensions for your arrays
if ~exist('weight', 'var') || isempty(weight); weight = 1; end;
eins = ones(size(c{1})); % that is german for "one", creative, I know...
if ~iscell(weight)
% ignore length if all elements are equal, this is case 1
if isequal(weight./max(weight(:)), ones(size(weight)))
weight = repmat(eins, [size(eins)>0 length(c)]);
elseif isequal(numel(weight), length(c)) % case 2: per cell-array weigth
weight = repmat(shiftdim(weight, -3), [size(eins) 1]);
else
error(['Weird weight dimensions: ' num2str(size(weight))]);
end
else % case 3, insert some dimension check here if you want
weight = cat(dim+1,weight{:});
end;
M = cat(dim+1,c{:}); %# Convert to a (dim+1)-dimensional matrix
sumc = sum(M.*weight,dim+1);
sumw = sum(weight,dim+1);
meanArray = sumc./sumw; %# Get the weighted mean across arrays

Related

Find Minimum Score Possible

Problem statement:
We are given three arrays A1,A2,A3 of lengths n1,n2,n3. Each array contains some (or no) natural numbers (i.e > 0). These numbers denote the program execution times.
The task is to choose the first element from any array and then you can execute that program and remove it from that array.
For example:
if A1=[3,2] (n1=2),
A2=[7] (n2=1),
A3=[1] (n3=1)
then we can execute programs in various orders like [1,7,3,2] or [7,1,3,2] or [3,7,1,2] or [3,1,7,2] or [3,2,1,7] etc.
Now if we take S=[1,3,2,7] as the order of execution the waiting time of various programs would be
for S[0] waiting time = 0, since executed immediately,
for S[1] waiting time = 0+1 = 1, taking previous time into account, similarly,
for S[2] waiting time = 0+1+3 = 4
for S[3] waiting time = 0+1+3+2 = 6
Now the score of array is defined as sum of all wait times = 0 + 1 + 4 + 6 = 11, This is the minimum score we can get from any order of execution.
Our task is to find this minimum score.
How can we solve this problem? I tried with approach trying to pick minimum of three elements each time, but it is not correct because it gets stuck when two or three same elements are encountered.
One more example:
if A1=[23,10,18,43], A2=[7], A3=[13,42] minimum score would be 307.
The simplest way to solve this is with dynamic programming (which runs in cubic time).
For each array A: Suppose you take the first element from array A, i.e. A[0], as the next process. Your total cost is the wait-time contribution of A[0] (i.e., A[0] * (total_remaining_elements - 1)), plus the minimal wait time sum from A[1:] and the rest of the arrays.
Take the minimum cost over each possible first array A, and you'll get the minimum score.
Here's a Python implementation of that idea. It works with any number of arrays, not just three.
def dp_solve(arrays: List[List[int]]) -> int:
"""Given list of arrays representing dependent processing times,
return the smallest sum of wait_time_before_start for all job orders"""
arrays = [x for x in arrays if len(x) > 0] # Remove empty
#functools.lru_cache(100000)
def dp(remaining_elements: Tuple[int],
total_remaining: int) -> int:
"""Returns minimum wait time sum when suffixes of each array
have lengths in 'remaining_elements' """
if total_remaining == 0:
return 0
rem_elements_copy = list(remaining_elements)
best = 10 ** 20
for i, x in enumerate(remaining_elements):
if x == 0:
continue
cost_here = arrays[i][-x] * (total_remaining - 1)
if cost_here >= best:
continue
rem_elements_copy[i] -= 1
best = min(best,
dp(tuple(rem_elements_copy), total_remaining - 1)
+ cost_here)
rem_elements_copy[i] += 1
return best
return dp(tuple(map(len, arrays)), sum(map(len, arrays)))
Better solutions
The naive greedy strategy of 'smallest first element' doesn't work, because it can be worth it to do a longer job to get a much shorter job in the same list done, as the example of
A1 = [100, 1, 2, 3], A2 = [38], A3 = [34],
best solution = [100, 1, 2, 3, 34, 38]
by user3386109 in the comments demonstrates.
A more refined greedy strategy does work. Instead of the smallest first element, consider each possible prefix of the array. We want to pick the array with the smallest prefix, where prefixes are compared by average process time, and perform all the processes in that prefix in order.
A1 = [ 100, 1, 2, 3]
Prefix averages = [(100)/1, (100+1)/2, (100+1+2)/3, (100+1+2+3)/4]
= [ 100.0, 50.5, 34.333, 26.5]
A2=[38]
A3=[34]
Smallest prefix average in any array is 26.5, so pick
the prefix [100, 1, 2, 3] to complete first.
Then [34] is the next prefix, and [38] is the final prefix.
And here's a rough Python implementation of the greedy algorithm. This code computes subarray averages in a completely naive/brute-force way, so the algorithm is still quadratic (but an improvement over the dynamic programming method). Also, it computes 'maximum suffixes' instead of 'minimum prefixes' for ease of coding, but the two strategies are equivalent.
def greedy_solve(arrays: List[List[int]]) -> int:
"""Given list of arrays representing dependent processing times,
return the smallest sum of wait_time_before_start for all job orders"""
def max_suffix_avg(arr: List[int]):
"""Given arr, return value and length of max-average suffix"""
if len(arr) == 0:
return (-math.inf, 0)
best_len = 1
best = -math.inf
curr_sum = 0.0
for i, x in enumerate(reversed(arr), 1):
curr_sum += x
new_avg = curr_sum / i
if new_avg >= best:
best = new_avg
best_len = i
return (best, best_len)
arrays = [x for x in arrays if len(x) > 0] # Remove empty
total_time_sum = sum(sum(x) for x in arrays)
my_averages = [max_suffix_avg(arr) for arr in arrays]
total_cost = 0
while True:
largest_avg_idx = max(range(len(arrays)),
key=lambda y: my_averages[y][0])
_, n_to_remove = my_averages[largest_avg_idx]
if n_to_remove == 0:
break
for _ in range(n_to_remove):
total_time_sum -= arrays[largest_avg_idx].pop()
total_cost += total_time_sum
# Recompute the changed array's avg
my_averages[largest_avg_idx] = max_suffix_avg(arrays[largest_avg_idx])
return total_cost

Argmax of a multidimensional array along a subset of dimensions in Matlab

Say, Y is a 7-dimensional array, and I need an efficient way to maximize it along the last 3 dimensions, that will work on GPU.
As a result I need a 4-dimensional array with maximal values of Y and three 4-dimensional arrays with the indices of these values in the last three dimensions.
I can do
[Y7, X7] = max(Y , [], 7);
[Y6, X6] = max(Y7, [], 6);
[Y5, X5] = max(Y6, [], 5);
Then I have already found the values (Y5) and the indices along the 5th dimension (X5). But I still need indices along the 6th and 7th dimensions.
Here's a way to do it. Let N denote the number of dimensions along which to maximize.
Reshape Y to collapse the last N dimensions into one.
Maximize along the collapsed dimensions. This gives argmax as a linear index over those dimensions.
Unroll the linear index into N subindices, one for each dimension.
The following code works for any number of dimensions (not necessarily 7 and 3 as in your example). To achieve that, it handles the size of Y generically and uses a comma-separated list obtained from a cell array to get N outputs from sub2ind.
Y = rand(2,3,2,3,2,3,2); % example 7-dimensional array
N = 3; % last dimensions along which to maximize
D = ndims(Y);
sz = size(Y);
[~, ind] = max(reshape(Y, [sz(1:D-N) prod(sz(D-N+1:end))]), [], D-N+1);
sub = cell(1,N);
[sub{:}] = ind2sub(sz(D-N+1:D), ind);
As a check, after running the above code, observe for example Y(2,3,1,2,:) (shown as a row vector for convenience):
>> reshape(Y(2,3,1,2,:), 1, [])
ans =
0.5621 0.4352 0.3672 0.9011 0.0332 0.5044 0.3416 0.6996 0.0610 0.2638 0.5586 0.3766
The maximum is seen to be 0.9011, which occurs at the 4th position (where "position" is defined along the N=3 collapsed dimensions). In fact,
>> ind(2,3,1,2)
ans =
4
>> Y(2,3,1,2,ind(2,3,1,2))
ans =
0.9011
or, in terms of the N=3 subindices,
>> Y(2,3,1,2,sub{1}(2,3,1,2),sub{2}(2,3,1,2),sub{3}(2,3,1,2))
ans =
0.9011

matlab maximum of array with unknown dimension

I would like to compute the maximum and, more importantly, its coordinates of an N-by-N...by-N array, without specifying its dimensions.
For example, let's take:
A = [2 3];
B = [2 3; 3 4];
The function (lets call it MAXI) should return the following values for matrix A:
[fmax, coor] = MAXI(A)
fmax =
3
coor =
2
and for matrix B:
[fmax, coor] = MAXI(B)
fmax =
4
coor=
2 2
The main problem is not to develop a code that works for one class in particular, but to develop a code that as quickly as possible works for any input (with higher dimensions).
To find the absolute maximum, you'll have to convert your input matrix into a column vector first and find the linear index of the greatest element, and then convert it to the coordinates with ind2sub. This can be a little bit tricky though, because ind2sub requires specifying a known number of output variables. For that purpose we can employ cell arrays and comma-separated lists, like so:
[fmax, coor] = max(A(:));
if ismatrix(A)
C = cell(1:ndims(A));
[C{:}] = ind2sub(size(A), coor);
coor = cell2mat(C);
end
EDIT: I've added an additional if statement that checks if the input is a matrix or a vector, and in case of the latter it returns the linear index itself as is.
In a function, it looks like so:
function [fmax, coor] = maxi(x)
[fmax, coor] = max(A(:));
if ismatrix(A)
C = cell(1:ndims(A));
[C{:}] = ind2sub(size(A), coor);
coor = cell2mat(C);
end
Example
A = [2 3; 3 4];
[fmax, coor] = maxi(A)
fmax =
4
coor =
2 2

How to sum matrix (vector) elements in a structure

I have a M x N sized structure array with fields var and val which are vectors.
What I would like to do is to get an M x N sized matrix A where each element A(i, j) contains the sum value of vector var (or val) from structure array
For example:
myStructure(1,5)
ans =
var: 1
val: [0.0100 0.1800 0.8100]
sum(myStructure(1,5).val)
ans =
1
myStructure(7,8)
ans =
var: [1 3]
val: [1x9 double]
myStructure(7,8).val
ans =
Columns 1 through 6
0.1111 0.1111 0.1111 0.1111 0.1111 0.1111
Columns 7 through 9
0.1111 0.1111 0.1111
Therefore A(1,5) should be 1 and the same way all elements A(i,j) should be equal to sum(myStructure(i,j).val).
Does anyone know how this could be done in Matlab without using for loops?
I've tried to use sum function in a number of ways (sum(messages.val) and sum(messages(:,:).val) ...) but couldn't get the desired result.
You can get the field elements into one matrix using:
svals = [myStructure.val];
If val is always the same length (let's name it P), this'll be a numel(myStructure)*P x 1 vector containing all values of all fields in sequence. You can reshape it of course back:
[N,M]=size(myStructure);
P = numel(myStructure(1,1).val);
svals = reshape(svals,[P M N]);
and now just sum the first dimension, which leaves you the MxN sized A matrix:
A = squeeze(sum(svals,1));
squeeze is applied in this last step to remove the resulting singleton dimension (otherwise A would be of size 1xMxN).
If the vallength can vary, I see no other way than looping it, or using arrayfun, which is essentially the same as looping:
A = arrayfun(#(x) sum(x.val),myStructure);
Here is a slightly different solution. First lets create an array structure for testing:
s = struct();
for i=1:5
for j=1:3
s(i,j).var = i+j;
s(i,j).val = rand(1,randi(10)); %# different lengths vectors
end
end
Now we do the sum:
A = cellfun(#sum, reshape({s.val}, size(s)))
A =
1.9278 3.0719 5.8731
3.2377 0.43874 2.2374
3.0661 2.8892 4.1455
1.9093 1.4758 1.441
4.8731 0.5308 3.4076

indexing into an octave array using another array

Hi I have an three dimensional octave array A of size [x y z]
Now I have another array B of dimensions n * 3
say B(0) gives [3 3 1]
I need to access that location in A ie A(3, 3, 1) = say 15
something like A(B(0))
How do I go about it?
See the help for sub2ind (and ind2sub).
However, nowadays people recommend to use loops.
Well, first, B(0) is invalid index, as addressing in MATLAB and Octave begins from 1. Other issue is that you want that B(0) would contain a vector [3 3 1 ]. Matrices in MATLAB can not contain other matrices, only scalars. So you need to use a 3x3 cell array, a 3x3 struct or a 4-dimensional array. I'll choose here the cell array option, because I find it easiest and most convenient.
% Set random seed (used only for example data generation).
rng(123456789);
% Let's generate some pseudo-random example data.
A = rand(3,3,3);
A(:,:,1) =
0.5328 0.7136 0.8839
0.5341 0.2570 0.1549
0.5096 0.7527 0.6705
A(:,:,2) =
0.6434 0.8185 0.2308
0.7236 0.0979 0.0123
0.7487 0.0036 0.3535
A(:,:,3) =
0.1853 0.8994 0.9803
0.7928 0.3154 0.5421
0.6122 0.4067 0.2423
% Generate an example 3x3x3 cell array of indices, filled with pseudo-random 1x3 index vectors.
CellArrayOfIndicesB = cellfun(#(x) randi(3,1,3), num2cell(zeros(3,3,3)), 'UniformOutput', false);
% Example #1. Coordinates (1,2,3).
Dim1 = 1;
Dim2 = 2;
Dim3 = 3;
% The code to get the corresponding value of A directly.
ValueOfA = A(CellArrayOfIndicesB{Dim1,Dim2,Dim3}(1), CellArrayOfIndicesB{Dim1,Dim2,Dim3}(2), CellArrayOfIndicesB{Dim1,Dim2,Dim3}(3));
ValueOfA =
0.8839
% Let's confirm that by first checking where CellArrayOfIndicesB{1,2,3} points to.
CellArrayOfIndicesB{1,2,3}
ans =
[ 1 3 1 ]
% CellArrayOfIndicesB{1,2,3} points to A(1,3,1).
% So let's see what is the value of A(1,3,1).
A(1,3,1)
ans =
0.8839
% Example #2. Coordinates (3,1,2).
Dim1 = 3;
Dim2 = 1;
Dim3 = 2;
ValueOfA = A(CellArrayOfIndicesB{Dim1,Dim2,Dim3}(1), CellArrayOfIndicesB{Dim1,Dim2,Dim3}(2), CellArrayOfIndicesB{Dim1,Dim2,Dim3}(3));
ValueOfA =
0.4067
CellArrayOfIndicesB{3,1,2}
ans =
[ 3 2 3 ]
A(3,2,3)
ans =
0.4067

Resources