Vectorizing addition of subarray - arrays

Let's say I have two (large) vectors a=[0 0 0 0 0] and b=[1 2 3 4 5] of the same size and one index vector ind=[1 5 2 1] with values in {1,...,length(a)}. I would like to compute
for k = 1:length(ind)
a(ind(k)) = a(ind(k)) + b(ind(k));
end
% a = [2 2 0 0 5]
That is, I want to add those entries of b declared in ind to a including multiplicity.
a(ind)=a(ind)+b(ind);
% a = [1 2 0 0 5]
is much faster, of course, but ignores indices which appear multiple times.
How can I speed up the above code?

We can use unique to identify the unique index values and use the third output to determine which elements of ind share the same index. We can then use accumarray to sum all the elements of b which share the same index. We then add these to the original value of a at these locations.
[uniqueinds, ~, inds] = unique(ind);
a(uniqueinds) = a(uniqueinds) + accumarray(inds, b(ind)).';
If max(inds) == numel(a) then this could be simplified to the following since accumarray will simply return 0 for any missing entry in ind.
a(:) = a(:) + accumarray(ind(:), b(ind));

Another approach based on accumarray:
a(:) = a(:) + accumarray(ind(:), b(ind(:)), [numel(a) 1]);
How it works
accumarray with two column vectors as inputs aggregates the values of the second input corresponding to the same index in the first. The third input is used here to force the result to be the same size as a, padding with zeros if needed.

Related

How to find a sum 3 elements in array?

I have an array A=[a1,a2,a3, ..., aN] I would like to take a product of each 3 elements:
s1=a1+a2+a3
s2=a4+a5+a6
...
sM=a(N-2)+a(N-1)+aN
My solution:
k=size(A);
s=0;
for n=1:k
s(n)=s(n-2)+s(n-1)+s(n);
end
Error: Attempted to access s(2); index out of bounds because numel(s)=1.
Hoe to fix it?
If you want to sum in blocks, for the general case when the number of elements of A is not necessarily a multiple of the block size, you can use accumarray:
A = [3 8 5 8 2 3 4 7 9 6 4]; % 11 elements
s = 3; % block size
result = accumarray(ceil((1:numel(A))/s).', A(:));
If you want a sliding sum with a given block size, you can use conv:
A = [3 8 5 8 2 3 4 7 9 6 4]; % 11 elements
s = 3; % block size
result = conv(A(:).', ones(1,s), 'valid');
You try to calculate sby using values from s. Dont you mean s(n)=A(n-2)+A(n-1)+A(n);? Also size returns more than one dimension on its own.
That being said, getting the 2 privous values n-2 and n-1 doenst work for n=1;2 (because you must have positive indices). You have to explain how the first two values should be handeled. I assume either 0 for elements not yet exisiting
k=size(A,2); %only the second dimension when A 1xn, or length(A)
s=zeros(1,k); %get empty values instead of appending each value for better performance
s(1)=A(1);
s(2)=A(2)+A(1);
for n=3:k %start at 3
s(n)=A(n-2)+A(n-1)+A(n);
end
or sshoult be 2 values shorter than A.
k=size(A,2);
s=zeros(1,k-2);
for n=1:k-2
s(n)=A(n)+A(n+1)+A(n+2);
end
You initialise s as a scalar with s = 0. Then you try and index it like an array, but it only has a single element.
Your current logic (if fixed) will calculate this:
s(1) = a(1)+a(2)+a(3)
s(2) = a(2)+a(3)+a(4)
...
% 's' will be 2 elements shorter than 'a'
So we need to be a bit wiser with the indexing to get what you describe, which is
s(1) = a(1)+a(2)+a(3)
s(2) = a(4)+a(5)+a(6)
...
% 's' will be a third as big as 'a'
You should pre-allocate s to the right size, like so:
k = numel(A); % Number of elements in 'A'
s = zeros( 1, k/3 ); % Output array, assuming 'k' is divisible by 3
for n = 0:3:k-3
s(n/3+1) = a(n+1) + a(n+2) + a(n+3);
end
You could do this in one line by reshaping the array to have 3 rows, then summing down each column, this assumes that the number of elements in a is divisible by 3, and that a is a row vector...
s = sum( reshape( a, 3, [] ) );

Creating a Binary Matrix from an Array of Indices

Definition of Problem
I have two arrays called weights and indices respectively:
weights = [1, 3, 2];
indices = [1, 1, 2, 3, 2, 4];
m = 4; % Number of Rows in Matrix
n = 3; % Number of Columns in Matrix
M = zeros(m, n);
The array called indices is storing the indices where I need to store a 1 in every column.
For instance for the first column at row 1 which is indicated in indices(1) I need to store a 1 and this is indicated by weights(1) which is equal to 1.
M(indices(1), 1) = 1;
For column 2, at rows 1 to 3 (indices(2:4)) I need to store a 1. The range of indices for column 2 are again indicated by weights(2).
M(indices(2:4),2) = 1;
Similarly, for column 3, at rows 2 and 4 (indices(5:6)) I need to store a 1. The range of indices for column 3 are again indicated by weights(3).
M(indices(5:6),3) = 1;
Expected Binary Matrix
The expected and resulting binary matrix is:
1 1 0
0 1 1
0 1 0
0 0 1
Solution
Is there any way I can do this in generalized manner by using both the weights and indices arrays rather than doing it in a hard coded manner to create create the binary matrix M?
You just have a weird way of describing your indices, so you just need to convert them to something standard.
columsn_idx=repelem(1:n,weights); % repeat column number as much as needed
row_idx=indices ; % just for clarity
M(sub2ind([m,n],row_idx,columsn_idx))=1;% Use linear indices

Counting appropriate number of subarrays in an array excluding some specific pairs?

Let's say, I have an array like this:
1 2 3 4 5
And given pair is (2,3), then number of possible subarrays that don't have (2,3) present in them will be,,
1. 1
2. 2
3. 3
4. 4
5. 5
6. 1 2
7. 3 4
8. 4 5
9. 3 4 5
So, the answer will be 9.
Obviously, there can be more of such pairs.
Now, one method that I thought of is of O(n^2) which involves finding all such elements of maximum length n. Can I do better? Thanks!
Let's see, this adhoc pseudocode should be O(n):
array = [1 2 3 4 5]
pair = [2 3]
length = array.length
n = 0
start = 0
while (start < length)
{
# Find next pair
pair_pos = start
while (pair_pos < length) and (array[pair_pos,pair_pos+1] != pair) # (**1)
{
pair_pos++
}
# Count subarrays
n += calc_number_of_subarrays(pair_pos-start) # (**2)
# Continue after the pair
start = pair_pos+2
}
print n
Note **1: This seems to involve a loop inside the outer loop. Since every element of the array is visited exactly once, both loops together are O(n). In fact, it is probably easy to refactor this to use only one while loop.
Note **2: Given an array of length l, there are l+(l-1)+(l-2)+...+1 subarrays (including the array itself). Which is easy to calculate in O(1), there is no loop involved. c/f Euler. :)
You don't need to find which subarrays are in an array to know how many of them there are. Finding where the pair is in the array is at most 2(n-1) array operations. Then you only need to do a simple calculation with the two lengths you extract from that. The amount of subarrays in an array of length 3 is, for example, 3 + 2 + 1 = 6 = (n(n+1))/2.
The solution uses that in a given array [a, ..., p1, p2, ..., b], the amount of subarrays without the pair is the amount of subarrays for [a, ..., p1] + the amount of subarrays for [p2, ..., b]. If multiple of such pairs exist, we repeat the same trick on [p2, ..., b] as if it was the whole array.
function amount_of_subarrays ::
index := 1
amount := 0
lastmatch := 0
while length( array ) > index do
if array[index] == pair[1] then
if array[index+1] == pair[2] then
length2 := index - lastmatch
amount := amount + ((length2 * (length2 + 1)) / 2)
lastmatch := index
fi
fi
index := index + 1
od
//index is now equal to the length
length2 := index - lastmatch
amount := amount + ((length2 * (length2 + 1)) / 2)
return amount
For an array [1, 2, 3, 4, 5] with pair [2, 3], index will be 2 when the two if-statements are true. amount will be updated to 3 and lastmatch will be updated to 2. No more matches will be found, so lastmatch is 2 and index is 5. amount will be 3 + 6 = 9.

How to do a matrix manipulation on Matlab?

I have a matrix A of size m x n and another matrix b of size 1 x n (in Matlab).
The matrix b is such that it consists of sequences of 1s, then sequences of 2s, then sequences of 3s, etc. up to some value k.
(For example b = [1 1 1 2 2 2 3 4 4], n = 9)
I want to take A, and for each row in A, choose the max in each segment, zeroing everything else in that subsequence.
So, for example, for a row A = [0 -1 2 3 4 1 3 4 5]) I would get
[0 0 2 0 4 0 3 0 5]
If there are multiple rows in A (m > 1), this should happen for each row.
I can do it easily using for loops, but it works very slowly, because I loop both over m and n.
Is there a "oneliner" to do it in Matlab, or something simple that works fast?
If A is a single row, accumarray can do the job using an ad hoc function:
result = accumarray(b(:), A(:) ,[] , #(x) {x==max(x)});
result = vertcat(result{:}).' .* A;
Not sure how fast this will be, since it uses cells.
If A has several rows, you can use a loop over the rows.

matlab: eliminate elements from array

I have quite big array. To make things simple lets simplify it to:
A = [1 1 1 1 2 2 3 3 3 3 4 4 5 5 5 5 5 5 5 5];
So, there is a group of 1's (4 elements), 2's (2 elements), 3's (4 elements), 4's (2 elements) and 5's (8 elements). Now, I want to keep only columns, which belong to group of 3 or more elements. So it will be like:
B = [1 1 1 1 3 3 3 3 5 5 5 5 5 5 5 5];
I was doing it using for loop, scanning separately 1's, 2's, 3's and so on, but its extremely slow with big arrays...
Thanks for any suggestions how to do it in more efficient way :)
Art.
A general approach
If your vector is not necessarily sorted, then you need to run to count the number of occurrences of each element in the vector. You have histc just for that:
elem = unique(A);
counts = histc(A, elem);
B = A;
B(ismember(A, elem(counts < 3))) = []
The last line picks the elements that have less than 3 occurrences and deletes them.
An approach for a grouped vector
If your vector is "semi-sorted", that is if similar elements in the vector are grouped together (as in your example), you can speed things up a little by doing the following:
start_idx = find(diff([0, A]))
counts = diff([start_idx, numel(A) + 1]);
B = A;
B(ismember(A, A(start_idx(counts < 3)))) = []
Again, note that the vector need not to be entirely sorted, just that similar elements are adjacent to each other.
Here is my two-liner
counts = accumarray(A', 1);
B = A(ismember(A, find(counts>=3)));
accumarray is used to count the individual members of A. find extracts the ones that meet your '3 or more elements' criterion. Finally, ismember tells you where they are in A. Note that A needs not be sorted. Of course, accumarray only works for integer values in A.
What you are describing is called run-length encoding.
There is software for this in Matlab on the FileExchange. Or you can do it directly as follows:
len = diff([ 0 find(A(1:end-1) ~= A(2:end)) length(A) ]);
val = A(logical([ A(1:end-1) ~= A(2:end) 1 ]));
Once you have your run-length encoding you can remove elements based on the length. i.e.
idx = (len>=3)
len = len(idx);
val = val(idx);
And then decode to get the array you want:
i = cumsum(len);
j = zeros(1, i(end));
j(i(1:end-1)+1) = 1;
j(1) = 1;
B = val(cumsum(j));
Here's another way to do it using matlab built-ins.
% Set up
A=[1 1 1 1 2 2 3 3 3 3 4 4 5 5 5 5 5];
threshold=2;
% Get the unique elements of the array
uniqueElements=unique(A);
% Count haw many times each unique element occurs
counts=histc(A,uniqueElements);
% Write which elements should be kept
toKeep=uniqueElements(counts>threshold);
% Make a logical index
indexer=false(size(A));
for i=1:length(toKeep)
% For every unique element we want to keep select the indices in A that
% are equal
indexer=indexer|(toKeep(i)==A);
end
% Apply index
B=A(indexer);

Resources