matlab: eliminate elements from array - arrays

I have quite big array. To make things simple lets simplify it to:
A = [1 1 1 1 2 2 3 3 3 3 4 4 5 5 5 5 5 5 5 5];
So, there is a group of 1's (4 elements), 2's (2 elements), 3's (4 elements), 4's (2 elements) and 5's (8 elements). Now, I want to keep only columns, which belong to group of 3 or more elements. So it will be like:
B = [1 1 1 1 3 3 3 3 5 5 5 5 5 5 5 5];
I was doing it using for loop, scanning separately 1's, 2's, 3's and so on, but its extremely slow with big arrays...
Thanks for any suggestions how to do it in more efficient way :)
Art.

A general approach
If your vector is not necessarily sorted, then you need to run to count the number of occurrences of each element in the vector. You have histc just for that:
elem = unique(A);
counts = histc(A, elem);
B = A;
B(ismember(A, elem(counts < 3))) = []
The last line picks the elements that have less than 3 occurrences and deletes them.
An approach for a grouped vector
If your vector is "semi-sorted", that is if similar elements in the vector are grouped together (as in your example), you can speed things up a little by doing the following:
start_idx = find(diff([0, A]))
counts = diff([start_idx, numel(A) + 1]);
B = A;
B(ismember(A, A(start_idx(counts < 3)))) = []
Again, note that the vector need not to be entirely sorted, just that similar elements are adjacent to each other.

Here is my two-liner
counts = accumarray(A', 1);
B = A(ismember(A, find(counts>=3)));
accumarray is used to count the individual members of A. find extracts the ones that meet your '3 or more elements' criterion. Finally, ismember tells you where they are in A. Note that A needs not be sorted. Of course, accumarray only works for integer values in A.

What you are describing is called run-length encoding.
There is software for this in Matlab on the FileExchange. Or you can do it directly as follows:
len = diff([ 0 find(A(1:end-1) ~= A(2:end)) length(A) ]);
val = A(logical([ A(1:end-1) ~= A(2:end) 1 ]));
Once you have your run-length encoding you can remove elements based on the length. i.e.
idx = (len>=3)
len = len(idx);
val = val(idx);
And then decode to get the array you want:
i = cumsum(len);
j = zeros(1, i(end));
j(i(1:end-1)+1) = 1;
j(1) = 1;
B = val(cumsum(j));

Here's another way to do it using matlab built-ins.
% Set up
A=[1 1 1 1 2 2 3 3 3 3 4 4 5 5 5 5 5];
threshold=2;
% Get the unique elements of the array
uniqueElements=unique(A);
% Count haw many times each unique element occurs
counts=histc(A,uniqueElements);
% Write which elements should be kept
toKeep=uniqueElements(counts>threshold);
% Make a logical index
indexer=false(size(A));
for i=1:length(toKeep)
% For every unique element we want to keep select the indices in A that
% are equal
indexer=indexer|(toKeep(i)==A);
end
% Apply index
B=A(indexer);

Related

How to find a sum 3 elements in array?

I have an array A=[a1,a2,a3, ..., aN] I would like to take a product of each 3 elements:
s1=a1+a2+a3
s2=a4+a5+a6
...
sM=a(N-2)+a(N-1)+aN
My solution:
k=size(A);
s=0;
for n=1:k
s(n)=s(n-2)+s(n-1)+s(n);
end
Error: Attempted to access s(2); index out of bounds because numel(s)=1.
Hoe to fix it?
If you want to sum in blocks, for the general case when the number of elements of A is not necessarily a multiple of the block size, you can use accumarray:
A = [3 8 5 8 2 3 4 7 9 6 4]; % 11 elements
s = 3; % block size
result = accumarray(ceil((1:numel(A))/s).', A(:));
If you want a sliding sum with a given block size, you can use conv:
A = [3 8 5 8 2 3 4 7 9 6 4]; % 11 elements
s = 3; % block size
result = conv(A(:).', ones(1,s), 'valid');
You try to calculate sby using values from s. Dont you mean s(n)=A(n-2)+A(n-1)+A(n);? Also size returns more than one dimension on its own.
That being said, getting the 2 privous values n-2 and n-1 doenst work for n=1;2 (because you must have positive indices). You have to explain how the first two values should be handeled. I assume either 0 for elements not yet exisiting
k=size(A,2); %only the second dimension when A 1xn, or length(A)
s=zeros(1,k); %get empty values instead of appending each value for better performance
s(1)=A(1);
s(2)=A(2)+A(1);
for n=3:k %start at 3
s(n)=A(n-2)+A(n-1)+A(n);
end
or sshoult be 2 values shorter than A.
k=size(A,2);
s=zeros(1,k-2);
for n=1:k-2
s(n)=A(n)+A(n+1)+A(n+2);
end
You initialise s as a scalar with s = 0. Then you try and index it like an array, but it only has a single element.
Your current logic (if fixed) will calculate this:
s(1) = a(1)+a(2)+a(3)
s(2) = a(2)+a(3)+a(4)
...
% 's' will be 2 elements shorter than 'a'
So we need to be a bit wiser with the indexing to get what you describe, which is
s(1) = a(1)+a(2)+a(3)
s(2) = a(4)+a(5)+a(6)
...
% 's' will be a third as big as 'a'
You should pre-allocate s to the right size, like so:
k = numel(A); % Number of elements in 'A'
s = zeros( 1, k/3 ); % Output array, assuming 'k' is divisible by 3
for n = 0:3:k-3
s(n/3+1) = a(n+1) + a(n+2) + a(n+3);
end
You could do this in one line by reshaping the array to have 3 rows, then summing down each column, this assumes that the number of elements in a is divisible by 3, and that a is a row vector...
s = sum( reshape( a, 3, [] ) );

How to repeat every 3rd element of a vector?

I have a vector like this:
h = [1,2,3,4,5,6,7,8,9,10,11,12]
And I want to repeat every third element like so:
h_rep = [1,2,3,3,4,5,6,6,7,8,9,9,10,11,12,12]
How do I accomplish this elegantly in MATLAB? The actual arrays are huge, so ideally I don't want to write a for loop. Is there a vectorized way to do this?
One way to do this would be to use the recent repelem function that was released in version R2015b where you can repeat each element in a vector a certain amount of times. In this case, specify a vector where every third element is a 2 with the rest of the values being a 1 as the number of times to repeat the corresponding element, then use the function:
N = numel(h);
rep = ones(1, N);
rep(3:3:end) = 2;
h_rep = repelem(h, rep);
Using your example: h = 1 : 12, we thus get:
>> h_rep
h_rep =
1 2 3 3 4 5 6 6 7 8 9 9 10 11 12 12
If repelem is not available to you, then a clever use of cumsum may help. Basically, note that for every three elements, the next one is a copy of the previous element. If we had an indicator vector of [1 1 1 0] where 1 is the position that we want to copy and 0 tells us to copy the last value, using cumulative sum or cumsum on repeated versions of this vector - exactly 1 + (numel(h) / 4) will give us exactly where we would need to index into h. Therefore, create a vector of ones that is the length of h added with 1 + (numel(h) / 4 to ensure that we make space for the duplicate elements, then make sure every fourth element is set to 0 before applying the cumsum:
N = numel(h);
rep = ones(1, N + 1 + (N / 4));
rep(4:4:end) = 0;
rep = cumsum(rep);
h_rep = h(rep);
Thus:
>> h_rep
h_rep =
1 2 3 3 4 5 6 6 7 8 9 9 10 11 12 12
One last suggestion (thanks to user #bremen_matt) would be to reshape your vector into a matrix so that it has 3 rows, duplicate the last row, then reshape the resulting duplicated matrix back to a single vector:
h_rep = reshape(h, 3, []);
h_rep = reshape([h_rep; h_rep(end,:)], 1, []);
We again get:
>> h_rep
h_rep =
1 2 3 3 4 5 6 6 7 8 9 9 10 11 12 12
Of course the obvious caveat with the above code is that the length of vector h is evenly divisible by 4.
(Modified according to rayryeng's correct observations)...
Another solution is to play around with the reshape function. If you reshape the matrix to a 3xn matrix first...
B = reshape(h,3,[])
And then copy the last row
B = [B;B(end,:)]
And finally vectorize the solution...
B(:).'
You can use just indexing:
h = [1,2,3,4,5,6,7,8,9,10,11,12]; % initial data
n = 3; % step for repetition
h_rep = h(ceil(n/(n+1):n/(n+1):end));
An index-based approach (using sort):
h_rep = h(sort([1:numel(h) 3:3:numel(h)]));
Or a slightly shorter syntax...
h_rep = h(sort([1:end 3:3:end]));
I think this will do it:
h = [1,2,3,4,5,6,7,8,9,10,11,12];
h0=kron(h,[1 1])
h_rep=h0(mod(1:length(h0),2)==0 | mod(1:length(h0),3)==2)
Answer:
1 2 3 3 4 5 6 6 7 8 9 9 10 11 12 12
Explanation:
After duplicating every element, you select only those that you wants. You can extend this idea to duplicate second and third. etc..

Finding indexes of maximum values of an array

How do I find the index of the 2 maximum values of a 1D array in MATLAB? Mine is an array with a list of different scores, and I want to print the 2 highest scores.
You can use sort, as #LuisMendo suggested:
[B,I] = sort(array,'descend');
This gives you the sorted version of your array in the variable B and the indexes of the original position in I sorted from highest to lowest. Thus, B(1:2) gives you the highest two values and I(1:2) gives you their indices in your array.
I'll go for an O(k*n) solution, where k is the number of maximum values you're looking for, rather than O(n log n):
x = [3 2 5 4 7 3 2 6 4];
y = x; %// make a copy of x because we're going to modify it
[~, m(1)] = max(y);
y(m(1)) = -Inf;
[~, m(2)] = max(y);
m =
5 8
This is only practical if k is less than log n. In fact, if k>=3 I would put it in a loops, which may offend the sensibilities of some. ;)
To get the indices of the two largest elements: use the second output of sort to get the sorted indices, and then pick the last two:
x = [3 2 5 4 7 3 2 6 4];
[~, ind] = sort(x);
result = ind(end-1:end);
In this case,
result =
8 5

Vectorization- Matlab

Given a vector
X = [1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3]
I would like to generate a vector such
Y = [1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5]
So far what I have got is
idx = find(diff(X))
Y = [1:idx(1) 1:idx(2)-idx(1) 1:length(X)-idx(2)]
But I was wondering if there is a more elegant(robust) solution?
One approach with diff, find & cumsum for a generic case -
%// Initialize array of 1s with the same size as input array and an
%// intention of using cumsum on it after placing "appropriate" values
%// at "strategic" places for getting the final output.
out = ones(size(X))
%// Find starting indices of each "group", except the first group, and
%// by group here we mean run of identical numbers.
idx = find(diff(X))+1
%// Place differentiated and subtracted values of indices at starting locations
out(idx) = 1-diff([1 idx])
%// Perform cumulative summation for the final output
Y = cumsum(out)
Sample run -
X =
1 1 1 1 2 2 3 3 3 3 3 4 4 5
Y =
1 2 3 4 1 2 1 2 3 4 5 1 2 1
Just for fun, but customary bsxfun based alternative solution -
%// Logical mask with each column of ones for presence of each group elements
mask = bsxfun(#eq,X(:),unique(X(:).')) %//'
%// Cumulative summation along columns and use masked values for final output
vals = cumsum(mask,1)
Y = vals(mask)
Here's another approach:
Y = sum(triu(bsxfun(#eq, X, X.')), 1);
This works as follows:
Compare each element with all others (bsxfun(...)).
Keep only comparisons with current or previous elements (triu(...)).
Count, for each element, how many comparisons are true (sum(..., 1)); that is, how many elements, up to and including the current one, are equal to the current one.
Another method is using the function unique
like this:
[unqX ind Xout] = unique(X)
Y = [ind(1):ind(2) 1:ind(3)-ind(2) 1:length(X)-ind(3)]
Whether this is more elegant is up to you.
A more robust method will be:
[unqX ind Xout] = unique(X)
for ii = 1:length(unqX)-1
Y(ind(ii):ind(ii+1)-1) = 1:(ind(ii+1)-ind(ii));
end

Find 10 most repeated elements in a vector in MATLAB

I'm suppose to find 10 most repeated elements in a vector with n elements,
(the elements are from 1-100)
does anyone know how to do that?
I know how to find the one that is most repeated element in a vector but I don't know how to find 10 most repeated elements with n being unknown.
a = randi(10,1,100);
y = hist(a,1:max(a));
[~,ind] = sort(y,'descend');
out = ind(1:10);
for number of occurrences use y(ind(1:10)).
I had some doubts so I tested it many times, it seems to work.
You can use unique for that case. In my example, I have 4 numbers and I want to grep the 2 with the most occurances.
A = [1 1 3 3 1 1 2 2 1 1 1 2 3 3 3 4 4 4 4];
B = sort(A); % Required for the usage of unique below
[~,i1] = unique(B,'first');
[val,i2] = unique(B,'last');
[~,pos] = sort(i2-i1,'descend');
val(pos(1:2))
1 3
Replace val(pos(1:2)) by val(pos(1:10)) in your case to get the 10 most values. The get the number of elements you can use i1 and i2.
num = i2-i1+1;
num(1:2)
ans =
7 3
Since you already know how to find the most repeated element, you could use the following algorithm:
Find the most repeated element of the vector
Remove the most repeated element from the vector
Repeat the process on the new vector to find the 2nd most repeated element
Continue until you have the 10 most repeated elements
The code would look something like:
count = 0;
values = [];
while count < 10
r = Mode(Vector);
values = [values r]; % store most repeated values
Vector = Vector(find(Vector~=r));
count = count + 1;
end
Not efficient, but it'll get the job done

Resources