I have two arrays of unequal length. Say A of a longer length and B of a shorter length. I wish to remove all elements from both A and B which meet a criteria - if there is a value in A which is between +/- 0.1 of a value in B then remove this element from both A and B. Remove only as many values from A as from B - ie., there can be non unique elements. If there are multiple elements that can be equivalently removed from A & B, remove the smaller element of B first and the larger element of A first.
Example:
A = [ 1 2 3 3 4 ]
B = [ 3.1, 2.9, 5]
Then 3 and 3 is removed from A and 3.1 and 2.9 is removed from B.
How do I do this in MATLAB?
You can use ismembertol:
A = [ 1 2 3 3 4 ];
B = [ 3.1, 2.9, 5];
Aind = ismembertol(A,B,0.1);
Bind = ismembertol(B,A,0.1);
A(Aind) = [];
B(Bind) = [];
ismembertol perform a comparison using a tolerance (0.1 in this case)
A similar result can also be achieve with:
lim = 0.1+10^-10 % +10^-10 so we avoid the floating point precision error.
Aind = any(abs(A-B.')<=lim,1)
Bind = any(abs(A-B.')<=lim,2)
A(Aind) = []
B(Bind) = []
Noticed that this second solution is not memory efficient. It is only suited for small array since it create a length(A)*length(B) matrice.
Related
I want to find multiple elements of a value in an array in Matlab code.
I found the function mod and find, but these return the indices of elements and
not the elements. Moreover, I wrote the following code:
x=[1 2 3 4];
if (mod(x,2)==0)
a=x;
end
but this does not work. How can I solve this problem?
Looks like you what to find all multiples of 2 (or any number), you can achieve this using :
a = x( mod(x,2) == 0 ) ;
When you do a = x, x is still x=[1 2 3 4] regardless if (mod(x,2)==0) is true or false;
you can assign a value to (mod(x,2)==0), e.g. val = (mod(x,2)==0), then append/add this value to a new array.
Given a vector numberList = [ 1, 2, 3, 4, 5, 6]; and a number number = 2; you can find indices (position in a vector) of the numbers in the numberList that are a multiple of number using indices = find(mod(numberList, number) ==0);.
If necessary you may display a list of this multiples calling: multiples = numberList(indices).
multiples =
2 4 6
What I'm trying to do
I have an array of numbers:
>> A = [2 2 2 2 1 3 4 4];
And I want to find the array indices where each number can be found:
>> B = arrayfun(#(x) {find(A==x)}, 1:4);
In other words, this B should tell me:
>> for ii=1:4, fprintf('Item %d in location %s\n',ii,num2str(B{ii})); end
Item 1 in location 5
Item 2 in location 1 2 3 4
Item 3 in location 6
Item 4 in location 7 8
It's like the 2nd output argument of unique, but instead of the first (or last) occurrence, I want all the occurrences. I think this is called a reverse lookup (where the original key is the array index), but please correct me if I'm wrong.
How can I do it faster?
What I have above gives the correct answer, but it scales terribly with the number of unique values. For a real problem (where A has 10M elements with 100k unique values), even this stupid for loop is 100x faster:
>> B = cell(max(A),1);
>> for ii=1:numel(A), B{A(ii)}(end+1)=ii; end
But I feel like this can't possibly be the best way to do it.
We can assume that A contains only integers from 1 to the max (because if it doesn't, I can always pass it through unique to make it so).
That's a simple task for accumarray:
out = accumarray(A(:),(1:numel(A)).',[],#(x) {x}) %'
out{1} = 5
out{2} = 3 4 2 1
out{3} = 6
out{4} = 8 7
However accumarray suffers from not being stable (in the sense of unique's feature), so you might want to have a look here for a stable version of accumarray, if that's a problem.
Above solution also assumes A to be filled with integers, preferably with no gaps in between. If that is not the case, there is no way around a call of unique in advance:
A = [2.1 2.1 2.1 2.1 1.1 3.1 4.1 4.1];
[~,~,subs] = unique(A)
out = accumarray(subs(:),(1:numel(A)).',[],#(x) {x})
To sum up, the most generic solution, working with floats and returning a sorted output could be:
[~,~,subs] = unique(A)
[subs(:,end:-1:1), I] = sortrows(subs(:,end:-1:1)); %// optional
vals = 1:numel(A);
vals = vals(I); %// optional
out = accumarray(subs, vals , [],#(x) {x});
out{1} = 5
out{2} = 1 2 3 4
out{3} = 6
out{4} = 7 8
Benchmark
function [t] = bench()
%// data
a = rand(100);
b = repmat(a,100);
A = b(randperm(10000));
%// functions to compare
fcns = {
#() thewaywewalk(A(:).');
#() cst(A(:).');
};
% timeit
t = zeros(2,1);
for ii = 1:100;
t = t + cellfun(#timeit, fcns);
end
format long
end
function out = thewaywewalk(A)
[~,~,subs] = unique(A);
[subs(:,end:-1:1), I] = sortrows(subs(:,end:-1:1));
idx = 1:numel(A);
out = accumarray(subs, idx(I), [],#(x) {x});
end
function out = cst(A)
[B, IX] = sort(A);
out = mat2cell(IX, 1, diff(find(diff([-Inf,B,Inf])~=0)));
end
0.444075509687511 %// thewaywewalk
0.221888202987325 %// CST-Link
Surprisingly the version with stable accumarray is faster than the unstable one, due to the fact that Matlab prefers sorted arrays to work on.
This solution should work in O(N*log(N)) due sorting, but is quite memory intensive (requires 3x the amount of input memory):
[U, X] = sort(A);
B = mat2cell(X, 1, diff(find(diff([Inf,U,-Inf])~=0)));
I am curious about the performance though.
I am sure this question must be answered somewhere else but I can't seem to find the answer.
Given a matrix M, what is the most efficient/succinct way to return two matrices respectively containing the row and column indices of the elements of M.
E.g.
M = [1 5 ; NaN 2]
and I want
MRow = [1 1; 2 2]
MCol = [1 2; 1 2]
One way would be to do
[MRow, MCol] = find(ones(size(M)))
MRow = reshape(MRow, size(M))
MCol = reshape(MCol, size(M))
But this does not seem particular succinct nor efficient.
This essentially amounts to building a regular grid over possible values of row and column indices. It can be achieved using meshgrid, which is more effective than using find as it avoids building the matrix of ones and trying to "find" a result that is essentially already known.
M = [1 5 ; NaN 2];
[nRows, nCols] = size(M);
[MCol, MRow] = meshgrid(1:nCols, 1:nRows);
Use meshgrid:
[mcol, mrow] = meshgrid(1:size(M,2),1:size(M,1))
I have an array in Matlab. I numbered every entry in array with natural number. So I formed equivalence relation in array.
For example,
array = [1 2 3 5 6 7]
classes = [1 2 1 1 3 3].
I want to get cell array: i-th cell array's position is connected with i-th entry of initial array and shows, which elements are in the one class with this entry. For the example above, I would get:
{[1 3 5], [2], [1 3 5], [1 3 5], [6 7], [6 7]}
It can be done easily with for-loop, but is there any other solution? It will be good if it works faster than O(n^2), where n is the size of initial array.
Edit.
Problem will be solved, if I know the approach to split sorted array into cells with indeces of equal elements by O(n).
array = [1 1 1 2 3 3]
groups = {[1 2 3], [4], [5 6]}
Not sure about complexity, but accumarray with cell output is useful for splitting up the array based on unique values of the classes:
data = sortrows([classes; array].',1) %' stable w.r.t. array
arrayPieces = accumarray(data(:,1),data(:,2)',[],#(x){x.'})
classElements = arrayPieces(classes).'
Regarding sorted array splitting into cells of indeces:
>> array = [1 1 1 2 3 3]
>> arrayinds = accumarray(array',1:numel(array),[],#(x){x'})' %' transpose for rows
arrayinds =
[1x3 double] [4] [1x2 double]
>> arrayinds{:}
ans =
1 2 3
ans =
4
ans =
5 6
I don't know how to do this without for-loops entirely, but you can use a combination of sort, diff, and find to organize and partition the equivalence class identifiers. That'll give you a mostly vectorized solution, where the M-code level for-loop is O(n) where n is the number of classes, not the length of the whole input array. This should be pretty fast in practice.
Here's a rough example using some index munging. Be careful; there's probably an off-by-one edge case bug in there somewhere since I just banged this out.
function [eqVals,eqIx] = equivsets(a,x)
%EQUIVSETS Find indexes of equivalent values
[b,ix] = sort(x);
ixEdges = find(diff(b)); % identifies partitions between equiv classes
ix2 = [0 ixEdges numel(ix)];
eqVals = cell([1 numel(ix2)-1]);
eqIx = cell([1 numel(ix2)-1]);
% Map back to original input indexes and values
for i = 1:numel(ix2)-1
eqIx{i} = ix((ix2(i)+1):ix2(i+1));
eqVals{i} = a(eqIx{i});
end
I included the indexes in the output because they're often more useful than the values themselves. You'd call it like this.
% Get indexes of occurrences of each class
equivs = equivsets(array, classes)
% You can expand that to get equivalences for each input element
equivsByValue = equivs(classes)
It's a lot more efficient to build the lists for each class first and then expand them out to match the input indexes. Not only do you have to do the work just once, but when you use the b = a(ix) to expand a small cell array to a larger one, Matlab's copy-on-write optimization will end up reusing the memory for the underlying numeric mxArrays so you get a more compact representation in memory.
This transformation pops up a lot when working with unique() or databases. For decision support systems and data warehouse style things I've worked with, it happens all over the place. I wish it were built in to Matlab. (And maybe it's been added to one of the db or timeseries toolboxes in recent years; I'm a few versions behind.)
Realistically, if performance of this is critical for your code, you might also look at dropping down to Java or C MEX functions and implementing it there. But if your data sets are low cardinality - that is, have a small number of classes/distinct values, like numel(unique(classes)) / numel(array) tends to be less than 0.1 or so - the M-code implementation will probably be just fine.
For the second question:
array = [1 1 1 2 3 3]; %// example data
Use diff to find the end of each run of equal values, and from that build the groups:
ind = [0 find(diff([array NaN])~=0)];
groups = arrayfun(#(n) ind(n)+1:ind(n+1), 1:numel(ind)-1, 'uni', 0);
Same approach using unique:
[~, ind] = unique(array);
ind = [0 ind];
groups = arrayfun(#(n) ind(n)+1:ind(n+1), 1:numel(ind)-1, 'uni', 0);
I haven't tested if the complexity is O(n), though.
I would like to compute the maximum and, more importantly, its coordinates of an N-by-N...by-N array, without specifying its dimensions.
For example, let's take:
A = [2 3];
B = [2 3; 3 4];
The function (lets call it MAXI) should return the following values for matrix A:
[fmax, coor] = MAXI(A)
fmax =
3
coor =
2
and for matrix B:
[fmax, coor] = MAXI(B)
fmax =
4
coor=
2 2
The main problem is not to develop a code that works for one class in particular, but to develop a code that as quickly as possible works for any input (with higher dimensions).
To find the absolute maximum, you'll have to convert your input matrix into a column vector first and find the linear index of the greatest element, and then convert it to the coordinates with ind2sub. This can be a little bit tricky though, because ind2sub requires specifying a known number of output variables. For that purpose we can employ cell arrays and comma-separated lists, like so:
[fmax, coor] = max(A(:));
if ismatrix(A)
C = cell(1:ndims(A));
[C{:}] = ind2sub(size(A), coor);
coor = cell2mat(C);
end
EDIT: I've added an additional if statement that checks if the input is a matrix or a vector, and in case of the latter it returns the linear index itself as is.
In a function, it looks like so:
function [fmax, coor] = maxi(x)
[fmax, coor] = max(A(:));
if ismatrix(A)
C = cell(1:ndims(A));
[C{:}] = ind2sub(size(A), coor);
coor = cell2mat(C);
end
Example
A = [2 3; 3 4];
[fmax, coor] = maxi(A)
fmax =
4
coor =
2 2