Reverse lookup with non-unique values - arrays

What I'm trying to do
I have an array of numbers:
>> A = [2 2 2 2 1 3 4 4];
And I want to find the array indices where each number can be found:
>> B = arrayfun(#(x) {find(A==x)}, 1:4);
In other words, this B should tell me:
>> for ii=1:4, fprintf('Item %d in location %s\n',ii,num2str(B{ii})); end
Item 1 in location 5
Item 2 in location 1 2 3 4
Item 3 in location 6
Item 4 in location 7 8
It's like the 2nd output argument of unique, but instead of the first (or last) occurrence, I want all the occurrences. I think this is called a reverse lookup (where the original key is the array index), but please correct me if I'm wrong.
How can I do it faster?
What I have above gives the correct answer, but it scales terribly with the number of unique values. For a real problem (where A has 10M elements with 100k unique values), even this stupid for loop is 100x faster:
>> B = cell(max(A),1);
>> for ii=1:numel(A), B{A(ii)}(end+1)=ii; end
But I feel like this can't possibly be the best way to do it.
We can assume that A contains only integers from 1 to the max (because if it doesn't, I can always pass it through unique to make it so).

That's a simple task for accumarray:
out = accumarray(A(:),(1:numel(A)).',[],#(x) {x}) %'
out{1} = 5
out{2} = 3 4 2 1
out{3} = 6
out{4} = 8 7
However accumarray suffers from not being stable (in the sense of unique's feature), so you might want to have a look here for a stable version of accumarray, if that's a problem.
Above solution also assumes A to be filled with integers, preferably with no gaps in between. If that is not the case, there is no way around a call of unique in advance:
A = [2.1 2.1 2.1 2.1 1.1 3.1 4.1 4.1];
[~,~,subs] = unique(A)
out = accumarray(subs(:),(1:numel(A)).',[],#(x) {x})
To sum up, the most generic solution, working with floats and returning a sorted output could be:
[~,~,subs] = unique(A)
[subs(:,end:-1:1), I] = sortrows(subs(:,end:-1:1)); %// optional
vals = 1:numel(A);
vals = vals(I); %// optional
out = accumarray(subs, vals , [],#(x) {x});
out{1} = 5
out{2} = 1 2 3 4
out{3} = 6
out{4} = 7 8
Benchmark
function [t] = bench()
%// data
a = rand(100);
b = repmat(a,100);
A = b(randperm(10000));
%// functions to compare
fcns = {
#() thewaywewalk(A(:).');
#() cst(A(:).');
};
% timeit
t = zeros(2,1);
for ii = 1:100;
t = t + cellfun(#timeit, fcns);
end
format long
end
function out = thewaywewalk(A)
[~,~,subs] = unique(A);
[subs(:,end:-1:1), I] = sortrows(subs(:,end:-1:1));
idx = 1:numel(A);
out = accumarray(subs, idx(I), [],#(x) {x});
end
function out = cst(A)
[B, IX] = sort(A);
out = mat2cell(IX, 1, diff(find(diff([-Inf,B,Inf])~=0)));
end
0.444075509687511 %// thewaywewalk
0.221888202987325 %// CST-Link
Surprisingly the version with stable accumarray is faster than the unstable one, due to the fact that Matlab prefers sorted arrays to work on.

This solution should work in O(N*log(N)) due sorting, but is quite memory intensive (requires 3x the amount of input memory):
[U, X] = sort(A);
B = mat2cell(X, 1, diff(find(diff([Inf,U,-Inf])~=0)));
I am curious about the performance though.

Related

Generate a matrix of combinations (permutation) without repetition (array exceeds maximum array size preference)

I am trying to generate a matrix, that has all unique combinations of [0 0 1 1], I wrote this code for this:
v1 = [0 0 1 1];
M1 = unique(perms([0 0 1 1]),'rows');
• This isn't ideal, because perms() is seeing each vector element as unique and doing:
4! = 4 * 3 * 2 * 1 = 24 combinations.
• With unique() I tried to delete all the repetitive entries so I end up with the combination matrix M1 →
only [4!/ 2! * (4-2)!] = 6 combinations!
Now, when I try to do something very simple like:
n = 15;
i = 1;
v1 = [zeros(1,n-i) ones(1,i)];
M = unique(perms(vec_1),'rows');
• Instead of getting [15!/ 1! * (15-1)!] = 15 combinations, the perms() function is trying to do
15! = 1.3077e+12 combinations and it's interrupted.
• How would you go about doing in a much better way? Thanks in advance!
You can use nchoosek to return the indicies which should be 1, I think in your heart you knew this must be possible because you were using the definition of nchoosek to determine the expected final number of permutations! So we can use:
idx = nchoosek( 1:N, k );
Where N is the number of elements in your array v1, and k is the number of elements which have the value 1. Then it's simply a case of creating the zeros array and populating the ones.
v1 = [0, 0, 1, 1];
N = numel(v1); % number of elements in array
k = nnz(v1); % number of non-zero elements in array
colidx = nchoosek( 1:N, k ); % column index for ones
rowidx = repmat( 1:size(colidx,1), k, 1 ).'; % row index for ones
M = zeros( size(colidx,1), N ); % create output
M( rowidx(:) + size(M,1) * (colidx(:)-1) ) = 1;
This works for both of your examples without the need for a huge intermediate matrix.
Aside: since you'd have the indicies using this approach, you could instead create a sparse matrix, but whether that's a good idea or not would depend what you're doing after this point.

Change diagonals of an array of matrices

I have an application with an array of matrices. I have to manipulate the diagonals several times. The other elements are unchanged. I want to do things like:
for j=1:nj
for i=1:n
g(i,i,j) = gd(i,j)
end
end
I have seen how to do this with a single matrix using logical(eye(n)) as a single index, but this does not work with an array of matrices. Surely there is a way around this problem. Thanks
Use a linear index as follows:
g = rand(3,3,2); % example data
gd = [1 4; 2 5; 3 6]; % example data. Each column will go to a diagonal
s = size(g); % size of g
ind = bsxfun(#plus, 1:s(1)+1:s(1)*s(2), (0:s(3)-1).'*s(1)*s(2)); % linear index
g(ind) = gd.'; % write values
Result:
>> g
g(:,:,1) =
1.000000000000000 0.483437118939645 0.814179952862505
0.154841697368116 2.000000000000000 0.989922194103104
0.195709075365218 0.356349047562417 3.000000000000000
g(:,:,2) =
4.000000000000000 0.585604389346560 0.279862618046844
0.802492555607293 5.000000000000000 0.610960767605581
0.272602365429990 0.551583664885735 6.000000000000000
Based on Luis Mendo's answer, a version that may perhaps be more easy to modify depending on one's specific purposes. No doubt his version will be more computationally efficient though.
g = rand(3,3,2); % example data
gd = [1 4; 2 5; 3 6]; % example data. Each column will go to a diagonal
sz = size(g); % Get size of data
sub = find(eye(sz(1))); % Find indices for 2d matrix
% Add number depending on location in third dimension.
sub = repmat(sub,sz(3),1); %
dim3 = repmat(0:sz(1)^2:prod(sz)-1, sz(1),1);
idx = sub + dim3(:);
% Replace elements.
g(idx) = gd;
Are we already playing code golf yet? Another slightly smaller and more readable solution
g = rand(3,3,2);
gd = [1 4; 2 5; 3 6];
s = size(g);
g(find(repmat(eye(s(1)),1,1,s(3))))=gd(:)
g =
ans(:,:,1) =
1.00000 0.35565 0.69742
0.85690 2.00000 0.71275
0.87536 0.13130 3.00000
ans(:,:,2) =
4.00000 0.63031 0.32666
0.33063 5.00000 0.28597
0.80829 0.52401 6.00000

Given two arrays A and B, how to get B values which are the closest to A

Suppose I have two arrays ordered in an ascending order, i.e.:
A = [1 5 7], B = [1 2 3 6 9 10]
I would like to create from B a new vector B', which contains only the closest values to A values (one for each).
I also need the indexes. So, in my example I would like to get:
B' = [1 6 9], Idx = [1 4 5]
Note that the third value is 9. Indeed 6 is closer to 7 but it is already 'taken' since it is close to 4.
Any idea for a suitable code?
Note: my true arrays are much larger and contain real (not int) values
Also, it is given that B is longer then A
Thanks!
Assuming you want to minimize the overall discrepancies between elements of A and matched elements in B, the problem can be written as an assignment problem of assigning to every row (element of A) a column (element of B) given a cost matrix C. The Hungarian (or Munkres') algorithm solves the assignment problem.
I assume that you want to minimize cumulative squared distance between A and matched elements in B, and use the function [assignment,cost] = munkres(costMat) by Yi Cao from https://www.mathworks.com/matlabcentral/fileexchange/20652-hungarian-algorithm-for-linear-assignment-problems--v2-3-:
A = [1 5 7];
B = [1 2 3 6 9 10];
[Bprime,matches] = matching(A,B)
function [Bprime,matches] = matching(A,B)
C = (repmat(A',1,length(B)) - repmat(B,length(A),1)).^2;
[matches,~] = munkres(C);
Bprime = B(matches);
end
Assuming instead you want to find matches recursively, as suggested by your question, you could either walk through A, for each element in A find the closest remaining element in B and discard it (sortedmatching below); or you could iteratively form and discard the distance-minimizing match between remaining elements in A and B until all elements in A are matched (greedymatching):
A = [1 5 7];
B = [1 2 3 6 9 10];
[~,~,Bprime,matches] = sortedmatching(A,B,[],[])
[~,~,Bprime,matches] = greedymatching(A,B,[],[])
function [A,B,Bprime,matches] = sortedmatching(A,B,Bprime,matches)
[~,ix] = min((A(1) - B).^2);
matches = [matches ix];
Bprime = [Bprime B(ix)];
A = A(2:end);
B(ix) = Inf;
if(not(isempty(A)))
[A,B,Bprime,matches] = sortedmatching(A,B,Bprime,matches);
end
end
function [A,B,Bprime,matches] = greedymatching(A,B,Bprime,matches)
C = (repmat(A',1,length(B)) - repmat(B,length(A),1)).^2;
[minrows,ixrows] = min(C);
[~,ixcol] = min(minrows);
ixrow = ixrows(ixcol);
matches(ixrow) = ixcol;
Bprime(ixrow) = B(ixcol);
A(ixrow) = -Inf;
B(ixcol) = Inf;
if(max(A) > -Inf)
[A,B,Bprime,matches] = greedymatching(A,B,Bprime,matches);
end
end
While producing the same results in your example, all three methods potentially give different answers on the same data.
Normally I would run screaming from for and while loops in Matlab, but in this case I cannot see how the solution could be vectorized. At least it is O(N) (or near enough, depending on how many equally-close matches to each A(i) there are in B). It would be pretty simple to code the following in C and compile it into a mex file, to make it run at optimal speed, but here's a pure-Matlab solution:
function [out, ind] = greedy_nearest(A, B)
if nargin < 1, A = [1 5 7]; end
if nargin < 2, B = [1 2 3 6 9 10]; end
ind = A * 0;
walk = 1;
for i = 1:numel(A)
match = 0;
lastDelta = inf;
while walk < numel(B)
delta = abs(B(walk) - A(i));
if delta < lastDelta, match = walk; end
if delta > lastDelta, break, end
lastDelta = delta;
walk = walk + 1;
end
ind(i) = match;
walk = match + 1;
end
out = B(ind);
You could first get the absolute distance from each value in A to each value in B, sort them and then get the first unique value to a sequence when looking down in each column.
% Get distance from each value in A to each value in B
[~, minIdx] = sort(abs(bsxfun(#minus, A,B.')));
% Get first unique sequence looking down each column
idx = zeros(size(A));
for iCol = 1:numel(A)
for iRow = 1:iCol
if ~ismember(idx, minIdx(iRow,iCol))
idx(iCol) = minIdx(iRow,iCol);
break
end
end
end
The result when applying idx to B
>> idx
1 4 5
>> B(idx)
1 6 9

Return matrices of row and column indices

I am sure this question must be answered somewhere else but I can't seem to find the answer.
Given a matrix M, what is the most efficient/succinct way to return two matrices respectively containing the row and column indices of the elements of M.
E.g.
M = [1 5 ; NaN 2]
and I want
MRow = [1 1; 2 2]
MCol = [1 2; 1 2]
One way would be to do
[MRow, MCol] = find(ones(size(M)))
MRow = reshape(MRow, size(M))
MCol = reshape(MCol, size(M))
But this does not seem particular succinct nor efficient.
This essentially amounts to building a regular grid over possible values of row and column indices. It can be achieved using meshgrid, which is more effective than using find as it avoids building the matrix of ones and trying to "find" a result that is essentially already known.
M = [1 5 ; NaN 2];
[nRows, nCols] = size(M);
[MCol, MRow] = meshgrid(1:nCols, 1:nRows);
Use meshgrid:
[mcol, mrow] = meshgrid(1:size(M,2),1:size(M,1))

matlab maximum of array with unknown dimension

I would like to compute the maximum and, more importantly, its coordinates of an N-by-N...by-N array, without specifying its dimensions.
For example, let's take:
A = [2 3];
B = [2 3; 3 4];
The function (lets call it MAXI) should return the following values for matrix A:
[fmax, coor] = MAXI(A)
fmax =
3
coor =
2
and for matrix B:
[fmax, coor] = MAXI(B)
fmax =
4
coor=
2 2
The main problem is not to develop a code that works for one class in particular, but to develop a code that as quickly as possible works for any input (with higher dimensions).
To find the absolute maximum, you'll have to convert your input matrix into a column vector first and find the linear index of the greatest element, and then convert it to the coordinates with ind2sub. This can be a little bit tricky though, because ind2sub requires specifying a known number of output variables. For that purpose we can employ cell arrays and comma-separated lists, like so:
[fmax, coor] = max(A(:));
if ismatrix(A)
C = cell(1:ndims(A));
[C{:}] = ind2sub(size(A), coor);
coor = cell2mat(C);
end
EDIT: I've added an additional if statement that checks if the input is a matrix or a vector, and in case of the latter it returns the linear index itself as is.
In a function, it looks like so:
function [fmax, coor] = maxi(x)
[fmax, coor] = max(A(:));
if ismatrix(A)
C = cell(1:ndims(A));
[C{:}] = ind2sub(size(A), coor);
coor = cell2mat(C);
end
Example
A = [2 3; 3 4];
[fmax, coor] = maxi(A)
fmax =
4
coor =
2 2

Resources