I have two matrices, A and B. (B is continuous like 1:n)
I need to find all the occurrences of each individual row of B in A, and store those row indices accordingly in cell array C. See below for an example.
A = [3,4,5;1,3,5;1,4,3;4,2,1]
B = [1;2;3;4;5]
Thus,
C = {[2,3,4];[4];[1,2,3];[1,3,4];[1,2]}
Note C does not need to be in a cell array for my application. I only suggest it because the row vectors of C are of unequal length. If you can suggest a work-around, this is fine too.
I've tried using a loop running ismember for each row of B, but this is too slow when the matrices A and B are huge, with around a million entries. Vectorized code is appreciated.
(To give you context, the purpose of this is to identify, in a mesh, those faces that are attached to a single vertex. Note I cannot use the function edgeattachments because my data are not of the form "TR" in triangulation representation. All I have is a list of faces and list of vertices.)
Well, the best answer for this would require knowledge of how A is filled. If A is sparse, that is, if it has few columns values and B is quite large, then I think the best way for memory saving may be using a sparse matrix instead of a cell.
% No fancy stuff, just fast and furious
bMax = numel(B);
nRows = size(A,1);
cLogical = sparse(nRows,bMax);
for curRow = 1:nRows
curIdx = A(curRow,:);
cLogical(curRow,curIdx) = 1;
end
Answer:
cLogical =
(2,1) 1
(3,1) 1
(4,1) 1
(4,2) 1
(1,3) 1
(2,3) 1
(3,3) 1
(1,4) 1
(3,4) 1
(4,4) 1
(1,5) 1
(2,5) 1
How to read the answer. For each column the rows show the indexes that the column index appears in A. That is 1 appears in rows [2 3 4], 2 appear in row [4], 3 rows [1 2 3], 4 row [1 3 4], 5 in row [1 2].
Then you can use cLogical instead of a cell as an indexing matrix in the future for your needs.
Another way would be to allocate C with the expected value for how many times an index should appear in C.
% Fancier solution using some assumed knowledge of A
bMax = numel(B);
nRows = size(A,1);
nColumns = size(A,2);
% Pre-allocating with the expected value, an attempt to reduce re-allocations.
% tic; for rep=1:10000; C = mat2cell(zeros(bMax,nColumns),ones(1,bMax),nColumns); end; toc
% Elapsed time is 1.364558 seconds.
% tic; for rep=1:10000; C = repmat({zeros(1,nColumns)},bMax,1); end; toc
% Elapsed time is 0.606266 seconds.
% So we keep the not fancy repmat solution
C = repmat({zeros(1,nColumns)},bMax,1);
for curRow = 1:nRows
curIdxMsk = A(curRow,:);
for curCol = 1:nColumns
curIdx = curIdxMsk(curCol);
fillIdx = ~C{curIdx};
if any(fillIdx)
fillIdx = find(fillIdx,1);
else
fillIdx = numel(fillIdx)+1;
end
C{curIdx}(fillIdx) = curRow;
end
end
% Squeeze empty indexes:
for curRow = 1:bMax
C{curRow}(~C{curRow}) = [];
end
Answer:
>> C{:}
ans =
2 3 4
ans =
4
ans =
1 2 3
ans =
1 3 4
ans =
1 2
Which solution will performs best? You do a performance test in your code because it depends on how big is A, bMax, the memory size of your computer and so on. Yet, I'm still curious with solutions other people can do for this x). I liked chappjc's solution although it has the cons that he has pointed out.
For the given example (10k times):
Solution 1: Elapsed time is 0.516647 seconds.
Solution 2: Elapsed time is 4.201409 seconds (seems that solution 2 is a bad idea hahaha, but since it was created to the specific issue of A having many rows it has to be tested in those conditions).
chappjc' solution: Elapsed time is 2.405341 seconds.
We can do it without making any assumptions about B. Try this use of bsxfun and mat2cell:
M = squeeze(any(bsxfun(#eq,A,permute(B,[3 2 1])),2)); % 4x3x1 #eq 1x1x5 => 4x3x5
R = sum(M); % 4x5 -> 1x5
[ii,jj] = find(M);
C = mat2cell(ii,R)
The cells in C above will be column vectors rather than rows as in your example. To make the cells contain row vectors, use C = mat2cell(ii',1,R)' instead.
My only concern is that mat2cell could be slow for millions of values of R, but if you want your output in a cell, I'm not sure how much better you can do. EDIT: If you can deal with a sparse matrix like in Werner's first solution with the loop, replace the last line of the above with the following:
>> Cs = sparse(ii,jj,1)
Cs =
(2,1) 1
(3,1) 1
(4,1) 1
(4,2) 1
(1,3) 1
(2,3) 1
(3,3) 1
(1,4) 1
(3,4) 1
(4,4) 1
(1,5) 1
(2,5) 1
Unfortunately, bsxfun will probably run out of memory if both size(A,1) and numel(B) are large! You may have to loop over the elements of A or B if memory becomes an issue. Here's one way to do it by looping over your vertexes in B:
for i=1:numel(B), C{i} = find(any(A==B(i),2)); end
Yup, that easy. Cell array growing is extremely fast in MATLAB as it similar to a sequence container that stores contiguous references to the data, rather than keeping the data itself contiguous. Perhaps ismember was the bottleneck in your test.
Related
I have a data set that consists of two 1800 x 900 x 3 x 3 arrays, which should each be interpreted as a 1800x900 array of 3x3 matrices. As part of the analysis, at one point I need to create another such array by, at each point of the 1800x900 array, multiplying the corresponding 3x3 matrices together.
There seem to be two ways to do this that I can think of. The obvious way is
C = zeros(size(A))
for i = 1:900
for j = 1:1800
C(i,j,:,:) = A(i,j,:,:)*B(i,j,:,:)
end
end
But that's quite a long loop and doesn't really take advantage of MATLAB's vectorization. The other way is
C = zeros(size(A))
for i = 1:3
for j = 1:3
for k = 1:3
C(:,:,i,j) = C(:,:,i,j) + A(:,:,i,k).*B(:,:,k,j)
end
end
end
where the large dimensions are getting vectorized and I'm basically using the for loops to implement the Einstein summation convention. This seems really inelegant, though, which makes me think there should be a better way. Is there?
Perfect job for bsxfun with permute:
C = permute(sum(bsxfun(#times, A, permute(B, [1 2 5 3 4])), 4), [1 2 3 5 4]);
In R2016b onwards you can avoid bsxfun thanks to implicit singleton expansion:
C = permute(sum(A .* permute(B, [1 2 5 3 4]), 4), [1 2 3 5 4]);
I have three matrices:
Values = [200x7] doubles
numOfStrings = [82 78 75 73 72 71 70] %(for example)
numOfColumns = [1 4]
The numOfColumns may contain any set of distinct values from 1 to 7. For example [1 2 3] or [4 7] or [1 5 6 7]. Obviously, biggest numOfColumns that can be is [1 2 3 4 5 6 7].
numOfColumns show the columns I want to get. numOfStrings shows the rows of that columns I need. I.e. in my example I want to get columns 1 and 4. So from the 1 column I want to get the 82 first rows and from the 4th get the 73 first rows.
i.e. if numOfColumns = [1 4] then
myCell{1} = Values( 1:82,1); % Values(numOfStrings(numOfColumn(1)), numOfColumn(1))
myCell{2} = Values( 1:73,4); % Values(numOfStrings(numOfColumn(2)), numOfColumn(2))
P.S. That's not necessary to save it into cell array. If you can offer any another solution Ill be grateful to you.
I'm looking for the fastest way to do this, which is likely to be by avoiding for loops and using vectorization.
I think a lot about sub2ind function. But I can't figure out how to return arrays of the different size! Because myCell{1} - [82x1] and myCell{2} - [73x1]. I suppose I can't use bsxfun or arrayfun.
RESULTS:
Using for loops alike #Will 's answer:
for jj = 1:numel(numOfColumns)
myCell{rowNumber(numOfColumns(jj)),numOfColumns(jj)} = Values( 1:numOfStrings(numOfColumns(jj)),numOfColumns(jj));
end
Elapsed time is 157 seconds
Using arrayfun alike #Yishai E 's answer:
myCell(sub2ind(size(myCell),rowNumber(numOfColumns),numOfColumns)) = arrayfun( #(nOC) Values( 1:numOfStrings(nOC),nOC), numOfColumns, 'UniformOutput', false);
Elapsed time is 179 seconds
Using bsxfun alike #rahnema1 's answer:
idx = bsxfun(#ge,numOfStrings , (1:200).');
extracted_values = Values (idx);
tempCell = mat2cell(extracted_values,numOfStrings);
myCell(sub2ind(size(myCell),rowNumber(numOfColumns),numOfColumns)) = myCell(numOfColumns)';
Elapsed time is 204 seconds
SO, I got a lot of working answers, and some of them are vectorized as I asked, but for loops still fastest!
This should solve your problem, using arrayfun, which "vectorizes" the application of the indexing function. Not really, but it doesn't call the interpreter for each entry from numOfColumns. Interestingly enough, this is slower than the non-vectorized code in the other answer! (for 1e5 entries, 0.95 seconds vs. 0.23 seconds...)
arrayfun(#(nOC)Values(1:numOfStrings(nOC), nOC), numOfColumns, 'UniformOutput', false)
% Get number of elements in NUMOFCOLUMNS
n = numel(numOfColumns);
% Set up output
myCell = cell(1,n);
% Loop through all NUMOFCOLUMNS values, storing to cell
for i = 1:n
myCell{i} = Values(1:numOfStrings(numOfColumns(i)), numOfColumns(i));
end
Which for your example gives output
myCell =
[82x1 double] [73x1 double]
You can create logical indices for extraction of the desired elements :
idx = bsxfun(#ge,numOfStrings , (1:200).');
that in MATLAB R2016b or Octave (thanks to broadcasting/expansion) can be written as:
idx = numOfStrings >= (1:200).';
extract values:
extracted_values = Values (idx);
then using mat2cell convert data to cell :
myCell = mat2cell(extracted_values,numOfStrings);
all in one line :
myCell = mat2cell(Values (numOfStrings >= (1:200).'), numOfStrings);
If you want to use different numOfColumns with different sizes to extract elements of the cell you can each time do this:
result = myCell(numOfColumns);
If both numOfStrings and numOfColumns change and you need to compute the result once do this:
%convert numOfColumns to logical index:
numcols_logical = false(1,7);
numcols_logical(numOfColumns) = true;
extracted_values = Values ((numOfStrings .* numcols_logical) >= (1:200).');
if you need cell array
result= mat2cell(extracted_values,numOfStrings(numcols_logical));
This question already has answers here:
Create a zero-filled 2D array with ones at positions indexed by a vector
(4 answers)
Closed 6 years ago.
I have a vector v of size (m,1) whose elements are integers picked from 1:n. I want to create a matrix M of size (m,n) whose elements M(i,j) are 1 if v(i) = j, and are 0 otherwise. I do not want to use loops, and would like to implement this as a simple vector-matrix manipulation only.
So I thought first, to create a matrix with repeated elements
M = v * ones(1,n) % this is a (m,n) matrix of repeated v
For example v=[1,1,3,2]'
m = 4 and n = 3
M =
1 1 1
1 1 1
3 3 3
2 2 2
then I need to create a comparison vector c of size (1,n)
c = 1:n
1 2 3
Then I need to perform a series of logical comparisons
M(1,:)==c % this results in [1,0,0]
.
M(4,:)==c % this results in [0,1,0]
However, I thought it should be possible to perform the last steps of going through each single row in compact matrix notation, but I'm stumped and not knowledgeable enough about indexing.
The end result should be
M =
1 0 0
1 0 0
0 0 1
0 1 0
A very simple call to bsxfun will do the trick:
>> n = 3;
>> v = [1,1,3,2].';
>> M = bsxfun(#eq, v, 1:n)
M =
1 0 0
1 0 0
0 0 1
0 1 0
How the code works is actually quite simple. bsxfun is what is known as the Binary Singleton EXpansion function. What this does is that you provide two arrays / matrices of any size, as long as they are broadcastable. This means that they need to be able to expand in size so that both of them equal in size. In this case, v is your vector of interest and is the first parameter - note that it's transposed. The second parameter is a vector from 1 up to n. What will happen now is the column vector v gets replicated / expands for as many values as there are n and the second vector gets replicated for as many rows as there are in v. We then do an eq / equals operator between these two arrays. This expanded matrix in effect has all 1s in the first column, all 2s in the second column, up until n. By doing an eq between these two matrices, you are in effect determining which values in v are equal to the respective column index.
Here is a detailed time test and breakdown of each function. I placed each implementation into a separate function and I also let n=max(v) so that Luis's first code will work. I used timeit to time each function:
function timing_binary
n = 10000;
v = randi(1000,n,1);
m = numel(v);
function luis_func()
M1 = full(sparse(1:m,v,1));
end
function luis_func2()
%m = numel(v);
%n = 3; %// or compute n automatically as n = max(v);
M2 = zeros(m, n);
M2((1:m).' + (v-1)*m) = 1;
end
function ray_func()
M3 = bsxfun(#eq, v, 1:n);
end
function op_func()
M4= ones(1,m)'*[1:n] == v * ones(1,n);
end
t1 = timeit(#luis_func);
t2 = timeit(#luis_func2);
t3 = timeit(#ray_func);
t4 = timeit(#op_func);
fprintf('Luis Mendo - Sparse: %f\n', t1);
fprintf('Luis Mendo - Indexing: %f\n', t2);
fprintf('rayryeng - bsxfun: %f\n', t3);
fprintf('OP: %f\n', t4);
end
This test assumes n = 10000 and the vector v is a 10000 x 1 vector of randomly distributed integers from 1 up to 1000. BTW, I had to modify Luis's second function so that the indexing will work as the addition requires vectors of compatible dimensions.
Running this code, we get:
>> timing_binary
Luis Mendo - Sparse: 0.015086
Luis Mendo - Indexing: 0.327993
rayryeng - bsxfun: 0.040672
OP: 0.841827
Luis Mendo's sparse code wins (as I expected), followed by bsxfun, followed by indexing and followed by your proposed approach using matrix operations. The timings are in seconds.
Assuming n equals max(v), you can use sparse:
v = [1,1,3,2];
M = full(sparse(1:numel(v),v,1));
What sparse does is build a sparse matrix using the first argument as row indices, the second as column indices, and the third as matrix values. This is then converted into a full matrix with full.
Another approach is to define the matrix containing initially zeros and then use linear indexing to fill in the ones:
v = [1,1,3,2];
m = numel(v);
n = 3; %// or compute n automatically as n = max(v);
M = zeros(m, n);
M((1:m) + (v-1)*m) = 1;
I think I've also found a way to do it, and it would be nice if somebody could tell me which of the methods shown is faster for very large vectors and matrices. The additional method I thought of is the following
M= ones(1,m)'*[1:n] == v * ones(1,n)
I have a 5 by 3 matrix, e.g the following:
A=[1 1 1; 2 2 2; 3 3 3; 4 4 4; 5 5 5]
I run a for loop:
for i = 1:5
AA = A(i)'*A(i);
end
My question is how to store each of the 5 (3 by 3) AA matrices?
Thanks.
You could pre-allocate enough memory to the AA matrix to hold all the results:
[r,c] = size(A); % get the rows and columns of A (r and c respectively)
AA = zeros(c,c,r); % pre-allocate memory to AA for all 5 products
% (so we have 5 3x3 arrays)
Now do almost the same loop as above BUT realize that A(i) in the above code only returns one element whereas you want the full row. So you want the data from row i but all columns which can be represented as 1:3 or just the colon :
for i=1:r
AA(:,:,i) = A(i,:)' * A(i,:);
end
In the above, A(i,:) is the ith row of A and we are setting all rows and columns in the third dimension (i) of AA to the result of the product.
Assuming, as in Geoff's answer, that you mean A(i,:)'*A(i,:) (to get 5 matrices of size 3x3 in your example), you can do it in one line with bsxfun and permute:
AA = bsxfun(#times, permute(A, [3 2 1]), permute(A, [2 3 1]));
(I'm also assuming that your matrices only contain real numbers, as in your example. If by ' you really mean conjugate transpose, you need to add a conj in the above).
I have 2 matrices A (nxm) and B (nxd) and want to multiply element-wise each column of A with a row of B. There are m columns in A and n 1xd vectors in B so the results are m nxd matrices. Then I want to sum(result_i, 1) to get m 1xd vectors, which I want to apply vertcat to get a mxd matrix. I'm doing this operations using for loop and it is slow because n and d are big. How can I vectorize this in matlab to make it faster? Thank you.
EDIT:
You're all right: I was confused by my own question. What I meant by "multiply element-wise each column of A with a row of B" is to multiply n elements of a column in A with the corresponding n rows of B. What I want to do with one column of A is as followed (and I repeat this for m columns of A, then vertcat the C's vector together to get an mxd matrix):
column_of_A =
3
3
1
B =
3 1 3 3
2 2 1 2
1 3 3 3
C = sum(diag(column_of_A)*B, 1)
16 12 15 18
You can vectorize your operation the following way. Note, however, that vectorizing comes at the cost of higher memory usage, so the solution may end up not working for you.
%# multiply nxm A with nx1xd B to create a nxmxd array
tmp = bsxfun(#times,A,permute(B,[1 3 2]));
%# sum and turn into mxd
out = squeeze(sum(tmp,1));
You may want to do everything in one line, which may help the Matlab JIT compiler to save on memory.
EDIT
Here's a way to replace the first line if you don't have bsxfun
[n,m] = size(A);
[n,d] = size(B);
tmp = repmat(A,[1 1 d]) .* repmat(permute(B,[1 3 2]),[1,m,1]);
It's ugly, but as far as I can see, it works. I'm not sure it will be faster than your loop though, plus, it has a large memory overhead. Anyway, here goes:
A_3D = repmat(reshape(A, size(A, 1), 1, size(A, 2)), 1, size(B, 2));
B_3D = repmat(B, [ 1 1 size(A, 2)]);
result_3D = sum(A_3D .* B_3D, 1);
result = reshape(result_3D, size(A, 2), size(B, 2))
What it does is: make A into a 3D matrix of size n x 1 x m, so one column in each index of the 3rd dimension. Then we repeat the matrix so we get an n x d x m matrix. We repeat B in the 3rd dimension as well. We then do a piecewise multiplication of all the elements and sum them. The resulting matrix is a 1 x d x m matrix. We reshape this into a m x d matrix.
I'm pretty sure I switched around the size of the dimensions a few times in my explanation, but I hope you get the general gist.
Multiplying with a diagonal matrix seems at least twice as fast, but I couldn't find a way to use diag, since it wants a vector or 2D matrix as input. I might try again later tonight, I feel there must be a faster way :).
[Edit] Split up the command in parts to at least make it a little bit readable.
This is the way I would do this:
sum(repmat(A,1,4).*B)
If you don't know the number of columns of B:
sum(repmat(A,1,size(B,2)).*B)