Short version
If I have matrix like this:
1 2
3 4
In memory, is it stored as [1 2 3 4] or as [1 3 2 4]. In other words, are matrices more optimized for row or for column access?
Long version
I'm translating some code from Matlab to NumPy. I'm used to C convention for multidimensional arrays (i.e. last index veries most rapidly, matrices are stored by rows) which is default for NumPy arrays. However, in Matlab code I see snippets like this all the time (for arrangement several colored images in a single multidimensional array):
images(:, :, :, i) = im
which looks suboptimal for C convention and more optimized for FORTRAN convention (first index veries most rapidly, matrices are stored by columns). So, is it correct that Matlab uses this second style and is better optimized for column operations?
Short answer: It is stored column-wise.
A = [1 2; 3 4];
A(:) = [1; 3; 2; 4];
In many cases, the performance can be much better if you do the calculations in the "correct order", and operate on full columns, and not rows.
A quick example:
%% Columns
a = rand(n);
b = zeros(n,1);
tic
for ii = 1:n
b = b + a(:,ii);
end
toc
Elapsed time is 0.252358 seconds.
%% Rows:
a = rand(n);
b = zeros(1,n);
tic
for ii = 1:n
b = b + a(ii,:);
end
toc
Elapsed time is 2.593381 seconds.
More than 10 times as fast when working on columns!
%% Columns
n = 4000;
a = rand(n);
b = zeros(n,1);
tic
for j = 1 : 10
for ii = 1:n
b = b + a(:,ii);
end
end
toc
%% Rows new:
a = rand(n);
b = zeros(1,n);
tic
for j = 1 : 10
for ii = 1:n
b = b + a(ii);
end
end
toc
%% Rows old:
a = rand(n);
b = zeros(1,n);
tic
for j = 1 : 10
for ii = 1:n
b = b + a(ii,:);
end
end
toc
Results:
Elapsed time is 1.53509 seconds.
Elapsed time is 1.03306 seconds.
Elapsed time is 3.4732 seconds.
So it looks like working on rows is SLIGHTLY faster than working on column, but using : causes the slowdown.
Related
I have a matrix A, a list of indices is and js, and a list of values to add to A, ws. Originally I was simply iterating through A by a nested for loop:
for idx = 1:N
i = is(idx);
j = js(idx);
w = ws(idx);
A(i,j) = A(i,j) + w;
end
However, I would like to vectorize this to increase efficiency. I thought something simple like
A(is,js) = A(is,js) + ws
would work, and it does as long as the is and js don't repeat. Said differently, if I generate idx = sub2ind(size(A),is,js);, so long as idx has no repeat values, all is well. If it does, then only the last value is added, all previous values are left out. A concrete example:
A = zeros(3,3);
indices = [1,2,3,1];
additions = [5,5,5,5];
A(indices) = A(indices) + additions;
This results in the first column having values of 5, not 5,5,10.
This is a small example, but in my actual application the lists of indices are really long and filled with redundant values. I'm hoping to vectorize this to save time, so going through and eliminating redundancies isn't really an option. So my main question is, how do I add to a matrix from a given set of redundant indices? Alternatively, is there another way of working through this without any sort of iteration?
To emphasize a nice property of accumarray (accumarray actually works with two indices)
With the example from Luis Mendo:
is = [2 3 3 1 1 2].';
js = [1 3 3 2 2 4].';
ws = [10 20 30 40 50 60].';
A3 = accumarray([is js],ws);
%% A3 =
%% 0 90 0 0
%% 10 0 0 60
%% 0 0 50 0
If I understand correctly, you only need full(sparse(is, js, ws)). This works because sparse accumulates values at matching indices.
% Example data
is = [2 3 3 1 1 2];
js = [1 3 3 2 2 4];
ws = [10 20 30 40 50 60];
% With loop
N = numel(is);
A = zeros(max(is), max(js));
for idx = 1:N
i = is(idx);
j = js(idx);
w = ws(idx);
A(i,j) = A(i,j) + w;
end
% With `sparse`
A2 = full(sparse(is, js, ws));
% Check
isequal(A, A2)
I have 3d matrix A that has my data. At multiple locations defined by row and column indcies as shown by matrix row_col_idx I want to extract all data along the third dimension as shown below:
A = cat(3,[1:3;4:6], [7:9;10:12],[13:15;16:18],[19:21;22:24]) %matrix(2,3,4)
row_col_idx=[1 1;1 2; 2 3];
idx = sub2ind(size(A(:,:,1)), row_col_idx(:,1),row_col_idx(:,2));
out=nan(size(A,3),size(row_col_idx,1));
for k=1:size(A,3)
temp=A(:,:,k);
out(k,:)=temp(idx);
end
out
The output of this code is as follows:
A(:,:,1) =
1 2 3
4 5 6
A(:,:,2) =
7 8 9
10 11 12
A(:,:,3) =
13 14 15
16 17 18
A(:,:,4) =
19 20 21
22 23 24
out =
1 2 6
7 8 12
13 14 18
19 20 24
The output is as expected. However, the actual A and row_col_idx are huge, so this code is computationally expensive. Is there away to vertorize this code to avoid the loop and the temp matrix?
This can be vectorized using linear indexing and implicit expansion:
out = A( row_col_idx(:,1) + ...
(row_col_idx(:,2)-1)*size(A,1) + ...
(0:size(A,1)*size(A,2):numel(A)-1) ).';
The above builds an indexing matrix as large as the output. If this is unacceptable due to memory limiations, it can be avoided by reshaping A:
sz = size(A); % store size A
A = reshape(A, [], sz(3)); % collapse first two dimensions
out = A(row_col_idx(:,1) + (row_col_idx(:,2)-1)*sz(1),:).'; % linear indexing along
% first two dims of A
A = reshape(A, sz); % reshape back A, if needed
A more efficient method is using the entries of the row_col_idx vector for selecting the elements from A. I have compared the two methods for a large matrix, and as you can see the calculation is much faster.
For the A given in the question, it gives the same output
A = rand([2,3,10000000]);
row_col_idx=[1 1;1 2; 2 3];
idx = sub2ind(size(A(:,:,1)), row_col_idx(:,1),row_col_idx(:,2));
out=nan(size(A,3),size(row_col_idx,1));
tic;
for k=1:size(A,3)
temp=A(:,:,k);
out(k,:)=temp(idx);
end
time1 = toc;
%% More efficient method:
out2 = nan(size(A,3),size(row_col_idx,1));
tic;
for jj = 1:size(row_col_idx,1)
out2(:,jj) = [A(row_col_idx(jj,1),row_col_idx(jj,2),:)];
end
time2 = toc;
fprintf('Time calculation 1: %d\n',time1);
fprintf('Time calculation 2: %d\n',time2);
Gives as output:
Time calculation 1: 1.954714e+01
Time calculation 2: 2.998120e-01
What is the fastest way of taking an array A and outputing both unique(A) [i.e. the set of unique array elements of A] as well as the multiplicity array which takes in its i-th place the i-th multiplicity of the i-th entry of unique(A) in A.
That's a mouthful, so here's an example. Given A=[1 1 3 1 4 5 3], I want:
unique(A)=[1 3 4 5]
mult = [3 2 1 1]
This can be done with a tedious for loop, but would like to know if there is a way to exploit the array nature of MATLAB.
uA = unique(A);
mult = histc(A,uA);
Alternatively:
uA = unique(A);
mult = sum(bsxfun(#eq, uA(:).', A(:)));
Benchmarking
N = 100;
A = randi(N,1,2*N); %// size 1 x 2*N
%// Luis Mendo, first approach
tic
for iter = 1:1e3;
uA = unique(A);
mult = histc(A,uA);
end
toc
%// Luis Mendo, second approach
tic
for iter = 1:1e3;
uA = unique(A);
mult = sum(bsxfun(#eq, uA(:).', A(:)));
end
toc
%'// chappjc
tic
for iter = 1:1e3;
[uA,~,ic] = unique(A); % uA(ic) == A
mult= accumarray(ic.',1);
end
toc
Results with N = 100:
Elapsed time is 0.096206 seconds.
Elapsed time is 0.235686 seconds.
Elapsed time is 0.154150 seconds.
Results with N = 1000:
Elapsed time is 0.481456 seconds.
Elapsed time is 4.534572 seconds.
Elapsed time is 0.550606 seconds.
[uA,~,ic] = unique(A); % uA(ic) == A
mult = accumarray(ic.',1);
accumarray is very fast. Unfortunately, unique gets slow with 3 outputs.
Late addition:
uA = unique(A);
mult = nonzeros(accumarray(A(:),1,[],#sum,0,true))
S = sparse(A,1,1);
[uA,~,mult] = find(S);
I've found this elegant solution in an old Newsgroup thread.
Testing with the benchmark of Luis Mendo for N = 1000 :
Elapsed time is 0.228704 seconds. % histc
Elapsed time is 1.838388 seconds. % bsxfun
Elapsed time is 0.128791 seconds. % sparse
(On my machine, accumarray results in Error: Maximum variable size allowed by the program is exceeded.)
I have two matrices, A and B. (B is continuous like 1:n)
I need to find all the occurrences of each individual row of B in A, and store those row indices accordingly in cell array C. See below for an example.
A = [3,4,5;1,3,5;1,4,3;4,2,1]
B = [1;2;3;4;5]
Thus,
C = {[2,3,4];[4];[1,2,3];[1,3,4];[1,2]}
Note C does not need to be in a cell array for my application. I only suggest it because the row vectors of C are of unequal length. If you can suggest a work-around, this is fine too.
I've tried using a loop running ismember for each row of B, but this is too slow when the matrices A and B are huge, with around a million entries. Vectorized code is appreciated.
(To give you context, the purpose of this is to identify, in a mesh, those faces that are attached to a single vertex. Note I cannot use the function edgeattachments because my data are not of the form "TR" in triangulation representation. All I have is a list of faces and list of vertices.)
Well, the best answer for this would require knowledge of how A is filled. If A is sparse, that is, if it has few columns values and B is quite large, then I think the best way for memory saving may be using a sparse matrix instead of a cell.
% No fancy stuff, just fast and furious
bMax = numel(B);
nRows = size(A,1);
cLogical = sparse(nRows,bMax);
for curRow = 1:nRows
curIdx = A(curRow,:);
cLogical(curRow,curIdx) = 1;
end
Answer:
cLogical =
(2,1) 1
(3,1) 1
(4,1) 1
(4,2) 1
(1,3) 1
(2,3) 1
(3,3) 1
(1,4) 1
(3,4) 1
(4,4) 1
(1,5) 1
(2,5) 1
How to read the answer. For each column the rows show the indexes that the column index appears in A. That is 1 appears in rows [2 3 4], 2 appear in row [4], 3 rows [1 2 3], 4 row [1 3 4], 5 in row [1 2].
Then you can use cLogical instead of a cell as an indexing matrix in the future for your needs.
Another way would be to allocate C with the expected value for how many times an index should appear in C.
% Fancier solution using some assumed knowledge of A
bMax = numel(B);
nRows = size(A,1);
nColumns = size(A,2);
% Pre-allocating with the expected value, an attempt to reduce re-allocations.
% tic; for rep=1:10000; C = mat2cell(zeros(bMax,nColumns),ones(1,bMax),nColumns); end; toc
% Elapsed time is 1.364558 seconds.
% tic; for rep=1:10000; C = repmat({zeros(1,nColumns)},bMax,1); end; toc
% Elapsed time is 0.606266 seconds.
% So we keep the not fancy repmat solution
C = repmat({zeros(1,nColumns)},bMax,1);
for curRow = 1:nRows
curIdxMsk = A(curRow,:);
for curCol = 1:nColumns
curIdx = curIdxMsk(curCol);
fillIdx = ~C{curIdx};
if any(fillIdx)
fillIdx = find(fillIdx,1);
else
fillIdx = numel(fillIdx)+1;
end
C{curIdx}(fillIdx) = curRow;
end
end
% Squeeze empty indexes:
for curRow = 1:bMax
C{curRow}(~C{curRow}) = [];
end
Answer:
>> C{:}
ans =
2 3 4
ans =
4
ans =
1 2 3
ans =
1 3 4
ans =
1 2
Which solution will performs best? You do a performance test in your code because it depends on how big is A, bMax, the memory size of your computer and so on. Yet, I'm still curious with solutions other people can do for this x). I liked chappjc's solution although it has the cons that he has pointed out.
For the given example (10k times):
Solution 1: Elapsed time is 0.516647 seconds.
Solution 2: Elapsed time is 4.201409 seconds (seems that solution 2 is a bad idea hahaha, but since it was created to the specific issue of A having many rows it has to be tested in those conditions).
chappjc' solution: Elapsed time is 2.405341 seconds.
We can do it without making any assumptions about B. Try this use of bsxfun and mat2cell:
M = squeeze(any(bsxfun(#eq,A,permute(B,[3 2 1])),2)); % 4x3x1 #eq 1x1x5 => 4x3x5
R = sum(M); % 4x5 -> 1x5
[ii,jj] = find(M);
C = mat2cell(ii,R)
The cells in C above will be column vectors rather than rows as in your example. To make the cells contain row vectors, use C = mat2cell(ii',1,R)' instead.
My only concern is that mat2cell could be slow for millions of values of R, but if you want your output in a cell, I'm not sure how much better you can do. EDIT: If you can deal with a sparse matrix like in Werner's first solution with the loop, replace the last line of the above with the following:
>> Cs = sparse(ii,jj,1)
Cs =
(2,1) 1
(3,1) 1
(4,1) 1
(4,2) 1
(1,3) 1
(2,3) 1
(3,3) 1
(1,4) 1
(3,4) 1
(4,4) 1
(1,5) 1
(2,5) 1
Unfortunately, bsxfun will probably run out of memory if both size(A,1) and numel(B) are large! You may have to loop over the elements of A or B if memory becomes an issue. Here's one way to do it by looping over your vertexes in B:
for i=1:numel(B), C{i} = find(any(A==B(i),2)); end
Yup, that easy. Cell array growing is extremely fast in MATLAB as it similar to a sequence container that stores contiguous references to the data, rather than keeping the data itself contiguous. Perhaps ismember was the bottleneck in your test.
I know that in MATLAB, in the 1D case, you can select elements with indexing such as a([1 5 3]), to return the 1st, 5th, and 3rd elements of a. I have a 2D array, and would like to select out individual elements according to a set of tuples I have. So I may want to get a(1,3), a(1,4), a(2,5) and so on. Currently the best I have is diag(a(tuples(:,1), tuples(:,2)), but this requires a prohibitive amount of memory for larger a and/or tuples. Do I have to convert these tuples into linear indices, or is there a cleaner way of accomplishing what I want without taking so much memory?
Converting to linear indices seems like a legitimate way to go:
indices = tuples(:, 1) + size(a,1)*(tuples(:,2)-1);
selection = a(indices);
Note that this is also implement in the Matlab built-in solution sub2ind, as in nate'2 answer:
a(sub2ind(size(a), tuples(:,1),tuples(:,2)))
however,
a = rand(50);
tuples = [1,1; 1,4; 2,5];
start = tic;
for ii = 1:1e4
indices = tuples(:,1) + size(a,1)*(tuples(:,2)-1); end
time1 = toc(start);
start = tic;
for ii = 1:1e4
sub2ind(size(a),tuples(:,1),tuples(:,2)); end
time2 = toc(start);
round(time2/time1)
which gives
ans =
38
so although sub2ind is easier on the eyes, it's also ~40 times slower. If you have to do this operation often, choose the method above. Otherwise, use sub2ind to improve readability.
if x and y are vectors of the x y values of matrix a, then sub2und should solve your problem:
a(sub2ind(size(a),x,y))
For example
a=magic(3)
a =
8 1 6
3 5 7
4 9 2
x = [3 1];
y = [1 2];
a(sub2ind(size(a),x,y))
ans =
4 1
you can reference the 2D matlab position with a 1D number as in:
a = [3 4 5;
6 7 8;
9 10 11;];
a(1) = 3;
a(2) = 6;
a(6) = 10;
So if you can get the positions in a matrix like this:
a([(col1-1)*(rowMax)+row1, (col2-1)*(rowMax)+row2, (col3-1)*(rowMax)+row3])
note: rowmax is 3 in this case
will give you a list of the elements at col1/row1 col2/row2 and col3/row3.
so if
row1 = col1 = 1
row2 = col2 = 2
row3 = col3 = 3
you will get:
[3, 7, 11]
back.