How to quickly get the array of multiplicities - arrays

What is the fastest way of taking an array A and outputing both unique(A) [i.e. the set of unique array elements of A] as well as the multiplicity array which takes in its i-th place the i-th multiplicity of the i-th entry of unique(A) in A.
That's a mouthful, so here's an example. Given A=[1 1 3 1 4 5 3], I want:
unique(A)=[1 3 4 5]
mult = [3 2 1 1]
This can be done with a tedious for loop, but would like to know if there is a way to exploit the array nature of MATLAB.

uA = unique(A);
mult = histc(A,uA);
Alternatively:
uA = unique(A);
mult = sum(bsxfun(#eq, uA(:).', A(:)));
Benchmarking
N = 100;
A = randi(N,1,2*N); %// size 1 x 2*N
%// Luis Mendo, first approach
tic
for iter = 1:1e3;
uA = unique(A);
mult = histc(A,uA);
end
toc
%// Luis Mendo, second approach
tic
for iter = 1:1e3;
uA = unique(A);
mult = sum(bsxfun(#eq, uA(:).', A(:)));
end
toc
%'// chappjc
tic
for iter = 1:1e3;
[uA,~,ic] = unique(A); % uA(ic) == A
mult= accumarray(ic.',1);
end
toc
Results with N = 100:
Elapsed time is 0.096206 seconds.
Elapsed time is 0.235686 seconds.
Elapsed time is 0.154150 seconds.
Results with N = 1000:
Elapsed time is 0.481456 seconds.
Elapsed time is 4.534572 seconds.
Elapsed time is 0.550606 seconds.

[uA,~,ic] = unique(A); % uA(ic) == A
mult = accumarray(ic.',1);
accumarray is very fast. Unfortunately, unique gets slow with 3 outputs.
Late addition:
uA = unique(A);
mult = nonzeros(accumarray(A(:),1,[],#sum,0,true))

S = sparse(A,1,1);
[uA,~,mult] = find(S);
I've found this elegant solution in an old Newsgroup thread.
Testing with the benchmark of Luis Mendo for N = 1000 :
Elapsed time is 0.228704 seconds. % histc
Elapsed time is 1.838388 seconds. % bsxfun
Elapsed time is 0.128791 seconds. % sparse
(On my machine, accumarray results in Error: Maximum variable size allowed by the program is exceeded.)

Related

Inserting One Row Each Time in a Sequence from Matrix into Another Matrix After Every nth Row in Matlab

I have matrix A and matrix B. Matrix A is 100*3. Matrix B is 10*3. I need to insert one row from matrix B each time in a sequence into matrix A after every 10th row. The result would be Matrix A with 110*3. How can I do this in Matlab?
Here's another indexing-based approach:
n = 10;
C = [A; B];
[~, ind] = sort([1:size(A,1) n*(1:size(B,1))+.5]);
C = C(ind,:);
For canonical purposes, here's how you'd do it via loops. This is a bit inefficient since you're mutating the array at each iteration, but it's really simple to read. Given that your two matrices are stored in A (100 x 3) and B (10 x 3), you would do:
out = [];
for idx = 1 : 10
out = [out; A((idx-1)*10 + 1 : 10*idx,:); B(idx,:)];
end
At each iteration, we pick out 10 rows of A and 1 row of B and we concatenate these 11 rows onto out. This happens 10 times, resulting in 330 rows with 3 columns.
Here's an index-based approach:
%//pre-allocate output matrix
matrixC = zeros(110, 3);
%//create index array for the locations in matrixC that would be populated by matrixB
idxArr = (1:10) * 11;
%//place matrixB into matrixC
matrixC(idxArr,:) = matrixB;
%//place matrixA into matrixC
%//setdiff is used to exclude indexes already populated by values from matrixB
matrixC(setdiff(1:110, idxArr),:) = matrixA;
And just for fun here's the same approach sans magic numbers:
%//define how many rows to take from matrixA at once
numRows = 10;
%//get dimensions of input matrices
lengthA = size(matrixA, 1);
lengthB = size(matrixB, 1);
matrixC = zeros(lengthA + lengthB, 3);
idxArr = (1:lengthB) * (numRows + 1);
matrixC(idxArr,:) = matrixB;
matrixC(setdiff(1:size(matrixC, 1), idxArr),:) = matrixA;
Just for fun... Now with more robust test matrices!
A = ones(3, 100);
A(:) = 1:300;
A = A.'
B = ones(3, 10);
B(:) = 1:30;
B = B.' + 1000
C = reshape(A.', 3, 10, []);
C(:,end+1,:) = permute(B, [2 3 1]);
D = permute(C, [2 3 1]);
E = reshape(D, 110, 3)
Input:
A =
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20 21
22 23 24
25 26 27
28 29 30
31 32 33
34 35 36
...
B =
1001 1002 1003
1004 1005 1006
...
Output:
E =
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20 21
22 23 24
25 26 27
28 29 30
1001 1002 1003
31 32 33
34 35 36
...
Thanks to #Divakar for pointing out my previous error.
Solution Code
Here's an implementation based on logical indexing also known as masking and must be pretty efficient when working with large arrays -
%// Get sizes of A and B
[M,d] = size(A);
N = size(B,1);
%// Mask of row indices where rows from A would be placed
mask_idx = reshape([true(A_cutrow,M/A_cutrow) ; false(1,N)],[],1);
%// Pre-allocate with zeros:
%// http://undocumentedmatlab.com/blog/preallocation-performance
out(M+N,d) = 0;
%// Insert A and B using mask and ~mask
out(mask_idx,:) = A;
out(~mask_idx,:) = B;
Benchmarking
%// Setup inputs
A = rand(100000,3);
B = rand(10000,3);
A_cutrow = 10;
num_iter = 200; %// Number of iterations to be run for each approach
%// Warm up tic/toc.
for k = 1:50000
tic(); elapsed = toc();
end
disp(' ------------------------------- With MASKING')
tic
for iter = 1:num_iter
[M,d] = size(A);
N = size(B,1);
mask_idx = reshape([true(A_cutrow,M/A_cutrow) ; false(1,N)],[],1);
out(M+N,d) = 0;
out(mask_idx,:) = A;
out(~mask_idx,:) = B;
clear out
end
toc, clear mask_idx N M d iter
disp(' ------------------------------- With SORT')
tic
for iter = 1:num_iter
C = [A; B];
[~, ind] = sort([1:size(A,1) A_cutrow*(1:size(B,1))+.5]);
C = C(ind,:);
end
toc, clear C ind iter
disp(' ------------------------------- With RESHAPE+PERMUTE')
tic
for iter = 1:num_iter
[M,d] = size(A);
N = size(B,1);
C = reshape(A.', d, A_cutrow , []);
C(:,end+1,:) = permute(B, [2 3 1]);
D = permute(C, [2 1 3]);
out = reshape(permute(D,[1 3 2]),M+N,[]);
end
toc, clear out D C N M d iter
disp(' ------------------------------- With SETDIFF')
tic
for iter = 1:num_iter
lengthA = size(A, 1);
lengthB = size(B, 1);
matrixC = zeros(lengthA + lengthB, 3);
idxArr = (1:lengthB) * (A_cutrow + 1);
matrixC(idxArr,:) = B;
matrixC(setdiff(1:size(matrixC, 1), idxArr),:) = A;
end
toc, clear matrixC idxArr lengthA lengthB
disp(' ------------------------------- With FOR-LOOP')
tic
for iter = 1:num_iter
[M,d] = size(A);
N = size(B,1);
Mc = M/A_cutrow;
out(M+N,d) = 0;
for idx = 1 : Mc
out( 1+(idx-1)*(A_cutrow +1): idx*(A_cutrow+1), :) = ...
[A( 1+(idx-1)*A_cutrow : idx*A_cutrow , : ) ; B(idx,:)];
end
clear out
end
toc
Runtimes
Case #1: A as 100 x 3 and B as 10 x 3
------------------------------- With MASKING
Elapsed time is 4.987088 seconds.
------------------------------- With SORT
Elapsed time is 5.056301 seconds.
------------------------------- With RESHAPE+PERMUTE
Elapsed time is 5.170416 seconds.
------------------------------- With SETDIFF
Elapsed time is 35.063020 seconds.
------------------------------- With FOR-LOOP
Elapsed time is 12.118992 seconds.
Case #2: A as 100000 x 3 and B as 10000 x 3
------------------------------- With MASKING
Elapsed time is 1.167707 seconds.
------------------------------- With SORT
Elapsed time is 2.667149 seconds.
------------------------------- With RESHAPE+PERMUTE
Elapsed time is 2.603110 seconds.
------------------------------- With SETDIFF
Elapsed time is 3.153900 seconds.
------------------------------- With FOR-LOOP
Elapsed time is 19.822912 seconds.
Please note that num_iter was different for these two cases, as the idea was to keep the runtimes > 1 sec mark to compensate for tic-toc overheads.

How to delete zeros from matrix in MATLAB?

Here is my problem:
I have a nxn matrix in matlab. I want to delete all the zeros of this matrix and put the rows of it in vectors. For n=4, let say I have the following matrix:
A = [ 1 1 0 0
1 2 0 0
1 0 0 0
1 2 1 0 ];
How to get the following:
v1 = [ 1 1 ];
v2 = [ 1 2 ];
v3 = [ 1 ];
v4 = [ 1 2 1 ];
I did the following:
for i = 1:size(A, 1)
tmp = A(i, :);
tmp(A(i, :)==0)=[];
v{i} = tmp;
end
Slightly faster than Divakar's answer:
nzv = arrayfun(#(n) nonzeros(A(n,:)), 1:size(A,1), 'uniformoutput', false);
Benchmarking
Small matrix
A = randi([0 3],100,200);
repetitions = 1000;
tic
for count = 1:repetitions
nzv =cellfun(#(x) nonzeros(x),mat2cell(A,ones(1,size(A,1)),size(A,2)),'uni',0);
end
toc
tic
for count = 1:repetitions
nzv = arrayfun(#(n) nonzeros(A(n,:)), 1:size(A,1), 'uniformoutput', false);
end
toc
Elapsed time is 3.017757 seconds.
Elapsed time is 2.025967 seconds.
Large matrix
A = randi([0 3],1000,2000);
repetitions = 100;
Elapsed time is 11.483947 seconds.
Elapsed time is 5.563153 seconds.
Convert to a cell array such that you have a cell for each row and then use nonzeros for each cell, that deletes zeros and finally store them into separate variables.
Code
nzv =cellfun(#(x) nonzeros(x),mat2cell(A,ones(1,size(A,1)),size(A,2)),'uni',0)
[v1,v2,v3,v4] = nzv{:}

Getting value and index of array matlab

I have an array in Matlab, let say of (256, 256). Now i need to build a new array of dimensions (3, 256*256) containing in each row the value, and the index of the value in the original array. I.e:
test = [1,2,3;4,5,6;7,8,9]
test =
1 2 3
4 5 6
7 8 9
I need as result:
[1, 1, 1; 2, 1, 2; 3, 1, 3; 4, 2, 1; 5, 2, 2; and so on]
Any ideas?
Thanks in advance!
What you want is the output of meshgrid
[C,B]=meshgrid(1:size(test,1),1:size(test,2))
M=test;
M(:,:,2)=B;
M(:,:,3)=C;
here's what i came up with
test = [1,2,3;4,5,6;7,8,9]; % orig matrix
[m, n] = size(test); % example 1, breaks with value zero elems
o = find(test);
test1 = [o, reshape(test, m*n, 1), o]
Elapsed time is 0.004104 seconds.
% one liner from above
% (depending on data size might want to avoid dual find calls)
test2=[ find(test) reshape(test, size(test,1)*size(test,2), 1 ) find(test)]
Elapsed time is 0.008121 seconds.
[r, c, v] = find(test); % just another way to write above, still breaks on zeros
test3 = [r, v, c]
Elapsed time is 0.009516 seconds.
[i, j] =ind2sub([m n],[1:m*n]); % use ind2sub to build tables of indicies
% and reshape to build col vector
test4 = [i', reshape(test, m*n, 1), j']
Elapsed time is 0.011579 seconds.
test0 = [1,2,3;0,5,6;0,8,9]; % testing find with zeros.....breaks
% test5=[ find(test0) reshape(test0, size(test0,1)*size(test0,2), 1 ) find(test0)] % error in horzcat
[i, j] =ind2sub([m n],[1:m*n]); % testing ind2sub with zeros.... winner
test6 = [i', reshape(test0, m*n, 1), j']
Elapsed time is 0.014166 seconds.
Using meshgrid from above:
Elapsed time is 0.048007 seconds.

What's the fastest way to remove or change large number of entries in arrays in MATLAB?

I want to change a number of values in a 4D array M_ ijkl to NaN using MATLAB.
I use find to get the indices i and j that meet a certain condition for k = 2 and l = 4 (in my case it's the y component of a position at time t_4). I now want to set all the entries for these i and j combinations and for all k and l to NaN.
I used this method to do it (example by nkjt):
% initialise
M = zeros(10,10,2,4);
% set two points in (:,:,2,4) to be above threshold.
M(2,4,2,4)=5;
M(6,8,2,4)=5;
% find and set to NaN
[i,j] = find(M(:,:,2,4) > 4);
M(i,j,:,:)= NaN;
% count NaNs
sum(isnan(M(:))) % returns 32
This method is is very slow as this example illustrates:
M = rand(360,360,2,4);
threshold = 0.5;
% slow and wrong result
[i,j] = find(M(:,:,2,4) > threshold);
tic;
M(i,j,:,:) = NaN;
toc;
Elapsed time is 54.698449 seconds.
Note that the tic and toc don't time the find so that is not the problem.
With Rody's and njkt's help I also realized that my method doesn't actually do what I want. I only want to change entries with the combinations i and j i found with find (for all k and l), i.e. [2,4,:,:] and [6,8,:,:], but not [2,8,:,:] and [6,4,:,:]. In the first example sum(isnan(M(:))) should return 16.
Have you checked your results? Because I think they are wrong. For example, if you have
A = [...
1 2 3
4 5 6
7 8 9];
and you want to set element A(1,1) and A(2,3) to NaN. What you are doing is
A([1 2], [1 3]) = NaN
but that gives
A =
NaN 2 NaN
NaN 5 NaN
7 8 9
The easiest and fastest way around this is to not use find, but logical indexing:
M = rand(360,360,2,4);
maximum = 0.05;
tic;
M(M(:,:,2,4) > maximum) = NaN;
toc
Which gives on my PC:
Elapsed time is 0.003547 seconds.
Much faster for me by reshaping M:
M = rand(360,360,2,4);
M = reshape(M,[360*360,2,4]);
maximum = 0.05;
n = find(M(:,2,4) > maximum);
tic;
M(n,:,:) = NaN;
M = reshape(M,[360, 360, 2, 4]);
toc;
ETA:
M(i,j,:,:)= NaN; sets all combinations of i, j to NaN for all k,l (as explained in Rody's answer).
So for example:
% initialise
M = zeros(10,10,2,4);
% set two points in (:,:,2,4) to be above threshold.
M(2,4,2,4)=5;
M(6,8,2,4)=5;
% find and set to NaN
[i,j] = find(M(:,:,2,4) > 4);
M(i,j,:,:)= NaN;
% count NaNs
sum(isnan(M(:))) % returns 32
e.g. '(2,4,l,k) = NaN' but also '(4,2,l,k) = NaN'.
If this is what you want, reduce the size of i,j with unique after find.
In terms of logical indexing, basically, it's often better to do something like A(A>2)=NaN; instead of n = find(A>2); A(n)=NaN;. In the reshaped case you could do M(M(:,2,4)>maximum,:,:) = NaN;. I didn't tic/toc it so I don't know if it would be faster in this case.

How are matrices stored in Matlab/Octave?

Short version
If I have matrix like this:
1 2
3 4
In memory, is it stored as [1 2 3 4] or as [1 3 2 4]. In other words, are matrices more optimized for row or for column access?
Long version
I'm translating some code from Matlab to NumPy. I'm used to C convention for multidimensional arrays (i.e. last index veries most rapidly, matrices are stored by rows) which is default for NumPy arrays. However, in Matlab code I see snippets like this all the time (for arrangement several colored images in a single multidimensional array):
images(:, :, :, i) = im
which looks suboptimal for C convention and more optimized for FORTRAN convention (first index veries most rapidly, matrices are stored by columns). So, is it correct that Matlab uses this second style and is better optimized for column operations?
Short answer: It is stored column-wise.
A = [1 2; 3 4];
A(:) = [1; 3; 2; 4];
In many cases, the performance can be much better if you do the calculations in the "correct order", and operate on full columns, and not rows.
A quick example:
%% Columns
a = rand(n);
b = zeros(n,1);
tic
for ii = 1:n
b = b + a(:,ii);
end
toc
Elapsed time is 0.252358 seconds.
%% Rows:
a = rand(n);
b = zeros(1,n);
tic
for ii = 1:n
b = b + a(ii,:);
end
toc
Elapsed time is 2.593381 seconds.
More than 10 times as fast when working on columns!
%% Columns
n = 4000;
a = rand(n);
b = zeros(n,1);
tic
for j = 1 : 10
for ii = 1:n
b = b + a(:,ii);
end
end
toc
%% Rows new:
a = rand(n);
b = zeros(1,n);
tic
for j = 1 : 10
for ii = 1:n
b = b + a(ii);
end
end
toc
%% Rows old:
a = rand(n);
b = zeros(1,n);
tic
for j = 1 : 10
for ii = 1:n
b = b + a(ii,:);
end
end
toc
Results:
Elapsed time is 1.53509 seconds.
Elapsed time is 1.03306 seconds.
Elapsed time is 3.4732 seconds.
So it looks like working on rows is SLIGHTLY faster than working on column, but using : causes the slowdown.

Resources