I have a matrix such as
M = [ 1 3 2 4;
3 3 2 1;
2 4 1 3]
which has a base A = [ 1 2 3 4];
I also have another base B = [103 104 105 106];
I need to replace the values of A with values of B inside M. So my new M should be:
M1 = [ 103 105 104 106;
105 105 104 103;
104 106 103 105];
The elements are random numbers so I need to use indice one to one connection between A nd B.
Should I mention it? of course NO FOR LOOPS :D
Thanks
Here's a one-liner for you:
sum(bsxfun(#times, bsxfun(#eq, M, reshape(A,1,1,[])), reshape(B,1,1,[])), 3)
It's rather fast.
Benchmark
Here's the benchmarking code:
%// bsxfun party
tic
for k = 1:10000
M1 = sum(bsxfun(#times,bsxfun(#eq,M,reshape(A,1,1,[])),reshape(B,1,1,[])),3);
end
toc
%// Using ismember
tic
for k = 1:10000
[idx,b] = ismember(M,A);
M(idx) = B(b(idx));
end
toc
%// Using a simple loop
tic
for k = 1:10000
M1 = M;
for t = 1:length(A)
M1(M == A(t)) = B(t);
end
end
toc
The results are:
Elapsed time is 0.030135 seconds.
Elapsed time is 0.094354 seconds.
Elapsed time is 0.007410 seconds.
So this one-liner is faster than the elegant solution with ismember, but the simple (JIT-accelerated) loop beats both. Surprising, no? :)
If you are sure the new M contains only elements from the new base (same for the old M and the old base), you can use the second output of ismember:
>> [~,b] = ismember(M,A);
>> M = B(b)
M =
103 105 104 106
105 105 104 103
104 106 103 105
If your base is a simple function of your old base it can be trivial:
M1 = M + 102;
Otherwise this is a way:
M1 = M
for t = 1:length(A)
M1(M==A(t)) = B(t)
end
Based on the answer of #Rody an other solution:
[idx,b] = ismember(M,A);
M(idx) = B(b(idx))
The difference is that this will not break if A does not contain all elements of M. (Probably should not occur if it is a proper basis).
Related
I have 3d matrix A that has my data. At multiple locations defined by row and column indcies as shown by matrix row_col_idx I want to extract all data along the third dimension as shown below:
A = cat(3,[1:3;4:6], [7:9;10:12],[13:15;16:18],[19:21;22:24]) %matrix(2,3,4)
row_col_idx=[1 1;1 2; 2 3];
idx = sub2ind(size(A(:,:,1)), row_col_idx(:,1),row_col_idx(:,2));
out=nan(size(A,3),size(row_col_idx,1));
for k=1:size(A,3)
temp=A(:,:,k);
out(k,:)=temp(idx);
end
out
The output of this code is as follows:
A(:,:,1) =
1 2 3
4 5 6
A(:,:,2) =
7 8 9
10 11 12
A(:,:,3) =
13 14 15
16 17 18
A(:,:,4) =
19 20 21
22 23 24
out =
1 2 6
7 8 12
13 14 18
19 20 24
The output is as expected. However, the actual A and row_col_idx are huge, so this code is computationally expensive. Is there away to vertorize this code to avoid the loop and the temp matrix?
This can be vectorized using linear indexing and implicit expansion:
out = A( row_col_idx(:,1) + ...
(row_col_idx(:,2)-1)*size(A,1) + ...
(0:size(A,1)*size(A,2):numel(A)-1) ).';
The above builds an indexing matrix as large as the output. If this is unacceptable due to memory limiations, it can be avoided by reshaping A:
sz = size(A); % store size A
A = reshape(A, [], sz(3)); % collapse first two dimensions
out = A(row_col_idx(:,1) + (row_col_idx(:,2)-1)*sz(1),:).'; % linear indexing along
% first two dims of A
A = reshape(A, sz); % reshape back A, if needed
A more efficient method is using the entries of the row_col_idx vector for selecting the elements from A. I have compared the two methods for a large matrix, and as you can see the calculation is much faster.
For the A given in the question, it gives the same output
A = rand([2,3,10000000]);
row_col_idx=[1 1;1 2; 2 3];
idx = sub2ind(size(A(:,:,1)), row_col_idx(:,1),row_col_idx(:,2));
out=nan(size(A,3),size(row_col_idx,1));
tic;
for k=1:size(A,3)
temp=A(:,:,k);
out(k,:)=temp(idx);
end
time1 = toc;
%% More efficient method:
out2 = nan(size(A,3),size(row_col_idx,1));
tic;
for jj = 1:size(row_col_idx,1)
out2(:,jj) = [A(row_col_idx(jj,1),row_col_idx(jj,2),:)];
end
time2 = toc;
fprintf('Time calculation 1: %d\n',time1);
fprintf('Time calculation 2: %d\n',time2);
Gives as output:
Time calculation 1: 1.954714e+01
Time calculation 2: 2.998120e-01
I have the following time-series:
b = [2 5 110 113 55 115 80 90 120 35 123];
Each number in b is one data point at a time instant. I computed the duration values from b. Duration is represented by all numbers within b larger or equal to 100 and arranged consecutively (all other numbers are discarded). A maximum gap of one number smaller than 100 is allowed. This is how the code for duration looks like:
N = 2; % maximum allowed gap
duration = cellfun(#numel, regexp(char((b>=100)+'0'), [repmat('0',1,N) '+'], 'split'));
giving the following duration values for b:
duration = [4 3];
I want to find the positions (time-lines) within b for each value in duration. Next, I want to replace the other positions located outside duration with zeros. The result would look like this:
result = [0 0 3 4 5 6 0 0 9 10 11];
If anyone could help, it would be great.
Answer to original question: pattern with at most one value below 100
Here's an approach using a regular expression to detect the desired pattern. I'm assuming that one value <100 is allowed only between (not after) values >=100. So the pattern is: one or more values >=100 with a possible value <100 in between .
b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
[s, e] = regexp(B, '1+(.1+|)', 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(#ge, y, s(:)) & bsxfun(#le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result
This gives
y =
0 0 3 4 5 6 0 0 9 10 11
Answer to edited question: pattern with at most n values in a row below 100
The regexp needs to be modified, and it has to be dynamically built as a function of n:
b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
n = 2;
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
r = sprintf('1+(.{1,%i}1+)*', n); %// build the regular expression from n
[s, e] = regexp(B, r, 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(#ge, y, s(:)) & bsxfun(#le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result
Here is another solution, not using regexp. It naturally generalizes to arbitrary gap sizes and thresholds. Not sure whether there is a better way to fill the gaps. Explanation in comments:
% maximum step size and threshold
N = 2;
threshold = 100;
% data
b = [2 5 110 113 55 115 80 90 120 35 123];
% find valid data
B = b >= threshold;
B_ind = find(B);
% find lengths of gaps
step_size = diff(B_ind);
% find acceptable steps (and ignore step size 1)
permissible_steps = 1 < step_size & step_size <= N;
% find beginning and end of runs
good_begin = B_ind([permissible_steps, false]);
good_end = good_begin + step_size(permissible_steps);
% fill gaps in B
for ii = 1:numel(good_begin)
B(good_begin(ii):good_end(ii)) = true;
end
% find durations of runs in B. This finds points where we switch from 0 to
% 1 and vice versa. Due to padding the first match is always a start of a
% run, the last one always an end. There will be an even number of matches,
% so we can reshape and diff and thus fidn the durations
durations = diff(reshape(find(diff([false, B, false])), 2, []));
% get positions of 'good' data
outpos = zeros(size(b));
outpos(B) = find(B);
I have matrix A and matrix B. Matrix A is 100*3. Matrix B is 10*3. I need to insert one row from matrix B each time in a sequence into matrix A after every 10th row. The result would be Matrix A with 110*3. How can I do this in Matlab?
Here's another indexing-based approach:
n = 10;
C = [A; B];
[~, ind] = sort([1:size(A,1) n*(1:size(B,1))+.5]);
C = C(ind,:);
For canonical purposes, here's how you'd do it via loops. This is a bit inefficient since you're mutating the array at each iteration, but it's really simple to read. Given that your two matrices are stored in A (100 x 3) and B (10 x 3), you would do:
out = [];
for idx = 1 : 10
out = [out; A((idx-1)*10 + 1 : 10*idx,:); B(idx,:)];
end
At each iteration, we pick out 10 rows of A and 1 row of B and we concatenate these 11 rows onto out. This happens 10 times, resulting in 330 rows with 3 columns.
Here's an index-based approach:
%//pre-allocate output matrix
matrixC = zeros(110, 3);
%//create index array for the locations in matrixC that would be populated by matrixB
idxArr = (1:10) * 11;
%//place matrixB into matrixC
matrixC(idxArr,:) = matrixB;
%//place matrixA into matrixC
%//setdiff is used to exclude indexes already populated by values from matrixB
matrixC(setdiff(1:110, idxArr),:) = matrixA;
And just for fun here's the same approach sans magic numbers:
%//define how many rows to take from matrixA at once
numRows = 10;
%//get dimensions of input matrices
lengthA = size(matrixA, 1);
lengthB = size(matrixB, 1);
matrixC = zeros(lengthA + lengthB, 3);
idxArr = (1:lengthB) * (numRows + 1);
matrixC(idxArr,:) = matrixB;
matrixC(setdiff(1:size(matrixC, 1), idxArr),:) = matrixA;
Just for fun... Now with more robust test matrices!
A = ones(3, 100);
A(:) = 1:300;
A = A.'
B = ones(3, 10);
B(:) = 1:30;
B = B.' + 1000
C = reshape(A.', 3, 10, []);
C(:,end+1,:) = permute(B, [2 3 1]);
D = permute(C, [2 3 1]);
E = reshape(D, 110, 3)
Input:
A =
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20 21
22 23 24
25 26 27
28 29 30
31 32 33
34 35 36
...
B =
1001 1002 1003
1004 1005 1006
...
Output:
E =
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20 21
22 23 24
25 26 27
28 29 30
1001 1002 1003
31 32 33
34 35 36
...
Thanks to #Divakar for pointing out my previous error.
Solution Code
Here's an implementation based on logical indexing also known as masking and must be pretty efficient when working with large arrays -
%// Get sizes of A and B
[M,d] = size(A);
N = size(B,1);
%// Mask of row indices where rows from A would be placed
mask_idx = reshape([true(A_cutrow,M/A_cutrow) ; false(1,N)],[],1);
%// Pre-allocate with zeros:
%// http://undocumentedmatlab.com/blog/preallocation-performance
out(M+N,d) = 0;
%// Insert A and B using mask and ~mask
out(mask_idx,:) = A;
out(~mask_idx,:) = B;
Benchmarking
%// Setup inputs
A = rand(100000,3);
B = rand(10000,3);
A_cutrow = 10;
num_iter = 200; %// Number of iterations to be run for each approach
%// Warm up tic/toc.
for k = 1:50000
tic(); elapsed = toc();
end
disp(' ------------------------------- With MASKING')
tic
for iter = 1:num_iter
[M,d] = size(A);
N = size(B,1);
mask_idx = reshape([true(A_cutrow,M/A_cutrow) ; false(1,N)],[],1);
out(M+N,d) = 0;
out(mask_idx,:) = A;
out(~mask_idx,:) = B;
clear out
end
toc, clear mask_idx N M d iter
disp(' ------------------------------- With SORT')
tic
for iter = 1:num_iter
C = [A; B];
[~, ind] = sort([1:size(A,1) A_cutrow*(1:size(B,1))+.5]);
C = C(ind,:);
end
toc, clear C ind iter
disp(' ------------------------------- With RESHAPE+PERMUTE')
tic
for iter = 1:num_iter
[M,d] = size(A);
N = size(B,1);
C = reshape(A.', d, A_cutrow , []);
C(:,end+1,:) = permute(B, [2 3 1]);
D = permute(C, [2 1 3]);
out = reshape(permute(D,[1 3 2]),M+N,[]);
end
toc, clear out D C N M d iter
disp(' ------------------------------- With SETDIFF')
tic
for iter = 1:num_iter
lengthA = size(A, 1);
lengthB = size(B, 1);
matrixC = zeros(lengthA + lengthB, 3);
idxArr = (1:lengthB) * (A_cutrow + 1);
matrixC(idxArr,:) = B;
matrixC(setdiff(1:size(matrixC, 1), idxArr),:) = A;
end
toc, clear matrixC idxArr lengthA lengthB
disp(' ------------------------------- With FOR-LOOP')
tic
for iter = 1:num_iter
[M,d] = size(A);
N = size(B,1);
Mc = M/A_cutrow;
out(M+N,d) = 0;
for idx = 1 : Mc
out( 1+(idx-1)*(A_cutrow +1): idx*(A_cutrow+1), :) = ...
[A( 1+(idx-1)*A_cutrow : idx*A_cutrow , : ) ; B(idx,:)];
end
clear out
end
toc
Runtimes
Case #1: A as 100 x 3 and B as 10 x 3
------------------------------- With MASKING
Elapsed time is 4.987088 seconds.
------------------------------- With SORT
Elapsed time is 5.056301 seconds.
------------------------------- With RESHAPE+PERMUTE
Elapsed time is 5.170416 seconds.
------------------------------- With SETDIFF
Elapsed time is 35.063020 seconds.
------------------------------- With FOR-LOOP
Elapsed time is 12.118992 seconds.
Case #2: A as 100000 x 3 and B as 10000 x 3
------------------------------- With MASKING
Elapsed time is 1.167707 seconds.
------------------------------- With SORT
Elapsed time is 2.667149 seconds.
------------------------------- With RESHAPE+PERMUTE
Elapsed time is 2.603110 seconds.
------------------------------- With SETDIFF
Elapsed time is 3.153900 seconds.
------------------------------- With FOR-LOOP
Elapsed time is 19.822912 seconds.
Please note that num_iter was different for these two cases, as the idea was to keep the runtimes > 1 sec mark to compensate for tic-toc overheads.
Short version
If I have matrix like this:
1 2
3 4
In memory, is it stored as [1 2 3 4] or as [1 3 2 4]. In other words, are matrices more optimized for row or for column access?
Long version
I'm translating some code from Matlab to NumPy. I'm used to C convention for multidimensional arrays (i.e. last index veries most rapidly, matrices are stored by rows) which is default for NumPy arrays. However, in Matlab code I see snippets like this all the time (for arrangement several colored images in a single multidimensional array):
images(:, :, :, i) = im
which looks suboptimal for C convention and more optimized for FORTRAN convention (first index veries most rapidly, matrices are stored by columns). So, is it correct that Matlab uses this second style and is better optimized for column operations?
Short answer: It is stored column-wise.
A = [1 2; 3 4];
A(:) = [1; 3; 2; 4];
In many cases, the performance can be much better if you do the calculations in the "correct order", and operate on full columns, and not rows.
A quick example:
%% Columns
a = rand(n);
b = zeros(n,1);
tic
for ii = 1:n
b = b + a(:,ii);
end
toc
Elapsed time is 0.252358 seconds.
%% Rows:
a = rand(n);
b = zeros(1,n);
tic
for ii = 1:n
b = b + a(ii,:);
end
toc
Elapsed time is 2.593381 seconds.
More than 10 times as fast when working on columns!
%% Columns
n = 4000;
a = rand(n);
b = zeros(n,1);
tic
for j = 1 : 10
for ii = 1:n
b = b + a(:,ii);
end
end
toc
%% Rows new:
a = rand(n);
b = zeros(1,n);
tic
for j = 1 : 10
for ii = 1:n
b = b + a(ii);
end
end
toc
%% Rows old:
a = rand(n);
b = zeros(1,n);
tic
for j = 1 : 10
for ii = 1:n
b = b + a(ii,:);
end
end
toc
Results:
Elapsed time is 1.53509 seconds.
Elapsed time is 1.03306 seconds.
Elapsed time is 3.4732 seconds.
So it looks like working on rows is SLIGHTLY faster than working on column, but using : causes the slowdown.
I need some help in this problem
I have this matrix in MATLAB:
A = [ 25 1.2 1
28 1.2 2
17 2.6 1
18 2.6 2
23 1.2 1
29 1.2 2
19 15 1
22 15 2
24 2.6 1
26 2.6 2];
1st column is some measured values for temperature
2nd column is an index code representing the color (1.2:red,.....etc)
3rd column is the hour of taking the sample. Only at hours from 1 to 2
I want the matrix to be controlled by 2nd column as follows:
if it is 1.2, the program will find the average of all temperatures at hour 1 that
corresponds to 1.2
So, here ( 25 + 23 )/2 = 24
and also finds the average of all temperatures at hour 2 and that corresponds
to 1.2, ( 28 + 29 ) /2 = 28.5
and this average values:
[24
28.5]
will replace all temperature values at hours 1 and 2
that corresponds to 1.2 .
Then, it does the same thing for indices 2.6 and 15
So, the desired output will be:
B = [ 24
28.5
15.5
22
24
28.5
19
22
15.5
22]
My problem is in using the loop. I could do it for only one index at one run.
for example,
T=[];
index=1.2;
for i=1:length(A)
if A(i,2)==index
T=[T A(i,1)];
else
T=[T 0];
end
end
So, T is the extracted T that corresponds to 1.2 and other entries are zeros
Then, I wrote long code to find the average and at the end I could find the matrix
that corresponds to ONLY the index 1.2 :
B = [24
28.5
0
0
24
28.5
0
0
0
0]
But this is only for one index and it assigns zeros for the other indices. I can do this for all
indices in separate runs and then add the B's but this will take very long time since my real
matrix is 8760 by 5 .
I am sure that there is a shorter way to do that.
Thanks
Regards
Try this:
B = zeros(size(A, 1), 1);
C = unique(A(:, 2))';
T = [1 2];
for c = C,
for t = T,
I1 = find((A(:, 2) == c) & (A(:, 3) == t));
B(I1) = mean(A(I1, 1));
end
end
Edit
I think your expected answer is wrong for c = 2.6 and t = 1... Shouldn't it be (17 + 24)/2 = 20.5?
This can be done, perhaps more neatly, with accumarray:
[~, ~, ii] = unique(A(:,2)); %// indices corresponding to second col values
ind = [ii A(:,3)]; %// build 2D-indices for accumarray
averages = accumarray(ind, A(:,1), [], #mean); %// desired averages of first col
result = averages(sub2ind(max(ind), ind(:,1), ind(:,2))); %// repeat averages