Output arrayfun into trid dimension of the matrix in MATLAB - arrays

Assume that I have a matrix A = rand(n,m). I want to compute matrix B with size n x n x m, where B(:,:,i) = A(:,i)*A(:,i)';
The code that can produce this is quite simple:
A = rand(n,m); B = zeros(n,n,m);
for i=1:m
B(:,:,i) = A(:,i)*A(:,i)'
end
However, I am concerned about speed and would like to ask you help to tell me how to implement it without using loops. Very likely that I need to use either bsxfun, arrayfun or rowfun, but I am not sure.
All answers are appreciated.

I don't have MATLAB at hand right now, but I think this code should produce the same result as your loop:
A1 = reshape(A,n,1,m);
A2 = reshape(A,1,n,m);
B = bsxfun(#times,A1,A2);
If you have a newer version of MATLAB, you don't need bsxfun any more, you can just write
B = A1 .* A2;
On older versions this last line will give an error message.
Whether any of this is faster than your loop depends also on the version of MATLAB. Newer MATLAB versions are not slow any more with loops. I think the loop is more readable, it's worth using more readable code, or at least keep the loop in a comment to clarify what the vectorized code does.

arrayfun and bsxfun does not speed up the calculations in my attempt as below:
clc;close all;
clear all;
m=300;n=400;
A = rand(n,m); B = zeros(n,n,m);
tic
for i=1:m
B(:,:,i) = A(:,i)*A(:,i)';
end
t1=toc
C = reshape(cell2mat(arrayfun(#(k) bsxfun(#times, A(:,k), A(:,k)'), ...
1:m, 'UniformOutput',false)),n,n,m);
%C=reshape(C,n,n,m);
t2=toc-t1
% t1 =0.3079
% t2 =0.5112

Related

Replace specific entries of a multidimensional arrays avoiding loops

I would like to replace the entry corresponding to the column number of an array that is part of a 3D matrix by zero. My matrix is of size IxJxJ. In each column j I can find a matrix of size IxJof which I would like to replace the jth column by zero.
You can find below an example of what I would like using a simple 3D matrix A. This example uses a loop, which is what I am trying to avoid.
A(:,:,1) = randi([1,2],5,3);
A(:,:,2) = randi([3,4],5,3);
A(:,:,3) = randi([5,6],5,3);
for i = 1:3
B = A(:,i,:);
B = squeeze(B);
B(:,i) = 0;
A(:,i,:) = B;
end
Firstly, you can replace the 4 lines of code in your for loop with just A(:,i,i) = 0;. I don't see any real need to avoid the for loop.
Using linear indexing, you can do
A((1:size(A,1)).'+size(A,1).*(size(A,2)+1).*(0:size(A,2)-1)) = 0
or for older version of Matlab without implicit expansion (pre-R2016b)
A(bsxfun(#plus,(1:size(A,1)).',size(A,1).*(size(A,2)+1).*(0:size(A,2)-1))) = 0
After some very quick testing, it actually looks like the bsxfun solution is fastest, but the differences aren't huge, your results may differ.
Use eye to create a logical mask and mutiply it by A.
A = A .* reshape(~eye(3), 1, 3, 3) ;

What is the difference between the two for loops?

I had a question which I hope someone is able to clarify for me. What is the difference between the following two for loops?
c = zeros(16,10);
for k = 1:10
c(1,k) = log(k+1) - log(k);
for n = 1:15
**c**(n+1,k) = 1./n - k*(c(n,k));
end
end
%%%%%
c = zeros(16,10);
for k = 1:10
c(1,k) = log(k+1) - log(k);
for n = 1:15
**A**(n+1,k) = 1./n - k*(c(n,k));
end
end
A lot of times I find myself trying different areas when its simply a result of a specification of matrix. In the 2nd code of the for loop, it does create a new matrix also but what are the difference in terms of the calculations ?
Thanks
Like noted, the capital C in the second program is very hard to notice. So I'm going to call it A.
After running the first program, say you have c1 matrix (c with all the calculations done).
The second program will produce 2 matrices A and c2. This c2 only has the first row of of the first c1, the rest of c2 is 0's. A will have the rest of c1, except for its first row, which is all 0's.
Hope this helps.

a faster way to compute the error of a vector

For a given vector $(x_1,x_2,\ldots, x_n)$ I am trying to compute
I wrote the following code
for l = 1:n
for k = 1:n
error = error + norm(x(i)-x(j))
end
end
This code is not fast, especially when $n$ is large. I am aware that I am double counting actually... But how may I avoid it? How can I speed up my code?
Thank you!
You can do it with bsxfun, which is fast:
d = (abs(bsxfun(#minus, x, x.')));
result = sum(d(:));
Or alternatively use pdist with 'cityblock' distance (which for one-dimensional observations reduces to absolute difference). This computes each distance once, so you need to multiply the sum by 2:
result = 2*sum(pdist(x(:),'cityblock'));
How about a simple speed up?
for a=1:n
for b=a+1:n
error = error + 2*norm(x(a)-x(b))
end
end
For a scalar, norm just gives abs.
So,
error = sum(abs( bsxfun(#minus, error,error') ))
will do the same thing.
also check out pdist which will do this for vectors, using vector norms, in an even faster way.

How to improve the execution time of this function?

Suppose that f(x,y) is a bivariate function as follows:
function [ f ] = f(x,y)
UN=(g)1.6*(1-acos(g)/pi)-0.8;
f= 1+UN(cos(0.5*pi*x+y));
end
How to improve execution time for function F(N) with the following code:
function [VAL] = F(N)
x=0:4/N:4;
y=0:2*pi/1000:2*pi;
VAL=zeros(N+1,3);
for i = 1:N+1
val = zeros(1,N+1);
for j = 1:N+1
val(j) = trapz(y,f(0,y).*f(x(i),y).*f(x(j),y))/2/pi;
end
val = fftshift(fft(val))/N;
l = (length(val)+1)/2;
VAL(i,:)= val(l-1:l+1);
end
VAL = fftshift(fft(VAL,[],1),1)/N;
L = (size(VAL,1)+1)/2;
VAL = VAL(L-1:L+1,:);
end
Note that N=2^p where p>10, so please consider the memory limitations while optimizing the code using ndgrid, arrayfun, etc.
FYI: The code intends to find the central 3-by-3 submatrix of the fftn of
fun=#(a,b) trapz(y,f(0,y).*f(a,y).*f(b,y))/2/pi;
where a,b are in [0,4]. The key idea is that we can save memory using the code above specially when N is very large. But the execution time is still an issue because of nested loops. See the figure below for N=2^2:
This is not a full answer, but some possibly helpful hints:
0) The trivial: Are you sure you need numerics? Can't you do the computation analytically?
1) Do not use function handles:
function [ f ] = f(x,y)
f= 1+1.6*(1-acos(cos(0.5*pi*x+y))/pi)-0.8
end
2) Simplify analytically: acos(cos(x)) is the same as abs(mod(x + pi, 2 * pi) - pi), which should compute slightly faster. Or, instead of sampling and then numerically integrating, first integrate analytically and sample the result.
3) The FFT is a very efficient algorithm to compute the full DFT, but you don't need the full DFT. Since you only want the central 3 x 3 coefficients, it might be more efficient to directly apply the DFT definition and evaluate the formula only for those coefficients that you want. That should be both fast and memory-efficient.
4) If you repeatedly do this computation, it might be helpful to precompute DFT coefficients. Here, dftmtx from the Signal Processing toolbox can assist.
5) To get rid of the loops, think about the problem not in the form of computation instructions, but a single matrix operation. If you consider your input N x N matrix as a vector with N² elements, and your output 3 x 3 matrix as a 9-element vector, then the whole operation you apply (numerical integration via trapz and DFT via fft) appears to be a simple linear transform, which it should be possible to express as an N² x 9 matrix.

How to do transpose for tptrs in blas?

How to do transpose for tptrs in blas?
I want to solve:
XA = B
But it seems that tptrs only lets me solve:
AX = B
Or, using the 'transpose' flag, in tptrs:
A'X = B
which, rearranging is:
(A'X)' = B'
X'A = B'
So, I can use it to solve XA = B, but I have to first transpose B manually myself, and then, again, transpose the answer. Am I missing some trick to avoid having to do the transpose?
TPTRS isn't a BLAS routine; it's an LAPACK routine.
If A is relatively small compared to B and X, then a good option to unpack it into a "normal" triangular matrix and use the BLAS routine TRSM which takes a "side" argument allowing you to specify XA = B. If A is mxm and B is nxm, the unpacking adds m^2 operations which will be a small amount of overhead compared to the O(nm^2) operations to do the solve.

Resources