I need to multiply parts of a column vector with a fixed row vector. I solved this problem using a for-loop. However, I am wondering if the performance can be improved as I have to perform this kind of computation around 50 million times. Here's my code so far:
multMat = 1:5;
mat = randi(5,10,1);
windowSize = 5;
vout = nan(10,1);
for r = windowSize : 10
vout(r) = multMat * mat( (r - windowSize + 1) : r);
end
I was thinking about uisng arrayfun. However, first I don't know how to adress the cell range (i.e. the previous five cells including the current cell), and second, I am not sure if arrayfun will be any faster than using the loop?
This sliding vector multiplication you're describing is an example of what is known as convolution. The following produces the same result as the loop in your example:
vout = [nan(windowSize-1,1);
conv(mat,flip(multMat),'valid')];
If your output doesn't really need the leading NaN values which aren't overwritten in your loop then the conv expression is sufficient without concatenating the NaN elements to it.
For sufficiently large vectors this is of course not guaranteed to be as fast as you'd like it to be, but MATLAB's built-in convolution implementation is likely to be pretty close to an optimal tool for the job.
Related
I imported code from Matlab to Octave and the speed of certain functions seems to have dropped.
I looked into vectorization and could not come up with a solution with my limited knowledge.
What i want to ask, is there a way to speed this up?
n = 181;
N = 250;
for i=1:n
for j=1:n
par=0;
for k=1:N;
par=par+log2(1+(10.^(matrix1(j,i,matrix2(j,i))./10)./(matrix3(j,i).*double1+double2)));
end
resultingMatrix(j,i)=2.^((1/N).*par)-1;
end
end
Where dimensions are:
matrix1 = 181x181x2,
matrix2 = 181x181 --> containing values either 1 or 2 only,
matrix3 = 181x181,
double1, double2 = just doubles
Here's my testing code, I've completed your code by making some random matrices:
n = 181;
N = 250;
matrix1 = rand(n,n,2);
matrix2 = randi(2,n,n);
matrix3 = rand(n,n);
double1 = 1;
double2 = 1;
tic
for i=1:n
for j=1:n
par=0;
for k=1:N
par=par+log2(1+(10.^(matrix1(j,i,matrix2(j,i))./10)./(matrix3(j,i).*double1+double2)));
end
resultingMatrix(j,i)=2.^((1/N).*par)-1;
end
end
toc
Note that the code inside the loop over k doesn't use k. This makes the loop superfluous. We can easily remove it. The loop does the same computation 250 times, adds up the results, and divides by 250, yielding the value of one of the repeated computations.
Another important thing to do is preallocate resultingMatrix, to avoid it growing with every loop iteration.
This is the resulting code:
tic
resultingMatrix2 = zeros(n,n);
for i=1:n
for j=1:n
par=log2(1+(10.^(matrix1(j,i,matrix2(j,i))./10)./(matrix3(j,i).*double1+double2)));
resultingMatrix2(j,i)=2.^par-1;
end
end
toc
max(abs((resultingMatrix(:)-resultingMatrix2(:))./resultingMatrix(:)))
The last line computes the maximum relative difference. It is 9.9424e-15 in my version of Octave. It will differ depending on the version, the system, and more. This error is the floating-point rounding error. Note that the original code, adding the same value 250 times, and then dividing it by 250, will produce a larger rounding error than the modified code. For example,
x = pi;
t = 0;
for i = 1:N
t = t + x;
end;
t = t / N;
t-x
gives -8.4377e-15, a similar rounding error to what we saw above.
The original code took 81.5 s, the modified code takes only 0.4 s. This is not a gain of vectorization, it is just a gain of preallocation and not needlessly repeating the same computation over and over again.
Next, we can remove the other two loops by vectorizing the operations. The difficult bit here is matrix1(j,i,matrix2(j,i)). We can produce each of the n*n linear indices with (1:n*n).' + (matrix2(:)-1)*(n*n). This is not trivial, I suggest you think about how this computation works. You need to know that linear indices count, starting at 1 for the top-left array element, first down, then right, then along the 3rd dimension. So 1:n*n is simply the linear indices for each of the elements of a 2D array, in order. To each of these we add n*n if we need to access the 2nd element along the 3rd dimension.
We now have the code
tic
index = reshape((1:n*n).' + (matrix2(:)-1)*(n*n), n, n);
par = log2(1+(10.^(matrix1(index)./10)./(matrix3.*double1+double2)));
resultingMatrix3 = 2.^par-1;
toc
max(abs((resultingMatrix(:)-resultingMatrix3(:))./resultingMatrix(:)))
This code produces the exact same result as my previous version, and runs in only 0.013 s, 30 times faster than the non-vectorized code, and 6000 times faster than the original code.
I have an issue with a code performing some array operations. It is too slow, because I use loops and input data are quite big. It was the easiest way for me, but now I am looking for something faster than for loops. I was trying to optimize or rewrite code, but unsuccessful. I really aprecciate Your help.
In my code I have three arrays x1, y1 (coordinates of points in grid), g1 (values in the points) and for example their size is 300 x 300. I treat each matrix as composition of 9 and I make calculation for points in the middle one. For example I start with g1(101,101), but I am using data from g1(1:201,1:201)=g2. I need to calculate distance from each point of g1(1:201,1:201) to g1(101,101) (ll matrix), then I calculate nn as it is in the code, next I find value for g1(101,101) from nn and put it in N array. Then I go to g1(101,102) and so on until g1(200,200), where in this last case g2=g1(99:300,99:300).
As i said, this code is not very efficient, even I have to use larger arrays than I gave in the example, it takes too much time. I hope I explain enough clearly what I expect from the code. I was thinking of using arrayfun, but I have never worked with this function, so I don't know how should use it, however it seems to me it won't handle. Maybe there are other solutions, however I couldn't find anything apropriate.
tic
x1=randn(300,300);
y1=randn(300,300);
g1=randn(300,300);
m=size(g1,1);
n=size(g1,2);
w=1/3*m;
k=1/3*n;
N=zeros(w,k);
for i=w+1:2*w
for j=k+1:2*k
x=x1(i,j);
y=y1(i,j);
x2=y1(i-k:i+k,j-w:j+w);
y2=y1(i-k:i+k,j-w:j+w);
g2=g1(i-k:i+k,j-w:j+w);
ll=1./sqrt((x2-x).^2+(y2-y).^2);
ll(isinf(ll))=0;
nn=ifft2(fft2(g2).*fft2(ll));
N(i-w,j-k)=nn(w+1,k+1);
end
end
czas=toc;
For what it's worth, arrayfun() is just a wrapper for a for loop, so it wouldn't lead to any performance improvements. Also, you probably have a typo in the definition of x2, I'll assume that it depends on x1. Otherwise it would be a superfluous variable. Also, your i<->w/k, j<->k/w pairing seems inconsistent, you should check that as well. Also also, just timing with tic/toc is rarely accurate. When profiling your code, put it in a function and run the timing multiple times, and exclude the variable generation from the timing. Even better: use the built-in profiler.
Disclaimer: this solution will likely not help for your actual problem due to its huge memory need. For your input of 300x300 matrices this works with arrays of size 300x300x100x100, which is usually a no-go. Still, it's here for reference with a smaller input size. I wanted to add a solution based on nlfilter(), but your problem seems to be too convoluted to be able to use that.
As always with vectorization, you can do it faster if you can spare the memory for it. You are trying to work with matrices of size [2*k+1,2*w+1] for each [i,j] index. This calls for 4d arrays, of shape [2*k+1,2*w+1,w,k]. For each element [i,j] you have a matrix with indices [:,:,i,j] to treat together with the corresponding elements of x1 and y1. It also helps that fft2 accepts multidimensional arrays.
Here's what I mean:
tic
x1 = randn(30,30); %// smaller input for tractability
y1 = randn(30,30);
g1 = randn(30,30);
m = size(g1,1);
n = size(g1,2);
w = 1/3*m;
k = 1/3*n;
%// these will be indexed on the fly:
%//x = x1(w+1:2*w,k+1:2*k); %// size [w,k]
%//y = x1(w+1:2*w,k+1:2*k); %// size [w,k]
x2 = zeros(2*k+1,2*w+1,w,k); %// size [2*k+1,2*w+1,w,k]
y2 = zeros(2*k+1,2*w+1,w,k); %// size [2*k+1,2*w+1,w,k]
g2 = zeros(2*k+1,2*w+1,w,k); %// size [2*k+1,2*w+1,w,k]
%// manual definition for now, maybe could be done smarter:
for ii=w+1:2*w %// don't use i and j as variables
for jj=k+1:2*k %// don't use i and j as variables
x2(:,:,ii-w,jj-k) = x1(ii-k:ii+k,jj-w:jj+w); %// check w vs k here
y2(:,:,ii-w,jj-k) = y1(ii-k:ii+k,jj-w:jj+w); %// check w vs k here
g2(:,:,ii-w,jj-k) = g1(ii-k:ii+k,jj-w:jj+w); %// check w vs k here
end
end
%// use bsxfun to operate on [2*k+1,2*w+1,w,k] vs [w,k]-sized arrays
%// need to introduce leading singletons with permute() in the latter
%// in order to have shape [1,1,w,k] compatible with the first array
ll = 1./sqrt(bsxfun(#minus,x2,permute(x1(w+1:2*w,k+1:2*k),[3,4,1,2])).^2 ...
+ bsxfun(#minus,y2,permute(y1(w+1:2*w,k+1:2*k),[3,4,1,2])).^2);
ll(isinf(ll)) = 0;
%// compute fft2, operating on [2*k+1,2*w+1,w,k]
%// will return fft2 for each index in the [w,k] subspace
nn = ifft2(fft2(g2).*fft2(ll));
%// we need nn(w+1,k+1,:,:) which is exactly of size [w,k] as needed
N = reshape(nn(w+1,k+1,:,:),[w,k]); %// quicker than squeeze()
N = real(N); %// this solution leaves an imaginary part of around 1e-12
czas=toc;
This is another step of my battle with multi-dimensional arrays in R, previous question is here :)
I have a big R array with the following dimensions:
> data = array(..., dim = c(x, y, N, value))
I'd like to perform a sort of bootstrap comparing the mean (see here for a discussion about it) obtained with:
> vmean = apply(data, c(1,2,3), mean)
With the mean obtained sampling the N values randomly with replacement, to explain better if data[1,1,,1] is equals to [v1 v2 v3 ... vN] I'd like to replace it with something like [v_k1 v_k2 v_k3 ... v_kN] with k values sampled with sample(N, N, replace = T).
Of course I want to AVOID a for loop. I've read this but I don't know how to perform an efficient indexing of this array avoiding a loop through x and y.
Any ideas?
UPDATE: the important thing here is that I want a different sample for each sample in the fourth (value) dimension, otherwise it would be simple to do something like:
> dataSample = data[,,sample(N, N, replace = T), ]
Also there's the compiler package which speeds up for loops by using a Just In Time compiler.
Adding thes lines at the top of your code enables the compiler for all code.
require("compiler")
compilePKGS(enable=T)
enableJIT(3)
setCompilerOptions(suppressAll=T)
I have the above loop running on the above variables:
A is a 2d array of size mxn.
mask is a 1d logical array of size 1xn
results is a 1d array of size 1xn
B is a vector of the form mx1
C is a mxm matrix, m is the same as the above.
Edit: expanded foo(x) into the function.
here is the code:
temp = (B.'*C*B);
for k = 1:n
x = A(:,k);
if(mask(k) == 1)
result(k) = (B.'*C*x)^2 / (temp*(x.'*C*x)); %returns scalar
end
end
take note, I am already successfully using the above code as a parfor loop instead of for. I was hoping you would be able to suggest some way to use meshgrid or the sort to yield better performance improvement. I don't think I have RAM problems so a solution can also be expensive memory wise.
Many thanks.
try this:
result=(B.'*C*A).^2./diag(temp*(A.'*C*A))'.*mask;
This vectorization via matrix multiplication will also make sure that result is a 1xn vector. In the code you provided there can be a case where the last elements in mask are zeros, in this case your code will truncate result to a smaller length, whereas, in the answer it'll keep these elements zero.
If your foo admits matrix input, you could do:
result = zeros(1,n); % preallocate result with zeros
mask = logical(mask); % make mask logical type
result(mask) = foo(A(mask),:); % compute foo for all selected columns
Here's the code I am using now, where decimal1 is an array of decimal values, and B is the number of bits in binary for each value:
for (i = 0:1:length(decimal1)-1)
out = dec2binvec(decimal1(i+1),B);
for (j = 0:B-1)
bit_stream(B*i+j+1) = out(B-j);
end
end
The code works, but it takes a long time if the length of the decimal array is large. Is there a more efficient way to do this?
bitstream = zeros(nelem * B,1);
for i = 1:nelem
bitstream((i-1)*B+1:i*B) = fliplr(dec2binvec(decimal1(i),B));
end
I think that should be correct and a lot faster (hope so :) ).
edit:
I think your main problem is that you probably don't preallocate the bit_stream matrix.
I tested both codes for speed and I see that yours is faster than mine (not very much tho), if we both preallocate bitstream, even though I (kinda) vectorized my code.
If we DONT preallocate the bitstream my code is A LOT faster. That happens because your code reallocates the matrix more often than mine.
So, if you know the B upfront, use your code, else use mine (of course both have to be modified a little bit to determine the length at runtime, which is no problem since dec2binvec can be called without the B parameter).
The function DEC2BINVEC from the Data Acquisition Toolbox is very similar to the built-in function DEC2BIN, so some of the alternatives discussed in this question may be of use to you. Here's one option to try, using the function BITGET:
decimal1 = ...; %# Your array of decimal values
B = ...; %# The number of bits to get for each value
nValues = numel(decimal1); %# Number of values in decimal1
bit_stream = zeros(1,nValues*B); %# Initialize bit stream
for iBit = 1:B %# Loop over the bits
bit_stream(iBit:B:end) = bitget(decimal1,B-iBit+1); %# Get the bit values
end
This should give the same results as your sample code, but should be significantly faster.