I have an issue with a code performing some array operations. It is too slow, because I use loops and input data are quite big. It was the easiest way for me, but now I am looking for something faster than for loops. I was trying to optimize or rewrite code, but unsuccessful. I really aprecciate Your help.
In my code I have three arrays x1, y1 (coordinates of points in grid), g1 (values in the points) and for example their size is 300 x 300. I treat each matrix as composition of 9 and I make calculation for points in the middle one. For example I start with g1(101,101), but I am using data from g1(1:201,1:201)=g2. I need to calculate distance from each point of g1(1:201,1:201) to g1(101,101) (ll matrix), then I calculate nn as it is in the code, next I find value for g1(101,101) from nn and put it in N array. Then I go to g1(101,102) and so on until g1(200,200), where in this last case g2=g1(99:300,99:300).
As i said, this code is not very efficient, even I have to use larger arrays than I gave in the example, it takes too much time. I hope I explain enough clearly what I expect from the code. I was thinking of using arrayfun, but I have never worked with this function, so I don't know how should use it, however it seems to me it won't handle. Maybe there are other solutions, however I couldn't find anything apropriate.
tic
x1=randn(300,300);
y1=randn(300,300);
g1=randn(300,300);
m=size(g1,1);
n=size(g1,2);
w=1/3*m;
k=1/3*n;
N=zeros(w,k);
for i=w+1:2*w
for j=k+1:2*k
x=x1(i,j);
y=y1(i,j);
x2=y1(i-k:i+k,j-w:j+w);
y2=y1(i-k:i+k,j-w:j+w);
g2=g1(i-k:i+k,j-w:j+w);
ll=1./sqrt((x2-x).^2+(y2-y).^2);
ll(isinf(ll))=0;
nn=ifft2(fft2(g2).*fft2(ll));
N(i-w,j-k)=nn(w+1,k+1);
end
end
czas=toc;
For what it's worth, arrayfun() is just a wrapper for a for loop, so it wouldn't lead to any performance improvements. Also, you probably have a typo in the definition of x2, I'll assume that it depends on x1. Otherwise it would be a superfluous variable. Also, your i<->w/k, j<->k/w pairing seems inconsistent, you should check that as well. Also also, just timing with tic/toc is rarely accurate. When profiling your code, put it in a function and run the timing multiple times, and exclude the variable generation from the timing. Even better: use the built-in profiler.
Disclaimer: this solution will likely not help for your actual problem due to its huge memory need. For your input of 300x300 matrices this works with arrays of size 300x300x100x100, which is usually a no-go. Still, it's here for reference with a smaller input size. I wanted to add a solution based on nlfilter(), but your problem seems to be too convoluted to be able to use that.
As always with vectorization, you can do it faster if you can spare the memory for it. You are trying to work with matrices of size [2*k+1,2*w+1] for each [i,j] index. This calls for 4d arrays, of shape [2*k+1,2*w+1,w,k]. For each element [i,j] you have a matrix with indices [:,:,i,j] to treat together with the corresponding elements of x1 and y1. It also helps that fft2 accepts multidimensional arrays.
Here's what I mean:
tic
x1 = randn(30,30); %// smaller input for tractability
y1 = randn(30,30);
g1 = randn(30,30);
m = size(g1,1);
n = size(g1,2);
w = 1/3*m;
k = 1/3*n;
%// these will be indexed on the fly:
%//x = x1(w+1:2*w,k+1:2*k); %// size [w,k]
%//y = x1(w+1:2*w,k+1:2*k); %// size [w,k]
x2 = zeros(2*k+1,2*w+1,w,k); %// size [2*k+1,2*w+1,w,k]
y2 = zeros(2*k+1,2*w+1,w,k); %// size [2*k+1,2*w+1,w,k]
g2 = zeros(2*k+1,2*w+1,w,k); %// size [2*k+1,2*w+1,w,k]
%// manual definition for now, maybe could be done smarter:
for ii=w+1:2*w %// don't use i and j as variables
for jj=k+1:2*k %// don't use i and j as variables
x2(:,:,ii-w,jj-k) = x1(ii-k:ii+k,jj-w:jj+w); %// check w vs k here
y2(:,:,ii-w,jj-k) = y1(ii-k:ii+k,jj-w:jj+w); %// check w vs k here
g2(:,:,ii-w,jj-k) = g1(ii-k:ii+k,jj-w:jj+w); %// check w vs k here
end
end
%// use bsxfun to operate on [2*k+1,2*w+1,w,k] vs [w,k]-sized arrays
%// need to introduce leading singletons with permute() in the latter
%// in order to have shape [1,1,w,k] compatible with the first array
ll = 1./sqrt(bsxfun(#minus,x2,permute(x1(w+1:2*w,k+1:2*k),[3,4,1,2])).^2 ...
+ bsxfun(#minus,y2,permute(y1(w+1:2*w,k+1:2*k),[3,4,1,2])).^2);
ll(isinf(ll)) = 0;
%// compute fft2, operating on [2*k+1,2*w+1,w,k]
%// will return fft2 for each index in the [w,k] subspace
nn = ifft2(fft2(g2).*fft2(ll));
%// we need nn(w+1,k+1,:,:) which is exactly of size [w,k] as needed
N = reshape(nn(w+1,k+1,:,:),[w,k]); %// quicker than squeeze()
N = real(N); %// this solution leaves an imaginary part of around 1e-12
czas=toc;
Related
I need to multiply parts of a column vector with a fixed row vector. I solved this problem using a for-loop. However, I am wondering if the performance can be improved as I have to perform this kind of computation around 50 million times. Here's my code so far:
multMat = 1:5;
mat = randi(5,10,1);
windowSize = 5;
vout = nan(10,1);
for r = windowSize : 10
vout(r) = multMat * mat( (r - windowSize + 1) : r);
end
I was thinking about uisng arrayfun. However, first I don't know how to adress the cell range (i.e. the previous five cells including the current cell), and second, I am not sure if arrayfun will be any faster than using the loop?
This sliding vector multiplication you're describing is an example of what is known as convolution. The following produces the same result as the loop in your example:
vout = [nan(windowSize-1,1);
conv(mat,flip(multMat),'valid')];
If your output doesn't really need the leading NaN values which aren't overwritten in your loop then the conv expression is sufficient without concatenating the NaN elements to it.
For sufficiently large vectors this is of course not guaranteed to be as fast as you'd like it to be, but MATLAB's built-in convolution implementation is likely to be pretty close to an optimal tool for the job.
I have a vector A of size 7812x1 and would like to calculate the sum of fixed windows of length 21 (so 372 blocks). This should be reiterated, so that the output should return a vector of size 372x1.
I have t=7812, p=372, w=21;
for t=1:p
out = sum(A((t*w-w+1):(t*w)));
end
This code, however, does not work. My idea is that the part ((t*w-w+1):(t*w)) allows for something like a rolling window. The window is of length 21, so there is not really a need to express is with variables, yet I think it keeps some flexibility.
I've seen potentially related questions (such a partial sum of a vector), yet I'm not sure whether this would result the output desired.
Reshape into a matrix so that each block of A is a column, and compute the sum of each colum:
result = sum(reshape(A, w, []), 1);
Following your idea of using a rolling/moving window (requires Matlab 2016a or later):
t = 7812; w = 21; % your parameters
A = rand(t,1); % generate some test data
B = movsum(A,w); % the sum of a moving window with width w
out = B(ceil(w/2):w:end); % get every w'th element
I am getting an error using repmat. My Matlab version is 2017a. "Requested 3711450x2726 (75.4GB) array exceeds maximum array size..." First, some context.
I have an adjacency matrix of social network data call it D. D is 2725x2725 with 1s denoting a link between agents i and j and 0s otherwise. I have been provided a function and sub-functions for a network formation model. There are K regressors (x variables). The model requires forming a dyad-specific regressor matrix W that is W = 0.5N(N-1) x K. In my data, this is 3711450 x K. For a start, I select only one x variable so K=1.
In the main function, there are two steps. The first step calculates the joint MLE from a logit. I have a problem in the second step computation of the variance covariance matrix with array size. Inside this step, there is a calculation that creates a 3711450 x n (2725) matrix using repmat.
INFO = ((repmat((exp_Xbeta ./ (1+exp_Xbeta).^2),1,K) .* X)'*X);
exp_Xbeta is 3711450 x K and X is a sparse 3711450 x 2725 matrix with Bytes = 178171416 of class double. The error occurs at INFO.
I've tried converting X to a tall matrix but thus far no joy. I've tried adding sparse to the INFO line but again no joy. Anyone have any ideas short of going to a cluster or getting more ram? Could I somehow convert X from a sparse matrix to a full matrix inside a datastore and then call the datastore using tall? I have not been able to figure out how to do that if it is possible.
Once INFO is constructed as an array it will be used later in one of the sub-functions. So, it needs to be callable. In case you're curious, INFO is the second derivative matrix.
I have found that producing the INFO matrix all at once was too much for my memory constraints. I split up the steps, but still, repmat and subsequent steps were a problem. Now, I've turned to building up the INFO matrix one step at a time, while never holding more than exp_Xbeta, X, and two vectors in memory. Replacing the construction of INFO with
for i = 1:d
s1_i = step1(:,1).*X(:,i);
s1_i = s1_i';
for j = 1:d;
INFO(i,j) = s1_i*X(:,j);
end
clear s1_i;
end
has dropped the memory requirement, though its slow, and things seem to be working. For anyone interested, below is a little example illustrating the point.
clear all
N = 20
n = 0.5*N*(N-1)
exp_Xbeta = rand(n,1);
X = rand(n,N);
step1 = (exp_Xbeta ./ (1+exp_Xbeta).^2);
[c,d] = size(X);
INFO = zeros(d,d);
for i = 1:d
s1_i = step1(:,1).*X(:,i)
s1_i = s1_i'
for j = 1:d
INFO(i,j) = s1_i*X(:,j)
end
clear s1_i
end
K = 1
INFO2 = ((repmat((exp_Xbeta ./ (1+exp_Xbeta).^2),1,K) .* X)'*X);
% Methods produce equivalent matrices
INFO
INFO2
I have a three-dimensional matrix of these sizes, approximately
A = rand(20, 1000, 20);
where the first and third dimensions are always the same length. I want to zero the elements in a main diagonal slice. This does what I mean
for ii = 1:size(A, 1)
A(ii, :, ii) = 0;
end
Is there a vectorized or otherwise faster way to do this? This code runs about 100,000 times, with these approximate sizes, but not the exact same sizes each time.
You can use logical indexing for multible tailing dimensions while using subscript indexing for all previous dimensions individually. This way you can easily do the operation on an 1000 20 20 matrix. To apply this to your matrix, permute is required which might be slow:
n=size(A,3)
A=permute(A,[2,1,3]);
A(:,diag(true(n,1)))=0;
A=permute(A,[2,1,3]);
If it would be possible to permanently swap the dimensions of A in your code and avoid the permute, this would lead to the fastest solution.
Alternatively you can use repmat to expand the index to the dimensions of A
ix=repmat(reshape(diag(true(n,1)),n,1,n),[1,size(A,2),1])
A(ix)=0
For matrices of the same size you could keep ix. Not having access to MATLAB right now, I don't know which solution is faster.
You can use bsxfun to build a linear index of the elements to be zeroed:
ind = bsxfun(#plus, (0:size(A,2)-1).'*size(A,1), 1:size(A,1)*size(A,2)+1:numel(A) );
A(ind) = 0;
I have the above loop running on the above variables:
A is a 2d array of size mxn.
mask is a 1d logical array of size 1xn
results is a 1d array of size 1xn
B is a vector of the form mx1
C is a mxm matrix, m is the same as the above.
Edit: expanded foo(x) into the function.
here is the code:
temp = (B.'*C*B);
for k = 1:n
x = A(:,k);
if(mask(k) == 1)
result(k) = (B.'*C*x)^2 / (temp*(x.'*C*x)); %returns scalar
end
end
take note, I am already successfully using the above code as a parfor loop instead of for. I was hoping you would be able to suggest some way to use meshgrid or the sort to yield better performance improvement. I don't think I have RAM problems so a solution can also be expensive memory wise.
Many thanks.
try this:
result=(B.'*C*A).^2./diag(temp*(A.'*C*A))'.*mask;
This vectorization via matrix multiplication will also make sure that result is a 1xn vector. In the code you provided there can be a case where the last elements in mask are zeros, in this case your code will truncate result to a smaller length, whereas, in the answer it'll keep these elements zero.
If your foo admits matrix input, you could do:
result = zeros(1,n); % preallocate result with zeros
mask = logical(mask); % make mask logical type
result(mask) = foo(A(mask),:); % compute foo for all selected columns