currently, I am using a nested for loop to calculate a finite difference scheme. The array that is calculated is dependent both, on the array values of the previous iteration of the outer loop and the neighboring values inside the array.
for l=2:timesteps %start time loop
%now calculate new concentration values for the current timestep
for k=2:nodes-1 %start spatial loop over inner nodes
conc(k,l)=conc(k,l-1)+(r*(conc(k-1,l-1)-(2*conc(k,l-1))+conc(k+1,l-1)));
end %end spatial loop over inner nodes
%calculate boundary nodes
conc(1,l)= c_surface;
conc(nodes,l)=conc(nodes)+(r*((2*conc(nodes-1))-(2*conc(nodes))));
end %end time loop
I would like a way to optimize the code for parallel computing.
I tried using the parfor feature but I have realized that this does not work for dependent loops. I understand that the workload is divided and sent to the workers, who do not communicate. This is the reason why parfor does not work. I wonder if there is an alternative.
Related
I am trying to think of an approach to multiply a Matrix and a Vector using Collective communication in MPI. The result of a Matrix M X N and a vector Nx1 will be a M x 1 vector. To parallelize we could run M+1 processes and each of the M processes calculates the one element of a row in the resulting vector. I am trying to come up with an approach that involves collective communication API like MPI_Reduce. Not sure what would be an ideal approach in this case.
My Thoughts:
Consider one row of the matrix x1,x2,...x,n and the vector is [v1,v2,v3...vn] the first row of the resulting vector will be single element x1v1+x2v2+...xnvn Now the only way to use the reduce function would be to get n processes to calculate each product xivi and then use MPI_Reduce to calculate the sum of all these. So the parent process will use reduce to calculate each row value. This does sound like it will work. I am not sure if there is a better way
I was wondering if the across structure uses an own cursor or a separated one? does it ensure that cursor hasn't moved and if so how can it be expressed for other examples?
ITERABLE uses so called external cursors, not internal ones merged with the underlying structure. As a result, iteration affects neither the structure nor any other cursor, created the same way. This is important to support nested or recursive iterations. For example, to find if there are duplicates, one can do the following:
across structure as i loop
across structure as j loop
if i.item = j.item then print ("Duplicates found.") end
end
end
Doing the same with internal cursors, like (note: the code is incorrect)
from structure.start until structure.after loop
x := structure.item
from structure.start until structure.after loop
if x = structure.item then print ("Duplicates found.") end
structure.forth
end
structure.forth
end
does not work, because the inner loop also changes the cursor of the outer loop.
The limitation of the cursors associated with ITERABLE is that the associated structure should not be changed during the whole course of iteration. This is not a theoretical limitation, but a practical one, to simplify implementation and to make it a bit more efficient.
I am a beginner in Python, trying to implement computer vision algorithms.I have to iterate over image read as a 2 dimensional array several times and I want to avoid using for loops.
For example, I want to multiply camera matrix P(3x4 dimension) with each row of coordinate matrix, where each row is dimension 1x4. I will of course take transpose of the row vector for matrix multiplication. Here is how I have implemented it using for loop. I initialize an empty array. Cameras is an object instance. So I loop over the object to find the total number of cameras. Counter gives me the total number of cameras. Then I read through each row of matrix v_h and perform the multiplication. I would like to accomplish the below task without using for loop in python. I believe it's possible but I don't know how to do it. For the number of points in thousands, using for loop is becoming very inefficient. I know my code is very inefficient and would appreciate any help.
for c in cameras:
counter=counter+1
for c in cameras:
v_to_s=np.zeros((v_h.shape[0],c.P.shape[0],counter),dtype=float)
for i in range(0,v_h.shape[0]):
v_to_s[i,:,cam_count]=np.dot(c.P,v_h[i,:].T)
numpy has matmul() which can perform multiplication
I want to have a time series of 2x2 complex matrices,Ot, and I then want to have 1-line commands to multiply an array of complex vectors Vt, by the array Ot where the position in the array is understood as the time instant. I will want Vtprime(i) = Ot(i)*Vt(i). Can anyone suggest a simple way to implement this?
Suppose I have a matrix, M(t), where the elements m(j,k) are functions of t and t is an element of some series (t = 0:0.1:3). Can I create an array of matrices very easily?
I understand how to have an array in Matlab, and even a two dimensional array, where each "i" index holds two complex numbers (j=0,1). That would be a way to have a "time series of complex 2-d vectors". A way to have a time series of complex matrices would be a three dimensional array. (i,j,k) denotes the "ith" matrix and j=0,1 and k=0,1 give the elements of that matrix.
If I go a head and treat matlab like a programming language with no special packages, then I end up having to write the matrix multiplications in terms of loops etc. This then goes towards all the matrix operations. I would prefer to use commands that will make all this very easy if I can.
This could be solved with Matlab array iterations like
vtprime(:) = Ot(:)*Vt(:)
if I understand your problem correctly.
Since Ot and Vt are both changing with time index, I think the best way to do this is in a loop. (If only one of Ot or Vt was changing with time, you could set it up in one big matrix multiplication.)
Here's how I would set it up: Ot is a complex 2x2xI 3D matrix, so that
Ot(:,:,i)
references the matrix at time instant i.
Vt is a complex 2xI matrix, so that
Vt(:,i)
references the vector at time instant i.
To do the multiplication:
for i = 1:I
Vtprime(:,i) = Ot(:,:,i) * Vt(:,i);
end
The resulting Vtprime is a 2xI matrix set up so that Vtprime(:,i) is the output at time instant i.
currently I have the following portion of code:
for i = 2:N-1
res(i) = k(i)/m(i)*x(i-1) -(c(i)+c(i+1))/m(i)*x(N+i) +e(i+1)/m(i)*x(i+1);
end
where as the variables k, m, c and e are vectors of size N and x is a vector of size 2*N. Is there any way to do this a lot faster using something like arrayfun!? I couldn't figure this out :( I especially want to make it faster by running on the GPU later and thus, arrayfun would be also helpfull since matlab doesn't support parallelizing for-loops and I don't want to buy the jacket package...
Thanks a lot!
You don't have to use arrayfun. It works if use use some smart indexing:
clear all
N=20;
k=rand(N,1);
m=rand(N,1);
c=rand(N,1);
e=rand(N,1);
x=rand(2*N,1);
% for-based implementation
%Watch out, you are not filling the first element of forres!
forres=zeros(N-1,1); %Initialize array first to gain some speed.
for i = 2:N-1
forres(i) = k(i)/m(i)*x(i-1) -(c(i)+c(i+1))/m(i)*x(N+i) +e(i+1)/m(i)*x(i+1);
end
%vectorized implementation
parres=k(2:N-1)./m(2:N-1).*x(1:N-2) -(c(2:N-1)+c(3:N))./m(2:N-1).*x(N+2:2*N-1) +e(3:N)./m(2:N-1).*x(3:N);
%compare results; strip the first element from forres
difference=forres(2:end)-parres %#ok<NOPTS>
Firstly, MATLAB does support parallel for loops via PARFOR. However, that doesn't have much chance of speeding up this sort of computation since the amount of computation is small compared to the amount of data you're reading and writing.
To restructure things for GPUArray "arrayfun", you need to make all the array references in the loop body refer to the loop iterate, and have the loop run across the full range. You should be able to do this by offsetting some of the arrays, and padding with dummy values. For example, you could prepend all your arrays with NaN, and replace x(i-1) with a new variable x_1 = [x(2:N) NaN]