I have the antenna array factor expression here:
I have coded the array factor expression as given below:
lambda = 1;
M = 100;N = 200; %an M x N array
dx = 0.3*lambda; %inter-element spacing in x direction
m = 1:M;
xm = (m - 0.5*(M+1))*dx; %element positions in x direction
dy = 0.4*lambda;
n = 1:N;
yn = (n - 0.5*(N+1))*dy;
thetaCount = 360; % no of theta values
thetaRes = 2*pi/thetaCount; % theta resolution
thetas = 0:thetaRes:2*pi-thetaRes; % theta values
phiCount = 180;
phiRes = pi/phiCount;
phis = -pi/2:phiRes:pi/2-phiRes;
cmpWeights = rand(N,M); %complex Weights
AF = zeros(phiCount,thetaCount); %Array factor
tic
for i = 1:phiCount
for j = 1:thetaCount
for p = 1:M
for q = 1:N
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))));
end
end
end
end
How can I vectorize the code for calculating the Array Factor (AF).
I want the line:
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))));
to be written in vectorized form (by modifying the for loop).
Approach #1: Full-throttle
The innermost nested loop generates this every iteration - cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i)))), which are to summed up iteratively to give us the final output in AF.
Let's call the exp(.... part as B. Now, B basically has two parts, one is the scalar (2*pi*1j/lambda) and the other part
(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))) that is formed from the variables that are dependent on
the four iterators used in the original loopy versions - i,j,p,q. Let's call this other part as C for easy reference later on.
Let's put all that into perspective:
Loopy version had AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i)))), which is now equivalent to AF(i,j) = AF(i,j) + cmpWeights(q,p)*B, where B = exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i)))).
B could be simplified to B = exp((2*pi*1j/lambda)* C), where C = (xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))).
C would depend on the iterators - i,j,p,q.
So, after porting onto a vectorized way, it would end up as this -
%// 1) Define vectors corresponding to iterators used in the loopy version
I = 1:phiCount;
J = 1:thetaCount;
P = 1:M;
Q = 1:N;
%// 2) Create vectorized version of C using all four vector iterators
mult1 = bsxfun(#times,sin(thetas(J)),cos(phis(I)).'); %//'
mult2 = bsxfun(#times,sin(thetas(J)),sin(phis(I)).'); %//'
mult1_xm = bsxfun(#times,mult1(:),permute(xm,[1 3 2]));
mult2_yn = bsxfun(#times,mult2(:),yn);
C_vect = bsxfun(#plus,mult1_xm,mult2_yn);
%// 3) Create vectorized version of B using vectorized C
B_vect = reshape(exp((2*pi*1j/lambda)*C_vect),phiCount*thetaCount,[]);
%// 4) Final output as matrix multiplication between vectorized versions of B and C
AF_vect = reshape(B_vect*cmpWeights(:),phiCount,thetaCount);
Approach #2: Less-memory intensive
This second approach would reduce the memory traffic and it uses the distributive property of exponential - exp(A+B) = exp(A)*exp(B).
Now, the original loopy version was this -
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*...
(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))))
So, after using the distributive property, we would endup with something like this -
K = (2*pi*1j/lambda)
part1 = K*xm(p)*sin(thetas(j))*cos(phis(i));
part2 = K*yn(q)*sin(thetas(j))*sin(phis(i));
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp(part1)*exp(part2);
Thus, the relevant vectorized approach would become something like this -
%// 1) Define vectors corresponding to iterators used in the loopy version
I = 1:phiCount;
J = 1:thetaCount;
P = 1:M;
Q = 1:N;
%// 2) Define the constant used at the start of EXP() call
K = (2*pi*1j/lambda);
%// 3) Perform the sine-cosine operations part1 & part2 in vectorized manners
mult1 = K*bsxfun(#times,sin(thetas(J)),cos(phis(I)).'); %//'
mult2 = K*bsxfun(#times,sin(thetas(J)),sin(phis(I)).'); %//'
%// Perform exp(part1) & exp(part2) in vectorized manners
part1_vect = exp(bsxfun(#times,mult1(:),xm));
part2_vect = exp(bsxfun(#times,mult2(:),yn));
%// Perform multiplications with cmpWeights for final output
AF = reshape(sum((part1_vect*cmpWeights.').*part2_vect,2),phiCount,[])
Quick Benchmarking
Here are the runtimes with the input data listed in the question for the original loopy approach and proposed approach #2 -
---------------------------- With Original Approach
Elapsed time is 358.081507 seconds.
---------------------------- With Proposed Approach #2
Elapsed time is 0.405038 seconds.
The runtimes suggests a crazy performance improvement with Approach #2!
The basic trick is to figure out what things are constant, and what things depend on the subscript term - and therefore are matrix terms.
Within the sum:
C(n,m) is a matrix
2π/λ is a constant
sin(θ)cos(φ) is a constant
x(m) and y(n) are vectors
So the two things I would do are:
Expand the xm and ym into matrices using meshgrid()
Take all the constant term stuff outside the loop.
Like this:
...
piFactor = 2 * pi * 1j / lambda;
[xgrid, ygrid] = meshgrid(xm, ym); % xgrid and ygrid will be size (N, M)
for i = 1:phiCount
for j = 1:thetaCount
xFactor = sin(thetas(j)) * cos(phis(i));
yFactor = sin(thetas(j)) * sin(phis(i));
expFactor = exp(piFactor * (xgrid * xFactor + ygrid * yFactor)); % expFactor is size (N, M)
elements = cmpWeights .* expFactor; % elements of sum, size (N, M)
AF(i, j) = AF(i, j) + sum(elements(:)); % sum and then integrate.
end
end
You could probably figure out how to vectorise the outer loop too, but hopefully that gives you a starting point.
Related
I have a code which is doing some processing over every point in a 3D matrix. The array input_vec_1D is accessed by an unusual index ind_prime which depends on the loop variable (for context, the index is determined from an algorithm that I am using in Eq. 42e of this paper, and my full code is here). I have managed to get it working correctly by first turning the matrix to a 1D array, calculating the correct indices, doing the processing, and reshaping back to 3D afterwards:
Nx = 8; Ny = 6; Nz = 4; Ntot = Nx*Ny*Nz; % Number of points
xvals = rand(1,Nx); yvals = rand(1,Ny); zvals = rand(1,Nz); % Grid vectors
input_vec_3D = rand(Ny,Nx,Nz); % Dummy 3D array
factor1 = 3.6*xvals; % some constant times xvals
factor2 = 1.2*yvals;
factor3 = 8.5*zvals;
input_vec_1D = reshape( permute(input_vec_3D,[3,1,2]) , [Ntot 1]); % Reshape to 1D for loop
output_vec = zeros(Ntot,1);
for ind = 1:Ntot
j1 = floor( floor( (ind-1)/Nz ) /Ny ) + 1;
j2 = mod( floor( (ind-1)/Nz ) , Ny ) + 1;
j3 = mod( (ind-1) , Nz ) + 1;
n1 = mod( 5*(j1-1) ,Nx);
n2 = mod( 3*(j2-1) ,Ny);
n3 = mod( 2*(j3-1) ,Nz);
ind_prime = mod( ( n3 + Nz*(n2 + Ny*n1) ) , Ntot ) + 1; % a different index for input_vec
output_vec(ind) = output_vec(ind) + input_vec_1D(ind_prime) * factor1(j1)*factor2(j2)*factor3(j3);
end
output_vec = permute( reshape( output_vec, [Nz,Ny,Nx] ) , [2,3,1] ); % Reshape back to 3D
This loop over all elements is the slowest part of my code, so I would like to speed it up - by vectorizing or otherwise.
My arrays are typically 512x512x1024 complex doubles, so it is crucial for my application that I did not store any temporary extra large matrices due to limited RAM (around 6 GB), which precludes the use of meshgrid() to generate the factors (notice that factor1, factor2, factor3 are only 1D vectors, so memory usage for them is small).
I was kindly helped with a very similar loop here, which was solved using Matlab's implicit expansion in that case. However, this is more complicated, because in the processing line different indices are used ind_prime, ind, and j.
You can vectorise this entirely, which will be quicker (about 50% by my tests of 512x512x10 input matrix). But that involves the creation of several arrays, which we can reduce the size of in two ways
Use an integer data type (e.g. uint32) for the indicies. uint32 is 4 bytes per index, compared to 8 bytes for a double so that's a decent saving, especially when indices are always integers anyway. Note to make the most of this you have to use idivide instead of ./ to avoid MATLAB converting to double internally, and you have to convert Nx/Ny/Nz to uint32 for the same reason.
We can't use uint8 or uint16 as their max values are too small to cater to your large arrays.
Also note that (by default) idivide uses fix rounding, i.e. rounding towards 0, so you can skip using floor and maybe make up a small amount of performance there.
Recycle your arrays. Instead of using j1..3 and n1..3, as 6 indexing arrays, we can just reorder the operations slightly and recycle j1..3.
This comes together like so:
IND = uint32(1:Ntot); % Shorthand, could skip defining this and write each tiem if memory was tight
Nx = uint32(Nx); Ny = uint32(Ny); Nz = uint32(Nz); Ntot = Nx*Ny*Nz; % uint8 conversion
J1 = idivide( idivide(IND-1,Nz), Ny ) + 1; % idivide to avoid "double" casting, does "fix" rounding
J2 = mod( idivide( IND-1, Nz ), Ny ) + 1; % idivide to avoid "double" casting, does "fix" rounding
J3 = mod( IND-1, Nz ) + 1;
FAC = (factor1(J1).*factor2(J2).*factor3(J3)).'; % done here so we can recycle J1..3
J1 = mod( 5*(J1-1), Nx );
J2 = mod( 3*(J2-1), Ny );
J3 = mod( 2*(J3-1), Nz );
ind_prime2 = (mod( ( J3 + Nz*(J2 + Ny*J1) ) , Ntot ) + 1).';
output_vec3 = input_vec_1D(ind_prime2) .* FAC;
I would like to solve the transient diffusion equation for two compounds A and B as shown in image. I think the image is a better way to show my problem.
Diffusion equations and boundary conditions.
As you can see, the reaction only occurs at the surface and the flux of A is equal to flux of B. So, this two equations are coupled only at surface. The boundary condition is similar to ROBIN boundary condition, explained in Fipy manual. However, the main difference is the existence of the second variable in boundary condition. Does anybody have any idea how to formulate this boundary condition in Fipy?
I guess I need to add some extra term to ROBIN boundary condition, but I couldn't figure it out.
I really appreciate your help.
This is the code which solves the mentioned equation with ROBIN boundary condition # x=0.
-D(dC_A/dx) = -kC_A
-D(dC_B/dx) = -kC_B
In this condition, I can easily use ROBIN boundary condition to solve equations. The results seem reasonable for this boundary condition.
"""
Question for StackOverflow
"""
#%%
from fipy import Variable, FaceVariable, CellVariable, Grid1D, TransientTerm, DiffusionTerm, Viewer, ImplicitSourceTerm
from fipy.tools import numerix
#%%
##### Model parameters
L= 8.4853e-4 # m boundary layer thickness
dx= 1e-8 # mesh size
nx = int(L/dx)+1 # number of meshes
D = 1e-9 # m^2/s diffusion coefficient
k = 1e-4 # m/s reaction coefficient R = k [c_A],
c_inf = 0. # ROBIN general condition, once can think R = k ([c_A]-[c_inf])
c_init = 1. # Initial concentration of compound A, mol/m^3
#%%
###### Meshing and variable definition
mesh = Grid1D(nx=nx, dx=dx)
c_A = CellVariable(name="c_A", hasOld = True,
mesh=mesh,
value=c_init)
c_B = CellVariable(name="c_B", hasOld = True,
mesh=mesh,
value=0.)
#%%
##### Right boundary condition
valueRight = c_init
c_A.constrain(valueRight, mesh.facesRight)
c_B.constrain(0., mesh.facesRight)
#%%
### ROBIN BC requirements, defining cellDistanceVectors
## This code is for fixing celldistance via this link:
## https://stackoverflow.com/questions/60073399/fipy-problem-with-grid2d-celltofacedistancevectors-gives-error-uniformgrid2d
MA = numerix.MA
tmp = MA.repeat(mesh._faceCenters[..., numerix.NewAxis,:], 2, 1)
cellToFaceDistanceVectors = tmp - numerix.take(mesh._cellCenters, mesh.faceCellIDs, axis=1)
tmp = numerix.take(mesh._cellCenters, mesh.faceCellIDs, axis=1)
tmp = tmp[..., 1,:] - tmp[..., 0,:]
cellDistanceVectors = MA.filled(MA.where(MA.getmaskarray(tmp), cellToFaceDistanceVectors[:, 0], tmp))
#%%
##### Defining mask and Robin BC at left boundary
mask = mesh.facesLeft
Gamma0 = D
Gamma = FaceVariable(mesh=mesh, value=Gamma0)
Gamma.setValue(0., where=mask)
dPf = FaceVariable(mesh=mesh,
value=mesh._faceToCellDistanceRatio * cellDistanceVectors)
n = mesh.faceNormals
a = FaceVariable(mesh=mesh, value=k, rank=1)
b = FaceVariable(mesh=mesh, value=D, rank=0)
g = FaceVariable(mesh=mesh, value= k * c_inf, rank=0)
RobinCoeff = (mask * Gamma0 * n / (-dPf.dot(a)+b))
#%%
#### Making a plot
viewer = Viewer(vars=(c_A, c_B),
datamin=-0.2, datamax=c_init * 1.4)
viewer.plot()
#%% Time step and simulation time definition
time = Variable()
t_simulation = 4 # seconds
timeStepDuration = .05
steps = int(t_simulation/timeStepDuration)
#%% PDE Equations
eqcA = (TransientTerm(var=c_A) == DiffusionTerm(var=c_A, coeff=Gamma) +
(RobinCoeff * g).divergence
- ImplicitSourceTerm(var=c_A, coeff=(RobinCoeff * a.dot(-n)).divergence))
eqcB = (TransientTerm(var=c_B) == DiffusionTerm(var=c_B, coeff=Gamma) -
(RobinCoeff * g).divergence
+ ImplicitSourceTerm(var=c_B, coeff=(RobinCoeff * a.dot(-n)).divergence))
#%% A loop for solving PDE equations
while time() <= (t_simulation):
time.setValue(time() + timeStepDuration)
c_B.updateOld()
c_A.updateOld()
res1=res2 = 1e10
viewer.plot()
while (res1 > 1e-6) & (res2 > 1e-6):
res1 = eqcA.sweep(var=c_A, dt=timeStepDuration)
res2 = eqcB.sweep(var=c_B, dt=timeStepDuration)
It's possible to solve this as a fully implicit system. The code below simplifies the problem to have a unity domain size and diffusion coefficient. k is set to 0.2. It captures the analytical solution quite well with some caveats (see below).
from fipy import (
CellVariable,
TransientTerm,
DiffusionTerm,
ImplicitSourceTerm,
Grid1D,
Viewer,
)
L = 1.0
nx = 1000
dx = L / nx
konstant = 0.2
coeff = 1.0
mesh = Grid1D(nx=nx, dx=dx)
var_a = CellVariable(mesh=mesh, value=1.0, hasOld=True)
var_b = CellVariable(mesh=mesh, value=0.0, hasOld=True)
var_a.constrain(1.0, mesh.facesRight)
var_b.constrain(0.0, mesh.facesRight)
coeff_mask = ~mesh.facesLeft * coeff
boundary_coeff = konstant * (mesh.facesLeft * mesh.faceNormals).divergence
eqn_a = TransientTerm(var=var_a) == DiffusionTerm(
coeff_mask, var=var_a
) - ImplicitSourceTerm(boundary_coeff, var=var_a) + ImplicitSourceTerm(
boundary_coeff, var=var_b
)
eqn_b = TransientTerm(var=var_b) == DiffusionTerm(
coeff_mask, var=var_b
) - ImplicitSourceTerm(boundary_coeff, var=var_b) + ImplicitSourceTerm(
boundary_coeff, var=var_a
)
eqn = eqn_a & eqn_b
for _ in range(5):
var_a.updateOld()
var_b.updateOld()
eqn.sweep(dt=1e10)
Viewer((var_a, var_b)).plot()
print("var_a[0] (expected):", (1 + konstant) / (1 + 2 * konstant))
print("var_b[0] (expected):", konstant / (1 + 2 * konstant))
print("var_a[0] (actual):", var_a[0])
print("var_b[0] (actual):", var_b[0])
input("wait")
Note the following:
As written the boundary condition is only first order accurate, which doesn't really matter for this problem, but might hurt you for in higher dimensions. There might be ways to fix this such as having a small cell near the boundary or adding in an explicit second order correction for the boundary condition.
The equations are coupled here. If uncoupled it would probably require loads of iterations to reach equilibrium.
It did require a few iterations to reach equilibrium, but it shouldn't. That's probably due to the solver not converging adequately without a few tries. It might be that coupled equations have some bad conditioning.
I have an nx4 matrix A representing n spheres, and an mx3 matrix B representing m points. I need to test whether these m points are inside any of the spheres. I can do this using a for loop, but with large n and m this method is very inefficient. How can I vectorize this operation? My current method is
A = [0.8622 1.1594 0.7457 0.6925;
1.4325 0.2559 0.0520 0.4687;
1.8465 0.3979 0.2850 0.4259;
1.4387 0.8713 1.6585 0.4616;
0.2383 1.5208 0.5415 0.9417;
1.6812 0.2045 0.1290 0.1972];
B = [0.5689 0.9696 0.8196;
0.5211 0.4462 0.6254;
0.9000 0.4894 0.2202;
0.4192 0.9229 0.4639];
for i=1:size(B,1)
mask = vecnorm(A(:, 1:3) - B(i,:), 2, 2) < A(:, 4);
if sum(mask) > 0
C(i) = true;
else
C(i) = false;
end %if
end %for
I tested the method suggested by #LuisMendo, and it seems it only speeds up the calculation for quite small m and n, but for large m and n, say, around 10000 for my problem, the improvement is very limited. But #NickyMattsson gave me some hint. Because logical operation in matlab is faster than vecnorm, I first use a rough check to find the spheres near the point, and then do a fine check:
A = [0.8622 1.1594 0.7457 0.6925;
1.4325 0.2559 0.0520 0.4687;
1.8465 0.3979 0.2850 0.4259;
1.4387 0.8713 1.6585 0.4616;
0.2383 1.5208 0.5415 0.9417;
1.6812 0.2045 0.1290 0.1972];
B = [0.5689 0.9696 0.8196;
0.5211 0.4462 0.6254;
0.9000 0.4894 0.2202;
0.4192 0.9229 0.4639];
ids = 1:size(A, 1);
for i=1:size(B,1)
% first a rough check
xbound = abs(A(:, 1) - B(i, 1)) < A(:, 4);
ybound = abs(A(:, 2) - B(i, 2)) < A(:, 4);
zbound = abs(A(:, 3) - B(i, 3)) < A(:, 4);
nears = ids(xbound & ybound & zbound);
if isempty(nears)
C(i) = false;
else
% then a fine check
mask = vecnorm(A(nears, 1:3) - B(i,:), 2, 2) < A(nears, 4);
if sum(mask) > 0
C(i) = true;
else
C(i) = false;
end
end
end
This may reduce the time to 1/2 or 1/3, which is acceptable, and if I divide m and n into batches it may be even faster without too heavy memory burden. #CrisLuengo mentioned the R*-tree method, but it seems that the implementation is quite complicated XD
This uses implicit expansion to compute all distances between points and sphere centers, and then to compare those with the sphere radii:
C = any(vecnorm(permute(B, [1 3 2]) - permute(A(:,1:3), [3 1 2]), 2, 3) < A(:,4).', 2);
This is probably faster than the loop approach, but also more memory-intensive, because an intermediate m×n×3 array is computed.
I have two nested loops which I want to parallelize.
n=100;
x=rand(1,n);
m=5;
xx=rand(1,m);
r = zeros(1,m);
for i=1:n
q = ones(1,m);
for j=1:n
q = q .* (xx-x(j))/(x(i)-x(j));
end
r = r + q;
end
In order to prepare this function for palatalization, I changed local variables to global ones.
n=100;
x=rand(1,n);
m=5;
xx=rand(1,m);
r = ones(n,m);
for i=1:n
for j=1:n
r(i,:) = r(i,:) .* (xx-x(j))/x(i)-x(j))
end
end
r = sum(r,1);
Instead of transforming a whole vector at once, let's try it with only one scalar. Also use the simplest element of x which depends on i and j. I also removed the sum in the end. We can add it back later.
n=100;
x=rand(1,n);
r = ones(n,1);
for i=1:n
for j=1:n
y = x(i)+x(j);
r(i) = r(i) * y;
end
end
The code above is the example function, I want to parallelize.
The inner loop always needs to access the same vector r(i) for one iteration of the outer loop i. This access is a write operation (*=), but the order doesn't matter for this operation.
Since nested parfor loops are not allowed in Matlab, I tried to pack everything in one parfor loop.
n=100;
x=rand(1,n);
r = ones(n,1);
parfor k=1:(n*n)
%i = floor((k-1)/n)+1; % outer loop
%j = mod(k-1,n)+1; % inner loop
[j,i] = ind2sub([n,n],k);
y = x(i)+x(j);
r(i) = r(i) * y; % ERROR here
end
Since indies are calculated, Matlab still doesn't know hot to slice it.
So, I decided to move the multiplication operation outside and use linear indices.
n=100;
x=rand(1,n);
r = ones(n,n);
parfor k=1:(n*n)
[j,i] = ind2sub([n,n],k);
y = x(i)+x(j);
r(k) = y;
end
r = prod(r,1);
r = squeeze(r); % remove singleton dimensions
While this does work for scalar values in the inner loop, it doesn't work for vectors in the inner loop since indices must be again calculated.
n=100;
x=rand(1,n);
m=5;
r = ones(n,n,m);
parfor k=1:(n*n)
[j,i] = ind2sub([n,n],k);
y = x(i)+x(j);
r((k-1)*m+1:k*m) = y.*(1:m); % ERROR here
end
r = prod(r,1);
r = squeeze(r); % remove singleton dimensions
Although it does work, when I reshape the array.
n=100;
x=rand(1,n);
m=5;
r = ones(n*n,m);
parfor k=1:(n*n)
[j,i] = ind2sub([n,n],k);
y = x(i)+x(j);
r(k,:) = y.*(1:m); % ERROR here
end
r = reshape(r,n,n,m);
r = prod(r,2);
r = squeeze(r); % remove singleton dimensions
This way, I can transform a vector xx to another vector r.
n=100;
x=rand(1,n);
m=5;
xx=rand(1,m);
r = ones(n*n,m);
parfor k=1:(n*n)
[j,i] = ind2sub([n,n],k);
y = x(i)+x(j);
r(k,:) = y.*xx; % ERROR here
end
r = reshape(r,n,n,m);
r = prod(r,2);
r = sum(r,1);
r = reshape(r,size(xx)); % reshape output vector to input vector
For my parallel solution, I need an n*n*m array instead of a n*m array which seems quite inefficient.
Is there a better way of doing what I want?
What are the advantages of other ways (prettier code, less CPU, less RAM, ...)?
UPDATE
In the order of trying to simplify the task and reduce it to the minimum working example of the problem, I omitted the check of i~=j to make it easier, although resulting in an all NaN result. Further, the nature of the code results in an all 1 result when adding this check. In order for the code to make sense, the factors are just weights for another vector z.
The more elaborate problem looks as follows:
n=100;
x=rand(1,n);
z=rand(1,n);
m=5;
xx=rand(1,m);
r = zeros(1,m);
for i=1:n
q = ones(1,m);
for j=1:n
if i~=j
q = q .* (xx-x(j))/(x(i)-x(j));
end
end
r = r + z(i) .* q;
end
This problem does not need any parallel for loop to execute. One problem is that x(i)-x(j) is redundandly calculated a lot of times. This is inefficient. The approach suggested calculates every number exactly once and it vectorize the operations for each element in xx. Since xx is the shortest vector by far it is almost completely vectorized. In case you want to vectorize the last loop as well this will probably just be like a hidden for loop as well, it will much more memory and the code would be more complicated (like 3D matrices and so). I took the freedom to switch minus to plus in the denominator just for testing. Minus would generate NaN for all numbers. The last approach is slightly faster. About 10 times for n=10000. I suggest you try a bit more elaborate benchmark.
function test()
% Initiate variables
n=100;
x=rand(1,n);
m=5;
xx=rand(1,m);
tic;
% Alternative 1
r = zeros(1,m);
for i=1:n
q = ones(1,m);
for j=1:n
q = q .* (xx-x(j))/(x(i)+x(j));
end
r = r + q;
end
toc;
tic;
% Alternative 2
xden = bsxfun(#plus, x, x.'); % Calculate denominator
xnom = repmat(x,n,1); % Calculate nominator
xfull = (xnom./xden).'; % calculate right term on rhs.
for (k = 1:m)
tmp= prod(xx(k)./xden - xfull); % Split in 2 calculations
r2(k) = sum(tmp); % "r = r + xx(k)"
end
toc;
disp(r);
disp(r2);
Just a note in the end. Alternative 2 is faster but it is also memory expensive, so in case of memory issues a loop is to prefer. Further, there is no need for global variables in case of parallelization. In case you need this you probably have to look over your design (but in case the code is short there is not some critical, so then you should not need to bother so much).
I have a column vector (V1) of real numbers like:
123.2100
125.1290
...
954.2190
If I add, let's say, a number 1 to each row in this vector, I will get (V2):
124.2100
126.1290
...
955.2190
I need to find out how many elements from V2 are inside some error-window created from V1. For example the error-window = 0.1 (but in my case every element in V1 has it's own error window):
123.1100 123.3100
125.0290 125.2290
...
954.1190 954.3190
I can create some code like this:
% x - my vector
% ppm - a variable responsible for error-window
window = [(1-(ppm/1000000))*x, (1+(ppm/1000000))*x]; % - error-window
mdiff = 1:0.001:20; % the numbers I will iteratively add to x
% (like the number 1 in the example)
cdiff = zeros(length(mdiff),1); % a vector that will contain counts of elements
% corresponding to different mdiff temp = 0;
for i = 1:length(mdiff)
for j = 1:size(window,1)
xx = x + mdiff(i);
indxx = find( xx => window(j,1) & xx <= window(j,2) );
if any(indxx)
temp = temp + length(indxx); %edited
end
end
cdiff(i) = temp;
temp = 0;
end
So, at the end cdiff will contain all the counts corresponding to mdiff. The only thing, I would like to make the code faster. Or is there a way to avoid using the second loop (with j)? I mean to directly use a multidimensional condition.
EDIT
I decided to simpify the code like this (thanking to the feedback I got here):
% x - my vector
% ppm - a variable responsible for error-window
window = [(1-(ppm/1000000))*x, (1+(ppm/1000000))*x]; % - error-window
mdiff = 1:0.001:20; % the numbers I will iteratively add to x
% (like the number 1 in the example)
cdiff = zeros(length(mdiff),1); % a vector that will contain counts of elements
% corresponding to different mdiff temp = 0;
for i = 1:length(mdiff)
xx = x + mdiff(i);
cdiff(i) = sum(sum(bsxfun(#and,bsxfun(#ge,xx,window(:,1)'),bsxfun(#le,xx,window(:,2)'))));
end
In this case the code works faster and seems properly
add = 1; %// how much to add
error = .1; %// maximum allowed error
V2 = V1 + add; %// build V2
ind = sum(abs(bsxfun(#minus, V1(:).', V2(:)))<error)>1; %'// index of elements
%// of V1 satisfying the maximum error condition. ">1" is used to because each
%// element is at least equal to itself
count = nnz(ind);
Think this might work for you -
%%// Input data
V1 = 52+rand(4,1)
V2 = V1+1;
t= 0.1;
low_bd = any(abs(bsxfun(#minus,V2,[V1-t]'))<t,2); %%//'
up_bd = any(abs(bsxfun(#minus,V2,[V1+t]'))<t,2); %%//'
count = nnz( low_bd | up_bd )
One could also write it as -
diff_map = abs(bsxfun(#minus,[V1-t V1+t],permute(V2,[3 2 1])));
count = nnz(any(any(diff_map<t,2),1))
Edit 1:
low_bd = any(abs(bsxfun(#minus,V2,window(:,1)'))<t,2); %%//'
up_bd = any(abs(bsxfun(#minus,V2,window(:,2)'))<t,2); %%//'
count = nnz( low_bd | up_bd )
Edit 2: Vectorized form for the edited code
t1 = bsxfun(#plus,x,mdiff);
d1 = bsxfun(#ge,t1,permute(window(:,1),[3 2 1]));
d2 = bsxfun(#le,t1,permute(window(:,2),[3 2 1]));
t2 = d1.*d2;
cdiff_vect = max(sum(t2,3),[],1)';