What is the logic behind this nvidia 's arctan2? - c

Nvidia has some functions in Cg 3.1 Toolkit Documentation
arctan2 is implemented as follows
float2 atan2(float2 y, float2 x)
{
float2 t0, t1, t2, t3, t4;
t3 = abs(x);
t1 = abs(y);
t0 = max(t3, t1);
t1 = min(t3, t1);
t3 = float(1) / t0;
t3 = t1 * t3;
t4 = t3 * t3;
t0 = - float(0.013480470);
t0 = t0 * t4 + float(0.057477314);
t0 = t0 * t4 - float(0.121239071);
t0 = t0 * t4 + float(0.195635925);
t0 = t0 * t4 - float(0.332994597);
t0 = t0 * t4 + float(0.999995630);
t3 = t0 * t3;
t3 = (abs(y) > abs(x)) ? float(1.570796327) - t3 : t3;
t3 = (x < 0) ? float(3.141592654) - t3 : t3;
t3 = (y < 0) ? -t3 : t3;
return t3;
}
What is the formula or logic behind this ? I couldn't find any references in their libraries.

The code has 3 parts.
First the input, which can be interpreted as a point (x,y), gets mapped to the sector below the diagonal in the first quadrant. The absolute values and max/min operations effectively act as reflections (or identity) first on the coordinate axes and then on the diagonal.
In the middle then an arcus or inverse tangent approximation for r=y/x is computed. Note that the Taylor series is a = 1*r-1/3*r^3+1/5*r^5-1/7*r^7+... However, the Taylor series is overly correct at the origin and rapidly loses accuracy away from it. Using some fitting procedure, a polynomial that is equally good/bad on the whole interval was determined. Its coefficients are close to the Taylor coefficients, especially in the lower degrees. The polynomial evaluation is most efficiently done via the Horner scheme, where the coefficients are used starting from the highest degree.
And finally, the signs and magnitudes of the input are used to undo the original reflections in reverse order, only that now the angle a gets transformed. So if there was a reflection on the diagonal, a gets mapped to pi/2-a. If there was a reflection on the y axis, a gets mapped to pi-a. And finally in case of a reflection on the x axis, a gets changed to -a.
With some recursive function calls, the procedure in question could also be compactly formulated as (here in Python)
def atan2(y,x):
if y<0: return -atan2(-y,x)
if x<0: return pi-atan2(y,-x)
if x<y: return 0.5*pi-atan2(x,y)
return p(y/x)
where the polynomial is evaluated as
def p(r):
r2 = r*r
res = - 0.013480470 # *r^11
res = res*r2 + 0.057477314 # *r^9
res = res*r2 - 0.121239071 # *r^7
res = res*r2 + 0.195635925 # *r^5
res = res*r2 - 0.332994597 # *r^3
res = res*r2 + 0.999995630 # *r^1
return r*res
For comparison the Taylor series can be implemented as
def a11(r):
r2 = r*r
res = 0
for k in range(11,0,-2):
res = 1/k-r2*res
return r*res
To compare the errors of both approximations in one plot, use a logarithmic vertical axis, the Taylor error grows too fast.
r = np.linspace(0,1,500)
plt.semilogy(r,abs(p(r)-np.arctan(r)), r, abs(a11(r)-np.arctan(r)))
plt.legend(["residual of p", "residual of Taylor"])
plt.grid(); plt.show()
This gives the error plot
which shows the described error behavior of the minimaxed polynomial and the Taylor polynomial of equal degree.

Related

Incorrect dimensions for matrix multiplication when multiplying pi*a

I've created this code and it gives me this error message:
Error using *
Incorrect dimensions for matrix multiplication.
Error in poli3 = sin(pi*a) ...
Below I show one function used in the code. I don't know if the problem comes from the value given by derivadan or what.
x = -1:0.01:1; % Intervalo en el que se evaluará el polinomio de Taylor
y = sin(pi*x); % Función
a = 0;
derivada3 = derivadan(0.01, 3, a);
derivada7 = derivadan(0.01, 7, a);
derivada3_vec = repmat(derivada3, size(x - a));
derivada7_vec = repmat(derivada7, size(x - a));
poli3 = sin(pi*a) + derivada3_vec*(x - a) + (derivada3_vec*(x - a).^2)/factorial(2) + (derivada3_vec*(x - a).^3)/factorial(3);
poli7 = sin(pi*a) + derivada7_vec*(x - a) + (derivada7_vec*(x - a).^2)/factorial(2) + (derivada7_vec*(x - a).^3)/factorial(3) + (derivada7_vec*(x - a).^4)/factorial(4) + (derivada7_vec*(x - a).^5)/factorial(5) + (derivada7_vec*(x - a).^6)/factorial(6) + (derivada7_vec*(x - a).^7)/factorial(7);
figure
plot(x, poli3, 'r', x, poli7, 'b')
legend('Taylor grau 3', 'Taylor grau 7')
title('Grafica Taylor 3 grau vs Grafica Taylor 7 grau')
function Yd = derivadan(h, grado, vecX)
Yd = zeros(size(vecX));
for i = 1:grado
Yd = (vecX(2:end) - vecX(1:end-1)) / h;
vecX = Yd;
end
end
In MATLAB one can go 2 ways when developing taylor series; the hard way and the easy way.
As asked, first the HARD way :
close all;clear all;clc
dx=.01
x = -1+dx:dx:1-dx;
y = sin(pi*x);
a =0;
[va,na]=find(x==a)
n1=3
D3y=zeros(n1,numel(x));
for k=1:1:n1
D3y(k,1:end-k)=dn(dx,k,y);
end
T3=y(na)+sum(1./factorial([1:n1])'.*D3y(:,na).*((x(1:end-n1)-a)'.^[1:n1])',1);
n2=7
D7y=zeros(n2,numel(x));
for k=1:1:n2
D7y(k,1:end-k)=dn(dx,k,y);
end
T7=y(na)+sum([1./factorial([1:n2])]'.*D7y(:,na).*((x(1:end-n2)-a)'.^[1:n2])',1);
figure(1);ax=gca
plot(ax,x(1:numel(T7)),T3(1:numel(T7)),'r')
grid on;hold on
xlabel('x')
plot(ax,x(1:numel(T7)),T7(1:numel(T7)),'b--')
plot(ax,x(1:numel(T7)),y(1:numel(T7)),'g')
axis(ax,[-1 1 -1.2 1.2])
legend('T3', 'T7','sin(pi*x)','Location','northeastoutside')
the support function being
function Yd = dn(h, n, vecX)
Yd = zeros(size(vecX));
for i = 1:n
Yd = (vecX(2:end) - vecX(1:end-1))/h;
vecX = Yd;
end
end
Explanation
1.- The custom function derivadan that I call dn shortens one sample for every unit up in grado where grado is the derivative order.
For instance, the 3rd order derivative is going to be 3 samples shorter than the input function.
This causes product mismatch and when later on attempting plot it's going to cause plot error.
2.- To avoid such mismatchs ALL vectors shortened to the size of the shortest one.
x(1:end-a)
is a samples shorter than x and y and can be used as reference vector in plot.
3.- Call function derivadan (that I call dn) correctly
dn expects as 3rd input (3rd from left) a vector, the function values to differentiate, yet you are calling derivadan with a in 3rd input field. a is scalar and you have set it null. Fixed it.
derivada3 = derivadan(0.01, 3, a);
should be called
derivada3 = derivadan(0.01, 3, y);
same for derivada7
4.- So
error using * ...
error in poly3=sin(pi*a) ...
MATLAB doesn't particularly mean that there's an error right on sin(pi*a) , that could be, but it's not the case here >
MATLAB is saying : THERE'S AN ERROR IN LINE STARTING WITH
poly3=sin(pi*a) ..
MATLAB aborts there.
Same error is found in next line starting with
poly7 ..
Since sin(pi*a)=0 because a=0 yet all other terms in sum for poly3 are repmat outcomes with different sizes all sizes being different and >1 hence attempting product of different sizes.
Operator * requires all terms have same size.
5.- Syntax Error
derivada3_vec = repmat(derivada3, size(x - a))
is built is not correct
this line repeats size(x) times the nth order derivative !
it's a really long sequence.
Now the EASY way
6.- MATLAB already has command taylor
syms x;T3=taylor(sin(pi*x),x,0)
T3 = (pi^5*x^5)/120 - (pi^3*x^3)/6 + pi*x
syms x;T3=taylor(sin(pi*x),x,0,'Order',3)
T3 = pi*x
syms x;T3=taylor(sin(pi*x),x,0,'Order',7)
T7 = (pi^5*x^5)/120 - (pi^3*x^3)/6 + pi*x
T9=taylor(sin(pi*x),x,0,'Order',9)
T9 =- (pi^7*x^7)/5040 + (pi^5*x^5)/120 - (pi^3*x^3)/6 + pi*x
It really simplfies taylor series development because it readily generates all that is needed to such endeavour :
syms f(x)
f(x) = sin(pi*x);
a=0
T_default = taylor(f, x,'ExpansionPoint',a);
T8 = taylor(f, x, 'Order', 8,'ExpansionPoint',a);
T10 = taylor(f, x, 'Order', 10,'ExpansionPoint',a);
figure(2)
fplot([T_default T8 T10 f])
axis([-2 3 -1.2 1.2])
hold on
plot(a,f(a),'r*')
grid on;xlabel('x')
title(['Taylor Series Expansion x =' num2str(a)])
a=.5
T_default = taylor(f, x,'ExpansionPoint',a);
T8 = taylor(f, x, 'Order', 8,'ExpansionPoint',a);
T10 = taylor(f, x, 'Order', 10,'ExpansionPoint',a);
figure(3)
fplot([T_default T8 T10 f])
axis([-2 3 -1.2 1.2])
hold on
plot(a,f(a),'r*')
grid on;xlabel('x')
title(['Taylor Series Expansion x =' num2str(a)])
a=1
T_default = taylor(f, x,'ExpansionPoint',a);
T8 = taylor(f, x, 'Order', 8,'ExpansionPoint',a);
T10 = taylor(f, x, 'Order', 10,'ExpansionPoint',a);
figure(4)
fplot([T_default T8 T10 f])
axis([-2 3 -1.2 1.2])
hold on
plot(a,f(a),'r*')
grid on;xlabel('x')
title(['Taylor Series Expansion x =' num2str(a)])
thanks for reading
I checked the line that you were having issues with; it seems that the error is in the derivada3_vec*(x - a) (as well as in the other terms that use derivada3_vec).
Looking at the variable itself: derivada3_vec is an empty vector. Going back further, the derivada3 variable is also an empty vector.
Your issue is in your function derivadan. You're inputting a 1x1 vector (a = [0]), but the function assumes that a is at least 1x2 (or 2x1).
I suspect there are other issues, but this is the cause of your error message.

Can I use a searching algorithm between arrays?

I have produced a plot of two arrays of the same size, the x axis being a time array with varying time steps and the y axis being a pre-calculated array of values. This is the plot below:
Up until this point, I have been searching for the time for when delta = 0 (apart from when time=0) . To do this, I have been manually going into the array to find at what step it reaches this value (i.e. delta = 0 at step 3,000), then going into the time-value array and looking for the step with the same value (i.e. searching for the 3,000th step and retrieving the time).
Is there a way I can automate this and have it as an output rather than manually searching each time?
The base code is below:
import numpy as np
import matplotlib.pyplot as plt
from scipy import integrate
masses = [1, 1, 1]
r1, v1 = [0, 0], [-2*0.513938054919243, -2*0.304736003875733]
r2, v2 = [-1, 0], [0.513938054919243, 0.304736003875733]
r3, v3 = [1, 0], [0.513938054919243, 0.304736003875733]
u0 = np.concatenate([r1, v1, r2, v2, r3, v3])
def odesys(t, u):
def force(a): return a / sum(a ** 2) ** 1.5
r1, v1, r2, v2, r3, v3 = u.reshape([-1, 2])
m1, m2, m3 = masses
f12, f13, f23 = force(r1 - r2), force(r1 - r3), force(r2 - r3)
a1, a2, a3 = -m2 * f12 - m3 * f13, m1 * f12 - m3 * f23, m1 * f13 + m2 * f23
return np.concatenate([v1, a1, v2, a2, v3, a3])
# collect data
t_values = []
u_values = []
par_1_pos = []
d_values = []
# Time start, step, and finish point
t0, tf, t_step = 0, 18, 0.0001
nsteps = int((tf - t0) / t_step)
solution = integrate.RK45(odesys, t0, u0, tf, max_step=t_step)
# The loop for running the Runge-Kutta method over some time period.
u_values.append(solution.y)
t_values.append(t0)
par_1_pos.append(((solution.y[0] - u0[0])**2 + (solution.y[1] - u0[1])**2)**0.5)
d_values.append(((solution.y[0] - u0[0])**2 + (solution.y[1] - u0[1])**2)**0.5 +
((solution.y[4] - u0[4])**2 + (solution.y[5] - u0[5])**2)**0.5 +
((solution.y[8] - u0[8])**2 + (solution.y[9] - u0[9])**2)**0.5 +
((solution.y[2] - u0[2])**2 + (solution.y[3] - u0[3])**2)**0.5 +
((solution.y[6] - u0[6])**2 + (solution.y[7] - u0[7])**2)**0.5 +
((solution.y[10] - u0[10])**2 + (solution.y[11] - u0[11])**2)**0.5)
for step in range(nsteps):
solution.step()
u_values.append(solution.y)
t_values.append(solution.t)
par_1_pos.append(((solution.y[0] - u0[0])**2 + (solution.y[1] - u0[1])**2)**0.5)
d_values.append(((solution.y[0] - u0[0])**2 + (solution.y[1] - u0[1])**2)**0.5 +
((solution.y[4] - u0[4])**2 + (solution.y[5] - u0[5])**2)**0.5 +
((solution.y[8] - u0[8])**2 + (solution.y[9] - u0[9])**2)**0.5 +
((solution.y[2] - u0[2])**2 + (solution.y[3] - u0[3])**2)**0.5 +
((solution.y[6] - u0[6])**2 + (solution.y[7] - u0[7])**2)**0.5 +
((solution.y[10] - u0[10])**2 + (solution.y[11] - u0[11])**2)**0.5)
# break loop after modelling is finished
if solution.status == 'finished':
break
# Plotting of the individual particles
u = np.asarray(u_values).T
# Plot for The trajectory of the three bodies over the time period
plt.plot(u[0], u[1], '-o', lw=1, ms=3, label="body 1")
plt.plot(u[4], u[5], '-x', lw=1, ms=3, label="body 2")
plt.plot(u[8], u[9], '-s', lw=1, ms=3, label="body 3")
plt.title('Trajectories of the three bodies')
plt.xlabel('X Position')
plt.ylabel('Y Position')
plt.legend()
plt.grid()
plt.show()
plt.close()
# Plot for d(delta_t) values
plt.plot(t_values, d_values)
plt.title('Delta number for the three bodies')
plt.xlabel('Time (s)')
plt.ylabel('Delta')
plt.grid()
plt.show()
plt.close()
# Plot of distance between P1 and IC
plt.plot(t_values, par_1_pos)
plt.title('Plot of distance between P1 and IC')
plt.xlabel('Time (s)')
plt.ylabel('Distance from origin')
plt.grid()
plt.show()
plt.close()
Make your life less complicated, use the provided time-loop routine solve_ivp.
You want to compute a point of minimal distance to the initial point u0. This is a point where the derivative of the distance goes from negative to positive. The derivative of the square of the distance is the dot product of tangent vector and difference vector. Close to the minimum it makes not much difference if the tangent vector is of the point or of the initial point.
Thus define an event of the "counting" type
Tu0 = odesys(t0,u0)
def dist_plane(u): return Tu0.dot(u-u0)
event0 = lambda t,u: dist_plane(u)
event0.direction = 1
and call the solver with its standard stepper RK45 and all the parameters
solution = solve_ivp(odesys, [t0,tf], u0, events=event0, atol=1e-10, rtol=1e-11)
u_values = solution.y.T
t_values = solution.t
def norm(u): return sum(u**2)**0.5
def dist_1(u): return norm(u[:2]-u0[:2])
def dist_all(u): return sum(norm(uu) for uu in (u-u0).reshape([-1,2]))
par_1_pos = [ dist_1(uu) for uu in u_values]
d_values = [dist_all(uu) for uu in u_values]
d_plane = [dist_plane(uu) for uu in u_values]
The last lines so you can do all the plots as before.
For the minimal however you just have to evaluate the event fields of the returned structure. Printing the distance measures results in a table
for tt,uu in zip(solution.t_events[0], solution.y_events[0]):
print(rf"|{tt:12.8f} | {dist_1(uu):12.8f} | {dist_all(uu):12.8f} | {dist_plane(uu):12.8f} |")
t
dist pos1
dist all
dist deriv
0.00000000
0.00000000
0.00000000
0.00000000
2.81292221
0.58161236
3.54275380
0.00000000
4.17037860
0.35583855
5.77531098
0.00000000
5.71976151
0.63111430
3.98764796
-0.00000000
8.66440460
0.00000019
3.73331800
0.00000000
11.60904921
0.63111445
3.98764485
-0.00000000
13.15843018
0.35583804
5.77530951
0.00000000
14.51588605
0.58161284
3.54275265
-0.00000000
17.32881023
0.00000078
0.00000328
0.00000000
This shorter list should be easier to search for the global minimum.
If the arrays are the same length, you can keep the index of your search.. Otherwise, you may be able to use the percentage through of the array as your starting point for the search of the second array. (position 1000 in an array of 3000 is 1/3rd, so about position 1166 when mapped into an array of 3500)
If you want any value (or there will only be one), you can bisect the data with a binary search.
If you want the first and there may be more than one value, you'll have to do a linear search.
if your x values are time steps
then convert them to time values first by summing them up (but for that you need the array is sorted by time first which would be insane if not already in this case).
if your x values are sorted
then use binary search. if not use linear search.
if your searched value is not exactly present in your data
but you have points below and above you can simply interpolate it from closest neighbors (linear,cubic or higher 2D interpolation) but for that is best if your searched array is sorted otherwise you will have hard time getting valid closest neighbors.

1D-Coupled Transient Diffusion in FiPY with Reactive Boundary Condition

I would like to solve the transient diffusion equation for two compounds A and B as shown in image. I think the image is a better way to show my problem.
Diffusion equations and boundary conditions.
As you can see, the reaction only occurs at the surface and the flux of A is equal to flux of B. So, this two equations are coupled only at surface. The boundary condition is similar to ROBIN boundary condition, explained in Fipy manual. However, the main difference is the existence of the second variable in boundary condition. Does anybody have any idea how to formulate this boundary condition in Fipy?
I guess I need to add some extra term to ROBIN boundary condition, but I couldn't figure it out.
I really appreciate your help.
This is the code which solves the mentioned equation with ROBIN boundary condition # x=0.
-D(dC_A/dx) = -kC_A
-D(dC_B/dx) = -kC_B
In this condition, I can easily use ROBIN boundary condition to solve equations. The results seem reasonable for this boundary condition.
"""
Question for StackOverflow
"""
#%%
from fipy import Variable, FaceVariable, CellVariable, Grid1D, TransientTerm, DiffusionTerm, Viewer, ImplicitSourceTerm
from fipy.tools import numerix
#%%
##### Model parameters
L= 8.4853e-4 # m boundary layer thickness
dx= 1e-8 # mesh size
nx = int(L/dx)+1 # number of meshes
D = 1e-9 # m^2/s diffusion coefficient
k = 1e-4 # m/s reaction coefficient R = k [c_A],
c_inf = 0. # ROBIN general condition, once can think R = k ([c_A]-[c_inf])
c_init = 1. # Initial concentration of compound A, mol/m^3
#%%
###### Meshing and variable definition
mesh = Grid1D(nx=nx, dx=dx)
c_A = CellVariable(name="c_A", hasOld = True,
mesh=mesh,
value=c_init)
c_B = CellVariable(name="c_B", hasOld = True,
mesh=mesh,
value=0.)
#%%
##### Right boundary condition
valueRight = c_init
c_A.constrain(valueRight, mesh.facesRight)
c_B.constrain(0., mesh.facesRight)
#%%
### ROBIN BC requirements, defining cellDistanceVectors
## This code is for fixing celldistance via this link:
## https://stackoverflow.com/questions/60073399/fipy-problem-with-grid2d-celltofacedistancevectors-gives-error-uniformgrid2d
MA = numerix.MA
tmp = MA.repeat(mesh._faceCenters[..., numerix.NewAxis,:], 2, 1)
cellToFaceDistanceVectors = tmp - numerix.take(mesh._cellCenters, mesh.faceCellIDs, axis=1)
tmp = numerix.take(mesh._cellCenters, mesh.faceCellIDs, axis=1)
tmp = tmp[..., 1,:] - tmp[..., 0,:]
cellDistanceVectors = MA.filled(MA.where(MA.getmaskarray(tmp), cellToFaceDistanceVectors[:, 0], tmp))
#%%
##### Defining mask and Robin BC at left boundary
mask = mesh.facesLeft
Gamma0 = D
Gamma = FaceVariable(mesh=mesh, value=Gamma0)
Gamma.setValue(0., where=mask)
dPf = FaceVariable(mesh=mesh,
value=mesh._faceToCellDistanceRatio * cellDistanceVectors)
n = mesh.faceNormals
a = FaceVariable(mesh=mesh, value=k, rank=1)
b = FaceVariable(mesh=mesh, value=D, rank=0)
g = FaceVariable(mesh=mesh, value= k * c_inf, rank=0)
RobinCoeff = (mask * Gamma0 * n / (-dPf.dot(a)+b))
#%%
#### Making a plot
viewer = Viewer(vars=(c_A, c_B),
datamin=-0.2, datamax=c_init * 1.4)
viewer.plot()
#%% Time step and simulation time definition
time = Variable()
t_simulation = 4 # seconds
timeStepDuration = .05
steps = int(t_simulation/timeStepDuration)
#%% PDE Equations
eqcA = (TransientTerm(var=c_A) == DiffusionTerm(var=c_A, coeff=Gamma) +
(RobinCoeff * g).divergence
- ImplicitSourceTerm(var=c_A, coeff=(RobinCoeff * a.dot(-n)).divergence))
eqcB = (TransientTerm(var=c_B) == DiffusionTerm(var=c_B, coeff=Gamma) -
(RobinCoeff * g).divergence
+ ImplicitSourceTerm(var=c_B, coeff=(RobinCoeff * a.dot(-n)).divergence))
#%% A loop for solving PDE equations
while time() <= (t_simulation):
time.setValue(time() + timeStepDuration)
c_B.updateOld()
c_A.updateOld()
res1=res2 = 1e10
viewer.plot()
while (res1 > 1e-6) & (res2 > 1e-6):
res1 = eqcA.sweep(var=c_A, dt=timeStepDuration)
res2 = eqcB.sweep(var=c_B, dt=timeStepDuration)
It's possible to solve this as a fully implicit system. The code below simplifies the problem to have a unity domain size and diffusion coefficient. k is set to 0.2. It captures the analytical solution quite well with some caveats (see below).
from fipy import (
CellVariable,
TransientTerm,
DiffusionTerm,
ImplicitSourceTerm,
Grid1D,
Viewer,
)
L = 1.0
nx = 1000
dx = L / nx
konstant = 0.2
coeff = 1.0
mesh = Grid1D(nx=nx, dx=dx)
var_a = CellVariable(mesh=mesh, value=1.0, hasOld=True)
var_b = CellVariable(mesh=mesh, value=0.0, hasOld=True)
var_a.constrain(1.0, mesh.facesRight)
var_b.constrain(0.0, mesh.facesRight)
coeff_mask = ~mesh.facesLeft * coeff
boundary_coeff = konstant * (mesh.facesLeft * mesh.faceNormals).divergence
eqn_a = TransientTerm(var=var_a) == DiffusionTerm(
coeff_mask, var=var_a
) - ImplicitSourceTerm(boundary_coeff, var=var_a) + ImplicitSourceTerm(
boundary_coeff, var=var_b
)
eqn_b = TransientTerm(var=var_b) == DiffusionTerm(
coeff_mask, var=var_b
) - ImplicitSourceTerm(boundary_coeff, var=var_b) + ImplicitSourceTerm(
boundary_coeff, var=var_a
)
eqn = eqn_a & eqn_b
for _ in range(5):
var_a.updateOld()
var_b.updateOld()
eqn.sweep(dt=1e10)
Viewer((var_a, var_b)).plot()
print("var_a[0] (expected):", (1 + konstant) / (1 + 2 * konstant))
print("var_b[0] (expected):", konstant / (1 + 2 * konstant))
print("var_a[0] (actual):", var_a[0])
print("var_b[0] (actual):", var_b[0])
input("wait")
Note the following:
As written the boundary condition is only first order accurate, which doesn't really matter for this problem, but might hurt you for in higher dimensions. There might be ways to fix this such as having a small cell near the boundary or adding in an explicit second order correction for the boundary condition.
The equations are coupled here. If uncoupled it would probably require loads of iterations to reach equilibrium.
It did require a few iterations to reach equilibrium, but it shouldn't. That's probably due to the solver not converging adequately without a few tries. It might be that coupled equations have some bad conditioning.

How to vectorize the antenna arrayfactor expression in matlab

I have the antenna array factor expression here:
I have coded the array factor expression as given below:
lambda = 1;
M = 100;N = 200; %an M x N array
dx = 0.3*lambda; %inter-element spacing in x direction
m = 1:M;
xm = (m - 0.5*(M+1))*dx; %element positions in x direction
dy = 0.4*lambda;
n = 1:N;
yn = (n - 0.5*(N+1))*dy;
thetaCount = 360; % no of theta values
thetaRes = 2*pi/thetaCount; % theta resolution
thetas = 0:thetaRes:2*pi-thetaRes; % theta values
phiCount = 180;
phiRes = pi/phiCount;
phis = -pi/2:phiRes:pi/2-phiRes;
cmpWeights = rand(N,M); %complex Weights
AF = zeros(phiCount,thetaCount); %Array factor
tic
for i = 1:phiCount
for j = 1:thetaCount
for p = 1:M
for q = 1:N
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))));
end
end
end
end
How can I vectorize the code for calculating the Array Factor (AF).
I want the line:
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))));
to be written in vectorized form (by modifying the for loop).
Approach #1: Full-throttle
The innermost nested loop generates this every iteration - cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i)))), which are to summed up iteratively to give us the final output in AF.
Let's call the exp(.... part as B. Now, B basically has two parts, one is the scalar (2*pi*1j/lambda) and the other part
(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))) that is formed from the variables that are dependent on
the four iterators used in the original loopy versions - i,j,p,q. Let's call this other part as C for easy reference later on.
Let's put all that into perspective:
Loopy version had AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i)))), which is now equivalent to AF(i,j) = AF(i,j) + cmpWeights(q,p)*B, where B = exp((2*pi*1j/lambda)*(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i)))).
B could be simplified to B = exp((2*pi*1j/lambda)* C), where C = (xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))).
C would depend on the iterators - i,j,p,q.
So, after porting onto a vectorized way, it would end up as this -
%// 1) Define vectors corresponding to iterators used in the loopy version
I = 1:phiCount;
J = 1:thetaCount;
P = 1:M;
Q = 1:N;
%// 2) Create vectorized version of C using all four vector iterators
mult1 = bsxfun(#times,sin(thetas(J)),cos(phis(I)).'); %//'
mult2 = bsxfun(#times,sin(thetas(J)),sin(phis(I)).'); %//'
mult1_xm = bsxfun(#times,mult1(:),permute(xm,[1 3 2]));
mult2_yn = bsxfun(#times,mult2(:),yn);
C_vect = bsxfun(#plus,mult1_xm,mult2_yn);
%// 3) Create vectorized version of B using vectorized C
B_vect = reshape(exp((2*pi*1j/lambda)*C_vect),phiCount*thetaCount,[]);
%// 4) Final output as matrix multiplication between vectorized versions of B and C
AF_vect = reshape(B_vect*cmpWeights(:),phiCount,thetaCount);
Approach #2: Less-memory intensive
This second approach would reduce the memory traffic and it uses the distributive property of exponential - exp(A+B) = exp(A)*exp(B).
Now, the original loopy version was this -
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp((2*pi*1j/lambda)*...
(xm(p)*sin(thetas(j))*cos(phis(i)) + yn(q)*sin(thetas(j))*sin(phis(i))))
So, after using the distributive property, we would endup with something like this -
K = (2*pi*1j/lambda)
part1 = K*xm(p)*sin(thetas(j))*cos(phis(i));
part2 = K*yn(q)*sin(thetas(j))*sin(phis(i));
AF(i,j) = AF(i,j) + cmpWeights(q,p)*exp(part1)*exp(part2);
Thus, the relevant vectorized approach would become something like this -
%// 1) Define vectors corresponding to iterators used in the loopy version
I = 1:phiCount;
J = 1:thetaCount;
P = 1:M;
Q = 1:N;
%// 2) Define the constant used at the start of EXP() call
K = (2*pi*1j/lambda);
%// 3) Perform the sine-cosine operations part1 & part2 in vectorized manners
mult1 = K*bsxfun(#times,sin(thetas(J)),cos(phis(I)).'); %//'
mult2 = K*bsxfun(#times,sin(thetas(J)),sin(phis(I)).'); %//'
%// Perform exp(part1) & exp(part2) in vectorized manners
part1_vect = exp(bsxfun(#times,mult1(:),xm));
part2_vect = exp(bsxfun(#times,mult2(:),yn));
%// Perform multiplications with cmpWeights for final output
AF = reshape(sum((part1_vect*cmpWeights.').*part2_vect,2),phiCount,[])
Quick Benchmarking
Here are the runtimes with the input data listed in the question for the original loopy approach and proposed approach #2 -
---------------------------- With Original Approach
Elapsed time is 358.081507 seconds.
---------------------------- With Proposed Approach #2
Elapsed time is 0.405038 seconds.
The runtimes suggests a crazy performance improvement with Approach #2!
The basic trick is to figure out what things are constant, and what things depend on the subscript term - and therefore are matrix terms.
Within the sum:
C(n,m) is a matrix
2π/λ is a constant
sin(θ)cos(φ) is a constant
x(m) and y(n) are vectors
So the two things I would do are:
Expand the xm and ym into matrices using meshgrid()
Take all the constant term stuff outside the loop.
Like this:
...
piFactor = 2 * pi * 1j / lambda;
[xgrid, ygrid] = meshgrid(xm, ym); % xgrid and ygrid will be size (N, M)
for i = 1:phiCount
for j = 1:thetaCount
xFactor = sin(thetas(j)) * cos(phis(i));
yFactor = sin(thetas(j)) * sin(phis(i));
expFactor = exp(piFactor * (xgrid * xFactor + ygrid * yFactor)); % expFactor is size (N, M)
elements = cmpWeights .* expFactor; % elements of sum, size (N, M)
AF(i, j) = AF(i, j) + sum(elements(:)); % sum and then integrate.
end
end
You could probably figure out how to vectorise the outer loop too, but hopefully that gives you a starting point.

Can the xor-swap be extended to more than two variables?

I've been trying to extend the xor-swap to more than two variables, say n variables. But I've gotten nowhere that's better than 3*(n-1).
For two integer variables x1 and x2 you can swap them like this:
swap(x1,x2) {
x1 = x1 ^ x2;
x2 = x1 ^ x2;
x1 = x1 ^ x2;
}
So, assume you have x1 ... xn with values v1 ... vn. Clearly you can "rotate" the values by successively applying swap:
swap(x1,x2);
swap(x2,x3);
swap(x3,x4);
...
swap(xm,xn); // with m = n-1
You will end up with x1 = v2, x2 = v3, ..., xn = v1.
Which costs n-1 swaps, each costing 3 xors, leaving us with (n-1)*3 xors.
Is a faster algorithm using xor and assignment only and no additional variables known?
As a partial result I tried a brute force search for N=3,4,5 and all of these agree with your formula.
Python code:
from collections import *
D=defaultdict(int) # Map from tuple of bitmasks to number of steps to get there
N=5
Q=deque()
Q.append( (tuple(1<<n for n in range(N)), 0) )
goal = (tuple(1<<( (n+1)%N ) for n in range(N)))
while Q:
masks,ops = Q.popleft()
if len(D)%10000==0:
print len(D),len(Q),ops
ops += 1
# Choose two to swap
for a in range(N):
for b in range(N):
if a==b:
continue
masks2 = list(masks)
masks2[a] = masks2[a]^masks2[b]
masks2 = tuple(masks2)
if masks2 in D:
continue
D[masks2] = ops
if masks2==goal:
print 'found goal in ',ops
raise ValueError
Q.append( (masks2,ops) )

Resources