Can I use a searching algorithm between arrays? - arrays

I have produced a plot of two arrays of the same size, the x axis being a time array with varying time steps and the y axis being a pre-calculated array of values. This is the plot below:
Up until this point, I have been searching for the time for when delta = 0 (apart from when time=0) . To do this, I have been manually going into the array to find at what step it reaches this value (i.e. delta = 0 at step 3,000), then going into the time-value array and looking for the step with the same value (i.e. searching for the 3,000th step and retrieving the time).
Is there a way I can automate this and have it as an output rather than manually searching each time?
The base code is below:
import numpy as np
import matplotlib.pyplot as plt
from scipy import integrate
masses = [1, 1, 1]
r1, v1 = [0, 0], [-2*0.513938054919243, -2*0.304736003875733]
r2, v2 = [-1, 0], [0.513938054919243, 0.304736003875733]
r3, v3 = [1, 0], [0.513938054919243, 0.304736003875733]
u0 = np.concatenate([r1, v1, r2, v2, r3, v3])
def odesys(t, u):
def force(a): return a / sum(a ** 2) ** 1.5
r1, v1, r2, v2, r3, v3 = u.reshape([-1, 2])
m1, m2, m3 = masses
f12, f13, f23 = force(r1 - r2), force(r1 - r3), force(r2 - r3)
a1, a2, a3 = -m2 * f12 - m3 * f13, m1 * f12 - m3 * f23, m1 * f13 + m2 * f23
return np.concatenate([v1, a1, v2, a2, v3, a3])
# collect data
t_values = []
u_values = []
par_1_pos = []
d_values = []
# Time start, step, and finish point
t0, tf, t_step = 0, 18, 0.0001
nsteps = int((tf - t0) / t_step)
solution = integrate.RK45(odesys, t0, u0, tf, max_step=t_step)
# The loop for running the Runge-Kutta method over some time period.
u_values.append(solution.y)
t_values.append(t0)
par_1_pos.append(((solution.y[0] - u0[0])**2 + (solution.y[1] - u0[1])**2)**0.5)
d_values.append(((solution.y[0] - u0[0])**2 + (solution.y[1] - u0[1])**2)**0.5 +
((solution.y[4] - u0[4])**2 + (solution.y[5] - u0[5])**2)**0.5 +
((solution.y[8] - u0[8])**2 + (solution.y[9] - u0[9])**2)**0.5 +
((solution.y[2] - u0[2])**2 + (solution.y[3] - u0[3])**2)**0.5 +
((solution.y[6] - u0[6])**2 + (solution.y[7] - u0[7])**2)**0.5 +
((solution.y[10] - u0[10])**2 + (solution.y[11] - u0[11])**2)**0.5)
for step in range(nsteps):
solution.step()
u_values.append(solution.y)
t_values.append(solution.t)
par_1_pos.append(((solution.y[0] - u0[0])**2 + (solution.y[1] - u0[1])**2)**0.5)
d_values.append(((solution.y[0] - u0[0])**2 + (solution.y[1] - u0[1])**2)**0.5 +
((solution.y[4] - u0[4])**2 + (solution.y[5] - u0[5])**2)**0.5 +
((solution.y[8] - u0[8])**2 + (solution.y[9] - u0[9])**2)**0.5 +
((solution.y[2] - u0[2])**2 + (solution.y[3] - u0[3])**2)**0.5 +
((solution.y[6] - u0[6])**2 + (solution.y[7] - u0[7])**2)**0.5 +
((solution.y[10] - u0[10])**2 + (solution.y[11] - u0[11])**2)**0.5)
# break loop after modelling is finished
if solution.status == 'finished':
break
# Plotting of the individual particles
u = np.asarray(u_values).T
# Plot for The trajectory of the three bodies over the time period
plt.plot(u[0], u[1], '-o', lw=1, ms=3, label="body 1")
plt.plot(u[4], u[5], '-x', lw=1, ms=3, label="body 2")
plt.plot(u[8], u[9], '-s', lw=1, ms=3, label="body 3")
plt.title('Trajectories of the three bodies')
plt.xlabel('X Position')
plt.ylabel('Y Position')
plt.legend()
plt.grid()
plt.show()
plt.close()
# Plot for d(delta_t) values
plt.plot(t_values, d_values)
plt.title('Delta number for the three bodies')
plt.xlabel('Time (s)')
plt.ylabel('Delta')
plt.grid()
plt.show()
plt.close()
# Plot of distance between P1 and IC
plt.plot(t_values, par_1_pos)
plt.title('Plot of distance between P1 and IC')
plt.xlabel('Time (s)')
plt.ylabel('Distance from origin')
plt.grid()
plt.show()
plt.close()

Make your life less complicated, use the provided time-loop routine solve_ivp.
You want to compute a point of minimal distance to the initial point u0. This is a point where the derivative of the distance goes from negative to positive. The derivative of the square of the distance is the dot product of tangent vector and difference vector. Close to the minimum it makes not much difference if the tangent vector is of the point or of the initial point.
Thus define an event of the "counting" type
Tu0 = odesys(t0,u0)
def dist_plane(u): return Tu0.dot(u-u0)
event0 = lambda t,u: dist_plane(u)
event0.direction = 1
and call the solver with its standard stepper RK45 and all the parameters
solution = solve_ivp(odesys, [t0,tf], u0, events=event0, atol=1e-10, rtol=1e-11)
u_values = solution.y.T
t_values = solution.t
def norm(u): return sum(u**2)**0.5
def dist_1(u): return norm(u[:2]-u0[:2])
def dist_all(u): return sum(norm(uu) for uu in (u-u0).reshape([-1,2]))
par_1_pos = [ dist_1(uu) for uu in u_values]
d_values = [dist_all(uu) for uu in u_values]
d_plane = [dist_plane(uu) for uu in u_values]
The last lines so you can do all the plots as before.
For the minimal however you just have to evaluate the event fields of the returned structure. Printing the distance measures results in a table
for tt,uu in zip(solution.t_events[0], solution.y_events[0]):
print(rf"|{tt:12.8f} | {dist_1(uu):12.8f} | {dist_all(uu):12.8f} | {dist_plane(uu):12.8f} |")
t
dist pos1
dist all
dist deriv
0.00000000
0.00000000
0.00000000
0.00000000
2.81292221
0.58161236
3.54275380
0.00000000
4.17037860
0.35583855
5.77531098
0.00000000
5.71976151
0.63111430
3.98764796
-0.00000000
8.66440460
0.00000019
3.73331800
0.00000000
11.60904921
0.63111445
3.98764485
-0.00000000
13.15843018
0.35583804
5.77530951
0.00000000
14.51588605
0.58161284
3.54275265
-0.00000000
17.32881023
0.00000078
0.00000328
0.00000000
This shorter list should be easier to search for the global minimum.

If the arrays are the same length, you can keep the index of your search.. Otherwise, you may be able to use the percentage through of the array as your starting point for the search of the second array. (position 1000 in an array of 3000 is 1/3rd, so about position 1166 when mapped into an array of 3500)
If you want any value (or there will only be one), you can bisect the data with a binary search.
If you want the first and there may be more than one value, you'll have to do a linear search.

if your x values are time steps
then convert them to time values first by summing them up (but for that you need the array is sorted by time first which would be insane if not already in this case).
if your x values are sorted
then use binary search. if not use linear search.
if your searched value is not exactly present in your data
but you have points below and above you can simply interpolate it from closest neighbors (linear,cubic or higher 2D interpolation) but for that is best if your searched array is sorted otherwise you will have hard time getting valid closest neighbors.

Related

Python code has a big bottleneck, but I am not experienced enough to see where it is

My code is supposed to model the average energy for alpha decay, it works but it is very slow.
import numpy as np
from numpy import sin, cos, arccos, pi, arange, fromiter
import matplotlib.pyplot as plt
from random import choices
r_cell, d, r, R, N = 5.5, 15.8, 7.9, 20, arange(1,10000, 50)
def total_decay(N):
theta = 2*pi*np.random.rand(2,N)
phi = arccos(2*np.random.rand(2,N)-1)
x = fromiter((r*sin(phi[0][i])*cos(theta[0][i]) for i in range(N)),float, count=-1)
dx = fromiter((x[i] + R*sin(phi[1][i])*cos(theta[1][i]) for i in range(N)), float,count=-1)
y = fromiter((r*sin(phi[0][i])*sin(theta[0][i]) for i in range(N)),float, count=-1)
dy = fromiter((y[i] + R*sin(phi[1][i])*sin(theta[1][i]) for i in range(N)),float,count=-1)
z = fromiter((r*cos(phi[0][i]) for i in range(N)),float, count=-1)
dz = fromiter((z[i] + R*cos(phi[1][i]) for i in range(N)),float, count=-1)
return x, y, z, dx, dy, dz
def inter(x,y,z,dx,dy,dz, N):
intersections = 0
for i in range(N): #Checks to see if a line between two points intersects with the target cell
a = (dx[i] - x[i])*(dx[i] - x[i]) + (dy[i] - y[i])*(dy[i] - y[i]) + (dz[i] - z[i])*(dz[i] - z[i])
b = 2*((dx[i] - x[i])*(x[i]-d) + (dy[i] - y[i])*(y[i])+(dz[i] - z[i])*(z[i]))
c = d*d + x[i]*x[i] + y[i]*y[i] + z[i]*z[i] - 2*(d*x[i]) - r_cell*r_cell
if b*b - 4*a*c >= 0:
intersections += 1
return intersections
def hits(N):
I = []
for i in range(len(N)):
decay = total_decay(N[i])
I.append(inter(decay[0],decay[1],decay[2],decay[3],decay[4],decay[5],N[i]))
return I
def AE(I,N):
p1, p2 = 52.4 / (52.4 + 18.9), 18.9 / (52.4 + 18.9)
E = [choices([5829.6, 5793.1], cum_weights=(p1,p2),k=1)[0] for _ in range(I)]
return sum(E)/N
def list_AE(I,N):
E = [AE(I[i],N[i]) for i in range(len(N))]
return E
plt.plot(N, list_AE(hits(N),N))
plt.title('Average energy per dose with respect to number of decays')
plt.xlabel('Number of decays [N]')
plt.ylabel('Average energy [keV]')
plt.show()
Can anyone experienced point out where the bottleneck takes place, explain why it happens and how to optimize it? Thanks in advance.
To find out where most of the time is spent in your code, examine it with a profiler. By wrapping your main code like this:
import cProfile
import pstats
profiler = cProfile.Profile()
profiler.enable()
result = list_AE(hits(N), N)
profiler.disable()
stats = pstats.Stats(profiler).sort_stats('tottime')
stats.print_stats()
You will get the following overview (abbreviated):
6467670 function calls in 19.982 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
200 4.766 0.024 4.766 0.024 ./alphadecay.py:24(inter)
995400 2.980 0.000 2.980 0.000 ./alphadecay.py:17(<genexpr>)
995400 2.925 0.000 2.925 0.000 ./alphadecay.py:15(<genexpr>)
995400 2.690 0.000 2.690 0.000 ./alphadecay.py:16(<genexpr>)
995400 2.683 0.000 2.683 0.000 ./alphadecay.py:14(<genexpr>)
995400 1.674 0.000 1.674 0.000 ./alphadecay.py:19(<genexpr>)
995400 1.404 0.000 1.404 0.000 ./alphadecay.py:18(<genexpr>)
1200 0.550 0.000 14.907 0.012 {built-in method numpy.fromiter}
Most of the time is spent in the inter function since it runs a huge loop over N. To improve this, you could parallelize its executing to multiple threads using multiprocessing.Pool.
Another way to speed up your calculations is to make use of NumPy vectorization. That is, avoid iterating over N inside the total_decay() function:
def total_decay(N):
theta = 2 * pi * np.random.rand(2, N)
phi = arccos(2 * np.random.rand(2, N) - 1)
x = r * sin(phi[0]) * cos(theta[0])
y = r * sin(phi[0]) * sin(theta[0])
z = r * cos(phi[0])
dx = x + R * sin(phi[1]) * cos(theta[1])
dy = y + R * sin(phi[1]) * sin(theta[1])
dz = z + R * cos(phi[1])
return x, y, z, dx, dy, dz
I've arranged the code a bit to make it more readable. On that note, I strongly suggest you to follow the Python formatting conventions and to use descriptive variable names to make your code more understandable.
I won't tell you where the bottleneck is, but I can tell you how to find bottlenecks in complex programs. The keyword is profiling. A profiler is an application that will run alongside your code and measure the execution times of each statement. Search online for python profiler.
The poor person's version would be debugging and guesstimating the execution times of statements or using print statements or a library for measuring execution times. Using a profiler is an important skill that's not that difficult to learn, though.

Python 3.7: Modelling a 2D Gaussian equation using a Numpy meshgrid and arrays without iterating through each point

I am currently trying to write my own 2D Gaussian function as a coding exercise, and have been able to create the following script:
import numpy as np
import matplotlib.pyplot as plt
def Gaussian2D_v1(coords=None, # x and y coordinates for each image.
amplitude=1, # Highest intensity in image.
xo=0, # x-coordinate of peak centre.
yo=0, # y-coordinate of peak centre.
sigma_x=1, # Standard deviation in x.
sigma_y=1, # Standard deviation in y.
rho=0, # Correlation coefficient.
offset=0): # Offset from zero (background radiation).
x, y = coords
xo = float(xo)
yo = float(yo)
# Create covariance matrix
mat_cov = [[sigma_x**2, rho * sigma_x * sigma_y],
[rho * sigma_x * sigma_y, sigma_y**2]]
mat_cov = np.asarray(mat_cov)
# Find its inverse
mat_cov_inv = np.linalg.inv(mat_cov)
G_array = []
# Calculate pixel by pixel
# Iterate through row last
for i in range(0, np.shape(y)[0]):
# Iterate through column first
for j in range(0, np.shape(x)[1]):
mat_coords = np.asarray([[x[i, j]-xo],
[y[i, j]-xo]])
G = (amplitude * np.exp(-0.5*np.matmul(np.matmul(mat_coords.T,
mat_cov_inv),
mat_coords)) + offset)
G_array.append(G)
G_array = np.asarray(G_array)
G_array = G_array.reshape(64, 64)
return G_array.ravel()
coords = np.meshgrid(np.arange(0, 64), np.arange(0, 64))
model_1 = Gaussian2D_v1(coords,
amplitude=20,
xo=32,
yo=32,
sigma_x=6,
sigma_y=3,
rho=0.8,
offset=20).reshape(64, 64)
plt.figure(figsize=(5, 5)).add_axes([0,
0,
1,
1])
plt.contourf(model_1)
The code as it is works, but as you can see, I am currently iterating through the mesh grid one point at a time, and appending each point to a list, which is then converted to an array and re-shaped to give the 2D Gaussian distribution.
How can I modify the script to forgo using a nested "for" loop and have the program consider the whole meshgrid for matrix calculations? Is such a method possible?
Thanks!
Of course there is a solution, numpy is all about array operations and vectorization of the code! np.matmul can take args with more than 2 dimensions and apply the matrix multiplication on the last two axes only (and this calculation in parallel over the others axes). However, making sure of the right axes order can get tricky.
Here is your edited code:
import numpy as np
import matplotlib.pyplot as plt
def Gaussian2D_v1(coords, # x and y coordinates for each image.
amplitude=1, # Highest intensity in image.
xo=0, # x-coordinate of peak centre.
yo=0, # y-coordinate of peak centre.
sigma_x=1, # Standard deviation in x.
sigma_y=1, # Standard deviation in y.
rho=0, # Correlation coefficient.
offset=0): # Offset from zero (background radiation).
x, y = coords
xo = float(xo)
yo = float(yo)
# Create covariance matrix
mat_cov = [[sigma_x**2, rho * sigma_x * sigma_y],
[rho * sigma_x * sigma_y, sigma_y**2]]
mat_cov = np.asarray(mat_cov)
# Find its inverse
mat_cov_inv = np.linalg.inv(mat_cov)
# PB We stack the coordinates along the last axis
mat_coords = np.stack((x - xo, y - yo), axis=-1)
G = amplitude * np.exp(-0.5*np.matmul(np.matmul(mat_coords[:, :, np.newaxis, :],
mat_cov_inv),
mat_coords[..., np.newaxis])) + offset
return G.squeeze()
coords = np.meshgrid(np.arange(0, 64), np.arange(0, 64))
model_1 = Gaussian2D_v1(coords,
amplitude=20,
xo=32,
yo=32,
sigma_x=6,
sigma_y=3,
rho=0.8,
offset=20)
plt.figure(figsize=(5, 5)).add_axes([0, 0, 1, 1])
plt.contourf(model_1)
So, the equation is exp(-0.5 * (X - µ)' Cinv (X - µ) ), where X is our coordinate matrix, µ the mean (x0, y0) and Cinv the inverse covariance matrix (and ' is a transpose). In the code, I stack both meshgrids to a new matrix so that: mat_coords has a shape of (Ny, Nx, 2). In the first np.matmul call, I add a new axis so that the shapes go like :(Ny, Nx, 1, 2) * (2, 2) = (Ny, Nx, 1, 2). As you see, the matrix multiplication is done on the two last axes, in parallel on the other. Then, I add a new axis so that: (Ny, Nx, 1, 2) * (Ny, Nx, 2, 1) = (Ny, Nx, 1, 1).
The np.squeeze() call returns a version without the two last singleton axes.

Reshaping tensors in a 3D numpy matrix

I'm essentially trying to accomplish this and then this but with a 3D matrix, say (128,128,60,6). The 4th dimension is an array vector that represents the diffusion array at that voxel, e.g.:
d[30,30,30,:] = [dxx, dxy, dxz, dyy, dyz, dzz] = D_array
Where dxx etc. are diffusion for a particular direction. D_array can also be seen as a triangular matrix (since dxy == dyx etc.). So I can use those 2 other answers to get from D_array to D_square, e.g.
D_square = [[dxx, dxy, dxz], [dyx, dyy, dyz],[dzx, dzy, dzz]]
I can't seem to figure out the next step however - how to apply that unit transformation of a D_array into D_square to the whole 3D volume.
Here's the code snippet that works on a single tensor:
#this solves an linear eq. that provides us with diffusion arrays at each voxel in a 3D space
D = np.einsum('ijkt,tl->ijkl',X,bi_plus)
#our issue at this point is we have a vector that represents a triangular matrix.
# first make a tri matx from the vector, testing on unit tensor first
D_tri = np.zeros((3,3))
D_array = D[30][30][30]
D_tri[np.triu_indices(3)] = D_array
# then getting the full sqr matrix
D_square = D_tri.T + D_tri
np.fill_diagonal(D_square, np.diag(D_tri))
So what would be the numpy-way of formulating that unit transformation of the Diffusion tensor to the whole 3D volume all at once?
Approach #1
Here's one using row, col indices from triu_indices for indexing along last two axes into an initialized output array -
def squareformnd_rowcol_integer(ar, n=3):
out_shp = ar.shape[:-1] + (n,n)
out = np.empty(out_shp, dtype=ar.dtype)
row,col = np.triu_indices(n)
# Get a "rolled-axis" view with which the last two axes come to the front
# so that we could index into them just like for a 2D case
out_rolledaxes_view = out.transpose(np.roll(range(out.ndim),2,0))
# Assign permuted version of input array into rolled output version
arT = np.moveaxis(ar,-1,0)
out_rolledaxes_view[row,col] = arT
out_rolledaxes_view[col,row] = arT
return out
Approach #2
Another one with the last two axes merged into one and then indexing with linear indices -
def squareformnd_linear_integer(ar, n=3):
out_shp = ar.shape[:-1] + (n,n)
out = np.empty(out_shp, dtype=ar.dtype)
row,col = np.triu_indices(n)
idx0 = row*n+col
idx1 = col*n+row
ar2D = ar.reshape(-1,ar.shape[-1])
out.reshape(-1,n**2)[:,idx0] = ar2D
out.reshape(-1,n**2)[:,idx1] = ar2D
return out
Approach #3
Finally altogether a new method using masking and should be better with performance as most masking based ones are when it comes to indexing -
def squareformnd_masking(ar, n=3):
out = np.empty((n,n)+ar.shape[:-1] , dtype=ar.dtype)
r = np.arange(n)
m = r[:,None]<=r
arT = np.moveaxis(ar,-1,0)
out[m] = arT
out.swapaxes(0,1)[m] = arT
new_axes = range(out.ndim)[2:] + [0,1]
return out.transpose(new_axes)
Timings on (128,128,60,6) shaped random array -
In [635]: ar = np.random.rand(128,128,60,6)
In [636]: %timeit squareformnd_linear_integer(ar, n=3)
...: %timeit squareformnd_rowcol_integer(ar, n=3)
...: %timeit squareformnd_masking(ar, n=3)
10 loops, best of 3: 103 ms per loop
10 loops, best of 3: 103 ms per loop
10 loops, best of 3: 53.6 ms per loop
A vectorized way to do it:
# Gets the triangle matrix
d_tensor = np.zeros(128, 128, 60, 3, 3)
triu_idx = np.triu_indices(3)
d_tensor[:, :, :, triu_idx[0], triu_idx[1]] = d
# Make it symmetric
diagonal = np.zeros(128, 128, 60, 3, 3)
idx = np.arange(3)
diagonal[:, :, :, idx, idx] = d_tensor[:, :, :, idx, idx]
d_tensor = np.transpose(d_tensor, (0, 1, 2, 4, 3)) + d_tensor - diagonal

Fractal dimension algorithms gives results of >2 for time-series

I'm trying to compute Fractal Dimension of very specific time series array.
I've found implementations of Higuchi FD algorithm:
def hFD(a, k_max): #Higuchi FD
L = []
x = []
N = len(a)
for k in range(1,k_max):
Lk = 0
for m in range(0,k):
#we pregenerate all idxs
idxs = np.arange(1,int(np.floor((N-m)/k)),dtype=np.int32)
Lmk = np.sum(np.abs(a[m+idxs*k] - a[m+k*(idxs-1)]))
Lmk = (Lmk*(N - 1)/(((N - m)/ k)* k)) / k
Lk += Lmk
L.append(np.log(Lk/(m+1)))
x.append([np.log(1.0/ k), 1])
(p, r1, r2, s)=np.linalg.lstsq(x, L)
return p[0]
from https://github.com/gilestrolab/pyrem/blob/master/src/pyrem/univariate.py
and Katz FD algorithm:
def katz(data):
n = len(data)-1
L = np.hypot(np.diff(data), 1).sum() # Sum of distances
d = np.hypot(data - data[0], np.arange(len(data))).max() # furthest distance from first point
return np.log10(n) / (np.log10(d/L) + np.log10(n))
from https://github.com/ProjectBrain/brainbits/blob/master/katz.py
I expect results of ~1,5 in both cases however get 2,2 and 4 instead...
hFD(x,4) = 2.23965648024 (k value of here is chosen as an example, however result won't change much in range 4-12 edit: I was able to get result of ~1,9 with k=22, however this still does not make any sense);
katz(x) = 4.03911343057
Which in theory should not be possible for 1D time-series array.
Questions here are: are Higuchi and Katz algorithms not suitable for time-series analysis in general, or am I doing something wrong on my side? Also are there any other python libraries with already implemented and error-less algorithms to verify my results?
My array of interest (each element represents point in time t, t+1, t+2,..., t+N)
x = np.array([373.4413096546802, 418.58026161917803,
395.7387698762124, 416.21163042783206,
407.9812265426947, 430.2355284504048,
389.66095393296763, 442.18969320408166,
383.7448638776275, 452.8931822090381,
413.5696828065546, 434.45932712853585
,429.95212301648996, 436.67612861616215,
431.10235365546964, 418.86935850068545,
410.84902747247423, 444.4188867775925,
397.1576881118471, 451.6129904245434,
440.9181246439599, 438.9857353268666,
437.1800408012741, 460.6251405281339,
404.3208481355302, 500.0432305427639,
380.49579242696177, 467.72953450552893,
333.11328535523967, 444.1171938340972,
303.3024198243042, 453.16332062153276,
356.9697406524534, 520.0720647379901,
402.7949987727925, 536.0721418821788,
448.21609036718445, 521.9137447208354,
470.5822486372967, 534.0572029633416,
480.03741443274765, 549.2104258193126,
460.0853321729541, 561.2705350421926,
444.52689144575794, 560.0835589548401,
462.2154563472787, 559.7166600213686,
453.42374550322353, 559.0591804941763,
421.4899935529862, 540.7970410737004,
454.34364779193913, 531.6018122709779,
437.1545739076901, 522.4262260216169,
444.6017030695873, 533.3991716674865,
458.3492761150962, 513.1735160522104])
The array you are trying to estimate hDF is too short. You need to get longer sample or oversample the current one to have at least 128 points for hDF and more then 4000 points for Katz
import scipy.signal as signal
...
x_res=signal.resample(x,128)
hfd(x_res,4) will be 1.74383694265

Can the xor-swap be extended to more than two variables?

I've been trying to extend the xor-swap to more than two variables, say n variables. But I've gotten nowhere that's better than 3*(n-1).
For two integer variables x1 and x2 you can swap them like this:
swap(x1,x2) {
x1 = x1 ^ x2;
x2 = x1 ^ x2;
x1 = x1 ^ x2;
}
So, assume you have x1 ... xn with values v1 ... vn. Clearly you can "rotate" the values by successively applying swap:
swap(x1,x2);
swap(x2,x3);
swap(x3,x4);
...
swap(xm,xn); // with m = n-1
You will end up with x1 = v2, x2 = v3, ..., xn = v1.
Which costs n-1 swaps, each costing 3 xors, leaving us with (n-1)*3 xors.
Is a faster algorithm using xor and assignment only and no additional variables known?
As a partial result I tried a brute force search for N=3,4,5 and all of these agree with your formula.
Python code:
from collections import *
D=defaultdict(int) # Map from tuple of bitmasks to number of steps to get there
N=5
Q=deque()
Q.append( (tuple(1<<n for n in range(N)), 0) )
goal = (tuple(1<<( (n+1)%N ) for n in range(N)))
while Q:
masks,ops = Q.popleft()
if len(D)%10000==0:
print len(D),len(Q),ops
ops += 1
# Choose two to swap
for a in range(N):
for b in range(N):
if a==b:
continue
masks2 = list(masks)
masks2[a] = masks2[a]^masks2[b]
masks2 = tuple(masks2)
if masks2 in D:
continue
D[masks2] = ops
if masks2==goal:
print 'found goal in ',ops
raise ValueError
Q.append( (masks2,ops) )

Resources