How to remove apparent redundency in numpy vector operations? - arrays

New to python and not sure about efficiency issues here. For vectors x, y, and z that represent the coordinates of n particles I can do the following computation
import numpy as np
X=np.subtract.outer(x,x)
Y=np.subtract.outer(y,y)
Z=np.subtract.outer(z,z)
R=np.sqrt(X**2+Y**2+Z**2)
A=X/R
np.fill_diagonal(A,0)
a=np.sum(A,axis=0)
With this calculation there is about a factor of 2 in redundancy in so far as multiplications and divisions go as the diagonals are not needed and the lower diagonal is just the negative of the upper diagonal. I plan to use this kind of computation inside a function call that is used by odeint - i.e. it would be called a lot and the vectors will be large - as large as my computer will handle. To remove it, naively I would end up doing a for loop which presumably is a stupid thing to do. Can I get rid of this redundancy in a vectorized way or is it even worth the effort?
Update: Based on the suggestions below, the only way I could see to improve was
ut=np.triu_indices(n,1)
X=x[ut[0]]-x[ut[1]]
With similar expressions for Y and Z and using pdist to find R. This construction only calculates the upper triangular part. Looking at the source code for pdist I am not convinced it does anything particularly smart so I think my expression above would be equally good. The use of squareform only produces the symmetric form. For the antisymmetric may as well use
B=np.zeros((n,n),dtype=np.float64)
B(ut[0],ut[1])=A
B=B-B.T
This cannot be slower than square form because this is pretty much exactly what squareform does. Since the function is called often it would seem to me that ut should be made static along with storage for others (X,Y,Z,A,B). However being new to python I'm not sure how that is done.

Related

How to obtain the derivative of Rodrigues vector and perform update in nonlinear least square?

I am now interested in the bundle adjustment in SLAM, where the Rodrigues vectors $R$ of dimension 3 are used as part of variables. Assume, without loss of generality, we use Gauss-Newton method to solve it, then in each step we need to solve the following linear least square problem:
$$J(x_k)\Delta x = -F(x_k),$$
where $J$ is the Jacobi of $F$.
Here I am wondering how to calculate the derivative $\frac{\partial F}{\partial R}$. Is it just like the ordinary Jacobi in mathematic analysis? I have this wondering because when I look for papers, I find many other concepts like exponential map, quaternions, Lie group and Lie algebra. So I suspect if there is any misunderstanding.
This is not an answer, but is too long for a comment.
I think you need to give more information about how the Rodrigues vector appears in your F.
First off, is the vector assumed to be of unit length.? If so that presents some difficulties as now it doesn't have 3 independent components. If you know that the vector will lie in some region (eg that it's z component will always be positive), you can work round this.
If instead the vector is normalised before use, then while you could then compute the derivatives, the resulting Jacobian will be singular.
Another approach is to use the length of the vector as the angle through which you rotate. However this means you need a special case to get a rotation through 0, and the resulting function is not differentiable at 0. Of course if this can never occur, you may be ok.

Should I use Halfcomplex2Real or Complex2Complex

Good morning, I'm trying to perform a 2D FFT as 2 1-Dimensional FFT.
The problem setup is the following:
There's a matrix of complex numbers generated by an inverse FFT on an array of real numbers, lets call it arr[-nx..+nx][-nz..+nz].
Now, since the original array was made up of real numbers, I exploit the symmetry and reduce my array to be arr[0..nx][-nz..+nz].
My problem starts here, with arr[0..nx][-nz..nz] provided.
Now I should come back in the domain of real numbers.
The question is what kind of transformation I should use in the 2 directions?
In x I use the fftw_plan_r2r_1d( .., .., .., FFTW_HC2R, ..), called Half complex to Real transformation because in that direction I've exploited the symmetry, and that's ok I think.
But in z direction I can't figure out if I should use the same transformation or, the Complex to complex (C2C) transformation?
What is the correct once and why?
In case of needing here, at page 11, the HC2R transformation is briefly described
Thank you
"To easily retrieve a result comparable to that of fftw_plan_dft_r2c_2d(), you can chain a call to fftw_plan_dft_r2c_1d() and a call to the complex-to-complex dft fftw_plan_many_dft(). The arguments howmany and istride can easily be tuned to match the pattern of the output of fftw_plan_dft_r2c_1d(). Contrary to fftw_plan_dft_r2c_1d(), the r2r_1d(...FFTW_HR2C...) separates the real and complex component of each frequency. A second FFTW_HR2C can be applied and would be comparable to fftw_plan_dft_r2c_2d() but not exactly similar.
As quoted on the page 11 of the documentation that you judiciously linked,
'Half of these column transforms, however, are of imaginary parts, and should therefore be multiplied by I and combined with the r2hc transforms of the real columns to produce the 2d DFT amplitudes; ... Thus, ... we recommend using the ordinary r2c/c2r interface.'
Since you have an array of complex numbers, you can either use c2r transforms or unfold real/imaginary parts and try to use HC2R transforms. The former option seems the most practical.Which one might solve your issue?"
-#Francis

computing function of neighbors efficiently on lattice

I'm studying the Ising model, and I'm trying to efficiently compute a function H(σ) where σ is the current state of an LxL lattice (that is, σ_ij ∈ {+1, -1} for i,j ∈ {1,2,...,L}). To compute H for a particular σ, I need to perform the following calculation:
where ⟨i j⟩ indicates that sites σ_i and σ_j are nearest neighbors and (suppose) J is a constant.
A couple of questions:
Should I store my state σ as an LxL matrix or as an L2 list? Is one better than the other for memory accessing in RAM (which I guess depends on the way I'm accessing elements...)?
In either case, how can I best compute H?
Really I think this boils down to how can I access (and manipulate) the neighbors of every state most efficiently.
Some thoughts:
I see that if I loop through each element in the list or matrix that I'll be double counting, so is there a "best" way to return the unique neighbors?
Is there a better data structure that I'm not thinking of?
Your question is a bit broad and a bit confusing for me, so excuse me if my answer is not the one you are looking for, but I hope it will help (a bit).
An array is faster than a list when it comes to indexing. A matrix is a 2D array, like this for example (where N and M are both L for you):
That means that you first access a[i] and then a[i][j].
However, you can avoid this double access, by emulating a 2D array with a 1D array. In that case, if you want to access element a[i][j] in your matrix, you would now do, a[i * L + j].
That way you load once, but you multiply and add your variables, but this may still be faster in some cases.
Now as for the Nearest Neighbor question, it seems that you are using a square-lattice Ising model, which means that you are working in 2 dimensions.
A very efficient data structure for Nearest Neighbor Search in low dimensions is the kd-tree. The construction of that tree takes O(nlogn), where n is the size of your dataset.
Now you should think if it's worth it to build such a data structure.
PS: There is a plethora of libraries implementing the kd-tree, such as CGAL.
I encountered this problem during one of my school assignments and I think the solution depends on which programming language you are using.
In terms of efficiency, there is no better way than to write a for loop to sum neighbours(which are actually the set of 4 points{ (i+/-1,j+/-1)} for a given (i,j). However, when simd(sse etc) functions are available, you can re-express this as a convolution with a 2d kernel {0 1 0;1 0 1;0 1 0}. so if you use a numerical library which exploits simd functions you can obtain significant performance increase. You can see the example implementation of this here(https://github.com/zawlin/cs5340/blob/master/a1_code/denoiseIsingGibbs.py) .
Note that in this case, the performance improvement is huge because to evaluate it in python I need to write an expensive for loop.
In terms of work, there is in fact some waste as the unecessary multiplications and sum with zeros at corners and centers. So whether you can experience performance improvement depends quite a bit on your programming environment( if you are already in c/c++, it can be difficult and you need to use mkl etc to obtain good improvement)

Raise matrix to complex power

I'm implementing a library which makes use of the GSL for matrix operations. I am now at a point where I need to raise things to any power, including imaginary ones. I already have the code in place to handle negative powers of a matrix, but now I need to handle imaginary powers, since all numbers in my library are complex.
Does the GSL have facilities for doing this already, or am I about to enter loop hell trying to create an algorithm for this? I need to be able to raise not only to imaginary but also complex numbers, such as 3+2i. Having limited experience with matrices as a whole, I'm not even certain on the process for doing this by hand, much less with a computer.
Hmm I never thought the electrical engineering classes I went through would help me on here, but what do you know. So the process for raising something to a complex power is not that complex and I believe you could write something fairly easily (I am not too familiar with the library your using, but this should still work with any library that has some basic complex number functions).
First your going to need to change the number to polar coordinates (i.e 3 + 3i would become (3^2 + 3^2) ^(1/2) angle 45 degrees. Pardon the awful notation. If you are confused on the process of changing the numbers over just do some quick googling on converting from cartesian to polar.
So now that you have changed it to polar coordinates you will have some radius r at an angle a. Lets raise it to the nth power now. You will then get r^n * e^(jan).
If you need more examples on this, research the "general power rule for complex numbers." Best of luck. I hope this helps!
Just reread the question and I see you need to raise to complex as well as imaginary. Well complex and imaginary are going to be the same just with one extra step using the exponent rule. This link will quickly explain how to raise something to a complex http://boards.straightdope.com/sdmb/showthread.php?t=261399
One approach would be to compute (if possible) the logarithm of your matrix, multiply that by your (complex) exponent, and then exponentiate.
That is you could have
mat_pow( M, z) = mat_exp( z * mat_log( M));
However mat_log and even mat_exp are tricky.
In case it is still relevant to you, I have extended the capabilities of my package so that now you can raise any diagonalizable matrix to any power (including, in particular, complex powers). The name of the function is 'Matpow', and can be found in package 'powerplus'. Also, this package is for the R language, not C, but perhaps you could do your calculations in R if needed.
Edit: Version 3.0 extends capabilities to (some) non-diagonalizable matrices too.
I hope it helps!

Optimising Matrix Updating and Multiplication

Consider as an example the matrix
X(a,b) = [a b
a a]
I would like to perform some relatively intensive matrix algebra computations with X, update the values of a and b and then repeat.
I can see two ways of storing the entries of X:
1) As numbers (i.e. floats). Then after our matrix algebra operation we update all the values in X to the correct values of a and b.
2) As pointers to a and b, so that after updating them, the entries of X are automatically updated.
Now, I initially thought method (2) was the way to go as it skips out the updating step. However I believe that using method (1) allows a better use of the cache when doing for example matrix multiplication in parallel (although I am no expert so please correct me if I'm wrong).
My hypothesis is that for unexpensive matrix computations you should use method (2) and there will be some threshold as the computation becomes more complex that you should switch to (1).
I imagine this is not too uncommon a problem and my question is which is the optimal method to use for general matrices X?
Neither approach sounds particularly hard to implement. The simplest answer is make a test calculation, try both, and benchmark them. Take the faster one. Depending on what sort of operations you're doing (matrix multiplication, inversion, etc?) you can potentially reduce the computation by simplifying the operations given the assumptions you can make about your matrix structure. But I can't speak to that any more deeply since I'm not sure what types of operations you're doing.
But from experience, with a matrix that size, you probably won't see a performance difference. With larger matrices, you will, since the CPU's cache starts to fill. In that case, doing things like separating multiplication and addition operations, pointer indexes, and passing inputs as const enable the compiler to make significant performance enhancements.
See
Optimized matrix multiplication in C and
Cache friendly method to multiply two matrices

Resources