Should I use Halfcomplex2Real or Complex2Complex - c

Good morning, I'm trying to perform a 2D FFT as 2 1-Dimensional FFT.
The problem setup is the following:
There's a matrix of complex numbers generated by an inverse FFT on an array of real numbers, lets call it arr[-nx..+nx][-nz..+nz].
Now, since the original array was made up of real numbers, I exploit the symmetry and reduce my array to be arr[0..nx][-nz..+nz].
My problem starts here, with arr[0..nx][-nz..nz] provided.
Now I should come back in the domain of real numbers.
The question is what kind of transformation I should use in the 2 directions?
In x I use the fftw_plan_r2r_1d( .., .., .., FFTW_HC2R, ..), called Half complex to Real transformation because in that direction I've exploited the symmetry, and that's ok I think.
But in z direction I can't figure out if I should use the same transformation or, the Complex to complex (C2C) transformation?
What is the correct once and why?
In case of needing here, at page 11, the HC2R transformation is briefly described
Thank you

"To easily retrieve a result comparable to that of fftw_plan_dft_r2c_2d(), you can chain a call to fftw_plan_dft_r2c_1d() and a call to the complex-to-complex dft fftw_plan_many_dft(). The arguments howmany and istride can easily be tuned to match the pattern of the output of fftw_plan_dft_r2c_1d(). Contrary to fftw_plan_dft_r2c_1d(), the r2r_1d(...FFTW_HR2C...) separates the real and complex component of each frequency. A second FFTW_HR2C can be applied and would be comparable to fftw_plan_dft_r2c_2d() but not exactly similar.
As quoted on the page 11 of the documentation that you judiciously linked,
'Half of these column transforms, however, are of imaginary parts, and should therefore be multiplied by I and combined with the r2hc transforms of the real columns to produce the 2d DFT amplitudes; ... Thus, ... we recommend using the ordinary r2c/c2r interface.'
Since you have an array of complex numbers, you can either use c2r transforms or unfold real/imaginary parts and try to use HC2R transforms. The former option seems the most practical.Which one might solve your issue?"
-#Francis

Related

How to obtain the derivative of Rodrigues vector and perform update in nonlinear least square?

I am now interested in the bundle adjustment in SLAM, where the Rodrigues vectors $R$ of dimension 3 are used as part of variables. Assume, without loss of generality, we use Gauss-Newton method to solve it, then in each step we need to solve the following linear least square problem:
$$J(x_k)\Delta x = -F(x_k),$$
where $J$ is the Jacobi of $F$.
Here I am wondering how to calculate the derivative $\frac{\partial F}{\partial R}$. Is it just like the ordinary Jacobi in mathematic analysis? I have this wondering because when I look for papers, I find many other concepts like exponential map, quaternions, Lie group and Lie algebra. So I suspect if there is any misunderstanding.
This is not an answer, but is too long for a comment.
I think you need to give more information about how the Rodrigues vector appears in your F.
First off, is the vector assumed to be of unit length.? If so that presents some difficulties as now it doesn't have 3 independent components. If you know that the vector will lie in some region (eg that it's z component will always be positive), you can work round this.
If instead the vector is normalised before use, then while you could then compute the derivatives, the resulting Jacobian will be singular.
Another approach is to use the length of the vector as the angle through which you rotate. However this means you need a special case to get a rotation through 0, and the resulting function is not differentiable at 0. Of course if this can never occur, you may be ok.

How to remove apparent redundency in numpy vector operations?

New to python and not sure about efficiency issues here. For vectors x, y, and z that represent the coordinates of n particles I can do the following computation
import numpy as np
X=np.subtract.outer(x,x)
Y=np.subtract.outer(y,y)
Z=np.subtract.outer(z,z)
R=np.sqrt(X**2+Y**2+Z**2)
A=X/R
np.fill_diagonal(A,0)
a=np.sum(A,axis=0)
With this calculation there is about a factor of 2 in redundancy in so far as multiplications and divisions go as the diagonals are not needed and the lower diagonal is just the negative of the upper diagonal. I plan to use this kind of computation inside a function call that is used by odeint - i.e. it would be called a lot and the vectors will be large - as large as my computer will handle. To remove it, naively I would end up doing a for loop which presumably is a stupid thing to do. Can I get rid of this redundancy in a vectorized way or is it even worth the effort?
Update: Based on the suggestions below, the only way I could see to improve was
ut=np.triu_indices(n,1)
X=x[ut[0]]-x[ut[1]]
With similar expressions for Y and Z and using pdist to find R. This construction only calculates the upper triangular part. Looking at the source code for pdist I am not convinced it does anything particularly smart so I think my expression above would be equally good. The use of squareform only produces the symmetric form. For the antisymmetric may as well use
B=np.zeros((n,n),dtype=np.float64)
B(ut[0],ut[1])=A
B=B-B.T
This cannot be slower than square form because this is pretty much exactly what squareform does. Since the function is called often it would seem to me that ut should be made static along with storage for others (X,Y,Z,A,B). However being new to python I'm not sure how that is done.

Using transpose versus ctranspose in MATLAB

When transposing vectors/matrices in MATLAB, I've seen and used just the ' (apostropohe) operator for a long time.
For example:
>> v = [ 1 2 3 ]'
v =
1
2
3
However this is the conjugate transpose as I've recently found out, or ctranspose.
This seems to only matter when there are complex numbers involved, where if you want to transpose a matrix without getting the conjugate, you need to use the .' opertator.
Is it good practice to also use the .' for real matrices and vectors then? What should we be teaching MATLAB beginners?
Interesting question!
I would definitely say it's good practice to use .' when you just want to transpose, even if the numbers are real and thus ' would have the same effect. The mains reasons for this are:
Conceptual clarity: if you need to transpose, just transpose. Don't throw in an unnecessary conjugation. It's bad practice. You'll get used to writing ' to transpose and will fail to notice the difference. One day you will write ' when .' should be used. As probable illustrations of this, see this question or this one.
Future-proofness. If one day in the future you apply your function to complex inputs the behaviour will suddenly change, and you will have a hard time finding the cause. Believe me, I know what I say1.
Of course, if you are using real inputs but a conjugation would make sense for complex inputs, do use '. For example, if you are defining a dot product for real vectors, it may be appropriate to use ', because should you want to use complex inputs in the future, the conjugate transpose would make more sense.
1 In my early Matlab days, it took me quite a while to trace back a certain problem in my code, which turned out to be caused by using ' when I should have used .'. What really got me upset is, it was my professor who had actually said that ' meant transpose! He forgot to mention the conjugate, and hence my error. Lessons I learned: ' is not .'; and professors can tell you things that are plain wrong :-)
My very biased view: Most cases I use ' are purely "formal", aka not related to mathematical calculations. Most likely I want to rotate a vector like the index sequence 1:10 by 90 degrees.
I seldomly use ' to matrices since it's ambiguous - the first question you've to answer is why you want to make a transpose?
If the matrix is originally defined in a wrong direction, I would rather define the matrix in the correct one it should be, but not turning it afterwards.
To transpose a matrix for a mathematical calculation, I explicitly use transpose and ctranspose. Because by doing so the code is easier to read (don't have to focus on those tiny dots) and to debug (don't have to care about missing dots). Do the following jobs such as dot product as usual.
This is actually a subject of debate among many MATLAB programmers. Some say that if you know what you're doing, then you can go ahead and use ' if you know that your data is purely real, and to use .' if your data is complex. However, some people (such as Luis Mendo) advocate the fact that you should definitely use .' all the time so that you don't get confused.
This allows for the proper handling of input into functions in case the data that are expected for your inputs into these functions do turn out to be complex. There is a time when complex transposition is required, such as compute the magnitude squared of a complex vector. In fact, Loren Shure in one of her MATLAB digests (I can't remember which one...) stated that this is one of the reasons why the complex transpose was introduced.
My suggestion is that you should use .' always if your goal is to transpose data. If you want to do some complex arithmetic, then use ' and .' depending on what operation / computation you're doing. Obviously, Luis Mendo's good practices have rubbed off on me.
There are two cases to distinguish here:
Taking transpose for non-mathematical reasons, like you have a function that is treating the data as arrays, rather than mathematical vectors, and you need your error correcting input to get it in the format that you expect.
Taking transpose as a mathematical operation.
In the latter case, the situation has to dictate which is correct, and probably only one of the two choice is correct in that situation. Most often that will be to take the conjugate transpose, which corresponds to ', but there are cases where you must take the straight transpose and then, of course, you need to use .'.
In the former case, I suggest not using either transpose operator. Instead you should either use reshape or just insist that the input be make correctly and throw an error if it is not. This clearly distinguishes these "computer science" instance from true mathematical instances.

KD-Trees and missing values (vector comparison)

I have a system that stores vectors and allows a user to find the n most similar vectors to the user's query vector. That is, a user submits a vector (I call it a query vector) and my system spits out "here are the n most similar vectors." I generate the similar vectors using a KD-Tree and everything works well, but I want to do more. I want to present a list of the n most similar vectors even if the user doesn't submit a complete vector (a vector with missing values). That is, if a user submits a vector with three dimensions, I still want to find the n nearest vectors (stored vectors are of 11 dimensions) I have stored.
I have a couple of obvious solutions, but I'm not sure either one seem very good:
Create multiple KD-Trees each built using the most popular subset of dimensions a user will search for. That is, if a user submits a query vector of thee dimensions, x, y, z, I match that query to my already built KD-Tree which only contains vectors of three dimensions, x, y, z.
Ignore KD-Trees when a user submits a query vector with missing values and compare the query vector to the vectors (stored in a table in a DB) one by one using something like a dot product.
This has to be a common problem, any suggestions? Thanks for the help.
Your first solution might be fastest for queries (since the tree-building doesn't consider splits in directions that you don't care about), but it would definitely use a lot of memory. And if you have to rebuild the trees repeatedly, it could get slow.
The second option looks very slow unless you only have a few points. And if that's the case, you probably didn't need a kd-tree in the first place :)
I think the best solution involves getting your hands dirty in the code that you're working with. Presumably the nearest-neighbor search computes the distance between the point in the tree leaf and the query vector; you should be able to modify this to handle the case where the point and the query vector are different sizes. E.g. if the points in the tree are given in 3D, but your query vector is only length 2, then the "distance" between the point (p0, p1, p2) and the query vector (x0, x1) would be
sqrt( (p0-x0)^2 + (p1-x1)^2 )
I didn't dig into the java code that you linked to, but I can try to find exactly where the change would need to go if you need help.
-Chris
PS - you might not need the sqrt in the equation above, since distance squared is usually equivalent.
EDIT
Sorry, didn't realize it would be so obvious in the source code. You should use this version of the neighbor function:
nearest(double [] key, int n, Checker<T> checker)
And implement your own Checker class; see their EuclideanDistance.java to see the Euclidean version. You may also need to comment out any KeySizeException that the query code throws, since you know that you can handle differently sized keys.
Your second option looks like a reasonable solution for what you want.
You could also populate the missing dimensions with the most important( or average or whatever you think it should be) values if there are any.
You could try using the existing KD tree -- by taking both branches when the split is for a dimension that is not supplied by the source vector. This should take less time than doing a brute force search, and might be less trouble than trying to maintain a bunch of specialized trees for dimension subsets.
You would need to adapt your N-closest algorithm (without more info I can't advise you on that...), and for distance you would use the sum of the squares of only those elements supplied by the source vector.
Here's what I ended up doing: When a user didn't specify a value (when their query vector lacked a dimension), I I simply adjusted my matching range (in the API) to something huge so that I match any value.

Similarity between line strings

I have a number of tracks recorded by a GPS, which more formally can be described as a number of line strings.
Now, some of the recorded tracks might be recordings of the same route, but because of inaccurasies in the GPS system, the fact that the recordings were made on separate occasions and that they might have been recorded travelling at different speeds, they won't match up perfectly, but still look close enough when viewed on a map by a human to determine that it's actually the same route that has been recorded.
I want to find an algorithm that calculates the similarity between two line strings. I have come up with some home grown methods to do this, but would like to know if this is a problem that's already has good algorithms to solve it.
How would you calculate the similarity, given that similar means represents the same path on a map?
Edit: For those unsure of what I'm talking about, please look at this link for a definition of what a line string is: http://msdn.microsoft.com/en-us/library/bb895372.aspx - I'm not asking about character strings.
Compute the Fréchet distance on each pair of tracks. The distance can be used to gauge the similarity of your tracks.
Math alert: Fréchet was a pioneer in the field of metric space which is relevant to your problem.
I would add a buffer around the first line based on the estimated probable error, and then determine if the second line fits entirely within the buffer.
To determine "same route," create the minimal set of normalized path vectors, calculate the total power differences and compare the total to a quality measure.
Normalize the GPS waypoints on total path length,
walk the vectors of the paths together, creating a new set of path vectors for each path based upon the shortest vector at each waypoint,
calculate the total power differences between endpoints of each vector in the normalized paths weighting for vector length, and
compare against a quality measure.
Tune the power of the differences (start with, say, squared differences) and the quality measure (say as a percent of the total power differences) visually. This algorithm produces a continuous quality measure of the path match as well as a binary result (Are the paths the same?)
Paul Tomblin said: I would add a buffer
around the first line based on the
estimated probable error, and then
determine if the second line fits
entirely within the buffer.
You could modify the algorithm as the normalized vector endpoints are compared. You could determine if any endpoint difference was above a certain size (implementing Paul's buffer idea) or perhaps, if the endpoints were outside the "buffer," use that fact to ignore that endpoint difference, allowing a comparison ignoring side trips.
You could walk along each point (Pa) of LineString A and measure the distance from Pa to the nearest line-segment of LineString B, averaging each of these distances.
This is not a quick or perfect method, but should be able to give use a useful number and is pretty quick to implement.
Do the line strings start and finish at similar points, or are they of very different extents?
If you consider a single line string to be a sequence of [x,y] points (or [x,y,z] points), then you could compute the similarity between each pair of line strings using the Needleman-Wunsch algorithm. As described in the referenced Wikipedia article, the Needleman-Wunsch algorithm requires a "similarity matrix" which defines the distance between a pair of points. However, it would be easy to use a function instead of a matrix. In your case you could simply use the 2D Euclidean distance function (or a 3D Euclidean function if your points have elevation) to provide the distance between each pair of points.
I actually side with the person (Aaron F) who said that you might be interested in the Levenshtein distance problem (and cited this). His answer seems to me to be the best so far.
More specifically, Levenshtein distance (also called edit distance), does not measure strictly the character-by-character distance, but also allows you to perform insertions and deletions. The best algorithm for this distance measure can be computed in quadratic time (pretty slow if your strings are long), but the computational biologists have pretty good heuristics for this, that might be of interest to you on their own. Check out BLAST and FASTA.
In your problem, it seems that you are dealing with differences between strings of numbers, and you care about the numbers. If you give more information, I might be able to direct you to the right variant of BLAST/FASTA/etc for your purposes. In any case, you might consider adapting BLAST and FASTA for your needs. They're quite simple.
1: http://en.wikipedia.org/wiki/Levenshtein_distance, http://www.nist.gov/dads/HTML/Levenshtein.html

Resources