Graphs for Big O notation - c

I wonder if there's any tool/website where I can plot some run times as graphs. Because, the asymptotic notation is often not what I wanted i.e. Don't want to ignore constants.
For example, suppose I have two notations, like:
1) O = (n * log n).
2) O = (n * log n * log n)/5.
It's obvious that 1st one is asymptotically better. But what I want to see how they perform and at which point the second one starts to become better.
A graphical notation where I can enter different equations and plot them them to see how they vary would be greatly useful for this purpose. In my search I found this site where they have some plots. I am looking for something similar but I also want to input my equations to plot to analyse the performance for various 'n' values.

As soon as you stop "ignoring constants", you're no longer graphing "Big O" notation, but just performing a standard XY plot. As such, any graphing program, even online graphing calculators, would let you display this, just replace "n" for "X" and you'll get the proper graph.

Would this or this help?
If you use a 3d grapher, you can use the other dimension (say y) as a constant replacement.
This way you would be able to interpret results as:
when y is greater than 5, n*log(n)*log(n)/y is better than n*log(n) starting from n = (actual value)
Also, you can ignore the 3rd dimension. Or use it if you have a complexity depending on 2 variables.
Just input the difference between the complexities. In this case, ignoring the 3rd dimension and considering log(x) = ln(x), the equation is:
z = x*ln(x) - x*ln(x)*ln(x)/5
An you can interpret that as x*ln(x) is more efficient when z is negative.

If you want to see how they perform then you have to implement the algorithms and execute them on various graph. Modern processors with memory locality and cache misses make it really hard to come up with an equation that gives you a reasonable estimation.
I can guarantee you that oyu won't measure what you would expect.

Related

How to obtain the derivative of Rodrigues vector and perform update in nonlinear least square?

I am now interested in the bundle adjustment in SLAM, where the Rodrigues vectors $R$ of dimension 3 are used as part of variables. Assume, without loss of generality, we use Gauss-Newton method to solve it, then in each step we need to solve the following linear least square problem:
$$J(x_k)\Delta x = -F(x_k),$$
where $J$ is the Jacobi of $F$.
Here I am wondering how to calculate the derivative $\frac{\partial F}{\partial R}$. Is it just like the ordinary Jacobi in mathematic analysis? I have this wondering because when I look for papers, I find many other concepts like exponential map, quaternions, Lie group and Lie algebra. So I suspect if there is any misunderstanding.
This is not an answer, but is too long for a comment.
I think you need to give more information about how the Rodrigues vector appears in your F.
First off, is the vector assumed to be of unit length.? If so that presents some difficulties as now it doesn't have 3 independent components. If you know that the vector will lie in some region (eg that it's z component will always be positive), you can work round this.
If instead the vector is normalised before use, then while you could then compute the derivatives, the resulting Jacobian will be singular.
Another approach is to use the length of the vector as the angle through which you rotate. However this means you need a special case to get a rotation through 0, and the resulting function is not differentiable at 0. Of course if this can never occur, you may be ok.

Should I use Halfcomplex2Real or Complex2Complex

Good morning, I'm trying to perform a 2D FFT as 2 1-Dimensional FFT.
The problem setup is the following:
There's a matrix of complex numbers generated by an inverse FFT on an array of real numbers, lets call it arr[-nx..+nx][-nz..+nz].
Now, since the original array was made up of real numbers, I exploit the symmetry and reduce my array to be arr[0..nx][-nz..+nz].
My problem starts here, with arr[0..nx][-nz..nz] provided.
Now I should come back in the domain of real numbers.
The question is what kind of transformation I should use in the 2 directions?
In x I use the fftw_plan_r2r_1d( .., .., .., FFTW_HC2R, ..), called Half complex to Real transformation because in that direction I've exploited the symmetry, and that's ok I think.
But in z direction I can't figure out if I should use the same transformation or, the Complex to complex (C2C) transformation?
What is the correct once and why?
In case of needing here, at page 11, the HC2R transformation is briefly described
Thank you
"To easily retrieve a result comparable to that of fftw_plan_dft_r2c_2d(), you can chain a call to fftw_plan_dft_r2c_1d() and a call to the complex-to-complex dft fftw_plan_many_dft(). The arguments howmany and istride can easily be tuned to match the pattern of the output of fftw_plan_dft_r2c_1d(). Contrary to fftw_plan_dft_r2c_1d(), the r2r_1d(...FFTW_HR2C...) separates the real and complex component of each frequency. A second FFTW_HR2C can be applied and would be comparable to fftw_plan_dft_r2c_2d() but not exactly similar.
As quoted on the page 11 of the documentation that you judiciously linked,
'Half of these column transforms, however, are of imaginary parts, and should therefore be multiplied by I and combined with the r2hc transforms of the real columns to produce the 2d DFT amplitudes; ... Thus, ... we recommend using the ordinary r2c/c2r interface.'
Since you have an array of complex numbers, you can either use c2r transforms or unfold real/imaginary parts and try to use HC2R transforms. The former option seems the most practical.Which one might solve your issue?"
-#Francis

Sorting and making "genes" in output bitstrings from a genetic algorithm

I was wondering if anybody had suggestions as to how I could analyze an output bitstring that is being permuted by a genetic algorithm. In particular it would be nice if I could try to identify patterns of bits (I'm calling them genes here) that seem to yield a desirable cv score. The difficulty comes in trying to examine these datasets because there are a lot of them (I have probably already something like 30 million bitstrings that are 140 bits long and I'll probably hit over 100 million pretty quickly), so after I sort out the desirable data there is still ALOT of potential datasets and doing similarity comparisons by eye is out of the question. My questions are:
How should I compare for similarity between these bitstrings?
How can I identify "genes" in these bitstrings in an algorithmic (aka programmable) way?
As you want to extract common gene-patterns, what about looking at the intersection of the two strings. So if you have
set1 = 11011101110011...
set2 = 11001100000110...
# apply bitwise '=='
set1 && set2 == 11101110000010...
The result now shows what genes are the same, and could be used in further analysis.
For the similarity part you need to do an exclusive-or (XOR). The result of this bit-wise operation will give you the difference between two bit strings, and is probably the most efficient and easy way of doing it (for pair comparison). As an example:
>>> from bitarray import bitarray
>>> a = bitarray('0001100111')
>>> b = bitarray('0100110110')
>>> a ^ b
bitarray('0101010001')
Then you can either count the differences, inspect quickly where the differences lie, etc.
For the second part, it depends on the representation of course, and on the programming language (PL) chosen for the implementation. Most PL libraries will have a search function, that retrieves all or at least the first of the indexes where some pattern is found in a string (or bitstring, or bitstream...). You just have to refer to the documentation of your chosen PL to know more about the performance if you have more than one option for the task.

Solving the problem of finding parts which work well with each other

I have a database of items. They are for cars and similar parts (eg cam/pistons) work better than others in different combinations (eg one product will work well with another, while another combination of 2 parts may not).
There are so many possible permutations, what solutions apply to this problem?
So far, I feel that these are possible approaches (Where I have question marks, something tells me these are solutions but I am not 100% confident they are).
Neural networks (?)
Collection-based approach (selection of parts in a collection for cam, and likewise for pistons in another collection, all work well with each other)
Business rules engine (?)
What are good ways to tackle this sort of problem?
Thanks
The answer largely depends on how do you calculate 'works better'?
1) Independent values
Assuming that 'works better' function f of x combination of items x=(a,b,c,d,...) and(!) that there are no regularities that can be used to decide if f(x') is bigger or smaller then f(x) knowing only x, f(x) and x' (which could allow to find the xmax faster) you will have to calculate f for all combinations at least once.
Once you calculate it for all combinations you can sort. If you will need to look up data in a partitioned way, using SQL/RDBMS might be a good approach (for example, finding top 5 best solutions but without such and such part).
For extra points after calculating all of the results and storing them you could analyze them statistically and try to establish patterns
2) Dependent values
If you can establish some regularities (and maybe you can) regarding the values the search for the max value can be simplified and speeded up.
For example if you know that function that you try to maximize is linear combination of all the parameters then you could look into linear programming
If it is not...

Similarity between line strings

I have a number of tracks recorded by a GPS, which more formally can be described as a number of line strings.
Now, some of the recorded tracks might be recordings of the same route, but because of inaccurasies in the GPS system, the fact that the recordings were made on separate occasions and that they might have been recorded travelling at different speeds, they won't match up perfectly, but still look close enough when viewed on a map by a human to determine that it's actually the same route that has been recorded.
I want to find an algorithm that calculates the similarity between two line strings. I have come up with some home grown methods to do this, but would like to know if this is a problem that's already has good algorithms to solve it.
How would you calculate the similarity, given that similar means represents the same path on a map?
Edit: For those unsure of what I'm talking about, please look at this link for a definition of what a line string is: http://msdn.microsoft.com/en-us/library/bb895372.aspx - I'm not asking about character strings.
Compute the Fréchet distance on each pair of tracks. The distance can be used to gauge the similarity of your tracks.
Math alert: Fréchet was a pioneer in the field of metric space which is relevant to your problem.
I would add a buffer around the first line based on the estimated probable error, and then determine if the second line fits entirely within the buffer.
To determine "same route," create the minimal set of normalized path vectors, calculate the total power differences and compare the total to a quality measure.
Normalize the GPS waypoints on total path length,
walk the vectors of the paths together, creating a new set of path vectors for each path based upon the shortest vector at each waypoint,
calculate the total power differences between endpoints of each vector in the normalized paths weighting for vector length, and
compare against a quality measure.
Tune the power of the differences (start with, say, squared differences) and the quality measure (say as a percent of the total power differences) visually. This algorithm produces a continuous quality measure of the path match as well as a binary result (Are the paths the same?)
Paul Tomblin said: I would add a buffer
around the first line based on the
estimated probable error, and then
determine if the second line fits
entirely within the buffer.
You could modify the algorithm as the normalized vector endpoints are compared. You could determine if any endpoint difference was above a certain size (implementing Paul's buffer idea) or perhaps, if the endpoints were outside the "buffer," use that fact to ignore that endpoint difference, allowing a comparison ignoring side trips.
You could walk along each point (Pa) of LineString A and measure the distance from Pa to the nearest line-segment of LineString B, averaging each of these distances.
This is not a quick or perfect method, but should be able to give use a useful number and is pretty quick to implement.
Do the line strings start and finish at similar points, or are they of very different extents?
If you consider a single line string to be a sequence of [x,y] points (or [x,y,z] points), then you could compute the similarity between each pair of line strings using the Needleman-Wunsch algorithm. As described in the referenced Wikipedia article, the Needleman-Wunsch algorithm requires a "similarity matrix" which defines the distance between a pair of points. However, it would be easy to use a function instead of a matrix. In your case you could simply use the 2D Euclidean distance function (or a 3D Euclidean function if your points have elevation) to provide the distance between each pair of points.
I actually side with the person (Aaron F) who said that you might be interested in the Levenshtein distance problem (and cited this). His answer seems to me to be the best so far.
More specifically, Levenshtein distance (also called edit distance), does not measure strictly the character-by-character distance, but also allows you to perform insertions and deletions. The best algorithm for this distance measure can be computed in quadratic time (pretty slow if your strings are long), but the computational biologists have pretty good heuristics for this, that might be of interest to you on their own. Check out BLAST and FASTA.
In your problem, it seems that you are dealing with differences between strings of numbers, and you care about the numbers. If you give more information, I might be able to direct you to the right variant of BLAST/FASTA/etc for your purposes. In any case, you might consider adapting BLAST and FASTA for your needs. They're quite simple.
1: http://en.wikipedia.org/wiki/Levenshtein_distance, http://www.nist.gov/dads/HTML/Levenshtein.html

Resources