Implementing Geometric Median

Implementing Geometric Median - c

When I google for Geometric median, I got this link Geometric median
but I have no clue how to implement it in C . I am not very good at understanding this Mathematical Explanation. Lets Say I have 11 pair of co-ordinates how will I calculate the geometric median for the same.
I am trying to solve this problem Grid CIty. I was given a Hint that geometric median will help me achieve it. I am not looking for a final solution. If someone can guide me to a right path that would help.
Thanks is Advance
Below is the list of co-ordinates a (test case). result : 3 4
1 2
1 7
2 2
2 3
2 5
3 4
4 2
4 5
4 6
5 3
6 5

I don't think this is solvable without an iterative algorithm.
Here is a pseudocode solution similar to the hill-climbing version, except that it works to arbitrary accuracy, and in higher dimensions.
CurrentPoint = Mean(Points)
While (CurrentPoint - PreviousPoint) Length > 0.01 Do
For Each Point in Points Do
Vector = CurrentPoint - Point
Vector Length = Vector Length - 1.0
Point2 = Point + Vector
Add Point2 To Points2
Loop
PreviousPoint = CurrentPoint
CurrentPoint = Mean(Points2)
Loop
Notes:
The constant 0.01 does not guarantee the result to be within 0.01 of the true value. Use smaller values for better precision.
The constant 1.0 should be adjusted to (I'm guessing) about 1/5 the distance between the furthest points. Too small values will slow down the algorithm, but too large values will cause inaccuracies probably leading an to infinite loop.

To resolve this problem, you just have to compute the mean for each coordinate and round up the result.
It should resolve your problem.

You are not obliged to use the concept of Geometric median; so seeing that it is not easy to calculate, you better solve your problem without calculating it!
Here is an idea for an algorithm/implementation.
Start at any point (e.g. the first point in the given data).
Calculate the sum of distances for current point and the 8 neighboring points (+/-1 in each direction, x and y)
If one of the neighbors is better than current point, update the current point and start from 1
(Found the optimal distance; now choose the best point among those with equal distance)
Calculate the sum of distances for current point and the 3 neighboring points (-1 in each direction, x and y)
If one of the neighbors is the same as current point, update the current point and continue from 5

The answer is (xi, yj) where xi
is the median of all the x's and yj is the median of all the y's.

As I comment the solution to your problem is not the geometric mean, but the arithmetic mean.
If you have to calculate the arithmetic mean, you need to sum all the values of the column and divide the answer by the number of elements.

Related

How to efficiently evaluate or approximate a road Clothoid?

I'm facing the problem of computing values of a clothoid in C in real-time.
First I tried using the Matlab coder to obtain auto-generated C code for the quadgk-integrator for the Fresnel formulas. This essentially works great in my test scnearios. The only issue is that it runs incredibly slow (in Matlab as well as the auto-generated code).
Another option was interpolating a data-table of the unit clothoid connecting the sample points via straight lines (linear interpolation). I gave up after I found out that for only small changes in curvature (tiny steps along the clothoid) the results were obviously degrading to lines. What a surprise...
I know that circles may be plotted using a different formula but low changes in curvature are often encountered in real-world-scenarios and 30k sampling points in between the headings 0° and 360° didn't provide enough angular resolution for my problems.
Then I tried a Taylor approximation around the R = inf point hoping that there would be significant curvatures everywhere I wanted them to be. I soon realized I couldn't use more than 4 terms (power of 15) as the polynom otherwise quickly becomes unstable (probably due to numerical inaccuracies in double precision fp-computation). Thus obviously accuracy quickly degrades for large t values. And by "large t values" I'm talking about every point on the clothoid that represents a curve of more than 90° w.r.t. the zero curvature point.
For instance when evaluating a road that goes from R=150m to R=125m while making a 90° turn I'm way outside the region of valid approximation. Instead I'm in the range of 204.5° - 294.5° whereas my Taylor limit would be at around 90° of the unit clothoid.
I'm kinda done randomly trying out things now. I mean I could just try to spend time on the dozens of papers one finds on that topic. Or I could try to improve or combine some of the methods described above. Maybe there even exists an integrate function in Matlab that is compatible with the Coder and fast enough.
This problem is so fundamental it feels to me I shouldn't have that much trouble solving it. any suggetions?

about the 4 terms in Taylor series - you should be able to use much more. total theta of 2pi is certainly doable, with doubles.
you're probably calculating each term in isolation, according to the full formula, calculating full factorial and power values. that is the reason for losing precision extremely fast.
instead, calculate the terms progressively, the next one from the previous one. Find the formula for the ratio of the next term over the previous one in the series, and use it.
For increased precision, do not calculate in theta by rather in the distance, s (to not lose the precision on scaling).
your example is an extremely flat clothoid. if I made no mistake, it goes from (25/22) pi =~ 204.545° to (36/22) pi =~ 294.545° (why not include these details in your question?). Nevertheless it should be OK. Even 2 pi = 360°, the full circle (and twice that), should pose no problem.
given: r = 150 -> 125, 90 degrees turn :
r s = A^2 = 150 s = 125 (s+x)
=> 1+(x/s) = 150/125 = 1 + 25/125 x/s = 1/5
theta = s^2/2A^2 = s^2 / (300 s) = s / 300 ; = (pi/2) * (25/11) = 204.545°
theta2 = (s+x)^2/(300 s) = (6/5)^2 s / 300 ; = (pi/2) * (36/11) = 294.545°
theta2 - theta = ( 36/25 - 1 ) s / 300 == pi/2
=> s = 300 * (pi/2) * (25/11) = 1070.99749554 x = s/5 = 214.1994991
A^2 = 150 s = 150 * 300 * (pi/2) * (25/11)
a = sqrt (2 A^2) = 300 sqrt ( (pi/2) * (25/11) ) = 566.83264608
The reference point is at r = Infinity, where theta = 0.
we have x = a INT[u=0..(s/a)] cos(u^2) d(u) where a = sqrt(2 r s) and theta = (s/a)^2. write out the Taylor series for cos, and integrate it, term-by-term, to get your Taylor approximation for x as function of distance, s, along the curve, from the 0-point. that's all.
next you have to decide with what density to calculate your points along the clothoid. you can find it from a desired tolerance value above the chord, for your minimal radius of 125. these points will thus define the approximation of the curve by line segments, drawn between the consecutive points.

I am doing my thesis in the same area right now.
My approach is the following.
at each point on your clothoid, calculate the following (change in heading / distance traveled along your clothoid), by this formula you can calculate the curvature at each point by this simple equation.
you are going to plot each curvature value, your x-axis will be the distance along the clothoid, the y axis will be the curvature. By plotting this and applying very easy linear regression algorithm (search for Peuker algorithm implementation in your language of choice)
you can easily identify where are the curve sections with value of zero (Line has no curvature), or linearly increasing or decreasing (Euler spiral CCW/CW), or constant value != 0 (arc has constant curvature across all points on it).
I hope this will help you a little bit.
You can find my code on github. I implemented some algorithms for such problems like Peuker Algorithm.

What is the advantage of linspace over the colon ":" operator?

Is there some advantage of writing
t = linspace(0,20,21)
over
t = 0:1:20
?
I understand the former produces a vector, as the first does.
Can anyone state me some situation where linspace is useful over t = 0:1:20?

It's not just the usability. Though the documentation says:
The linspace function generates linearly spaced vectors. It is
similar to the colon operator :, but gives direct control over the
number of points.
it is the same, the main difference and advantage of linspace is that it generates a vector of integers with the desired length (or default 100) and scales it afterwards to the desired range. The : colon creates the vector directly by increments.
Imagine you need to define bin edges for a histogram. And especially you need the certain bin edge 0.35 to be exactly on it's right place:
edges = [0.05:0.10:.55];
X = edges == 0.35
edges = 0.0500 0.1500 0.2500 0.3500 0.4500 0.5500
X = 0 0 0 0 0 0
does not define the right bin edge, but:
edges = linspace(0.05,0.55,6); %// 6 = (0.55-0.05)/0.1+1
X = edges == 0.35
edges = 0.0500 0.1500 0.2500 0.3500 0.4500 0.5500
X = 0 0 0 1 0 0
does.
Well, it's basically a floating point issue. Which can be avoided by linspace, as a single division of an integer is not that delicate, like the cumulative sum of floting point numbers. But as Mark Dickinson pointed out in the comments:
You shouldn't rely on any of the computed values being exactly what you expect. That is not what linspace is for. In my opinion it's a matter of how likely you will get floating point issues and how much you can reduce the probabilty for them or how small can you set the tolerances. Using linspace can reduce the probability of occurance of these issues, it's not a security.
That's the code of linspace:
n1 = n-1
c = (d2 - d1).*(n1-1) % opposite signs may cause overflow
if isinf(c)
y = d1 + (d2/n1).*(0:n1) - (d1/n1).*(0:n1)
else
y = d1 + (0:n1).*(d2 - d1)/n1
end
To sum up: linspace and colon are reliable at doing different tasks. linspace tries to ensure (as the name suggests) linear spacing, whereas colon tries to ensure symmetry
In your special case, as you create a vector of integers, there is no advantage of linspace (apart from usability), but when it comes to floating point delicate tasks, there may is.
The answer of Sam Roberts provides some additional information and clarifies further things, including some statements of MathWorks regarding the colon operator.

linspace and the colon operator do different things.
linspace creates a vector of integers of the specified length, and then scales it down to the specified interval with a division. In this way it ensures that the output vector is as linearly spaced as possible.
The colon operator adds increments to the starting point, and subtracts decrements from the end point to reach a middle point. In this way, it ensures that the output vector is as symmetric as possible.
The two methods thus have different aims, and will often give very slightly different answers, e.g.
>> a = 0:pi/1000:10*pi;
>> b = linspace(0,10*pi,10001);
>> all(a==b)
ans =
0
>> max(a-b)
ans =
3.5527e-15
In practice, however, the differences will often have little impact unless you are interested in tiny numerical details. I find linspace more convenient when the number of gaps is easy to express, whereas I find the colon operator more convenient when the increment is easy to express.
See this MathWorks technical note for more detail on the algorithm behind the colon operator. For more detail on linspace, you can just type edit linspace to see exactly what it does.

linspace is useful where you know the number of elements you want rather than the size of the "step" between them. So if I said make a vector with 360 elements between 0 and 2*pi as a contrived example it's either going to be
linspace(0, 2*pi, 360)
or if you just had the colon operator you would have to manually calculate the step size:
0:(2*pi - 0)/(360-1):2*pi
linspace is just more convenient
For a simple real world application, see this answer where linspace is helpful in creating a custom colour map

How to calculate distance between 2 points in a 2D matrix

I am both new to this website and new to C. I need a program to find the average 'jumps' it takes from all points.
The idea is this: Find "jump" distance from 1 to 2, 1 to 3, 1 to 4 ... 1 to 9, or find 2 to 1, 2 to 3, 2 to 4 2 to 5 etc.
Doing them on the first row is simple, just (2-1) or (3-1) and you get the correct number. But if I want to find the distance between 1 and 4 or 1 to 8 then I have absolutely no idea.
The dimensions of the matrix should potentially be changeable. But I just want help with a 3x3 matrix.
Anyone could show me how to find it?
Jump means vertical or horizontal move from one point to another. from 1 to 2 = 1, from 1 to 9 = 4 (shortest path only)

The definition of "distance" on this kind of problems is always tricky.
Imagine that the points are marks on a field, and you can freely walk all over it. Then, you could take any path from one point to the other. The shortest route then would be a straight line; its length would be the length of the vector that joins the points, which happens to be the difference vector among two points' positions. This length can be computed with the help of Pythagora's theorem: dist = sqrt((x2-x1)^2 + (y2-y1)^2). This is known as the Euclidian distance between the points.
Now imagine that you are in a city, and each point is a building. You can't walk over a building, so the only options are to go either up/down or left/right. Then, the shortest distance is given by the sum of the components of the difference vector; which is the mathematical way of saying that "go down 2 blocks and then one block to the left" means walking 3 blocks' distance: dist = abs(x2-x1) + abs(y2-y1). This is known as the Manhattan distance between the points.
In your problem, however, it looks like the only possible move is to jump to an adjacent point, in a single step, diagonals allowed. Then the problem gets a bit trickier, because the path is very irregular. You need some Graph Theory here, very useful when modeling problems with linked elements, or "nodes". Each point would be a node, connected to their neighbors, and the problem would be to find the shortest path to another given point. If jumps had different weights (for instance, is jumping in diagonal was harder), an easy way to solve this is would be with the Dijkstra's Algorithm; more details on implementation at Wikipedia.
If the cost is always the same, then the problem is reduced to counting the number of jumps in a Breadth-First Search of the destination point from the source.

Let's define the 'jump' distance : "the number of hops required to reach from Point A [Ax,Ay] to Point B [Bx,By]."
Now there can be two ways in which the hops are allowed :
Horizontally/VerticallyIn this case, you can go up/down or left/right. As you have to travel X axis and Y axis independently, your ans is:jumpDistance = abs(Bx - Ax) + abs(By - Ay);
Horizontally/Vertically and also Diagonally
In this case, you can go up/down or left/right and diagonally as well. How it differs from Case 1 is that now you have the ability to change your X axis and Y axis together at the cost of only one jump . Your answer now is:jumpDistance = Max(abs(Bx - Ax),abs(By - Ay));

What is the definition of "jump-distance" ?
If you mean how many jumps a man needs to go from square M to N, if he can only jumps vertically and horizontally, one possibility can:
dist = abs(x2 - x1) + abs(y2 - y1);
For example jump-distance between 1 and 9 is: |3-1|+|3-1| = 4

There are two ways to calculate jump distance.
1) when only horizontal and vertical movements are allowed, in that case all you need to do is form a rectangle in between the two points and calculate the length of two adjacent side. Like if you want to move from 1 to 9 then first move from 1 to 3 and then move from 3 to 9. (Convert it to code)
2) when movements in all eight directions are allowed, things get tricky. Like if you want to move from 1 to 6 suppose. What you'll need to do is you'll have to more from 1 to 5. And then from 5 to 6. The way of doing it in code is to find the maximum in between the difference in x and y coordinates. In this example, in x coordinate, difference is 2 (3-1) and in y coordinate, difference is 1 (2-1). So the maximum of this is 2. So here's the answer. (Convert to code)

I need to translate 3d points relative to a triangle as if the triangle was somewhere else

I posted this on twitter a while ago but seeing how none of my followers appears to be a math/programming genius, I'll try my luck here as well. I got here because I found this which might contain part of my solution.
I described my problem in the following pdf document, containing a picture of what I'm trying to achieve.
To give some more details, I divided the pentagon's of a dodecahedron (12 pentagons) into triangles (5/pentagon, 60 triangles in total), then collected a set of data points relative to each of these triangles.
The idea is to generate terrain meshes for each individual triangle.
To do so, the data must be represented flat, in a 32K x 32K square (idTech4 Megatexture)
I have vaguely heard of transformation matrices, which when set up properly, could do the trick of passing all the data points trough them to have them show up in the right place.
I looked at this source code here but I don't understand how I'm supposed to get the points in and/or out of there, not to mention how to do the setup so I can present each point in turn and get the result point back.
I got as fas as identifying the point that belongs in the back right corner. All my 3D points are originally stored in latitude / longitude pairs. I retrieve the 3D vectors this way:
coord getcoord(point* p)
{
coord c;
c.x=cos(p->lat*pi/180.l) * cos(p->lon*pi/180.l);
c.y=cos(p->lat*pi/180.l) * sin(p->lon*pi/180.l);
c.z=sin(p->lat*pi/180.l);
return c;
};
My thought is that if I can find the center of my triangle, and discover how to offset my angles so the vector from the center of my sphere to the middle of the triangle moves to 90N then my points would already be in the right plane if I rotated them all along the same angles. If I then convert them all to 3d and subtracti the radius from y, they'll be at the correct y position as well.
Then all I'd need to do is the rotation, the scaling, and the moving to the final position.
There are several kinds of 'centers' for a triangle, I think the one I need is the one that is equidistant to the corners of the triangle (Circumcenter?)
But then there might be an easier approach to the whole problem so while I continue my own research, perhaps some of you can help pointing me in the right direction.
It appears as if some sample data is in order, here are a few of these triangles in obj file format:
v 0.000000 0.000000 3396.000000
v 2061.582356 0.000000 2698.646733
v 637.063983 1960.681333 2698.646733
f 1 2 3
And another:
v -938.631230 2888.810129 1518.737455
v 637.063983 1960.681333 2698.646733
v 1030.791271 3172.449325 637.064076
f 1 2 3
You will notice that each point is at a distance of 3396 from 0,0,0
I mentioned 'on the sphere' meaning that the face away from the center of the sphere is the face that needs to become the 'top' when translated into the square.
Theoretically all these triangles should in fact have identical sizes, but due to rounding errors in the math that generated them, this might not be entirely true.
If I'm not mistaken I already took measures to ensure that the first point you see here is always the one opposite the longest border, so it's the one that should go in the far left corner (testing the above 2 samples confirms this, but I'm measuring anyway just to be sure)
Both legs leading away from this point should theoretically have the same length as well, but again rounding errors might slightly offset that.
If I've done it correctly then the longer side is 1,113587 times longer than the 2 shorter sides. Assuming those are identical, then doing some goal seeking in excel, I can deduct that the final points, assuming I was just translating this triangle, should look like:
v 16384.000000 0.000000 16384.000000
v -16384.000000 0.000000 9916.165306
v 9916.165306 0.000000 -16384.000000
f 1 2 3
So I need to setup the matrix to do this transformation, preferably using the 4x4 matrix as explained below.

I would recommend using transform matrices. The 3d transform matrix is a 4x4 data structure which describes a translation and rotation (and possibly a scale). Once you have a matrix you can transform a point like so
result.x = (tmp->pt.x * m->element[0][0]) +
(tmp->pt.y * m->element[1][0]) +
(tmp->pt.z * m->element[2][0]) +
m->element[3][0];
result.y = (tmp->pt.x * m->element[0][1]) +
(tmp->pt.y * m->element[1][1]) +
(tmp->pt.z * m->element[2][1]) +
m->element[3][1];
result.z = (tmp->pt.x * m->element[0][2]) +
(tmp->pt.y * m->element[1][2]) +
(tmp->pt.z * m->element[2][2]) +
m->element[3][2];
int w = (tmp->pt.x * m->element[0][3]) + (tmp->pt.y * m->element[1][3])
+ (tmp->pt.z * m->element[2][3]) + m->element[3][3];
if (w!=0 || w!=1)
result.x/=w; result.y/=w; result.z/=w;
This will transform the 3D point pt by the matrix m. If you now a little matrix math you'll see i'm just multiplying my origin point as a vector against the matrix (and doing a little normalization if it is a skew matrix.) Matrices can be multiplied together to form complicated transformations so they are very useful.
For details on making matrices suggest reading this link.
http://en.wikipedia.org/wiki/Transformation_matrix

Dynamic Programming Problem.. Array Partitioning..

The question says,
That given an array of size n, we have to output/partition the array into subsets which sum to N.
For E,g,
I/p arr{2,4,5,7}, n=4, N(sum) = 7(given)
O/p = {2,5}, {7}
I saw similar kind of problem/explanation in the url Dynamic Programming3
And I have the following queries in the pdf:-
How could we find the subsets which sum to N, as the logic only tells whether the subset exist or not?
Also, if we change the question a bit, can we find two subsets which has equal average using the same ideology?
Can anybody thrown some light on this Dynamic Programming problem.. :)
Thanks in Advance..

You can try to process recursively:
Given a SORTED array X={x1 ... xn} xi !=0 and an intger N.
First find all the possibilities "made" with just one element:
here if N=xp, eliminate all xi s.t i>=p
second find all the possibilities made with 2 elements:
{ (x1,x2) .... (xp-2,xp-1)}
Sort by sum and elminate all the sums >=N
and you had the rules: xi cannot go with xj when xi+xj >= N
Third with 3 elments:
You create all the part that respect the above rule.
And idem step 2
etc...
Example:
X={1,2,4,7,9,10} N=9
step one:
{9}
X'={1,2,4,7,9}
step 2: cannot chose 9 and 10
X={(1,2) (1,4) (2,4) (1,7) (2,7) (4,7)}
{2,7}
X'={(1,2) (1,4) (2,4) (1,7)}
step 3: 4 and 2 cannot go with 7:
X={(1,2,4)}
no sol
{9} {2,7} are the only solutions
This diminishes the total number of comparaison (that would be 2^n = 2^6=64) you only did : 12 comparaisons
hope it helps

Unfortunately, this is a very difficult problem. Even determining if there exists a single subset summing to your target value is NP-Complete.
If the problem is more restricted, you might be able to find a good algorithm. For example:
Do the subsets have to be contiguous?
Can you ignore subsets with more than K values?
Are the array values guaranteed to be positive?
Are the array values guaranteed to be distinct? What about differing from the other values by at least some constant factor?
Is there some bound on the difference between the smallest and largest value?

The proposed algorithm stores only a single bit of information in the temporary array T[N], namely whether it's reachable at all. Obviously, you can store more information at each index [N], such as the values C[i] used to get there. (It's a variation of the "Dealing with Unlimited Copies" chapter in the PDF)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight