Algorithm for best fit rectangle

Algorithm for best fit rectangle - c

I'm looking for an algorithm to do a best fit of an arbitrary rectangle to an unordered set of points. Specifically, I'm looking for a rectangle where the sum of the distances of the points to any one of the rectangle edges is minimised. I've found plenty of best fit line, circle and ellipse algorithms, but none for a rectangle. Ideally, I'd like something in C, C++ or Java, but not really that fussy on the language.
The input data will typically be comprised of most points lying on or close to the rectangle, with a few outliers. The distribution of data will be uneven, and unlikely to include all four corners.

Here are some ideas that might help you.
We can estimate if a point is on an edge or on a corner as follows:
Collect the point's n neares neighbours
Calculate the points' centroid
Calculate the points' covariance matrix as follows:
Start with Covariance = ((0, 0), (0, 0))
For each point calculate d = point - centroid
Covariance += outer_product(d, d)
Calculate the covariance's eigenvalues. (e.g. with SVD)
Classify point:
if one eigenvalue is large and the other very small, we are probably on an edge
otherwise we should be on a corner
Extract all corner points and do a segmentation. Choose the four segments with most entries. The centroid of those segments are candidates for the rectangle's corners.
Calculate the normalized direction vectors of two opposite sides and calculate their mean. Calculate the mean of the other two opposite sides. These are the direction vectors of a parallelogram. If you want a rectangle, calculate a perpendicular vector to one of those directions and calculate the mean with the other direction vector. Then the rectangle's direction's are the mean vector and a perpendicular vector.
In order to calculate the corners, you can project the candidates on their directions and move them so that they form the corners of a rectangle.

The idea of a line of best fit is to compute the vertical distances between your points and the line y=ax+b. Then you can use calculus to find the values of a and b that minimize the sum of the squares of the distances. The reason squaring is chosen over absolute value is because the former is differentiable at 0.
If you were to try the same approach with a rectangle, you would run into the problem that the square of the distance to the side of a rectangle is a piecewise defined function with 8 different pieces and is not differentiable when the pieces meet up inside the rectangle.
In order to proceed, you'll need to decide on a function that measures how far a point is from a rectangle that is everywhere differentiable.

Here's a general idea. Make a grid with smallish cells; calculate best fit line for each not-too-empty cell (the calculation is immediate1, there's no search involved). Join adjacent cells while making sure the standard deviation is improving/not worsening much. Thus we detect the four sides and the four corners, and divide our points into four groups, each belonging to one of the four sides.
Next, we throw away the corner cells, put the true rectangle in place of the four approximate
lines and do a bit of hill climbing (or whatever). The calculation of best fit line may be augmented for this case, since the two lines are parallel, and we've already separated our points into the four groups (for a given rectangle, we know the delta-y between the two opposing sides (taking horizontal-ish sides for a moment), so we just add this delta-y to the ys of the lower group of points and make the calculation).
The initial rectangular grid may be replaced with working by stripes (say, vertical). Then, at least half of the stripes will have two pronounced groupings of points (find them by dividing each stripe by horizontal division lines into cells).
1For a line Y = a*X+b, minimize the sum of squares of perpendicular distances of data points {xi,yi} to that line. This is directly solvable for a and b. For more vertical lines, flip the Xs and the Ys.
P.S. I interpret the problem as minimizing the sum of squares of perpendicular distances of each point to its nearest side of the rectangle, not to all the rectangle's sides.

I am not completely sure, but You might play around first 2 (3?) dimensions over the PCA from your points. it will work reasonably fast for the most cases.

Related

Efficient way of calculating minimum distance between point and multiple faces

I have multiple faces in 3D space creating cells. All these faces lie within a predefined cube (e.g. of size 100x100x100).
Every face is convex and defined by a set of corner points and a normal vector. Every cell is convex. The cells are result of 3d voronoi tessellation, and I know the initial seed points of the cells.
Now for every integer coordinate I want the smallest distance to any face.
My current solution uses this answer https://math.stackexchange.com/questions/544946/determine-if-projection-of-3d-point-onto-plane-is-within-a-triangle/544947 and calculates for every point for every face for every possible triple of this faces points the projection of the point to the triangle created by the triple, checks if the projection is inside the triangle. If this is the case I return the distance between projection and original point. If not I calculate the distance from the point to every possible line segment defined by two points of a face. Then I choose the smallest distance. I repeat this for every point.
This is quite slow and clumsy. I would much rather calculate all points that lie on (or almost lie on) a face and then with these calculate the smallest distance to all neighbour points and repeat this.
I have found this Get all points within a Triangle but am not sure how to apply it to 3D space.
Are there any techniques or algorithms to do this efficiently?

Since we're working with a Voronoi tessellation, we can simplify the current algorithm. Given a grid point p, it belongs to the cell of some site q. Take the minimum over each neighboring site r of the distance from p to the plane that is the perpendicular bisector of qr. We don't need to worry whether the closest point s on the plane belongs to the face between q and r; if not, the segment ps intersects some other face of the cell, which is necessarily closer.
Actually it doesn't even matter if we loop r over some sites that are not neighbors. So if you don't have access to a point location subroutine, or it's slow, we can use a fast nearest neighbors algorithm. Given the grid point p, we know that q is the closest site. Find the second closest site r and compute the distance d(p, bisector(qr)) as above. Now we can prune the sites that are too far away from q (for every other site s, we have d(p, bisector(qs)) ≥ d(q, s)/2 − d(p, q), so we can prune s unless d(q, s) ≤ 2 (d(p, bisector(qr)) + d(p, q))) and keep going until we have either considered or pruned every other site. To do pruning in the best possible way requires access to the guts of the nearest neighbor algorithm; I know that it slots right into the best-first depth-first search of a kd-tree or a cover tree.

Robustly finding the local maximum of an image patch with sub-pixel accuracy

I am developing a SLAM algorithm in C, and I have implemented the FAST corner finding method which gives me some strong keypoints in the image. The next step is to get the center of the keypoints with a sub-pixel accuracy, therefore I extract a 3x3 patch around each of them, and do a Least Squares fit of a two dimensional quadratic:
Where f(x,y) is the corner saliency measure of each pixel, similar to the FAST score proposed on the original paper, but modified to also provide a saliency measure in non corner pixels.
And the least squares:
With being the estimated parameters.
I can now calculate the location of the peak of the fitted quadratic, by taking the gradient equal to zero, achieving my original goal.
The issue arises on some corner cases, where the local peak is closer to the edge of the window, resulting in a fit with low residuals but a peak of the quadratic way outside the window.
An example:
The corner saliency and a contour of the fitted quadratic:
The saliency (blue) and fit (red) as 3D meshes:
Numeric values of this example are (row-major ordering):
[336, 522, 483, 423, 539, 153, 221, 412, 234]
And the resulting sub pixel center of (2.6, -17.1) being wrong.
How can I constrain the fit so the center is within the window?
I'm open to alternative methods for finding the sub pixel peak.

The obvious answer is to reject 3x3 (or 5x5, whatever you use) boxes whose discrete maximum is not at the center. In other words, to use a quadratic approximation only to refine the location of a maximum that must be located inside the box.
More generally, in such cases the first questions to ask is not "How do I constrain my model-fitting procedure to shoehorn a solution for this edge case?", but rather
"Does my model apply to this edge case?" and "Is this edge case even worth spending time on, or can I just ignore it?"

I tried my own code to fit a 2D quadratic function to the 3x3 values, using a stable least-squares solving algorithm, and also found a maximum outside of the domain. The 3x3 patch of data does not match a quadratic function, and therefore the fit is not useful.
Fitting a 2D quadratic to a 3x3 neighborhood requires a degree of smoothness in the data that you don't seem to have in your FAST output.
There are many other methods to find the sub-pixel location of the maximum. One that I like because it is more stable and less computationally intensive is the fitting of a "separable" quadratic function. In short, you fit a quadratic function to the three values around the local maximum in one dimension, and then another in the other dimension. Instead of solving 6 parameters with 9 values, this solves 3 parameters with 3 values, twice. The solution is guaranteed stable, as long as the center pixel is larger or equal to all pixels in the 4-connected neighborhood.
z1 = [f(-1,0), f(0,0), f(1,0)]^T
[1,-1,0]
X = [0,0,0]
[1,1,0]
solve: X b1 = z1
and
z2 = [f(0,-1), f(0,0), f(0,1)]^T
[1,-1,0]
X = [0,0,0]
[1,1,0]
solve: X b2 = z2
Now you get the x-coordinate of the centroid from b1 and the y-coordinate from b2.

Connect points to plane/Draw Polygon

I'm currently working on a project where I want to draw different mathematical objects onto a 3D cube. It works as it should for Points and Lines given as a vector equation. Now I have a plane given as a parametric equation. This plane can be somewhere in the 3D space and may be visible on the screen, which is this 3D cube. The cube acts as an AABB.
First thing I needed to know was whether the plane intersects with the cube. To do this I made lines who are identical to the edges of this cube and then doing 12 line/plane intersections, calculating whether the line is hit inside the line segment(edge) which is part of the AABB. Doing this I will get a set of Points defining the visible part of the plane in the cube which I have to draw.
I now have up to 6 points A, B, C, D, E and F defining the polygon ABCDEF I would like to draw. To do this I want to split the polygon into triangles for example: ABC, ACD, ADE, AED. I would draw this triangles like described here. The problem I am currently facing is, that I (believe I) need to order the points to get correct triangles and then a correctly drawn polygon. I found out about convex hulls and found QuickHull which works in three dimensional space. There is just one problem with this algorithm: At the beginning I need to create a three dimensional simplex to have a starting point for the algorithm. But as all my points are in the same plane they simply form a two dimensional plane. Thus I think this algorithm won't work.
My question is now: How do I order these 3D points resulting in a polygon that should be a 2D convex hull of these points? And if this is a limitation: I need to do this in C.
Thanks for your help!

One approach is to express the coordinates of the intersection points in the space of the plane, which is 2D, instead of the global 3D space. Depending on how exactly you computed these points, you may already have these (say (U, V)) coordinates. If not, compute two orthonormal vectors that belong to the plane and take the dot products with the (X, Y, Z) intersections. Then you can find the convex hull in 2D.
The 8 corners of the cube can be on either side of the plane, and have a + or - sign when the coordinates are plugged in the implicit equation of the plane (actually the W coordinate of the vertices). This forms a maximum of 2^8=256 configurations (of which not all are possible).
For efficiency, you can solve all these configurations once for all, and for every case list the intersections that form the polygon in the correct order. Then for a given case, compute the 8 sign bits, pack them in a byte and lookup the table of polygons.
Update: direct face construction.
Alternatively, you can proceed by tracking the intersection points from edge to edge.
Start from an edge of the cube known to traverse the plane. This edge belongs to two faces. Choose one arbitrarily. Then the plane cuts this face in a triangle and a pentagon, or two quadrilaterals. Go to the other the intersection with an edge of the face. Take the other face bordered by this new edge. This face is cut in a triangle and a pentagon...
Continuing this process, you will traverse a set of faces and corresponding segments that define the section polygon.
In the figure, you start from the intersection on edge HD, belonging to face DCGH. Then move to the edge GC, also in face CGFB. From there, move to edge FG, also in face EFGH. Move to edge EH, also in face ADHE. And you are back on edge HD.
Complete discussion must take into account the case of the plane through one or more vertices of the cube. (But you can cheat by slightly translating the plane, constructing the intersection polygon and removing the tiny edges that may have been artificially created this way.)

Check that smaller cubes fill bigger cube

Given one large cube (axis aligned and on integer coordinates), and many smaller cubes (also axis aligned and on integer coordinates). How can we check that the large cube is perfectly filled by the smaller cubes.
Currently we check that:
For each small cube it is fully contained by the large cube.
That it doesn't intersect any other small cube.
The sum of the volumes of the small cubes equals the volume of the large cube.
This is ok for small numbers of cubes but we need to support this test of cubes with dimensions greater than 2^32. Even at 2^16 the number of small cubes required to fill the large cube is large enough that step 2 takes a while (O(n^2) checking each cube intersects no other).
Is there a better algorithm?
EDIT:
There seems to be some confusion over this. I am not trying to split a cube into smaller cubes. That's already done. Part of our program splits large OpenCL ranges (axis aligned cubes on integer coordinates) into lots of smaller ranges that fit into a hardware job.
What I'm doing is hooking into this system and checking that the jobs it produces correctly cover the large initial range. My algorithm above works, but it's slow and given the amount of tests we have to run I'd like to keep these tests as fast as possible.

We are talking about 3D right?
For 2D one can do a similar (but simpler) process (with, I believe, an O(n log n) running time algorithm).
The basic idea of the below is the sweep-line algorithm.
Note that rectangle intersection can done by checking whether any corner of any cube is contained in any other cube.
You can improve on (2) as follows:
Split each cube into 2 rectangles on the y-z plane (so you'd have 2 rectangles defined by the same set of 4 (y,z) coordinates, but the x coordinates will be different between the rectangles).
Define the rectangle with the smaller x-coordinate as the start of a cube and the other rectangle as the end of a cube.
Sort the rectangles by x-coordinate
Have an initially empty interval tree
(each interval should also store a reference to the rectangle to which it belongs)
For each rectangle:
Look up the y-coordinate of each point of the rectangle in the interval tree.
For each matching interval, look up its rectangle and check whether the point is also contained within the z-coordinates (this is all that's required because the tree only contains x-coordinates in the correct range and we check the y-coordinates by doing the interval lookup).
If it is, we have overlap.
If the rectangle is the start of a cube, insert the 2 y-coordinates of the rectangle as an interval into the interval tree.
Otherwise, remove the interval defined by the 2 y-coordinates from the tree.
The running time is between O(n) (best case) and O(n2) (worst case), depending on how much overlap there is in the x- and y-coordinates (more overlap is worse).

order your insert cubes
insert the biggest insert cube in one of the corners of your cube and split up the remaining cube into subcubes
insert the second biggest insert cube in the first of the sub cubes that will fit and add the remaining subcubes of this subcube to the set of subcubes
etc.

Another go, again only addressing step 2 in the original question:
Define a space-filling curve with good spatial locality, such as a 3D Hilbert Curve.
For each cube calculate the pair of coordinates on the curve for the points at which the curve both enters and leaves the cube. The space-filling curve will enter and leave some cubes more than once, calculate more than one pair of coordinates for these cases.
You've now got I don't know how many pairs of coordinates, but I'd guess no more than 2^18. These coordinates define intervals along the space-filling curve, so sort them and look for overlaps.
Time complexity is probably dominated by the sort, space complexity is probably quite big.

Given centers, find minimum radius for set of circles such that they fully cover another

I have the following geometry problem:
You are given a circle with the center in origin - C(0, 0), and radius 1. Inside the circle are given N points which represent the centers of N different circles. You are asked to find the minimum radius of the small circles (the radius of all the circles are equal) in order to cover all the boundary of the large circle.
The number of circles is: 3 ≤ N ≤ 10000 and the problem has to be solved with a precision of P decimals where 1 ≤ P ≤ 6.
For example:
N = 3 and P = 4
and the coordinates:
(0.193, 0.722)
(-0.158, -0.438)
(-0.068, 0.00)
The radius of the small circles is: 1.0686.
I have the following idea but my problem is implementing it. The idea consists of a binary search to find the radius and for each value given by the binary search to try and find all the intersection point between the small circles and the large one. Each intersection will have as result an arc. The next step is to 'project' the coordinates of the arcs on to the X axis and Y axis, the result being a number of intervals. If the reunions of the intervals from the X and the Y axis have as result the interval [-1, 1] on each axis, it means that all the circle is covered.
In order to avoid precision problems I thought of searching between 0 and 2×10P, and also taking the radius as 10P, thus eliminating the figures after the comma, but my problem is figuring out how to simulate the intersection of the circles and afterwards how to see if the reunion of the resulting intervals form the interval [-1, 1].
Any suggestions are welcomed!

Each point in your set has to cover the the intersection of its cell in the point-set's voronoi diagram and the test-circle around the origin.
To find the radius, start by computing the voronoi diagram of your point set. Now "close" this voronoi diagram by intersecting all infinite edges with your target-circle. Then for each point in your set, check the distance to all the points of its "closed" voronoi cell. The maximum should be your solution.
It shouldn't matter that the cells get closed by an arc instead of a straight line by the test-circle until your solution radius gets greater than 1 (because then the "small" circles will arc stronger). In that case, you also have to check the furthest point from the cell center to that arc.

I might be missing something, but it seems that you only need to find the maximal minimal distance between a point in the circle and the given points.
That is, if you consider the set of all points on the circle, and take the minimal distance between each point to one of the given points, and then take the maximal values of all these - you have found your radius.
This is, of course, not an algorithm, as there are uncountably many points.
I think what I'll do would be along the line of:
Find the minimal distance between the circumference and the set of points, this is your initial radius R.
Check if the entire circle was covered, like so:
For any two points whose distance from each other is more than 2R, check if the entire segment was covered (for each point, check if the circle around it intersects, and if so, remove that segment and keep going). That should take about o(N^3) (you iterate over all of the points for each pair of points). If I'm correct (though I didn't formally prove it) the circle is covered iff all of the segments are covered.
Of all the segment which weren't covered, take the long one, and add half it's length to R.
Repeat.
This algorithm will never cover the circle per se, but it's easy to prove that it exponentially converges to a full cover, so it should be able to find the needed radius with arbitrary accuracy within a reasonable amount of iterations.
Hope that helps.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight