Related
I am developing a SLAM algorithm in C, and I have implemented the FAST corner finding method which gives me some strong keypoints in the image. The next step is to get the center of the keypoints with a sub-pixel accuracy, therefore I extract a 3x3 patch around each of them, and do a Least Squares fit of a two dimensional quadratic:
Where f(x,y) is the corner saliency measure of each pixel, similar to the FAST score proposed on the original paper, but modified to also provide a saliency measure in non corner pixels.
And the least squares:
With being the estimated parameters.
I can now calculate the location of the peak of the fitted quadratic, by taking the gradient equal to zero, achieving my original goal.
The issue arises on some corner cases, where the local peak is closer to the edge of the window, resulting in a fit with low residuals but a peak of the quadratic way outside the window.
An example:
The corner saliency and a contour of the fitted quadratic:
The saliency (blue) and fit (red) as 3D meshes:
Numeric values of this example are (row-major ordering):
[336, 522, 483, 423, 539, 153, 221, 412, 234]
And the resulting sub pixel center of (2.6, -17.1) being wrong.
How can I constrain the fit so the center is within the window?
I'm open to alternative methods for finding the sub pixel peak.
The obvious answer is to reject 3x3 (or 5x5, whatever you use) boxes whose discrete maximum is not at the center. In other words, to use a quadratic approximation only to refine the location of a maximum that must be located inside the box.
More generally, in such cases the first questions to ask is not "How do I constrain my model-fitting procedure to shoehorn a solution for this edge case?", but rather
"Does my model apply to this edge case?" and "Is this edge case even worth spending time on, or can I just ignore it?"
I tried my own code to fit a 2D quadratic function to the 3x3 values, using a stable least-squares solving algorithm, and also found a maximum outside of the domain. The 3x3 patch of data does not match a quadratic function, and therefore the fit is not useful.
Fitting a 2D quadratic to a 3x3 neighborhood requires a degree of smoothness in the data that you don't seem to have in your FAST output.
There are many other methods to find the sub-pixel location of the maximum. One that I like because it is more stable and less computationally intensive is the fitting of a "separable" quadratic function. In short, you fit a quadratic function to the three values around the local maximum in one dimension, and then another in the other dimension. Instead of solving 6 parameters with 9 values, this solves 3 parameters with 3 values, twice. The solution is guaranteed stable, as long as the center pixel is larger or equal to all pixels in the 4-connected neighborhood.
z1 = [f(-1,0), f(0,0), f(1,0)]^T
[1,-1,0]
X = [0,0,0]
[1,1,0]
solve: X b1 = z1
and
z2 = [f(0,-1), f(0,0), f(0,1)]^T
[1,-1,0]
X = [0,0,0]
[1,1,0]
solve: X b2 = z2
Now you get the x-coordinate of the centroid from b1 and the y-coordinate from b2.
I'm looking for an algorithm to do a best fit of an arbitrary rectangle to an unordered set of points. Specifically, I'm looking for a rectangle where the sum of the distances of the points to any one of the rectangle edges is minimised. I've found plenty of best fit line, circle and ellipse algorithms, but none for a rectangle. Ideally, I'd like something in C, C++ or Java, but not really that fussy on the language.
The input data will typically be comprised of most points lying on or close to the rectangle, with a few outliers. The distribution of data will be uneven, and unlikely to include all four corners.
Here are some ideas that might help you.
We can estimate if a point is on an edge or on a corner as follows:
Collect the point's n neares neighbours
Calculate the points' centroid
Calculate the points' covariance matrix as follows:
Start with Covariance = ((0, 0), (0, 0))
For each point calculate d = point - centroid
Covariance += outer_product(d, d)
Calculate the covariance's eigenvalues. (e.g. with SVD)
Classify point:
if one eigenvalue is large and the other very small, we are probably on an edge
otherwise we should be on a corner
Extract all corner points and do a segmentation. Choose the four segments with most entries. The centroid of those segments are candidates for the rectangle's corners.
Calculate the normalized direction vectors of two opposite sides and calculate their mean. Calculate the mean of the other two opposite sides. These are the direction vectors of a parallelogram. If you want a rectangle, calculate a perpendicular vector to one of those directions and calculate the mean with the other direction vector. Then the rectangle's direction's are the mean vector and a perpendicular vector.
In order to calculate the corners, you can project the candidates on their directions and move them so that they form the corners of a rectangle.
The idea of a line of best fit is to compute the vertical distances between your points and the line y=ax+b. Then you can use calculus to find the values of a and b that minimize the sum of the squares of the distances. The reason squaring is chosen over absolute value is because the former is differentiable at 0.
If you were to try the same approach with a rectangle, you would run into the problem that the square of the distance to the side of a rectangle is a piecewise defined function with 8 different pieces and is not differentiable when the pieces meet up inside the rectangle.
In order to proceed, you'll need to decide on a function that measures how far a point is from a rectangle that is everywhere differentiable.
Here's a general idea. Make a grid with smallish cells; calculate best fit line for each not-too-empty cell (the calculation is immediate1, there's no search involved). Join adjacent cells while making sure the standard deviation is improving/not worsening much. Thus we detect the four sides and the four corners, and divide our points into four groups, each belonging to one of the four sides.
Next, we throw away the corner cells, put the true rectangle in place of the four approximate
lines and do a bit of hill climbing (or whatever). The calculation of best fit line may be augmented for this case, since the two lines are parallel, and we've already separated our points into the four groups (for a given rectangle, we know the delta-y between the two opposing sides (taking horizontal-ish sides for a moment), so we just add this delta-y to the ys of the lower group of points and make the calculation).
The initial rectangular grid may be replaced with working by stripes (say, vertical). Then, at least half of the stripes will have two pronounced groupings of points (find them by dividing each stripe by horizontal division lines into cells).
1For a line Y = a*X+b, minimize the sum of squares of perpendicular distances of data points {xi,yi} to that line. This is directly solvable for a and b. For more vertical lines, flip the Xs and the Ys.
P.S. I interpret the problem as minimizing the sum of squares of perpendicular distances of each point to its nearest side of the rectangle, not to all the rectangle's sides.
I am not completely sure, but You might play around first 2 (3?) dimensions over the PCA from your points. it will work reasonably fast for the most cases.
I made a photo mosaic script (PHP). This script has one picture and changes it to a photo buildup of little pictures. From a distance it looks like the real picture, when you move closer you see it are all little pictures. I take a square of a fixed number of pixels and determine the average color of that square. Then I compare this with my database which contains the average color of a couple thousand of pictures. I determine the color distance with all available images. But to run this script fully it takes a couple of minutes.
The bottleneck is matching the best picture with a part of the main picture. I have been searching online how to reduce this and came a cross “Antipole Clustering.” Of course I tried to find some information on how to use this method myself but I can’t seem to figure out what to do.
There are two steps. 1. Database acquisition and 2. Photomosaic creation.
Let’s start with step one, when this is all clear. Maybe I understand step 2 myself.
Step 1:
partition each image of the database into 9 equal rectangles arranged in a 3x3 grid
compute the RGB mean values for each rectangle
construct a vector x composed by 27 components (three RGB components for each rectangle)
x is the feature vector of the image in the data structure
Well, point 1 and 2 are easy but what should I do at point 3. How do I compose a vector X out of the 27 components (9 * R mean, G mean, B mean.)
And when I succeed to compose the vector, what is the next step I should do with this vector.
Peter
Here is how I think the feature vector is computed:
You have 3 x 3 = 9 rectangles.
Each pixel is essentially 3 numbers, 1 for each of the Red, Green, and Blue color channels.
For each rectangle you compute the mean for the red, green, and blue colors for all the pixels in that rectangle. This gives you 3 numbers for each rectangle.
In total, you have 9 (rectangles) x 3 (mean for R, G, B) = 27 numbers.
Simply concatenate these 27 numbers into a single 27 by 1 (often written as 27 x 1) vector. That is 27 numbers grouped together. This vector of 27 numbers is the feature vector X that represents the color statistic of your photo. In the code, if you are using C++, this will probably be an array of 27 number or perhaps even an instance of the (aptly named) vector class. You can think of this feature vector as some form of "summary" of what the color in the photo is like. Roughly, things look like this: [R1, G1, B1, R2, G2, B2, ..., R9, G9, B9] where R1 is the mean/average of red pixels in the first rectangle and so on.
I believe step 2 involves some form of comparing these feature vectors so that those with similar feature vectors (and hence similar color) will be placed together. Comparison will likely involve the use of the Euclidean distance (see here), or some other metric, to compare how similar the feature vectors (and hence the photos' color) are to each other.
Lastly, as Anony-Mousse suggested, converting your pixels from RGB to HSB/HSV color would be preferable. If you use OpenCV or have access to it, this is simply a one liner code. Otherwise wiki HSV etc. will give your the math formula to perform the conversion.
Hope this helps.
Instead of using RGB, you might want to use HSB space. It gives better results for a wide variety of use cases. Put more weight on Hue to get better color matches for photos, or to brightness when composing high-contrast images (logos etc.)
I have never heard of antipole clustering. But the obvious next step would be to put all the images you have into a large index. Say, an R-Tree. Maybe bulk-load it via STR. Then you can quickly find matches.
Maybe it means vector quantization (vq). In vq the image isn't subdivide in rectangles but in density areas. Then you can take a mean point of this cluster. First off you need to take all colors and pixels separate and transfer it to a vector with XY coordinate. Then you can use a density clustering like voronoi cells and get the mean point. This point can you compare with other pictures in the database. Read here about VQ: http://www.gamasutra.com/view/feature/3090/image_compression_with_vector_.php.
How to plot vector from adjacent pixel:
d(x) = I(x+1,y) - I(x,y)
d(y) = I(x,y+1) - I(x,y)
Here's another link: http://www.leptonica.com/color-quantization.html.
Update: When you have already computed the mean color of your thumbnail you can proceed and sort all the means color in a rgb map and using the formula I give to you to compute the vector x. Now that you have a vector of all your thumbnails you can use the antipole tree to search for a thumbnail. This is possbile because the antipole tree is something like a kd-tree and subdivide the 2d space. Read here about antipole tree: http://matt.eifelle.com/2012/01/17/qtmosaic-0-2-faster-mosaics/. Maybe you can ask the author and download the sourcecode?
I have an array of N measurements that I should present as a graph, but the graph can be only of M pixels wide and is scrolled only by M pixels.
While M is constant, N can be anything from tens to thousands.
Each time when I need to show the graph, I know what N is, however as N/M can be not integer, there is an accumulated error that I want somehow to compensate.
I am working in a plain C, and no math libraries can be used.
EDIT 2:
The data is relatively homogeneous with peaks once in a while, and I do not want to miss these peaks, while interpolating.
EDIT 3:
I am looking for solution that will work good enough for any N, greater than M and lesser than M.
Thanks.
One good solution is not to iterate over your input samples, but over your output positions. That is, you will always draw exactly M pixels. To calculate the nearest sample value for the ith pixel, use the array offset:
[(i*N+M/2)/M]
Of course only using the nearest sample will give a very aliased result (discarding most of your samples in the case where N is large). If you're sure N will always be larger than M, a good but simple approach is to average sufficient neighboring samples with a weighted average such that each sample gets a total weight of 1 (with endpoints having their weight split between neighboring output pixels). Of course there are more elaborate resampling algorithms you can use that may be more appropriate (especially if your data is something like audio samples which are more meaningful in the frequency domain), but for an embedded device with tight memory and clock cycle requirements, averaging is likely the approach you want.
In hashlife the field is typically treated as a theoretically infinite grid, with the pattern in question centered near the origin. A quadtree is used to represent the field. Given a square of 2^(2k) cells, 2k on a side, at the kth level of the tree, the hash table stores the 2^(k-1) by 2^(k-1) square of cells in the center, 2^(k-2) generations in the future. For example, for a 4x4 square it stores the 2x2 center, 1 generation forward; and for an 8x8 square it stores the 4x4 center, 2 generations forward.
So given a 8x8 initial configuration we get a 4x4 square 1 generation forward centered w.r.t. the 8x8 square and a 2x2 square 2 generations forward (1 generation forward w.r.t the 4x4 square) centered w.r.t the 8x8 square. With every new generation our view of the grid reduces, in-turn we get the next state of the automata. We canot go any further after getting the inner most 2x2 square 2^(k-2) generations forward.
So how does the hashlife in Golly go on forever? Also its view of the field never seems to reduce. It seems to show the state of the whole automata after 2^(k-2) generations. More so given a starting configuration which expands with time, the view of the algorithm seems to increase. The view of the grid zooms out to show the expanding automata?
There's a good article on Dr. Dobb's which goes into detail about how HashLife works. The basic answer is that you don't merely run the algorithm on the existing nodes, you also use new shifted nodes to get the next generation.
To be clear (because your ^ symbols were missing), you are asking:
Given a square of 2^(2k) cells, 2^k on a side, at the kth level of the
tree, the hash table stores the 2^(k-1)-by-2^(k-1) square of cells in
the center, 2^(k-2) generations in the future. [...]
So given a 8x8 initial configuration [...] With every new generation
our view of the grid reduces, in-turn we get the next state of the
automata. We canot go any further after getting the inner most 2x2
square 2^k-2 generations forward.
So how does the hashlife in Golly go on forever? Also its view of the
field never seems to reduce.
Instead of starting with your 8x8 pattern, imagine instead that you start with a bigger pattern that happens to contain your 8x8 pattern inside it. For example, you could start with a 16x16 pattern that has your 8x8 pattern in the center, and a 4-row margin of blank cells all around the edges. Such a pattern is easy to construct, by assembling blank 4x4 nodes with the 4x4 subnodes of your 8x8 start pattern.
Given such a 16x16 pattern, the HashLife algorithm can give you an 8x8 answer, 4 generations in the future.
You want more? Okay, start with a 32x32 pattern that contains mostly blank space, with the 8x8 pattern in the center. With this you can get a 16x16 answer that is 8 generations into the future.
What if your pattern contains moving objects that move fast enough that they go outside that 16x16 area after 8 generations? Simple -- start with a 64x64 start pattern, but instead of trying to run it for a whole 16 generations, just run it for 8 generations.
In general, all cases of arbitrarily large, possibly expanding patterns, over arbitrarily long periods of time, can be handled (and in fact are handled in Golly) by adding as much blank space as needed around the outside of the pattern.
The centered squares are only the precomputed stuff. The algorithm indeed keeps the whole universe at all times and updates all parts of it, not just the centers.