Is the Leptonica implementation of 'Modified Median Cut' not using the median at all? - c

I'm playing around a bit with image processing and decided to read up on how color quantization worked and after a bit of reading I found the Modified Median Cut Quantization algorithm.
I've been reading the code of the C implementation in Leptonica library and came across something I thought was a bit odd.
Now I want to stress that I am far from an expert in this area, not am I a math-head, so I am predicting that this all comes down to me not understanding all of it and not that the implementation of the algorithm is wrong at all.
The algorithm states that the vbox should be split along the lagest axis and that it should be split using the following logic
The largest axis is divided by locating the bin with the median pixel
(by population), selecting the longer side, and dividing in the center
of that side. We could have simply put the bin with the median pixel
in the shorter side, but in the early stages of subdivision, this
tends to put low density clusters (that are not considered in the
subdivision) in the same vbox as part of a high density cluster that
will outvote it in median vbox color, even with future median-based
subdivisions. The algorithm used here is particularly important in
early subdivisions, and 3is useful for giving visible but low
population color clusters their own vbox. This has little effect on
the subdivision of high density clusters, which ultimately will have
roughly equal population in their vboxes.
For the sake of the argument, let's assume that we have a vbox that we are in the process of splitting and that the red axis is the largest. In the Leptonica algorithm, on line 01297, the code appears to do the following
Iterate over all the possible green and blue variations of the red color
For each iteration it adds to the total number of pixels (population) it's found along the red axis
For each red color it sum up the population of the current red and the previous ones, thus storing an accumulated value, for each red
note: when I say 'red' I mean each point along the axis that is covered by the iteration, the actual color may not be red but contains a certain amount of red
So for the sake of illustration, assume we have 9 "bins" along the red axis and that they have the following populations
4 8 20 16 1 9 12 8 8
After the iteration of all red bins, the partialsum array will contain the following count for the bins mentioned above
4 12 32 48 49 58 70 78 86
And total would have a value of 86
Once that's done it's time to perform the actual median cut and for the red axis this is performed on line 01346
It iterates over bins and check they accumulated sum. And here's the part that throws me of from the description of the algorithm. It looks for the first bin that has a value that is greater than total/2
Wouldn't total/2 mean that it is looking for a bin that has a value that is greater than the average value and not the median ? The median for the above bins would be 49
The use of 43 or 49 could potentially have a huge impact on how the boxes are split, even though the algorithm then proceeds by moving to the center of the larger side of where the matched value was..
Another thing that puzzles me a bit is that the paper specified that the bin with the median value should be located, but does not mention how to proceed if there are an even number of bins.. the median would be the result of (a+b)/2 and it's not guaranteed that any of the bins contains that population count. So this is what makes me thing that there are some approximations going on that are negligible because of how the split actually takes part at the center of the larger side of the selected bin.
Sorry if it got a bit long winded, but I wanted to be as thoroughas I could because it's been driving me nuts for a couple of days now ;)

In the 9-bin example, 49 is the number of pixels in the first 5 bins. 49 is the median number in the set of 9 partial sums, but we want the median pixel in the set of 86 pixels, which is 43 (or 44), and it resides in the 4th bin.
Inspection of the modified median cut algorithm in colorquant2.c of leptonica shows that the actual cut location for the 3d box does not necessarily occur adjacent to the bin containing the median pixel. The reasons for this are explained in the function medianCutApply(). This is one of the "modifications" to Paul Heckbert's original method. The other significant modification is to make the decision of which 3d box to cut next based on a combination of both population and the product (population * volume), thus permitting splitting of large but sparsely populated regions of color space.

I do not know the algo, but I would assume your array contains the population of each red; let's explain this with an example:
Assume you have four gradations of red: A,B,C and D
And you have the following sequence of red values:
AABDCADBBBAAA
To find the median, you would have to sort them according to red value and take the middle:
median
v
AAAAAABBBBCDD
Now let's use their approach:
A:6 => 6
B:4 => 10
C:1 => 11
D:2 => 13
13/2 = 6.5 => B
I think the mismatch happened because you are counting the population; the average color would be:
(6*A+4*B+1*C+2*D)/13

Related

Robustly finding the local maximum of an image patch with sub-pixel accuracy

I am developing a SLAM algorithm in C, and I have implemented the FAST corner finding method which gives me some strong keypoints in the image. The next step is to get the center of the keypoints with a sub-pixel accuracy, therefore I extract a 3x3 patch around each of them, and do a Least Squares fit of a two dimensional quadratic:
Where f(x,y) is the corner saliency measure of each pixel, similar to the FAST score proposed on the original paper, but modified to also provide a saliency measure in non corner pixels.
And the least squares:
With being the estimated parameters.
I can now calculate the location of the peak of the fitted quadratic, by taking the gradient equal to zero, achieving my original goal.
The issue arises on some corner cases, where the local peak is closer to the edge of the window, resulting in a fit with low residuals but a peak of the quadratic way outside the window.
An example:
The corner saliency and a contour of the fitted quadratic:
The saliency (blue) and fit (red) as 3D meshes:
Numeric values of this example are (row-major ordering):
[336, 522, 483, 423, 539, 153, 221, 412, 234]
And the resulting sub pixel center of (2.6, -17.1) being wrong.
How can I constrain the fit so the center is within the window?
I'm open to alternative methods for finding the sub pixel peak.
The obvious answer is to reject 3x3 (or 5x5, whatever you use) boxes whose discrete maximum is not at the center. In other words, to use a quadratic approximation only to refine the location of a maximum that must be located inside the box.
More generally, in such cases the first questions to ask is not "How do I constrain my model-fitting procedure to shoehorn a solution for this edge case?", but rather
"Does my model apply to this edge case?" and "Is this edge case even worth spending time on, or can I just ignore it?"
I tried my own code to fit a 2D quadratic function to the 3x3 values, using a stable least-squares solving algorithm, and also found a maximum outside of the domain. The 3x3 patch of data does not match a quadratic function, and therefore the fit is not useful.
Fitting a 2D quadratic to a 3x3 neighborhood requires a degree of smoothness in the data that you don't seem to have in your FAST output.
There are many other methods to find the sub-pixel location of the maximum. One that I like because it is more stable and less computationally intensive is the fitting of a "separable" quadratic function. In short, you fit a quadratic function to the three values around the local maximum in one dimension, and then another in the other dimension. Instead of solving 6 parameters with 9 values, this solves 3 parameters with 3 values, twice. The solution is guaranteed stable, as long as the center pixel is larger or equal to all pixels in the 4-connected neighborhood.
z1 = [f(-1,0), f(0,0), f(1,0)]^T
[1,-1,0]
X = [0,0,0]
[1,1,0]
solve: X b1 = z1
and
z2 = [f(0,-1), f(0,0), f(0,1)]^T
[1,-1,0]
X = [0,0,0]
[1,1,0]
solve: X b2 = z2
Now you get the x-coordinate of the centroid from b1 and the y-coordinate from b2.

Finding groups of clusters in binary array

I have a binary image which I have extracted all the pixels and wrote them to a txt file. I am trying to find how many clusters and where there are clusters of 25 or more 1's in the array.
DBSCAN, euclidean distances.
db_scan = DBSCAN(eps=1, min_samples=25,metric='euclidean', metric_params=None, algorithm='auto').fit(im_bw)
I expect to find the i, j location of the center of the clusters. I expect to find the number of clusters but says i only have 1
It probably says 0 clusters, only noise.
Because in a radius of 1 pixel you won't have 25 pixels.
Note that you need to make sure you chose the right representation and don't make false assumptions on what the algorithm does... For example, it does not produce centers.

How would I organise a clustered set of 2D coordinates into groups of close together sets?

I have a large amount of 2D sets of coordinates on a 6000x6000 plane (2116 sets), available here: http://pastebin.com/kiMQi7yu (the context isn't really important so I just pasted the raw data).
I need to write an algorithm to group together coordinates that are close to each other by some threshold. The coordinates in my list are already in groups on that plane, but the order is very scattered.
Despite this task being rather brain-melting to me at first, I didn't admit defeat instantly; this is what I tried:
First sort the list by the Y value, then sort it by the X value. Run through the list checking the distance between the current set and the previous. If they are close enough (100 units) then add them to the same group.
This method didn't really work out (as I expected). There are still objects that are pretty close that are in different groups, because I'm only comparing the next set in the list and the list is sorted by the X position.
I'm out of ideas! The language I'm using is C but I suppose that's not really relevant since all I need is an idea for how the algorithm should work. Thanks!
Though I haven't looked at the data set, it seems that you already know how many groups there are. Have you considered using k means? http://en.m.wikipedia.org/wiki/K-means_clustering
I'm just thinking this along while I write.
Tile the "arena" with squares that have the diameter of your distance (200) as their diagonal.
If there are any points within a square (x,y), they are tentatively part of Cluster(x,y).
Within each square (x,y), there are (up to) 4 areas where the circles of Cluster(x-1,y), Cluster(x+1,y), Cluster(x, y-1) and Cluster(x,y+1) overlap "into" the square; of these consider only those Clusters that are tentatively non-empty.
If all points of Cluster(x,y) are in the (up to 4) overlapping segments of non-empty neighbouring clusters: reallocate these points to the pertaining Cluster and remove Cluster(x,y) from the set of non-empty Clusters.
Added later: Regarding 3., the set of points to be investigated for one neighbour can be coarsely but quickly (!) determined by looking at the rectangle enclosing the segment. [End of addition]
This is just an idea - I can't claim that I've ever done anything remotely like this.
A simple, often used method for spatially grouping points, is to calculate the distance between each unique pair of points. If the distance does not exceed some predefined limit, then the points belong to the same group.
One way to think about this algorithm, is to consider each point as a limit-diameter ball (made of soft foam, so that balls can intersect each other). All balls that are in contact belong to the same group.
In practice, you calculate the squared distance, (x2 - x1)2 + (y2 - y1)2, to avoid the relatively slow square root operation. (Just remember to square the limit, too.)
To track which group each point belongs to, a disjoint-set data structure is used.
If you have many points (a few thousand is not many), you can use partitioning or other methods to limit the number of pairs to consider. Partitioning is probably the most used, as it is very simple to implement: just divide the space into squares of limit size, and then you only need to consider points within each square, and between points in neighboring squares.
I wrote a small awk script to find the groups (no partitioning, about 84 lines or awk code, also numbers the groups consecutively from 1 onwards, and outputs each input point, the group number, and the number of points in each group). Here's the results summarized:
Limit Singles Pairs Triplets Clusters (of four or more points)
1.0 1313 290 29 24
2.0 1062 234 50 52
3.0 904 179 53 75
4.0 767 174 55 81
5.0 638 173 52 84
10.0 272 99 41 99
20.0 66 20 8 68
50.0 21 11 3 39
100.0 13 6 2 29
200.0 6 5 0 23
300.0 3 1 0 20
400.0 1 0 0 18
500.0 0 0 0 15
where Limit is the maximum distance at which the points are considered to belong to the same group.
If the data set is very detailed, you can have intertwined but separate groups. You can easily have a separate group in the hole of a donut-shaped group (or hollow ball in 3D). This is important to remember, so you don't make wrong assumptions on how the groups are separated.
Questions?
You can use a space-filling-curve, I.e a z curve a.k.a morton curve. Basically you translate x-and y value to binary and then concatenate th,e coordinates. The spatial index puts together close coordinates. You can verify it with the upper bounds and the mostsignificant bits.

Antipole Clustering

I made a photo mosaic script (PHP). This script has one picture and changes it to a photo buildup of little pictures. From a distance it looks like the real picture, when you move closer you see it are all little pictures. I take a square of a fixed number of pixels and determine the average color of that square. Then I compare this with my database which contains the average color of a couple thousand of pictures. I determine the color distance with all available images. But to run this script fully it takes a couple of minutes.
The bottleneck is matching the best picture with a part of the main picture. I have been searching online how to reduce this and came a cross “Antipole Clustering.” Of course I tried to find some information on how to use this method myself but I can’t seem to figure out what to do.
There are two steps. 1. Database acquisition and 2. Photomosaic creation.
Let’s start with step one, when this is all clear. Maybe I understand step 2 myself.
Step 1:
partition each image of the database into 9 equal rectangles arranged in a 3x3 grid
compute the RGB mean values for each rectangle
construct a vector x composed by 27 components (three RGB components for each rectangle)
x is the feature vector of the image in the data structure
Well, point 1 and 2 are easy but what should I do at point 3. How do I compose a vector X out of the 27 components (9 * R mean, G mean, B mean.)
And when I succeed to compose the vector, what is the next step I should do with this vector.
Peter
Here is how I think the feature vector is computed:
You have 3 x 3 = 9 rectangles.
Each pixel is essentially 3 numbers, 1 for each of the Red, Green, and Blue color channels.
For each rectangle you compute the mean for the red, green, and blue colors for all the pixels in that rectangle. This gives you 3 numbers for each rectangle.
In total, you have 9 (rectangles) x 3 (mean for R, G, B) = 27 numbers.
Simply concatenate these 27 numbers into a single 27 by 1 (often written as 27 x 1) vector. That is 27 numbers grouped together. This vector of 27 numbers is the feature vector X that represents the color statistic of your photo. In the code, if you are using C++, this will probably be an array of 27 number or perhaps even an instance of the (aptly named) vector class. You can think of this feature vector as some form of "summary" of what the color in the photo is like. Roughly, things look like this: [R1, G1, B1, R2, G2, B2, ..., R9, G9, B9] where R1 is the mean/average of red pixels in the first rectangle and so on.
I believe step 2 involves some form of comparing these feature vectors so that those with similar feature vectors (and hence similar color) will be placed together. Comparison will likely involve the use of the Euclidean distance (see here), or some other metric, to compare how similar the feature vectors (and hence the photos' color) are to each other.
Lastly, as Anony-Mousse suggested, converting your pixels from RGB to HSB/HSV color would be preferable. If you use OpenCV or have access to it, this is simply a one liner code. Otherwise wiki HSV etc. will give your the math formula to perform the conversion.
Hope this helps.
Instead of using RGB, you might want to use HSB space. It gives better results for a wide variety of use cases. Put more weight on Hue to get better color matches for photos, or to brightness when composing high-contrast images (logos etc.)
I have never heard of antipole clustering. But the obvious next step would be to put all the images you have into a large index. Say, an R-Tree. Maybe bulk-load it via STR. Then you can quickly find matches.
Maybe it means vector quantization (vq). In vq the image isn't subdivide in rectangles but in density areas. Then you can take a mean point of this cluster. First off you need to take all colors and pixels separate and transfer it to a vector with XY coordinate. Then you can use a density clustering like voronoi cells and get the mean point. This point can you compare with other pictures in the database. Read here about VQ: http://www.gamasutra.com/view/feature/3090/image_compression_with_vector_.php.
How to plot vector from adjacent pixel:
d(x) = I(x+1,y) - I(x,y)
d(y) = I(x,y+1) - I(x,y)
Here's another link: http://www.leptonica.com/color-quantization.html.
Update: When you have already computed the mean color of your thumbnail you can proceed and sort all the means color in a rgb map and using the formula I give to you to compute the vector x. Now that you have a vector of all your thumbnails you can use the antipole tree to search for a thumbnail. This is possbile because the antipole tree is something like a kd-tree and subdivide the 2d space. Read here about antipole tree: http://matt.eifelle.com/2012/01/17/qtmosaic-0-2-faster-mosaics/. Maybe you can ask the author and download the sourcecode?

simplified _resample_ algorithm in matlab

I am generating a variable size rows of samples from a DSP algorithm.
I mean each of the row contains random number of elements(Well, depending on the input).
I would like to resize into a specific number of samples per row.
Ex: column count in each row: 15 24 41 09 27
Say I would like to make it 30 element in a row.
Each of the row is a digitized curve samples.
I'm interested in making it contain equisized sample elements.
I think you need to resample your row values, the idea is roughly like this:
interpolate each row to a continuous curve
quantize each curve to a fixed number of values (30)
Obviously, for row with >30 values, you will lose some information.

Resources