How to show the arrangement of the blocks in following fashion can have utmost two different colours - permutation

You are given 10 boxes, each large enough to contain exactly 10 wooden blocks, and a total of 100 blocks in 10 different colours.There might not be the same number in each colour, so you might not be able to pack the blocks into the boxes in such a way that each box contains only one colour of block. Show that it is possible to pack the blocks into the boxes so that each box contains at most two different colours.

Arrange the blocks by color. The smallest group will have no more than 10 blocks.
Fill one box with this color, fill the remainder with blocks from the biggest pile.
You now have on box less to fill & 1 Color gone.
Repeat.
See http://mei.org.uk/images/Nov09_miotm_solution.JPG

Related

Antipole Clustering

I made a photo mosaic script (PHP). This script has one picture and changes it to a photo buildup of little pictures. From a distance it looks like the real picture, when you move closer you see it are all little pictures. I take a square of a fixed number of pixels and determine the average color of that square. Then I compare this with my database which contains the average color of a couple thousand of pictures. I determine the color distance with all available images. But to run this script fully it takes a couple of minutes.
The bottleneck is matching the best picture with a part of the main picture. I have been searching online how to reduce this and came a cross “Antipole Clustering.” Of course I tried to find some information on how to use this method myself but I can’t seem to figure out what to do.
There are two steps. 1. Database acquisition and 2. Photomosaic creation.
Let’s start with step one, when this is all clear. Maybe I understand step 2 myself.
Step 1:
partition each image of the database into 9 equal rectangles arranged in a 3x3 grid
compute the RGB mean values for each rectangle
construct a vector x composed by 27 components (three RGB components for each rectangle)
x is the feature vector of the image in the data structure
Well, point 1 and 2 are easy but what should I do at point 3. How do I compose a vector X out of the 27 components (9 * R mean, G mean, B mean.)
And when I succeed to compose the vector, what is the next step I should do with this vector.
Peter
Here is how I think the feature vector is computed:
You have 3 x 3 = 9 rectangles.
Each pixel is essentially 3 numbers, 1 for each of the Red, Green, and Blue color channels.
For each rectangle you compute the mean for the red, green, and blue colors for all the pixels in that rectangle. This gives you 3 numbers for each rectangle.
In total, you have 9 (rectangles) x 3 (mean for R, G, B) = 27 numbers.
Simply concatenate these 27 numbers into a single 27 by 1 (often written as 27 x 1) vector. That is 27 numbers grouped together. This vector of 27 numbers is the feature vector X that represents the color statistic of your photo. In the code, if you are using C++, this will probably be an array of 27 number or perhaps even an instance of the (aptly named) vector class. You can think of this feature vector as some form of "summary" of what the color in the photo is like. Roughly, things look like this: [R1, G1, B1, R2, G2, B2, ..., R9, G9, B9] where R1 is the mean/average of red pixels in the first rectangle and so on.
I believe step 2 involves some form of comparing these feature vectors so that those with similar feature vectors (and hence similar color) will be placed together. Comparison will likely involve the use of the Euclidean distance (see here), or some other metric, to compare how similar the feature vectors (and hence the photos' color) are to each other.
Lastly, as Anony-Mousse suggested, converting your pixels from RGB to HSB/HSV color would be preferable. If you use OpenCV or have access to it, this is simply a one liner code. Otherwise wiki HSV etc. will give your the math formula to perform the conversion.
Hope this helps.
Instead of using RGB, you might want to use HSB space. It gives better results for a wide variety of use cases. Put more weight on Hue to get better color matches for photos, or to brightness when composing high-contrast images (logos etc.)
I have never heard of antipole clustering. But the obvious next step would be to put all the images you have into a large index. Say, an R-Tree. Maybe bulk-load it via STR. Then you can quickly find matches.
Maybe it means vector quantization (vq). In vq the image isn't subdivide in rectangles but in density areas. Then you can take a mean point of this cluster. First off you need to take all colors and pixels separate and transfer it to a vector with XY coordinate. Then you can use a density clustering like voronoi cells and get the mean point. This point can you compare with other pictures in the database. Read here about VQ: http://www.gamasutra.com/view/feature/3090/image_compression_with_vector_.php.
How to plot vector from adjacent pixel:
d(x) = I(x+1,y) - I(x,y)
d(y) = I(x,y+1) - I(x,y)
Here's another link: http://www.leptonica.com/color-quantization.html.
Update: When you have already computed the mean color of your thumbnail you can proceed and sort all the means color in a rgb map and using the formula I give to you to compute the vector x. Now that you have a vector of all your thumbnails you can use the antipole tree to search for a thumbnail. This is possbile because the antipole tree is something like a kd-tree and subdivide the 2d space. Read here about antipole tree: http://matt.eifelle.com/2012/01/17/qtmosaic-0-2-faster-mosaics/. Maybe you can ask the author and download the sourcecode?

Datagridview excessive memory usage

I have an unbound datagridview with 175 columns and 50,000 rows, populated primarily with doubles. According to my calculations, this equates to a memory usage of 175*50000*8 bytes = 70 MB. However, Task Manager says the grid is using about 1.2 GB of memory - an 17x overhead! Can anyone explain why it's consuming so much memory?
From the msdn article on scaling the datagridview ( http://msdn.microsoft.com/en-us/library/ha5xt0d9.aspx ) I don't think I'm doing anything flagrantly wrong. I'm not setting styles or contextmenustrips for individual cells. No modifications other than populating the cell values and setting format strings on column level.
I understand that virtual mode or shared rows might decrease memory consumption, but given my above calculations, I don't think it should be necessary. 17x overhead doesn't sound right to me.
Keep in mind that each cell of your DataGridView holds a DataGridViewCell instance, containing about 33 properties. It's more overhead than just a double value.
Your calculation is based on the System.Double containing 8 bytes. There may be 8 bytes in the value of each cell in the underlying System.Data.DataTable, but that does not mean that the same amount of data in the DataGridView is only 8 bytes.
Each and every cell has multiple properties - height, width, borderstyle, bordercolor, etc. Even if these all are at the default values, those default values consume memory.

Is the Leptonica implementation of 'Modified Median Cut' not using the median at all?

I'm playing around a bit with image processing and decided to read up on how color quantization worked and after a bit of reading I found the Modified Median Cut Quantization algorithm.
I've been reading the code of the C implementation in Leptonica library and came across something I thought was a bit odd.
Now I want to stress that I am far from an expert in this area, not am I a math-head, so I am predicting that this all comes down to me not understanding all of it and not that the implementation of the algorithm is wrong at all.
The algorithm states that the vbox should be split along the lagest axis and that it should be split using the following logic
The largest axis is divided by locating the bin with the median pixel
(by population), selecting the longer side, and dividing in the center
of that side. We could have simply put the bin with the median pixel
in the shorter side, but in the early stages of subdivision, this
tends to put low density clusters (that are not considered in the
subdivision) in the same vbox as part of a high density cluster that
will outvote it in median vbox color, even with future median-based
subdivisions. The algorithm used here is particularly important in
early subdivisions, and 3is useful for giving visible but low
population color clusters their own vbox. This has little effect on
the subdivision of high density clusters, which ultimately will have
roughly equal population in their vboxes.
For the sake of the argument, let's assume that we have a vbox that we are in the process of splitting and that the red axis is the largest. In the Leptonica algorithm, on line 01297, the code appears to do the following
Iterate over all the possible green and blue variations of the red color
For each iteration it adds to the total number of pixels (population) it's found along the red axis
For each red color it sum up the population of the current red and the previous ones, thus storing an accumulated value, for each red
note: when I say 'red' I mean each point along the axis that is covered by the iteration, the actual color may not be red but contains a certain amount of red
So for the sake of illustration, assume we have 9 "bins" along the red axis and that they have the following populations
4 8 20 16 1 9 12 8 8
After the iteration of all red bins, the partialsum array will contain the following count for the bins mentioned above
4 12 32 48 49 58 70 78 86
And total would have a value of 86
Once that's done it's time to perform the actual median cut and for the red axis this is performed on line 01346
It iterates over bins and check they accumulated sum. And here's the part that throws me of from the description of the algorithm. It looks for the first bin that has a value that is greater than total/2
Wouldn't total/2 mean that it is looking for a bin that has a value that is greater than the average value and not the median ? The median for the above bins would be 49
The use of 43 or 49 could potentially have a huge impact on how the boxes are split, even though the algorithm then proceeds by moving to the center of the larger side of where the matched value was..
Another thing that puzzles me a bit is that the paper specified that the bin with the median value should be located, but does not mention how to proceed if there are an even number of bins.. the median would be the result of (a+b)/2 and it's not guaranteed that any of the bins contains that population count. So this is what makes me thing that there are some approximations going on that are negligible because of how the split actually takes part at the center of the larger side of the selected bin.
Sorry if it got a bit long winded, but I wanted to be as thoroughas I could because it's been driving me nuts for a couple of days now ;)
In the 9-bin example, 49 is the number of pixels in the first 5 bins. 49 is the median number in the set of 9 partial sums, but we want the median pixel in the set of 86 pixels, which is 43 (or 44), and it resides in the 4th bin.
Inspection of the modified median cut algorithm in colorquant2.c of leptonica shows that the actual cut location for the 3d box does not necessarily occur adjacent to the bin containing the median pixel. The reasons for this are explained in the function medianCutApply(). This is one of the "modifications" to Paul Heckbert's original method. The other significant modification is to make the decision of which 3d box to cut next based on a combination of both population and the product (population * volume), thus permitting splitting of large but sparsely populated regions of color space.
I do not know the algo, but I would assume your array contains the population of each red; let's explain this with an example:
Assume you have four gradations of red: A,B,C and D
And you have the following sequence of red values:
AABDCADBBBAAA
To find the median, you would have to sort them according to red value and take the middle:
median
v
AAAAAABBBBCDD
Now let's use their approach:
A:6 => 6
B:4 => 10
C:1 => 11
D:2 => 13
13/2 = 6.5 => B
I think the mismatch happened because you are counting the population; the average color would be:
(6*A+4*B+1*C+2*D)/13

How do I make my spatial index use a Level greater than HIGH?

My spatial geography index in SQL Server has the following level definitions.
HIGH LOW LOW LOW
The problem is that all of my points are in a city and thus all of my points are in a single cell at layer 1. As a result the primary filter is looking at all points which means my index efficiency is 0%. I realized that the HIGH grid means that there are 256 cells. How do I instead use 512 cells or 1024 cells? 256 just isn't enough for me.
Take a look at this page for the different levels.
Does anyone know how to get a higher value than HIGH?
You need to use a bounding box (see: http://technet.microsoft.com/en-us/library/bb934196(v=sql.105).aspx for information about bounding boxes).
Without a bounding box: The issue is that the SQL Server uses a sub-gridding methodology. The 256 cells together must span the entire space! This means that you HLLL is restricting the number of cells you use. Think about it this way: The LLL portion creates 4096 cells for each of the initial cells. The 256 cells each must be the same size. That means that your high level cells are splitting up too large of an area!
Instead, if you put in a bounding box, the total area covered will be reduced, and the 4096 grids will be smaller, so splitting that into 256 can be sufficient.

Search image pattern

I need to do a program that does this: given an image (5*5 pixels), I have to search how many images like that exist in another image, composed by many other images. That is, i need to search a given pattern in an image.
The language to use is C. I have to use parallel computing to search in the 4 angles (0º, 90º, 180º and 270º).
What is the best way to do that?
Seems straight forward.
Create 4 versions of the image rotated by 0°, 90°, 180°, and 270°.
Start four threads each with one version of the image.
For all positions from (0,0) to (width - 5, height - 5)
Comapare the 25 pixels of the reference image with the 25 pixels at the current position
If they are equal enough using some metric, report the finding.
Use normalized correlation to determine a match of templates.
#Daniel, Daniel's solution is good for leveraging your multiple CPUs. He doesn't mention a quality metric that would be useful and I would like to suggest one quality metric that is very common in image processing.
I suggest using normalized correlation[1] as a comparison metric because it outputs a number from -1 to +1. Where 0 is no correlation 1 would be output if the two templates were identical and -1 would be if the two templates were exactly opposite.
Once you compute the normalized correlation you can test to see if you have found the template by doing either a threshold test or a peak-to-average test[2].
[1 - footnote] How do you implement normalized correlation? It is pretty simple and only has two for loops. Once you have an implementation that is good enough you can verify your implementation by checking to see if the identical image gets you a 1.
[2 - footnote] You do the ratio of the max(array) / average(array_without_peak). Then threshold to make sure you have a good peak to average ratio.
There's no need to create the additional three versions of the image, just address them differently or use something like the class I created here. Better still, just duplicate the 5x5 matrix and rotate those instead. You can then linearly scan the image for all rotations (which is a good thing).
This problem will not scale well for parallel processing since the bottleneck is certainly accessing the image data. Having multiple threads accessing the same data will slow it down, especially if the threads get 'out of sync', i.e. one thread gets further through the image than the other threads so that the other threads end up reloading the data the first thread has discarded.
So, the solution I think will be most efficient is to create four threads that scan 5 lines of the image, one thread per rotation. A fifth thread loads the image data one line at a time and passes the line to each of the four scanning threads, waiting for all four threads to complete, i.e. load one line of image, append to five line buffer, start the four scanning threads, wait for threads to end and repeat until all image lines are read.
5 * 5 = 25
25 bits fits in an integer.
each image can be encoded as an array of 4 integers.
Iterate your larger image, (hopefully it is not too big),
pulling out all 5 * 5 sub images, convert to an array of 4 integers and compare.

Resources