OpenCV lower color values - c

I was wondering if there was a way to lower the color scheme of an image. Lets say I have an image that has 32bit color range in the RGB. I was wondering if it would be possible to scale it down to perhaps an 8 bit color scheme. This would be similar to a "cartoon" filter in applications like photoshop or if you change your screen color space from 32-bit true color to 256 colors.
Thanks

If you want the most realistic result take a look at colour quantisation. Basically find the blocks of pixels with a similar RGB colour and replace them with a single colour, you are trying to minimize the number of pixels that are changed and the amount each new pixel is different from it's original colour - so it's a space parameterisation problem

Well, you could do convertTo(newimg, CV_8U) to convert it to 8-bit, but that's still 16 million colors. If the image has integer pixel values you can also do val = val / reductionFactor * reductionFactor + reductionFactor / 2 (or some optimization thereof) on each pixel's R, G, and B values for arbitrary reduction factors or val = val & mask + reductionFactor >> 1 for reduction factors that are a power of two.

Have you tried the pyramidal Mean Shift filter example program given in the samples with OpenCV? The mention of "cartoon" filter reminded me of it - the colors are flattened and subtle shades are merged and reduced resulting in a reduction in the number of colors present.
The reduction is based on a threshold and some experimentation should surely get satisfactory results.

Related

What is the difference between Static HDR and dynamic HDR?

HDR is a high dynamic range which is widely used in video devices to have better viewing experience.
What is the difference between static HDR and dynamic HDR?
Dynamic HDR can achieve higher HDR media quality across a variety of
displays.
The following presentation: SMPTE ST 2094 and Dynamic Metadata summarizes the subject of Dynamic Metadata:
Dynamic Metadata for Color Volume Transforms (DMCVT)
- Can preserve the creative intent in HDR media across a variety of displays
- Carried in files, video streams, packaged media
- Standardized in SMPTE ST 2094
It all starts with digital Quantization.
Assume you need to approximate the numbers between 0 and 1,000,000 using only 1000 possible values.
Your first option is using uniform quantification:
Values in range [0, 999] are mapped to 0, range [1000, 1999] are mapped to 1, [2000, 2999] are mapped to 2, and so on...
When you need to restore the original data, you can't restore it accurately, so you need to get the value with minimal average error.
0 is mapped to 500 (to the center of the range [0, 999]).
1 is mapped to 1500 (to the center of the range [1000, 1999]).
When you restore the quntized data, you are loosing lots of information.
The information you loose is called "Quantization error".
The common HDR video applies 10 bits per color component (10 bits for Y component, 10 bits for U and 10 bits for V). Or 10 bits for red, 10 for green and 10 for blue in RGB color space.
10 bits can store 1024 possible values (values in range [0, 1023]).
Assume you have a very good monitor that can display 1,000,001 different brightness levels (0 is darkest and 1000000 is the brightest).
Now you need to quantize the 1,000,001 levels to 1024 values.
Since the response of the human visual system to brightness level is not linear, the uniform quantization illustrated above, is sub-optimal.
The quantization to 10 bits is performed after applying a gamma function.
Example for gamma function: divide each value by 1000000 (new range is [0,1]), compute square root of each value, and multiply the result by 1000000.
Apply the quantization after the gamma function.
The result is: keeping more accuracy on the darker values, on expanse of the brighter values.
The monitor do the opposite operation (de-quantization, and inverse gamma).
Preforming the quantization after applying gamma function results a better quality for the human visual system.
In reality, square root is not the best gamma function.
There are three types of standard HDR static gamma functions:
HLG - Hybrid Log Gamma
PQ - Perceptual Quantizer
HDR10 - Static Metadata
Can we do better?
What if we could select the optimal "gamma functions" for each video frame?
Example for Dynamic Metadata:
Consider the case where all the brightness levels in the image are in range [500000, 501000]:
Now we can map all the levels to 10 bits, without any quantization.
All we need to do is send 500000 as minimum level, and 501000 as minimum level in the image metadata.
Instead of quantization, we can just subtract 500000 from each value.
The monitor that receives the image, reads the metadata, and knows to add 500000 to each value - so there is a perfect data reconstruction (no quantization errors).
Assume the levels of the next image is in range 400000 to 401000, so we need to adjust the metadata (dynamically).
DMCVT - Dynamic Metadata for Color Volume Transform
The true math of DMCVT is much more complicated than the example above (and much more than quantization), but it's based on the same principles - adjusting the metadata dynamically according to the scene and display, can achieve better quality compared to static gamma (or static metadata).
In case you are still reading...
I am really not sure that the main advantage of DMCVT is reducing the quantization errors.
(It was just simpler to give an example of reducing the quantization errors).
Reducing the conversion errors:
Accurate conversion from the digital representation of the input (e.g BT.2100 to the optimal pixel value of the display (like the RGB voltage of the pixel) requires "heavy math".
The conversion process is called Color Volume Transformation.
Displays replaces the heavy computation with mathematical approximations (using look up tables and interpolations [I suppose]).
Another advantage of DMCVT, is moving the "heavy math" from the display to the video post-production process.
The computational resources in the video post-production stage are in order of magnitudes higher than the display resources.
In the post-production stage, the computers can calculate metadata that helps the display performing much more accurate Color Volume Transformation (with less computational resources), and reduce the conversion errors considerably.
Example from the presentation:
Why does "HDR static gamma functions" called static?
Opposed to DMCVT, the static gamma functions are fixed across the entire movie, or fixed (pre-defined) across the entire "system".
For example: Most PC systems (PC and monitors) are using sRGB color space (not HDR).
The sRGB standard uses the following fixed gamma function:
.
Both the PC system and the display knows from advance, that they are working in sRGB standard, and knows that this is the gamma function that is used (without adding any metadata, or adding one byte of metadata that marks the video data as sRGB).

RGB 0-1 nomenclature

Stoopid question time!
RGB colours have three values (red, green and blue, ranged from 0 to 255). If those values are ranged from 0 to 1, what is the name for this colourspace*?
Is it RGB 0-1? RGB digital? Unreal RGB?
*and if alpha channel is included RGBA 0-1.
Unfortunately there is not real nomenclature. And sometime the same word could be interpreted differently according which book you studied (computer graphic, TV brodcasting, digital video format, photo).
First: RGB is not a colour space but a colour model, so it give just an idea on how colour are made, but it give nothing precise.
When using RGB, usually we intend linear RGB or gamma corrected (R'G'B'). Linear indicates the intensity of light, gamma corrected more how we perceive colours. so an half gray (which seems half way between white and black) is around 18% in linear RGB, or 50% in gamma corrected space
Then we have colour space, like sRGB. In a colour space we define chromacities (of R, G, and B) and the chromacity of white. [Usually the chromacities are given as x,y of CIExyz]. Rec.709 (HDTV) has the same chromacities of R, G, and B, but a different gamma, and a different white.
Often a colour space defines various characteristics. sRGB defines values from 0 to 255 (originally), so a byte, always gamma corrected. Previously it was common to have 100 or 1.0 as values for white (a triplet of such values). Note: values above 1.0 or 100 could be valid. On old analogue TV we can have such values (limited on part of screen and for limited frames, but still allowed).
On digital world signals (e.g. in HDMI), we have full-range RGB: 0 to 255 or limited-range RGB 16-235.
I do not know a good nomenclature, but usually it is obvious. In general: linear RGB has (0.0, 0.0, 0.0) for black, and (1.0, 1.0, 1.0) as floating point number (half or single precision). Floating point number are already a sort of exponential representation, like gamma corrected, and linear: adding light is just an addition (and it doesn't give unwanted cast of colours).
On non-linear colour spaces [gamma corrected] we tend to use integers, often 0 to 255 (or 0 to 1023 in 10 bit) (colour depth, sometime it is given total, sometime as bit per channel). So if you have a colour depth, you are working with integers, so values 0 to 2**channel_depth-1.
It is always good to specify the values, also because there is lack of nomenclature and so often confusion. You see many problems about people not realizing about full-scale/limited-scale images on HDMI signals.
My take: you define "8bit sRGB" (or 24bit): 0 to 255, analog for 10bit, etc. "linear RGB" (or anything with floating point constant, or seeing also R'G'B' in the text): 0 to 1.0. If you see also YCC somewhere, start worrying, because you never know if you have full-range or limited-range. The 0 to 100 is found on text books (especially old), but in that case I prefer to add a % sign, so giving the value automatically from 0 to 1.0.
But like file formats, somewhere you should describe fully your colour space: chromacities (e.g. by referring to sRGB DCI-P3, Apple P3, AdobeRGB, etc,), which gamma correction do you use (there are various functions, Apple on old hardware used different one), and I would write black: 0,0,0, white 1.0,1.0,1.0 (e.v. with range, e.g. negative numbers [for colour "out of gamut", or values ultraluminous), and precision (16bit [half precision] 32 bit [single precision], per channel, or classic 8bit, 10bit, 12bit in case of integers). You need only once, but better to be explicit, especially considering that people in different fields have different expectations.
RGB is a color space, in which colors are defined as proportions of it's components, so RGB colors are 3 values from 0 to 1. True color is sometimes referred to as RGB, but it's not technically correct, since it's combination of color space (RGB), and color depth (3*8 bits).
RGB 0-1 is fine, but RGB digital is more fitting for 0-255 range in my opinion and RGB unreal is really ill-named since it uses real numbers instead of integers for color representations.
With 0 and 1 as RGB values you get these combinations:
000
001
010
100
110
101
011
111
Eight different colors: Black, then blue, green, red, then yellow, magenta, cyan, then white.
(In the order that I listed the rgb color bits in my list)
If you double with a bold attribute, you get the 16 official Linux colors. And I guess that colorspace existed before: it is what you get when you start with 3 "base" colors and mix them to get 6.
A bit number magic also: 2^3=2*3+1+1
8 minus black minus white is 6 colors, in two groups.

Dividing a Bitmap for Parallel Processing

How would I approach dividing a bitmap into segments and use for parallel processing? I already have the Height and Width of the Bitmap but where from here. I've read to use MPI_Cart_shift() and MPI_Sendrecv(). However, I'm unsure how to approach using them.
width = BMP_GetWidth (bmp);
height = BMP_GetHeight (bmp);
new_bmp = BMP_Create(width, height, 24); // BMP_Create(UINT width, UINT height, USHORT depth)
How I'd approach dividing a bitmap into segments for use with parallel processing depends on what type of processing is being done.
Your tags (but not your question) mention Gaussian blur, so that's probably a good place to start.
For Gaussian blur, each output pixel depends on lots of input pixels and nothing else. If each processor has a (read only) copy of all input pixels then you can split the work however you like, but "banding" would work best. Specifically, if there are N processors the first processor would find the first group of "total_pixels/N" output pixels (which is likely to be a band of pixels at the top of the image), the second processor would do the second group of "total_pixels/N" output pixels (likely to be a band of pixels just below the first band), etc. Once all the processors are done you'd just append the output pixels from each processor in the right order to get the whole output bitmap.
Note that (due to rounding) some processors may need to do a different number of pixels - e.g. if the bitmap has 10000 pixels and you have 64 processors, then "10000/64 = 156.25" but a processor can't do a quarter of a pixel, so you end up with 48 processors doing 156 pixels while 16 processors do 157 pixels ("48*156 + 16*157 = 10000").
Also, if processors may be different speeds and/or different latencies, you might want to split the work into more pieces (e.g. if there are 64 processors split the work into 128 pieces, where slower processors might only do 1 piece while faster processors might do 4 pieces).
If the processors don't already have a copy of all input pixels (and if there's no shared memory) then you can send each processor a fraction of all pixels. For example, if you have a Gaussian matrix that's 7 rows high (3 rows above the output position, one row at the output position and 3 rows below the output position), and if each processor outputs a band of 100 rows of pixels, then you'd send each processor a "3+100+3 = 106" band of input pixels to work on (except for the processors that do the first band and last band, which would only get "3+100" or "100+3" rows of input pixels).
For something like (e.g.) Floyd–Steinberg dithering things get a lot more complicated because one output pixel depends on all previous output pixels (in addition to the input pixels). In this case you can split the "3 colour" bitmap into three separate monochrome bitmaps (one for each processor, for up to 3 processors) and each processor can dither its monochrome bitmap, and then you can merge the three resulting monochrome bitmaps back together to get a single "3 colour" output bitmap; but it's virtually impossible to use more than 3 processors (without changing to a different dithering algorithm that's more suited to parallelisation).
For drawing one circle or one ellipse, you might get each processor to draw an arc and combine the arcs; for drawing 1234 shapes you might split the image into a grid and get each processor to do a tile within the grid.

Plotting a frequency response from biquad filter

This is a hard one and although I can think of a few kludge methods of doing it, I have a feeling there is a clean mathematical method, although I am having difficulty inventing it myself.
I have a number of parameters which control (software) biquad filters for audio. Essentially there are just 3 parameters, frequency, gain and Q (or bandwidth). In audio terms, the frequency represents the center frequency of the filter. The gain represents whether this frequency is boosted or cut (a gain of 0 results in no change to the audio passing through the filter). Q represents the width of the filter - IE a very wide filter might affect frequencies far away from the center frequency, whereas a narrow (low Q) filter will only affect frequencies close to the center frequency.The filters take the form of a bell curve, or at least thats an approximation, whether its mathematically accurate I am not sure.
I want to display the characteristics of these filters graphically - display a graph of gain against frequency. There are several of these filters applied to the audio channel, and I want to be able to add the different result graphs, to produce an overall graph (IE a graph summing all the components of the combined filters). But I also want to be able to access the individual filters graphs.
I can handle adding the component graphs into a single 'total' graph, but how to produce the original x-y graph from the filter parameters escapes me. I will draw bitmaps so all I need is to be able to create arrays of the form frequency[x]=y. Im doing this in C so I don't have the mathematical tools in matlab etc. So I might have a filter with a center frequency of say 1000 (Hz), a gain of say 20 (db or linear I understand how to convert that), and a Q of say 3. The Q factor is relative and does not have to be exactly mathematically correct if that causes any complication.
It seems like a quite simple mathematical function but maths is not my strong point and I don't know enough - I have been messing round with sine functions etc but its not working and I suspect is probably wasting processing power by over complicating the maths (although I might be wrong there).
TIA, Pete
I have my doubts about the relationships between biquad filters, Q values, and bell curves. But I'll put those aside and just tell you how to draw a bell curve, since that's what you asked.
From this wikipedia article, the equation for a bell curve is
where for your application
a corresponds to the gain
b determines the center frequency
2c^2 is related to Q (larger values will make the curve wider)
The C code below computes a sample bell curve. For this example, the numbers were chosen based on drawing into a window that is 250 pixels wide by 200 pixels high, with a coordinate system where the origin {0,0} is at the bottom left corner.
int width = 250;
int height = 200;
int bellCurve[width]; // the output array that holds the f(x) values
double gain = 180; // the 'a' value, determines how high the peak is from the baseline
double offset = 10; // the 'd' value, determines the y coordinate of the baseline
double qFactor = 1000; // the '2c^2' value, determines how fat the curve is
double center = 100; // the 'b' value, determines the x coordinate of the peak
double dx;
for ( int x = 0; x < width; x++ )
{
dx = x - center;
bellCurve[x] = gain * exp( -( dx * dx ) / qFactor ) + offset;
}
Plotting the curve results in an image like this where the peak is at x=100, y=10+180=190
You could input a unit impulse (an array of all zeros, except one element=1.0) into your digital filters, treating them as black boxes. Then FFT the impulse response output array to get the frequency response of the filter. Plotting the magnitude of the complex frequency samples will give you a pretty picture. Python+numpy+matplotlib would probably be an easier way to go about it. You will need to know the sampling period to get meaningful plots.
What you really want is the bode plot of the filter. This is non trivial to calculate yourself, a cursory search for a library to do it for you in C yielded nothing. If accuracy is not important and you can approximate the shape once and stretch it based on the parameters of the particular filter. For example, you might have a normalized array of relative values and construct a new curve (array) based on the parameters of the filter and the base curve you generated earlier.
The base curve could be generated from MATLAB if you can or Wolfram Alpha or something like that.
Here is one in javascript.
http://www.earlevel.com/main/2013/10/13/biquad-calculator-v2/
The filter you describe is the 'peak' filter. Use the log scale to display frequencies.
—Tom

Antipole Clustering

I made a photo mosaic script (PHP). This script has one picture and changes it to a photo buildup of little pictures. From a distance it looks like the real picture, when you move closer you see it are all little pictures. I take a square of a fixed number of pixels and determine the average color of that square. Then I compare this with my database which contains the average color of a couple thousand of pictures. I determine the color distance with all available images. But to run this script fully it takes a couple of minutes.
The bottleneck is matching the best picture with a part of the main picture. I have been searching online how to reduce this and came a cross “Antipole Clustering.” Of course I tried to find some information on how to use this method myself but I can’t seem to figure out what to do.
There are two steps. 1. Database acquisition and 2. Photomosaic creation.
Let’s start with step one, when this is all clear. Maybe I understand step 2 myself.
Step 1:
partition each image of the database into 9 equal rectangles arranged in a 3x3 grid
compute the RGB mean values for each rectangle
construct a vector x composed by 27 components (three RGB components for each rectangle)
x is the feature vector of the image in the data structure
Well, point 1 and 2 are easy but what should I do at point 3. How do I compose a vector X out of the 27 components (9 * R mean, G mean, B mean.)
And when I succeed to compose the vector, what is the next step I should do with this vector.
Peter
Here is how I think the feature vector is computed:
You have 3 x 3 = 9 rectangles.
Each pixel is essentially 3 numbers, 1 for each of the Red, Green, and Blue color channels.
For each rectangle you compute the mean for the red, green, and blue colors for all the pixels in that rectangle. This gives you 3 numbers for each rectangle.
In total, you have 9 (rectangles) x 3 (mean for R, G, B) = 27 numbers.
Simply concatenate these 27 numbers into a single 27 by 1 (often written as 27 x 1) vector. That is 27 numbers grouped together. This vector of 27 numbers is the feature vector X that represents the color statistic of your photo. In the code, if you are using C++, this will probably be an array of 27 number or perhaps even an instance of the (aptly named) vector class. You can think of this feature vector as some form of "summary" of what the color in the photo is like. Roughly, things look like this: [R1, G1, B1, R2, G2, B2, ..., R9, G9, B9] where R1 is the mean/average of red pixels in the first rectangle and so on.
I believe step 2 involves some form of comparing these feature vectors so that those with similar feature vectors (and hence similar color) will be placed together. Comparison will likely involve the use of the Euclidean distance (see here), or some other metric, to compare how similar the feature vectors (and hence the photos' color) are to each other.
Lastly, as Anony-Mousse suggested, converting your pixels from RGB to HSB/HSV color would be preferable. If you use OpenCV or have access to it, this is simply a one liner code. Otherwise wiki HSV etc. will give your the math formula to perform the conversion.
Hope this helps.
Instead of using RGB, you might want to use HSB space. It gives better results for a wide variety of use cases. Put more weight on Hue to get better color matches for photos, or to brightness when composing high-contrast images (logos etc.)
I have never heard of antipole clustering. But the obvious next step would be to put all the images you have into a large index. Say, an R-Tree. Maybe bulk-load it via STR. Then you can quickly find matches.
Maybe it means vector quantization (vq). In vq the image isn't subdivide in rectangles but in density areas. Then you can take a mean point of this cluster. First off you need to take all colors and pixels separate and transfer it to a vector with XY coordinate. Then you can use a density clustering like voronoi cells and get the mean point. This point can you compare with other pictures in the database. Read here about VQ: http://www.gamasutra.com/view/feature/3090/image_compression_with_vector_.php.
How to plot vector from adjacent pixel:
d(x) = I(x+1,y) - I(x,y)
d(y) = I(x,y+1) - I(x,y)
Here's another link: http://www.leptonica.com/color-quantization.html.
Update: When you have already computed the mean color of your thumbnail you can proceed and sort all the means color in a rgb map and using the formula I give to you to compute the vector x. Now that you have a vector of all your thumbnails you can use the antipole tree to search for a thumbnail. This is possbile because the antipole tree is something like a kd-tree and subdivide the 2d space. Read here about antipole tree: http://matt.eifelle.com/2012/01/17/qtmosaic-0-2-faster-mosaics/. Maybe you can ask the author and download the sourcecode?

Resources