RGB 0-1 nomenclature - rgb

Stoopid question time!
RGB colours have three values (red, green and blue, ranged from 0 to 255). If those values are ranged from 0 to 1, what is the name for this colourspace*?
Is it RGB 0-1? RGB digital? Unreal RGB?
*and if alpha channel is included RGBA 0-1.

Unfortunately there is not real nomenclature. And sometime the same word could be interpreted differently according which book you studied (computer graphic, TV brodcasting, digital video format, photo).
First: RGB is not a colour space but a colour model, so it give just an idea on how colour are made, but it give nothing precise.
When using RGB, usually we intend linear RGB or gamma corrected (R'G'B'). Linear indicates the intensity of light, gamma corrected more how we perceive colours. so an half gray (which seems half way between white and black) is around 18% in linear RGB, or 50% in gamma corrected space
Then we have colour space, like sRGB. In a colour space we define chromacities (of R, G, and B) and the chromacity of white. [Usually the chromacities are given as x,y of CIExyz]. Rec.709 (HDTV) has the same chromacities of R, G, and B, but a different gamma, and a different white.
Often a colour space defines various characteristics. sRGB defines values from 0 to 255 (originally), so a byte, always gamma corrected. Previously it was common to have 100 or 1.0 as values for white (a triplet of such values). Note: values above 1.0 or 100 could be valid. On old analogue TV we can have such values (limited on part of screen and for limited frames, but still allowed).
On digital world signals (e.g. in HDMI), we have full-range RGB: 0 to 255 or limited-range RGB 16-235.
I do not know a good nomenclature, but usually it is obvious. In general: linear RGB has (0.0, 0.0, 0.0) for black, and (1.0, 1.0, 1.0) as floating point number (half or single precision). Floating point number are already a sort of exponential representation, like gamma corrected, and linear: adding light is just an addition (and it doesn't give unwanted cast of colours).
On non-linear colour spaces [gamma corrected] we tend to use integers, often 0 to 255 (or 0 to 1023 in 10 bit) (colour depth, sometime it is given total, sometime as bit per channel). So if you have a colour depth, you are working with integers, so values 0 to 2**channel_depth-1.
It is always good to specify the values, also because there is lack of nomenclature and so often confusion. You see many problems about people not realizing about full-scale/limited-scale images on HDMI signals.
My take: you define "8bit sRGB" (or 24bit): 0 to 255, analog for 10bit, etc. "linear RGB" (or anything with floating point constant, or seeing also R'G'B' in the text): 0 to 1.0. If you see also YCC somewhere, start worrying, because you never know if you have full-range or limited-range. The 0 to 100 is found on text books (especially old), but in that case I prefer to add a % sign, so giving the value automatically from 0 to 1.0.
But like file formats, somewhere you should describe fully your colour space: chromacities (e.g. by referring to sRGB DCI-P3, Apple P3, AdobeRGB, etc,), which gamma correction do you use (there are various functions, Apple on old hardware used different one), and I would write black: 0,0,0, white 1.0,1.0,1.0 (e.v. with range, e.g. negative numbers [for colour "out of gamut", or values ultraluminous), and precision (16bit [half precision] 32 bit [single precision], per channel, or classic 8bit, 10bit, 12bit in case of integers). You need only once, but better to be explicit, especially considering that people in different fields have different expectations.

RGB is a color space, in which colors are defined as proportions of it's components, so RGB colors are 3 values from 0 to 1. True color is sometimes referred to as RGB, but it's not technically correct, since it's combination of color space (RGB), and color depth (3*8 bits).
RGB 0-1 is fine, but RGB digital is more fitting for 0-255 range in my opinion and RGB unreal is really ill-named since it uses real numbers instead of integers for color representations.

With 0 and 1 as RGB values you get these combinations:
000
001
010
100
110
101
011
111
Eight different colors: Black, then blue, green, red, then yellow, magenta, cyan, then white.
(In the order that I listed the rgb color bits in my list)
If you double with a bold attribute, you get the 16 official Linux colors. And I guess that colorspace existed before: it is what you get when you start with 3 "base" colors and mix them to get 6.
A bit number magic also: 2^3=2*3+1+1
8 minus black minus white is 6 colors, in two groups.

Related

What is the difference between Static HDR and dynamic HDR?

HDR is a high dynamic range which is widely used in video devices to have better viewing experience.
What is the difference between static HDR and dynamic HDR?
Dynamic HDR can achieve higher HDR media quality across a variety of
displays.
The following presentation: SMPTE ST 2094 and Dynamic Metadata summarizes the subject of Dynamic Metadata:
Dynamic Metadata for Color Volume Transforms (DMCVT)
- Can preserve the creative intent in HDR media across a variety of displays
- Carried in files, video streams, packaged media
- Standardized in SMPTE ST 2094
It all starts with digital Quantization.
Assume you need to approximate the numbers between 0 and 1,000,000 using only 1000 possible values.
Your first option is using uniform quantification:
Values in range [0, 999] are mapped to 0, range [1000, 1999] are mapped to 1, [2000, 2999] are mapped to 2, and so on...
When you need to restore the original data, you can't restore it accurately, so you need to get the value with minimal average error.
0 is mapped to 500 (to the center of the range [0, 999]).
1 is mapped to 1500 (to the center of the range [1000, 1999]).
When you restore the quntized data, you are loosing lots of information.
The information you loose is called "Quantization error".
The common HDR video applies 10 bits per color component (10 bits for Y component, 10 bits for U and 10 bits for V). Or 10 bits for red, 10 for green and 10 for blue in RGB color space.
10 bits can store 1024 possible values (values in range [0, 1023]).
Assume you have a very good monitor that can display 1,000,001 different brightness levels (0 is darkest and 1000000 is the brightest).
Now you need to quantize the 1,000,001 levels to 1024 values.
Since the response of the human visual system to brightness level is not linear, the uniform quantization illustrated above, is sub-optimal.
The quantization to 10 bits is performed after applying a gamma function.
Example for gamma function: divide each value by 1000000 (new range is [0,1]), compute square root of each value, and multiply the result by 1000000.
Apply the quantization after the gamma function.
The result is: keeping more accuracy on the darker values, on expanse of the brighter values.
The monitor do the opposite operation (de-quantization, and inverse gamma).
Preforming the quantization after applying gamma function results a better quality for the human visual system.
In reality, square root is not the best gamma function.
There are three types of standard HDR static gamma functions:
HLG - Hybrid Log Gamma
PQ - Perceptual Quantizer
HDR10 - Static Metadata
Can we do better?
What if we could select the optimal "gamma functions" for each video frame?
Example for Dynamic Metadata:
Consider the case where all the brightness levels in the image are in range [500000, 501000]:
Now we can map all the levels to 10 bits, without any quantization.
All we need to do is send 500000 as minimum level, and 501000 as minimum level in the image metadata.
Instead of quantization, we can just subtract 500000 from each value.
The monitor that receives the image, reads the metadata, and knows to add 500000 to each value - so there is a perfect data reconstruction (no quantization errors).
Assume the levels of the next image is in range 400000 to 401000, so we need to adjust the metadata (dynamically).
DMCVT - Dynamic Metadata for Color Volume Transform
The true math of DMCVT is much more complicated than the example above (and much more than quantization), but it's based on the same principles - adjusting the metadata dynamically according to the scene and display, can achieve better quality compared to static gamma (or static metadata).
In case you are still reading...
I am really not sure that the main advantage of DMCVT is reducing the quantization errors.
(It was just simpler to give an example of reducing the quantization errors).
Reducing the conversion errors:
Accurate conversion from the digital representation of the input (e.g BT.2100 to the optimal pixel value of the display (like the RGB voltage of the pixel) requires "heavy math".
The conversion process is called Color Volume Transformation.
Displays replaces the heavy computation with mathematical approximations (using look up tables and interpolations [I suppose]).
Another advantage of DMCVT, is moving the "heavy math" from the display to the video post-production process.
The computational resources in the video post-production stage are in order of magnitudes higher than the display resources.
In the post-production stage, the computers can calculate metadata that helps the display performing much more accurate Color Volume Transformation (with less computational resources), and reduce the conversion errors considerably.
Example from the presentation:
Why does "HDR static gamma functions" called static?
Opposed to DMCVT, the static gamma functions are fixed across the entire movie, or fixed (pre-defined) across the entire "system".
For example: Most PC systems (PC and monitors) are using sRGB color space (not HDR).
The sRGB standard uses the following fixed gamma function:
.
Both the PC system and the display knows from advance, that they are working in sRGB standard, and knows that this is the gamma function that is used (without adding any metadata, or adding one byte of metadata that marks the video data as sRGB).

Can a pixel be divided into smaller pixels? Or is it possible to have 1.54 pixels instead of 1 or 2?

I am doing an exercise where I have to resize an image "f" times. "f" is a float, so I have to consider 1.45, 3.54, and so on. I don't want you to solve the problem, but I have some doubts about it.
A pixel is 24bits in a BMP file, right? Because it is RGB, so it has 1 byte for red, 1 byte for green and 1 byte for blue. So how am I supposed to divide a pixel? If I have 2.67 for example, then 0.67 how would that work? Dividing a pixel means dividing an 3 bytes, but there is a limit how I can divide them, also, RGB would dissapear, because if I divided by half, then I would only have 12bits, not enough to store RGB.
Also when I am copying pixel by pixel, is it possible to copy instead of pixel by pixel, to copy 0.01 pixel each time? meaning that if it takes me 1 step to copy 1 pixel (1 pixel at a time), then if I copy 0.01 pixel each time, means it would take me 100 times the time it took me to copy a whole pixel. It sounds completely weird for me, because copying 0.01 pixel at a time means copying 0.01 byte at a time, and that may screw the image up if I am resizing (I think).
I have tried with integers, but for example, a for loop will not work in the floating point, because of all the possibilities.
I don't think you're being asked to split an individual pixel. It sounds like you're being asked to add or remove pixels when an image is resized. For example, suppose you have an image that is 12 x 12 pixels and you are given a factor of 1.3 to expand by. This gives you a new image size of 15.6 x 15.6, which rounds to 16 x 16.
Then you need to perform a mapping of pixels in the original image to pixels in the resized image. A simple way to do this is to take the x and y coordinates of the larger image and multiply them (or divide them) by the scaling factor to get the corresponding coordinates in the smaller image, then copy the whole pixel from the old to the new image. Given the above example, pixel (13,14) in the larger image corresponds to x = 13/1.3 = 10 and y = 14 / 1.3 = ~10.76 (rounds to 11), so copy pixel (10,11) in the old image to (13,14) in the new image.
#dbush was very clear. But you can also make a deeper scale algorithm with these two observations.
Observation 1
In #dbush example, he tries to expand a 12 x 12 to a 16 x 16 because 15.6 x 15.6 is impossible to make (since pixels are a discrete unit). But by doing this the scale factor is no longer 1.3, it is 16/12 = 1.333333333333333 now. So you can use that number to make the adjustments he says.
Observation 2
In #dbush example, the pixel (13, 14) (counting pixels from 0 to 15 I suppose) is mapped to the pixel (10, 10.76). Since this pixel doesn't exist, he rounds its coordinates to use (10, 11) instead. But (10, 10.76) represents the coordinate of the upper left corner of a little rectangle inside the original image. A normal pixel is a square of size 1 x 1. But this little rectangle has the size of a pixel scaled by the same factor of 1.3. The size of this little rectangle is 1/1.3 = 0.78 (aprox.). Which means that this little rectangle has its lower right corner at (10.78, 11.54).
This little rectangle which has to be mapped to the new image has 11 - 10.76 = 0.24 units of its height inside pixel (10, 10), and 11.54 - 11 = 0.54 units of its height inside pixel (10, 11). So the RGB values for the new pixel must be a weighted sum of the RGB values of pixels (10, 10) and (10, 11) using 0.24 and 0.54 as weights respectively. This will grant your code the power to scale images by factors smaller than 1.
Notes
I used the word "rectangle" because I was considering the fact that an image could have a horizontal scale factor different than the vertical scale factor. In this particular case, the scale was 1.3 for both horizontal and vertical.
The weighted sum uses only height as weights because the little rectangle only intersects 2 pixels in the vertical axis. It happened that in the horizontal axis the little rectangle was inside a single pixel. But there could be a scenario where the rectangle will be intersecting pixel both horizontally and vertically, or even intersecting more than 2 pixels in the same axis. So the weighted sum should be prepared to consider more than 2 pixels in the same axis and to use areas instead of widths or heights if both axes are considered for a single rectangle.
Pixel represents the point on the screen and it is atomic. I'd you need to resize the screen you need to create an algorithm which will increase or decrease the number of rows and columns so hot having pixel parts.

Reversible approximation of Cb and Cr components

I am trying to develop a lossless image compression algorithm. I know that YCbCr <-> RGB is practically lossy due to rounding errors, similarly Original Reversible Color Transform (ORCT) offer reversibility at the cost of storing extra bit for U and V component.
Since U and V are no way equivalent to Cb and Cr, the compression ratio differs greatly (I believe that this is due to their underlying cocktail of blending RGB in Cb and Cr).
Furthermore, I know that there exist techniques which require extra bits to accommodate reversibility (i.e. YCoCg-R etc). However I have tested YCoCg24, YUV (from ORCT) and GCbCr 1 but none of them come close to lossy YCbCr.
My question is that is there some reversible transform which approximate Cb and Cr since these two components play vital role in overall compression?
Before anyone blames me for not doing my homework, I should clarify that question is related to Lossless RGB to Y'CbCr transformation and JPEG: YCrCb <-> RGB conversion precision.
EDIT: To clarify that this is another question
My question is: does a transformation exist, that converts three eight-bit integers (representing red, green and blue components) into three other eight-bit integers (representing a colour space similar to Y'CbCr, where two components change only slightly with respect to position, or at least less than in an RGB colour space), and that can be inversed without loss of information?

Antipole Clustering

I made a photo mosaic script (PHP). This script has one picture and changes it to a photo buildup of little pictures. From a distance it looks like the real picture, when you move closer you see it are all little pictures. I take a square of a fixed number of pixels and determine the average color of that square. Then I compare this with my database which contains the average color of a couple thousand of pictures. I determine the color distance with all available images. But to run this script fully it takes a couple of minutes.
The bottleneck is matching the best picture with a part of the main picture. I have been searching online how to reduce this and came a cross “Antipole Clustering.” Of course I tried to find some information on how to use this method myself but I can’t seem to figure out what to do.
There are two steps. 1. Database acquisition and 2. Photomosaic creation.
Let’s start with step one, when this is all clear. Maybe I understand step 2 myself.
Step 1:
partition each image of the database into 9 equal rectangles arranged in a 3x3 grid
compute the RGB mean values for each rectangle
construct a vector x composed by 27 components (three RGB components for each rectangle)
x is the feature vector of the image in the data structure
Well, point 1 and 2 are easy but what should I do at point 3. How do I compose a vector X out of the 27 components (9 * R mean, G mean, B mean.)
And when I succeed to compose the vector, what is the next step I should do with this vector.
Peter
Here is how I think the feature vector is computed:
You have 3 x 3 = 9 rectangles.
Each pixel is essentially 3 numbers, 1 for each of the Red, Green, and Blue color channels.
For each rectangle you compute the mean for the red, green, and blue colors for all the pixels in that rectangle. This gives you 3 numbers for each rectangle.
In total, you have 9 (rectangles) x 3 (mean for R, G, B) = 27 numbers.
Simply concatenate these 27 numbers into a single 27 by 1 (often written as 27 x 1) vector. That is 27 numbers grouped together. This vector of 27 numbers is the feature vector X that represents the color statistic of your photo. In the code, if you are using C++, this will probably be an array of 27 number or perhaps even an instance of the (aptly named) vector class. You can think of this feature vector as some form of "summary" of what the color in the photo is like. Roughly, things look like this: [R1, G1, B1, R2, G2, B2, ..., R9, G9, B9] where R1 is the mean/average of red pixels in the first rectangle and so on.
I believe step 2 involves some form of comparing these feature vectors so that those with similar feature vectors (and hence similar color) will be placed together. Comparison will likely involve the use of the Euclidean distance (see here), or some other metric, to compare how similar the feature vectors (and hence the photos' color) are to each other.
Lastly, as Anony-Mousse suggested, converting your pixels from RGB to HSB/HSV color would be preferable. If you use OpenCV or have access to it, this is simply a one liner code. Otherwise wiki HSV etc. will give your the math formula to perform the conversion.
Hope this helps.
Instead of using RGB, you might want to use HSB space. It gives better results for a wide variety of use cases. Put more weight on Hue to get better color matches for photos, or to brightness when composing high-contrast images (logos etc.)
I have never heard of antipole clustering. But the obvious next step would be to put all the images you have into a large index. Say, an R-Tree. Maybe bulk-load it via STR. Then you can quickly find matches.
Maybe it means vector quantization (vq). In vq the image isn't subdivide in rectangles but in density areas. Then you can take a mean point of this cluster. First off you need to take all colors and pixels separate and transfer it to a vector with XY coordinate. Then you can use a density clustering like voronoi cells and get the mean point. This point can you compare with other pictures in the database. Read here about VQ: http://www.gamasutra.com/view/feature/3090/image_compression_with_vector_.php.
How to plot vector from adjacent pixel:
d(x) = I(x+1,y) - I(x,y)
d(y) = I(x,y+1) - I(x,y)
Here's another link: http://www.leptonica.com/color-quantization.html.
Update: When you have already computed the mean color of your thumbnail you can proceed and sort all the means color in a rgb map and using the formula I give to you to compute the vector x. Now that you have a vector of all your thumbnails you can use the antipole tree to search for a thumbnail. This is possbile because the antipole tree is something like a kd-tree and subdivide the 2d space. Read here about antipole tree: http://matt.eifelle.com/2012/01/17/qtmosaic-0-2-faster-mosaics/. Maybe you can ask the author and download the sourcecode?

OpenCV lower color values

I was wondering if there was a way to lower the color scheme of an image. Lets say I have an image that has 32bit color range in the RGB. I was wondering if it would be possible to scale it down to perhaps an 8 bit color scheme. This would be similar to a "cartoon" filter in applications like photoshop or if you change your screen color space from 32-bit true color to 256 colors.
Thanks
If you want the most realistic result take a look at colour quantisation. Basically find the blocks of pixels with a similar RGB colour and replace them with a single colour, you are trying to minimize the number of pixels that are changed and the amount each new pixel is different from it's original colour - so it's a space parameterisation problem
Well, you could do convertTo(newimg, CV_8U) to convert it to 8-bit, but that's still 16 million colors. If the image has integer pixel values you can also do val = val / reductionFactor * reductionFactor + reductionFactor / 2 (or some optimization thereof) on each pixel's R, G, and B values for arbitrary reduction factors or val = val & mask + reductionFactor >> 1 for reduction factors that are a power of two.
Have you tried the pyramidal Mean Shift filter example program given in the samples with OpenCV? The mention of "cartoon" filter reminded me of it - the colors are flattened and subtle shades are merged and reduced resulting in a reduction in the number of colors present.
The reduction is based on a threshold and some experimentation should surely get satisfactory results.

Resources