Reversible approximation of Cb and Cr components - rgb

I am trying to develop a lossless image compression algorithm. I know that YCbCr <-> RGB is practically lossy due to rounding errors, similarly Original Reversible Color Transform (ORCT) offer reversibility at the cost of storing extra bit for U and V component.
Since U and V are no way equivalent to Cb and Cr, the compression ratio differs greatly (I believe that this is due to their underlying cocktail of blending RGB in Cb and Cr).
Furthermore, I know that there exist techniques which require extra bits to accommodate reversibility (i.e. YCoCg-R etc). However I have tested YCoCg24, YUV (from ORCT) and GCbCr 1 but none of them come close to lossy YCbCr.
My question is that is there some reversible transform which approximate Cb and Cr since these two components play vital role in overall compression?
Before anyone blames me for not doing my homework, I should clarify that question is related to Lossless RGB to Y'CbCr transformation and JPEG: YCrCb <-> RGB conversion precision.
EDIT: To clarify that this is another question
My question is: does a transformation exist, that converts three eight-bit integers (representing red, green and blue components) into three other eight-bit integers (representing a colour space similar to Y'CbCr, where two components change only slightly with respect to position, or at least less than in an RGB colour space), and that can be inversed without loss of information?

Related

Hide information on DCT coefficient

I am learning algorithm to hide information on DCT coefficient and my document write like this:
For JPEG images, the original data are DCT tables after quantization. Each DCT table contains 64 coefficients, each of which is an integer whose value is in the range [-2048; 2047]. The high-frequency domain often has many consecutive 0 values, if hiding information here, it can increase the size of the image because the long sequence of zeros is interrupted, reducing the image compression ability. The feature of the DCT table is that the closer to the end of the table, the smaller the value tends to be.
enter image description here
Anyone know why coefficients's value is in the range [-2048; 2047]? Please help me with this

What is the difference between Static HDR and dynamic HDR?

HDR is a high dynamic range which is widely used in video devices to have better viewing experience.
What is the difference between static HDR and dynamic HDR?
Dynamic HDR can achieve higher HDR media quality across a variety of
displays.
The following presentation: SMPTE ST 2094 and Dynamic Metadata summarizes the subject of Dynamic Metadata:
Dynamic Metadata for Color Volume Transforms (DMCVT)
- Can preserve the creative intent in HDR media across a variety of displays
- Carried in files, video streams, packaged media
- Standardized in SMPTE ST 2094
It all starts with digital Quantization.
Assume you need to approximate the numbers between 0 and 1,000,000 using only 1000 possible values.
Your first option is using uniform quantification:
Values in range [0, 999] are mapped to 0, range [1000, 1999] are mapped to 1, [2000, 2999] are mapped to 2, and so on...
When you need to restore the original data, you can't restore it accurately, so you need to get the value with minimal average error.
0 is mapped to 500 (to the center of the range [0, 999]).
1 is mapped to 1500 (to the center of the range [1000, 1999]).
When you restore the quntized data, you are loosing lots of information.
The information you loose is called "Quantization error".
The common HDR video applies 10 bits per color component (10 bits for Y component, 10 bits for U and 10 bits for V). Or 10 bits for red, 10 for green and 10 for blue in RGB color space.
10 bits can store 1024 possible values (values in range [0, 1023]).
Assume you have a very good monitor that can display 1,000,001 different brightness levels (0 is darkest and 1000000 is the brightest).
Now you need to quantize the 1,000,001 levels to 1024 values.
Since the response of the human visual system to brightness level is not linear, the uniform quantization illustrated above, is sub-optimal.
The quantization to 10 bits is performed after applying a gamma function.
Example for gamma function: divide each value by 1000000 (new range is [0,1]), compute square root of each value, and multiply the result by 1000000.
Apply the quantization after the gamma function.
The result is: keeping more accuracy on the darker values, on expanse of the brighter values.
The monitor do the opposite operation (de-quantization, and inverse gamma).
Preforming the quantization after applying gamma function results a better quality for the human visual system.
In reality, square root is not the best gamma function.
There are three types of standard HDR static gamma functions:
HLG - Hybrid Log Gamma
PQ - Perceptual Quantizer
HDR10 - Static Metadata
Can we do better?
What if we could select the optimal "gamma functions" for each video frame?
Example for Dynamic Metadata:
Consider the case where all the brightness levels in the image are in range [500000, 501000]:
Now we can map all the levels to 10 bits, without any quantization.
All we need to do is send 500000 as minimum level, and 501000 as minimum level in the image metadata.
Instead of quantization, we can just subtract 500000 from each value.
The monitor that receives the image, reads the metadata, and knows to add 500000 to each value - so there is a perfect data reconstruction (no quantization errors).
Assume the levels of the next image is in range 400000 to 401000, so we need to adjust the metadata (dynamically).
DMCVT - Dynamic Metadata for Color Volume Transform
The true math of DMCVT is much more complicated than the example above (and much more than quantization), but it's based on the same principles - adjusting the metadata dynamically according to the scene and display, can achieve better quality compared to static gamma (or static metadata).
In case you are still reading...
I am really not sure that the main advantage of DMCVT is reducing the quantization errors.
(It was just simpler to give an example of reducing the quantization errors).
Reducing the conversion errors:
Accurate conversion from the digital representation of the input (e.g BT.2100 to the optimal pixel value of the display (like the RGB voltage of the pixel) requires "heavy math".
The conversion process is called Color Volume Transformation.
Displays replaces the heavy computation with mathematical approximations (using look up tables and interpolations [I suppose]).
Another advantage of DMCVT, is moving the "heavy math" from the display to the video post-production process.
The computational resources in the video post-production stage are in order of magnitudes higher than the display resources.
In the post-production stage, the computers can calculate metadata that helps the display performing much more accurate Color Volume Transformation (with less computational resources), and reduce the conversion errors considerably.
Example from the presentation:
Why does "HDR static gamma functions" called static?
Opposed to DMCVT, the static gamma functions are fixed across the entire movie, or fixed (pre-defined) across the entire "system".
For example: Most PC systems (PC and monitors) are using sRGB color space (not HDR).
The sRGB standard uses the following fixed gamma function:
.
Both the PC system and the display knows from advance, that they are working in sRGB standard, and knows that this is the gamma function that is used (without adding any metadata, or adding one byte of metadata that marks the video data as sRGB).

RGB 0-1 nomenclature

Stoopid question time!
RGB colours have three values (red, green and blue, ranged from 0 to 255). If those values are ranged from 0 to 1, what is the name for this colourspace*?
Is it RGB 0-1? RGB digital? Unreal RGB?
*and if alpha channel is included RGBA 0-1.
Unfortunately there is not real nomenclature. And sometime the same word could be interpreted differently according which book you studied (computer graphic, TV brodcasting, digital video format, photo).
First: RGB is not a colour space but a colour model, so it give just an idea on how colour are made, but it give nothing precise.
When using RGB, usually we intend linear RGB or gamma corrected (R'G'B'). Linear indicates the intensity of light, gamma corrected more how we perceive colours. so an half gray (which seems half way between white and black) is around 18% in linear RGB, or 50% in gamma corrected space
Then we have colour space, like sRGB. In a colour space we define chromacities (of R, G, and B) and the chromacity of white. [Usually the chromacities are given as x,y of CIExyz]. Rec.709 (HDTV) has the same chromacities of R, G, and B, but a different gamma, and a different white.
Often a colour space defines various characteristics. sRGB defines values from 0 to 255 (originally), so a byte, always gamma corrected. Previously it was common to have 100 or 1.0 as values for white (a triplet of such values). Note: values above 1.0 or 100 could be valid. On old analogue TV we can have such values (limited on part of screen and for limited frames, but still allowed).
On digital world signals (e.g. in HDMI), we have full-range RGB: 0 to 255 or limited-range RGB 16-235.
I do not know a good nomenclature, but usually it is obvious. In general: linear RGB has (0.0, 0.0, 0.0) for black, and (1.0, 1.0, 1.0) as floating point number (half or single precision). Floating point number are already a sort of exponential representation, like gamma corrected, and linear: adding light is just an addition (and it doesn't give unwanted cast of colours).
On non-linear colour spaces [gamma corrected] we tend to use integers, often 0 to 255 (or 0 to 1023 in 10 bit) (colour depth, sometime it is given total, sometime as bit per channel). So if you have a colour depth, you are working with integers, so values 0 to 2**channel_depth-1.
It is always good to specify the values, also because there is lack of nomenclature and so often confusion. You see many problems about people not realizing about full-scale/limited-scale images on HDMI signals.
My take: you define "8bit sRGB" (or 24bit): 0 to 255, analog for 10bit, etc. "linear RGB" (or anything with floating point constant, or seeing also R'G'B' in the text): 0 to 1.0. If you see also YCC somewhere, start worrying, because you never know if you have full-range or limited-range. The 0 to 100 is found on text books (especially old), but in that case I prefer to add a % sign, so giving the value automatically from 0 to 1.0.
But like file formats, somewhere you should describe fully your colour space: chromacities (e.g. by referring to sRGB DCI-P3, Apple P3, AdobeRGB, etc,), which gamma correction do you use (there are various functions, Apple on old hardware used different one), and I would write black: 0,0,0, white 1.0,1.0,1.0 (e.v. with range, e.g. negative numbers [for colour "out of gamut", or values ultraluminous), and precision (16bit [half precision] 32 bit [single precision], per channel, or classic 8bit, 10bit, 12bit in case of integers). You need only once, but better to be explicit, especially considering that people in different fields have different expectations.
RGB is a color space, in which colors are defined as proportions of it's components, so RGB colors are 3 values from 0 to 1. True color is sometimes referred to as RGB, but it's not technically correct, since it's combination of color space (RGB), and color depth (3*8 bits).
RGB 0-1 is fine, but RGB digital is more fitting for 0-255 range in my opinion and RGB unreal is really ill-named since it uses real numbers instead of integers for color representations.
With 0 and 1 as RGB values you get these combinations:
000
001
010
100
110
101
011
111
Eight different colors: Black, then blue, green, red, then yellow, magenta, cyan, then white.
(In the order that I listed the rgb color bits in my list)
If you double with a bold attribute, you get the 16 official Linux colors. And I guess that colorspace existed before: it is what you get when you start with 3 "base" colors and mix them to get 6.
A bit number magic also: 2^3=2*3+1+1
8 minus black minus white is 6 colors, in two groups.

From int array of 4096 elements to a float between 0 and 1

I am currently working on a project and I need to work with images.
However, my images are 64*64 sized, so when I load one, I have a 4096 int array.
I would like to convert this array to a float that is between 0 and 1 (and of course I will need the function that need to build an image from a float).
Do you have any idea or suggestion of how to do it ?
Because I need to make an algorithm but I don't really know how to proceed.
Best regards and thank you.
The only way this could make some sense is if the image is binary (1 bit per pixel)
but even then the lossless naive conversion will take 64x64 bits which is far from what single 32bit float can do. So there is some piece of info missing. To make this possible you need introduce some kind of compression but even that could be not enough unless lossy compression used. Anyway you should add some sample images so we see what are you dealing with.
I am afraid the only usable compression for this would be using DCT (like in JPEG) on the full image. So do a DCT of the image and store only first few coefficients. for example if 4 bit coefficients used then you can store 32/4=8 coefficients which could be enough but hard to say if 4 bits will be enough to reconstruct the image back.
In similar cases visual hashes are used
but you have no way to turn them back to the original image. They are pretty much the same as hashes but their binary representation is visually similar to the image.
float is really not a good way for this
due to precision/rounding problems. You are loosing more bits then if just integer type would be used. Yes you can use integer type stored as float in integer format but the resulting float value can be jibberish with possibility of throwing exception if used as regular float.
If the target float should be in range <0.0,1.0> then exceptions will not occur but you can not use exponent nor sign for storage limiting the usable bits to only 23 from original 32.
When put all together without additional info I would:
Do a DCT on 64x64 image matrix
use only 1x4bit + 6*3bit top left corner matrix cells
encode into mantisa bits by concatenating
mantissa = coeff0+coeff1<<4+coef2<<7+coef3<<10+...
set sign and exponent to set range to <0.0,1.0>
If I am not mistaking sign=0 and exponent=-1 + 32bit_float_bias
put the integer parts of float to floating value
union x { float f; DWORD dw; }
DWORD sign=...,mantisa=...,exponent=...;
x.dw=sign<<31;
x.dw|=exponent<<23;
x.dw|=mantisa;
return x.f;
To obtain back the image (at least something close to it) reverse the steps. Yo can improve quality with introducing of some filters to get closer to your original images. But without actually seeing any of them is hard to tell which one to use or if even possible...

OpenCV lower color values

I was wondering if there was a way to lower the color scheme of an image. Lets say I have an image that has 32bit color range in the RGB. I was wondering if it would be possible to scale it down to perhaps an 8 bit color scheme. This would be similar to a "cartoon" filter in applications like photoshop or if you change your screen color space from 32-bit true color to 256 colors.
Thanks
If you want the most realistic result take a look at colour quantisation. Basically find the blocks of pixels with a similar RGB colour and replace them with a single colour, you are trying to minimize the number of pixels that are changed and the amount each new pixel is different from it's original colour - so it's a space parameterisation problem
Well, you could do convertTo(newimg, CV_8U) to convert it to 8-bit, but that's still 16 million colors. If the image has integer pixel values you can also do val = val / reductionFactor * reductionFactor + reductionFactor / 2 (or some optimization thereof) on each pixel's R, G, and B values for arbitrary reduction factors or val = val & mask + reductionFactor >> 1 for reduction factors that are a power of two.
Have you tried the pyramidal Mean Shift filter example program given in the samples with OpenCV? The mention of "cartoon" filter reminded me of it - the colors are flattened and subtle shades are merged and reduced resulting in a reduction in the number of colors present.
The reduction is based on a threshold and some experimentation should surely get satisfactory results.

Resources