What is the difference between Static HDR and dynamic HDR? - video-processing

HDR is a high dynamic range which is widely used in video devices to have better viewing experience.
What is the difference between static HDR and dynamic HDR?

Dynamic HDR can achieve higher HDR media quality across a variety of
displays.
The following presentation: SMPTE ST 2094 and Dynamic Metadata summarizes the subject of Dynamic Metadata:
Dynamic Metadata for Color Volume Transforms (DMCVT)
- Can preserve the creative intent in HDR media across a variety of displays
- Carried in files, video streams, packaged media
- Standardized in SMPTE ST 2094
It all starts with digital Quantization.
Assume you need to approximate the numbers between 0 and 1,000,000 using only 1000 possible values.
Your first option is using uniform quantification:
Values in range [0, 999] are mapped to 0, range [1000, 1999] are mapped to 1, [2000, 2999] are mapped to 2, and so on...
When you need to restore the original data, you can't restore it accurately, so you need to get the value with minimal average error.
0 is mapped to 500 (to the center of the range [0, 999]).
1 is mapped to 1500 (to the center of the range [1000, 1999]).
When you restore the quntized data, you are loosing lots of information.
The information you loose is called "Quantization error".
The common HDR video applies 10 bits per color component (10 bits for Y component, 10 bits for U and 10 bits for V). Or 10 bits for red, 10 for green and 10 for blue in RGB color space.
10 bits can store 1024 possible values (values in range [0, 1023]).
Assume you have a very good monitor that can display 1,000,001 different brightness levels (0 is darkest and 1000000 is the brightest).
Now you need to quantize the 1,000,001 levels to 1024 values.
Since the response of the human visual system to brightness level is not linear, the uniform quantization illustrated above, is sub-optimal.
The quantization to 10 bits is performed after applying a gamma function.
Example for gamma function: divide each value by 1000000 (new range is [0,1]), compute square root of each value, and multiply the result by 1000000.
Apply the quantization after the gamma function.
The result is: keeping more accuracy on the darker values, on expanse of the brighter values.
The monitor do the opposite operation (de-quantization, and inverse gamma).
Preforming the quantization after applying gamma function results a better quality for the human visual system.
In reality, square root is not the best gamma function.
There are three types of standard HDR static gamma functions:
HLG - Hybrid Log Gamma
PQ - Perceptual Quantizer
HDR10 - Static Metadata
Can we do better?
What if we could select the optimal "gamma functions" for each video frame?
Example for Dynamic Metadata:
Consider the case where all the brightness levels in the image are in range [500000, 501000]:
Now we can map all the levels to 10 bits, without any quantization.
All we need to do is send 500000 as minimum level, and 501000 as minimum level in the image metadata.
Instead of quantization, we can just subtract 500000 from each value.
The monitor that receives the image, reads the metadata, and knows to add 500000 to each value - so there is a perfect data reconstruction (no quantization errors).
Assume the levels of the next image is in range 400000 to 401000, so we need to adjust the metadata (dynamically).
DMCVT - Dynamic Metadata for Color Volume Transform
The true math of DMCVT is much more complicated than the example above (and much more than quantization), but it's based on the same principles - adjusting the metadata dynamically according to the scene and display, can achieve better quality compared to static gamma (or static metadata).
In case you are still reading...
I am really not sure that the main advantage of DMCVT is reducing the quantization errors.
(It was just simpler to give an example of reducing the quantization errors).
Reducing the conversion errors:
Accurate conversion from the digital representation of the input (e.g BT.2100 to the optimal pixel value of the display (like the RGB voltage of the pixel) requires "heavy math".
The conversion process is called Color Volume Transformation.
Displays replaces the heavy computation with mathematical approximations (using look up tables and interpolations [I suppose]).
Another advantage of DMCVT, is moving the "heavy math" from the display to the video post-production process.
The computational resources in the video post-production stage are in order of magnitudes higher than the display resources.
In the post-production stage, the computers can calculate metadata that helps the display performing much more accurate Color Volume Transformation (with less computational resources), and reduce the conversion errors considerably.
Example from the presentation:
Why does "HDR static gamma functions" called static?
Opposed to DMCVT, the static gamma functions are fixed across the entire movie, or fixed (pre-defined) across the entire "system".
For example: Most PC systems (PC and monitors) are using sRGB color space (not HDR).
The sRGB standard uses the following fixed gamma function:
.
Both the PC system and the display knows from advance, that they are working in sRGB standard, and knows that this is the gamma function that is used (without adding any metadata, or adding one byte of metadata that marks the video data as sRGB).

Related

Hide information on DCT coefficient

I am learning algorithm to hide information on DCT coefficient and my document write like this:
For JPEG images, the original data are DCT tables after quantization. Each DCT table contains 64 coefficients, each of which is an integer whose value is in the range [-2048; 2047]. The high-frequency domain often has many consecutive 0 values, if hiding information here, it can increase the size of the image because the long sequence of zeros is interrupted, reducing the image compression ability. The feature of the DCT table is that the closer to the end of the table, the smaller the value tends to be.
enter image description here
Anyone know why coefficients's value is in the range [-2048; 2047]? Please help me with this

How to compress/archive a temperature curve effectively?

Summary: The industrial thermometer is used to sample temperature at the technology device. For few months, the samples are simply stored in the SQL database. Are there any well-known ways to compress the temperature curve so that much longer history could be stored effectively (say for the audit purpose)?
More details: Actually, there are much more thermometers, and possibly other sensors related to the technology. And there are well known time intervals where the curve belongs to a batch processed on the machine. The temperature curves should be added to the batch documentation.
My idea was that the temperature is a smooth function that could be interpolated somehow -- say the way a sound is compressed using MP3 format. The compression need not to be looseless. However, it must be possible to reconstruct the temperature curve (not necessarily the identical sample values, and the identical sampling interval) -- say, to be able to plot the curve or to tell what was the temperature in certain time.
The raw sample values from the SQL table would be processed, the compressed version would be stored elsewhere (possibly also in SQL database, as a blob), and later the raw samples can be deleted to save the database space.
Is there any well-known and widely used approach to the problem?
A simple approach would be code the temperature into a byte or two bytes, depending on the range and precision you need, and then to write the first temperature to your output, followed by the difference between temperatures for all the rest. For two-byte temperatures you can restrict the range some and write one or two bytes depending on the difference with a variable-length integer. E.g. if the high bit of the first byte is set, then the next byte contains 8 more bits of difference, allowing for 15 bits of difference. Most of the time it will be one byte, based on your description.
Then take that stream and feed it to a standard lossless compressor, e.g. zlib.
Any lossiness should be introduced at the sampling step, encoding only the number of bits you really need to encode the required range and precision. The rest of the process should then be lossless to avoid systematic drift in the decompressed values.
Subtracting successive values is the simplest predictor. In that case the prediction of the next value is the value before it. It may also be the most effective, depending on the noisiness of your data. If your data is really smooth, then you could try a higher-order predictor to see if you get better performance. E.g. a predictor for the next point using the last two points is 2a - b, where a is the previous point and b is the point before that, or using the last three points 3a - 3b + c, where c is the point before b. (These assume equal time steps between each.)

Google Earth Engine: How can I perform the comparison of two maps (different resolutions)?

I have two maps- Hansen 30m forest cover and another product of 500 m resolution. I want to look on correlation between these two maps at 500 m resolution.
Does anyone know, how to aggregate one map (30 m) to the resolution of another (500 m) in Google Earth Engine, so they will perfectly overlay?
Thanks in advance!
In Google Earth Engine you don't have to worry about it: the scaling happens automagically and they will overlay perfectly. Of course lack of control can be an issue for certain applications, but to my knowledge there is not much you can do about it.
Now, to the point: scaling Hansen's map (Global Forest Change - GFC) to lower resolution map:
The GEE loads the data internally at native resolution and then generates down-sampled versions at multiples of two, until the entire image fits in a single tile. Then, GEE fetches the down-sampled pixels from the nearest higher level of the pyramid, and re-samples those to the requested scale. For bands that cannot be interpolated, like GFC's "lossyear" (pixel value indicating the year loss occurred), the GEE selects the first value (of the four "child" values) at each pyramid level.
In other words, if we consider example of a treecover2000 band with the following pixel values
10 20
30 40
the down-sampling will result in value '25'. The categorical values like 'lossyear' will scale differently:
1 2
3 4
Here the result will be simply "1"

Plotting a frequency response from biquad filter

This is a hard one and although I can think of a few kludge methods of doing it, I have a feeling there is a clean mathematical method, although I am having difficulty inventing it myself.
I have a number of parameters which control (software) biquad filters for audio. Essentially there are just 3 parameters, frequency, gain and Q (or bandwidth). In audio terms, the frequency represents the center frequency of the filter. The gain represents whether this frequency is boosted or cut (a gain of 0 results in no change to the audio passing through the filter). Q represents the width of the filter - IE a very wide filter might affect frequencies far away from the center frequency, whereas a narrow (low Q) filter will only affect frequencies close to the center frequency.The filters take the form of a bell curve, or at least thats an approximation, whether its mathematically accurate I am not sure.
I want to display the characteristics of these filters graphically - display a graph of gain against frequency. There are several of these filters applied to the audio channel, and I want to be able to add the different result graphs, to produce an overall graph (IE a graph summing all the components of the combined filters). But I also want to be able to access the individual filters graphs.
I can handle adding the component graphs into a single 'total' graph, but how to produce the original x-y graph from the filter parameters escapes me. I will draw bitmaps so all I need is to be able to create arrays of the form frequency[x]=y. Im doing this in C so I don't have the mathematical tools in matlab etc. So I might have a filter with a center frequency of say 1000 (Hz), a gain of say 20 (db or linear I understand how to convert that), and a Q of say 3. The Q factor is relative and does not have to be exactly mathematically correct if that causes any complication.
It seems like a quite simple mathematical function but maths is not my strong point and I don't know enough - I have been messing round with sine functions etc but its not working and I suspect is probably wasting processing power by over complicating the maths (although I might be wrong there).
TIA, Pete
I have my doubts about the relationships between biquad filters, Q values, and bell curves. But I'll put those aside and just tell you how to draw a bell curve, since that's what you asked.
From this wikipedia article, the equation for a bell curve is
where for your application
a corresponds to the gain
b determines the center frequency
2c^2 is related to Q (larger values will make the curve wider)
The C code below computes a sample bell curve. For this example, the numbers were chosen based on drawing into a window that is 250 pixels wide by 200 pixels high, with a coordinate system where the origin {0,0} is at the bottom left corner.
int width = 250;
int height = 200;
int bellCurve[width]; // the output array that holds the f(x) values
double gain = 180; // the 'a' value, determines how high the peak is from the baseline
double offset = 10; // the 'd' value, determines the y coordinate of the baseline
double qFactor = 1000; // the '2c^2' value, determines how fat the curve is
double center = 100; // the 'b' value, determines the x coordinate of the peak
double dx;
for ( int x = 0; x < width; x++ )
{
dx = x - center;
bellCurve[x] = gain * exp( -( dx * dx ) / qFactor ) + offset;
}
Plotting the curve results in an image like this where the peak is at x=100, y=10+180=190
You could input a unit impulse (an array of all zeros, except one element=1.0) into your digital filters, treating them as black boxes. Then FFT the impulse response output array to get the frequency response of the filter. Plotting the magnitude of the complex frequency samples will give you a pretty picture. Python+numpy+matplotlib would probably be an easier way to go about it. You will need to know the sampling period to get meaningful plots.
What you really want is the bode plot of the filter. This is non trivial to calculate yourself, a cursory search for a library to do it for you in C yielded nothing. If accuracy is not important and you can approximate the shape once and stretch it based on the parameters of the particular filter. For example, you might have a normalized array of relative values and construct a new curve (array) based on the parameters of the filter and the base curve you generated earlier.
The base curve could be generated from MATLAB if you can or Wolfram Alpha or something like that.
Here is one in javascript.
http://www.earlevel.com/main/2013/10/13/biquad-calculator-v2/
The filter you describe is the 'peak' filter. Use the log scale to display frequencies.
—Tom

Binary SVM classifier failed with two classes: one is big, other one is small

I 'm trying to classify some pixels using a binary support vector machine. my training database is made of 28 data files and has two classes, number pixels of class1 is 16571 and number of pixels of class2 is 313.
The test data(each file) has around 600 pixels that only 6-10 pixels are member of class 2 and the remaining pixels are in class1.
My problem is that, after training, when I try to classify data, the SVM classifies all of the pixels in class1.
I think that it maybe because it has seen few samples from class2. but the number of available data files are limited(around 35 data file).
How can I train the svm and get reasonable result?
Thank you for your help.
SVM might indeed be sensitive to large differences in train set sizes. I'd suggest trying the following 2 methods:
Limit class 1's train set by sampling, so that its size is around the size of class 2's, and check for any improvement (sizes between 1X to 10X of class 2's size might fit).
Play with the 'cost' parameter of your SVM (e.g, in SVMLight it's done with the 'J' parameter), it can help balance the classes.
you can of course use both methods together, i.e. somewhat limit the size of class 1's train data, then further balance the classes using the cost parameter.

Resources