How to locate position and get intermediate results in the source code on VTM? - video-compression

I want to achieve deep learning-based video compression. But it's difficult to get the intermediate results. So I want to ask if there are some convenient methods.

Could you specify your question 'intermediate results'?
If you mean reconstructed frame of VTM, you can get a buffer from picture class.
EncGOP encodes every frame in GOP and executes in-loop filters, therefore you can get intermediate frame from EncGOP while encoding.
At decoder side, you can get same buffer at DecLib.
I hope this answer helps you.

Related

Extract motion vectors from versatile video coding

How do I go about extracting motion vector into a .txt or .xml file from VVC VTM reference software. I managed to extract the motion vectors to a text file but I don't have a proper index indicating which motion vector belongs where. If anyone could guide me on getting proper index along with motion vectors, that would be very helpful.
Are you doing it at the encoder side?
If so, I suggest that you move to the decoder side and do this:
Encode the sequence from which you want to extract MVs.
Modify the decoder so it prints the MV of each coding unit, if any (e.g. not intra). To do so, you may go to CABAC Reader.cpp file, somewhere inside coding_unit() function, and find the place where MV is parsed. There, in addition to the parsed MV, you have access to coordinates of the ongoing CU.
Decode your encoded bitstream with the modified VTM decoder and print what you wanted to be printed.
As Mosen's answer, I recommend you to extract any information(include MVs) from the decoder.
If you just want to extract MVs to file, you may utilize traverseCU().
VTM's picture class has CodingStructure class which traverses all CUs in picture(even CTU or CU can be treated as CodingStructure class, so you can use traverseCU() at block level too).
So I suggest you to
Access picture class(its name might be different, e.g., m_pcPic at DecLib.cpp) at the decoder side(insert you code before/after execute loop filters).
Iterate each CUs in picutre by using traverseCU().
Extract MVs from every CU you accessed, and save those information(MVs, indices, etc.)
Although there might be better ways to answer your question, i hope this answer helps you.

changing HM reference software to display some information about the bitstream

I am very new to the HM HEVC (and the JEM) reference software, and I am currently trying to understand the source code. I want to add some lines to display for each component: name of Algo (i.e. inter/intra Algos) + length of the bitstream+ position in output bin file.
To know which component cost more bits to code and how codec is working. I want to do same thing for the JEM also after that.
my problem first is that I am unable of understanding a lot of function there, the comment is not sufficient, so is there any references to understand the code??!! (I already read the Manuel ,doesn’t help).
2nd I don’t know where & how exactly to add these lines; is it in TEncGOP, TEncSlice or TEncCU. Ps: I don’t think in TEncGOP.compressGOP so maybe in the 2 other classes.
(I put the answer to comment that #Mourad put four hours ago here, becuase it will be long)
I assume that you could manage to find where the actual encoding after the RDO loop is implemented. As you correctly mentioned, xEncodeCU is the function you need to refer to make sure you are not still in the RDO.
Now you need to find the exact function in xEncodeCU that is responsible for your target codec tool.
For instance, if you want to count the number of bits for coefficient coding, you should be looking into the m_pcEntropyCoder->encodeCoeff() (It's a JEM function and may have different name in the HM). Once you find this line in the xEncodeCU, you may do this and get the number of bits written inside encodeCoeff() function:
UInt b_before = m_pcEntropyCoder->getNumberOfWrittenBits();
m_pcEntropyCoder->encodeCoeff( ... );
UInt b_after = m_pcEntropyCoder->getNumberOfWrittenBits();
UInt writtenBitsCoeff = b_after - b_before;
One important point: as you cas see, the function getNumberOfWrittenBits() gives you integer rates, which is obtained by rounding sum of fractional rates corresponding to all syntax elements coded inside the function encodeCoeff. This error might or might not be acceptable, depending on your problem. For example, if instead of coefficient coding rate, you wanted to know the rate of CBF, then this error would not be acceptable at all. Because, CBF rate is mostly less than one bit. If this is your case, then you would need to calculate the fractional bits one-by-one. It would be totally different and relatively more complicated than this.
Point 1: There is one rule of tumb that logging coding decisions (e.g. pred mode, MV, IPM, block size) is much easier at the decoder side than encoder. This is because of the fact that you have super complicated RDO process at the encoder side that can easily make you get lost in the loops. But at the decoder side, everything appears only once. However, if you insist on doing it at the encoder side, you may find some tips here: Get some information from HEVC reference software
Point 2: Unlike coding decisions, logging rate (i.e. number of written bits for different syntax elements) is more complicated at the decoder side than encoder. This is particularly true for fractional bits associated to anything that is encoded in non-EP mode (i.e. with CABAC contexts). So you may do this part at the ecoder side. But I am afraid it is not easy.
Point 3: I think the best way to understand the code is to read it line-by-line. It's very time-consuming but if you theoritically know the standard(s), you will probably be able to distiguish important parts and ignore the rest.
PS: I think there are too many questions, mostly too general, in your post. It makes it a bit difficult for me to answer them all together. So you I'll wait for you to take your next step and ask more precise questions.

Measure difference between two files

I have a question that's very specific, yet very general at the same time. (Also, I don't know if this is quite the right site for this.)
The Scenario
Let's say I have an uncompressed video vid.avi. It is then run through [Some compression algorithm], which is lossy. I want to compare vid.avi and the new, compressed file to determine just how much data was lost in the compression. How can I compare the files and how can I measure the difference between the two, using the original as the reference point? Is it possible at all? I would prefer a generic answer that will work with any language, but I would also gladly accept an answer that's specific to a language.
EDIT: Let me be more specific. I want something that compares two video files in a similar way that the Notepad++ Compare plugin compares text files. I just want to find out how close each individual pixel's colour is to the original file's colour for that pixel.
Thanks in advance, and thank you for taking the time to read this question.
It is generally the change in video quality that people want to measure when comparing compression methods, rather than a loss of data.
If you did want to measure somehow the data loss, you would have to define what you mean by 'data' and how you wanted to measure it. Video compression is quite complex and the approach may even differ frame by frame within a video. Data could mean the colour depth for each pixel, the number of frames per second, whether a frame is encoded based on a delay to other frames etc.
Video quality is subjective so the reduction in quality after compression will not be an absolute value. The usual way to measure the quality is similar to the technique used for audio - Mean Opinion Score: https://en.wikipedia.org/wiki/Mean_opinion_score. Its essentially uses a well defined process to try to apply some objectivity to a test audiences subjective experience.

Images and Filters in OpenCL

Lets say I have an image called Test.jpg.
I just figured out how to bring an image into the project by the following line:
FILE *infile = fopen("Stonehenge.jpg", "rb");
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
I have never worked with images before, let alone OpenCl so there is a lot that is going over my head.
I need further clarification on this part for my own understanding
Does this bmp image also need to be stored in an array in order to have a filter applied to it? I have seen a sliding window technique be used a couple of times in other examples. Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
I know this may seem like a basic question to most but I do not have a mentor on this subject in my workplace.
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
Not exactly. bmp is a very specific image serialization format and actually a quite complicated one (implementing a BMP file parser that deals with all the corner cases correctly is actually rather difficult).
However what you have there so far is not even file content data. What you have there is a C stdio FILE handle and that's it. So far you did not even check if the file could be opened. That's not really useful.
JPEG is a lossy compressed image format. What you need to be able to "work" with it is a pixel value array. Either an array of component tuples, or a number of arrays, one for each component (depending on your application either format may perform better).
Now implementing image format decoders becomes tedious. It's not exactly difficult but also not something you can write down on a single evening. Of course the devil is in the details and writing an implementation that is high quality, covers all corner cases and is fast is a major effort. That's why for every image (and video and audio) format out there you usually can find only a small number of encoder and decoder implementations. The de-facto standard codec library for JPEG are libjpeg and libjpeg-turbo. If your aim is to read just JPEG files, then these libraries would be the go-to implementation. However you also may want to support PNG files, and then maybe EXR and so on and then things become tedious again. So there are meta-libraries which wrap all those format specific libraries and offer them through a universal API.
In the OpenGL wiki there's a dedicated page on the current state of image loader libraries: https://www.opengl.org/wiki/Image_Libraries
Does this bmp image also need to be stored in an array in order to have a filter applied to it?
That actually depends on the kind of filter you want to apply. A simple threshold filter for example does not take a pixel's surroundings into account. If you were to perform scanline signal processing (e.g. when processing old analogue television signals) you may require only a single row of pixels at a time.
The universal solution of course to keep the whole image in memory, but then some pictures are so HUGE that no average computer's RAM can hold them. There are image processing libraries like VIPS that implement processing graphs that can operate on small subregions of an image at a time and can be executed independently.
Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
In case you mean "pixel array" instead of BMP (remember, BMP is a specific data structure), then no. Pixel component values may be of any scalar type and value range. And there are in fact colour spaces in which there are value regions which are mathematically necessary but do not denote actually sensible colours.
When it comes down to pixel data, an image is just a n-dimensional array of scalar component tuples where each component's value lies in a given range of values. It doesn't get more specific for that. Only when you introduce colour spaces (RGB, CMYK, YUV, CIE-Lab, CIE-XYZ, etc.) you give those values specific colour-meaning. And the choice of data type is more or less arbitrary. You can either use 8 bits per component RGB (0..255), 10 bits (0..1024) or floating point (0.0 .. 1.0); the choice is yours.

How could we get a variable value from GLSL?

I'm doing a project with a lot of calculation and i got an idea is throw pieces of work to GPU, but i wonder whether could we retrieve results from GLSL, if it is posible, how?
GLSL does not provide outputs besides what is placed in the frame buffer.
To program a GPU and get results more conveniently, use CUDA (NVidia only) or OpenCL (cross-platform).
In general, what you want to do is use OpenCL for general-purpose GPU tasks. However, if you are insistent about pretending that OpenGL is not a rendering API...
Framebuffer Objects make it relatively easy to render to multiple outputs. This of course means that you have to structure your processing such that what gets rendered matches what you want. You can render to 32-bit floating-point "images", so you have access to plenty of precision. The biggest difficulty is what I stated: figuring out how to structure your task to match rendering.
It's a bit easier when using transform feedback. This is the ability to write the output of the vertex (or geometry) shader processing to a buffer object. This still requires structuring your tasks into something like rendering, but it's easier because vertex shaders have a strict one-vertex-to-one-vertex mapping. For every input vertex, there is exactly one output. And if you draw GL_POINTS, it's not too difficult to use attributes to pass the data that changes.
Both easier and harder is the use of shader_image_load_store. This is effectively the ability to read/write from/to arbitrary images "whenever you want". I put that last part in quotes because there are lots of esoteric rules about data race conditions: reading from a value written by another shader invocation and so forth. These are not trivial to deal with. You can try to structure your code to avoid them, by not writing to the same image location in the same shader. But in many cases, if you could do that, you could just render to the framebuffer.
Ultimately, it's pretty much impossible to answer this question in the general case, without knowing what exactly you're trying to actually do. How you approach GPGPU through a rendering API depends greatly on exactly what you're trying to compute.

Resources