H.264 Video Encoding - video-processing

I'm working on a video encoding component that suppose to Transcode a stream from Resolution X to Resolution Y and stream it over the network (down scaling).
I'm getting an encoded stream which I need to decode, rescale and encode again.
What I'm thinking of doing in order to reduce the CPU is to decode only the key-frames and then do the re-scale and encoding.
Will it be more beneficial from CPU-wise perspective to also encode only to key-frames as well? meaning, each decoded key-frame will be encoded to key-frame.
Thanks.

This sounds like a good (patentable) idea! However, most codec doesn't really support this right now. Give a sequence, resolution of all frames must be same. Resolution of key frame cannot be different from that of other frames. Partly this is also needed because of gamuts of motion compensation algorithms involved in construction of P and B frames from I and P frames. (a.k.a IDR frames in H.264).
In my knowledge H.264 also doesn't support this as well. Will be happy to know if this is possible.

I do not understand this question, if you decode,rescale and encode only the keyframes only 1 in 30(assuming key frame interval is 30) will be rescaled. Is that what you want? that is 3.3% of stream. What purpose would this serve? Key frames in video compression mean Intra/IDR frames.

Related

How to locate position and get intermediate results in the source code on VTM?

I want to achieve deep learning-based video compression. But it's difficult to get the intermediate results. So I want to ask if there are some convenient methods.
Could you specify your question 'intermediate results'?
If you mean reconstructed frame of VTM, you can get a buffer from picture class.
EncGOP encodes every frame in GOP and executes in-loop filters, therefore you can get intermediate frame from EncGOP while encoding.
At decoder side, you can get same buffer at DecLib.
I hope this answer helps you.

Seeking some guidance on webcam picture display using GTK+ and Cairo in C

In this question I'm mostly seeking for advice and guidance on overall understanding of some concepts of drawing wth GTK+ and Cairo in C language (IMO the information on topic is rather scarce, also my experience in really modest).
I'm coding some pet application which captures frames from webcam and displays them on a GTK window.
My app is working, but there are some points which I don't feel like grasped.
Overall process:
I've got a webcam frame as an array of bytes mmaped from webcam device to my app's process memory. So when another frame is captured what I have is a 640*480*3 bytes long array which is denoted as being in a RGB24 format. After some searching it looks like for a purpose of displaying it in a GTK window I need to create an object called drawing area using gtk_drawing_area_new(), add a "draw" callback and do "drawing" there in a designated callback. So, according to Cairo "drawing" is a process of applying "source" to "destination". I assume that I already have a source - my webcam mmaped pixels, but it looks like I need to use some "source" that Cairo is able to understand. I found a candidate:
cairo_surface_t* surface = cairo_image_surface_create(CAIRO_FORMAT_RGB24, 640, 480);
As I see this call creates some Cairo acceptable object, which along the way allocates a buffer in my app's memory which I can get, using:
unsigned char* surface_data = cairo_image_surface_get_data(surface);
According to docs this is a 640x480x4 bytes long buffer, which, on a little endian archs, should be filled with BGRA formatted pixel data.
Then I should rearrange my original webcam pixels for EVERY frame captured using this :
for (size_t idx_src=0, idx_dst=0; idx_src<640*480*3; idx_dst+=4, idx_src+=3) {
surface_data[idx_dst] = image[idx_src+2]; //B [3rd pos -> 1st pos]
surface_data[idx_dst+1] = image[idx_src+1]; //G [no change]
surface_data[idx_dst+2] = image[idx_src]; //R [1st pos -> 3rd pos]
}
After this I should do "drawing" with:
cairo_set_source_surface(cr, surface, 0, 0);
cairo_paint(cr);
So questions:
Is it what is supposed to be done for task at hand or I miss
something completely here ?
What confuses me is that I should
rearrange my original webcam pixels for EVERY frame captured (this
presumably consumes some cpu time, could be a limiting factor for
capturing in HD res at high frame rates). Is there some other way ?
Let's suppose I somehow acquire pixels from webcam in a Cairo
conforming format, e.g. 640x480x4 BGRA formatted bytes. Is there a
way to "wrap" this data in some Cairo acceptable object to exclude
pixel rearranging part ?
Any other thoughts I should've consider ?
Thanks for attention.
For most of your questions: Cairo only supports some image formats. Since your data comes in another format, you will have to convert it. All this copying around will likely be too slow. To make this work with an acceptable speed, you would need some other approach. No, I do not have any helpful suggestions here.
An unhelpful one would be: Is there some example for this webcam that you could look at?
Let's suppose I somehow acquire pixels from webcam in a Cairo conforming format, e.g. 640x480x4 BGRA formatted bytes. Is there a way to "wrap" this data in some Cairo acceptable object to exclude pixel rearranging part ?
Yup. cairo_image_surface_create_for_data.

Measure difference between two files

I have a question that's very specific, yet very general at the same time. (Also, I don't know if this is quite the right site for this.)
The Scenario
Let's say I have an uncompressed video vid.avi. It is then run through [Some compression algorithm], which is lossy. I want to compare vid.avi and the new, compressed file to determine just how much data was lost in the compression. How can I compare the files and how can I measure the difference between the two, using the original as the reference point? Is it possible at all? I would prefer a generic answer that will work with any language, but I would also gladly accept an answer that's specific to a language.
EDIT: Let me be more specific. I want something that compares two video files in a similar way that the Notepad++ Compare plugin compares text files. I just want to find out how close each individual pixel's colour is to the original file's colour for that pixel.
Thanks in advance, and thank you for taking the time to read this question.
It is generally the change in video quality that people want to measure when comparing compression methods, rather than a loss of data.
If you did want to measure somehow the data loss, you would have to define what you mean by 'data' and how you wanted to measure it. Video compression is quite complex and the approach may even differ frame by frame within a video. Data could mean the colour depth for each pixel, the number of frames per second, whether a frame is encoded based on a delay to other frames etc.
Video quality is subjective so the reduction in quality after compression will not be an absolute value. The usual way to measure the quality is similar to the technique used for audio - Mean Opinion Score: https://en.wikipedia.org/wiki/Mean_opinion_score. Its essentially uses a well defined process to try to apply some objectivity to a test audiences subjective experience.

Which video encoding algorithm should I use for a video with just one static image and sound?

I'm doing video processing tasks and one of the problems I need to solve is choosing the appropriate encoding algorithm for a video that has just one static image throughout the entire video.
Currently I tried several algorithms, such as DivX and XviD, but they produce 3MB video for a 1 minute long video. The audio is 64kbit/s mp3, so the audio takes just 480KB. So the video is 2.5MB!
As the image in the video is not changing, it could be compressed really efficiently as there is no motion. The image size itself (it's a jpg) is just 50KB.
So ideally I'd expect this video to be about 550KB - 600KB and not 3MB.
Any ideas about how I could optimize the video so it's not that huge?
I hope this is the right stackexchange forum to ask this question.
Set the frames-per-second to be very low. Lower than 1fps if you can. Your goal would be to get as close to two keyframes (one at the start, and one at the end) as possible.
Whether you can do this depends on the scheme/codec you are using, and also the encoder.
Many codecs will have keyframe-related options. For example, here are some open-source encoders:
lavc (libavcodec):
keyint=<0-300> - maximum interval between keyframes in frames (default: 250 or one keyframe every ten seconds in a 25fps movie.
This is the recommended default for MPEG-4). Most codecs require regular keyframes in order to limit the accumulation of mismatch error. Keyframes are also needed for seeking, as seeking is only possible to a keyframe - but keyframes need more space than other frames, so larger numbers here mean slightly smaller files but less precise seeking. 0 is equivalent to 1, which makes every frame a keyframe. Values >300 are not recommended as the quality might be bad depending upon decoder, encoder and luck. It is common for MPEG-1/2 to use values <=30.
xvidenc:
max_key_interval= - maximum interval between keyframes (default: 10*fps)
Interestingly, this solution may reduce the ability to seek in the file, so you will want to test that.
I think this problem is related to the implementation of video encoder, not the video encoding standard itself.
Actually, most video encoder implementations are not designed for videos of static image, thus it will not produce perfect bitstream as we imagined when a video of static image is inputted. Most video encoder implementations are designed for processing "natural" video.
If you really need a better encoding result for video of static image, you may do a hack on an open source video encoder, from 2nd frame on, mark all MBs' as "skip"...

Image scaling in C

I am designing a jpeg to bmp decoder which scales the image. I have been supplied with the source code for the decoder so my actual work is to design a scaler . I do not know where to begin. I have scouted the internet for the various scaling algorithms but am not sure where to introduce the scaling. So should I do the the scaling after the image is converted into bmp or should I do this during the decoding at the MCU level. am confused :(
If you guys have some information to help me out, its appreciated. any material to read, source code to analyse etc....
Oh I forgot to mention one more thing, this is a porting project from the pc platform to a fpga, so, not all the library files are available on the target platform.
There are many ways to scale an image.
The easiest way is to decode the image and then scale using a naive scaling algorithm, something like:
dest_pixel [x,y] = src_pixel [x * x_scale_factor, y * y_scale_factor]
where x/y_scale_factor is
src_size / dest_size
Once you have that working, you can look into more complex scaling systems, things like bilinear filter. For example, the destination pixel is the average of several source pixels when reducing the size and an interpolation of several source pixels when increasing the size.

Resources