Which video encoding algorithm should I use for a video with just one static image and sound? - video-processing

I'm doing video processing tasks and one of the problems I need to solve is choosing the appropriate encoding algorithm for a video that has just one static image throughout the entire video.
Currently I tried several algorithms, such as DivX and XviD, but they produce 3MB video for a 1 minute long video. The audio is 64kbit/s mp3, so the audio takes just 480KB. So the video is 2.5MB!
As the image in the video is not changing, it could be compressed really efficiently as there is no motion. The image size itself (it's a jpg) is just 50KB.
So ideally I'd expect this video to be about 550KB - 600KB and not 3MB.
Any ideas about how I could optimize the video so it's not that huge?
I hope this is the right stackexchange forum to ask this question.

Set the frames-per-second to be very low. Lower than 1fps if you can. Your goal would be to get as close to two keyframes (one at the start, and one at the end) as possible.
Whether you can do this depends on the scheme/codec you are using, and also the encoder.
Many codecs will have keyframe-related options. For example, here are some open-source encoders:
lavc (libavcodec):
keyint=<0-300> - maximum interval between keyframes in frames (default: 250 or one keyframe every ten seconds in a 25fps movie.
This is the recommended default for MPEG-4). Most codecs require regular keyframes in order to limit the accumulation of mismatch error. Keyframes are also needed for seeking, as seeking is only possible to a keyframe - but keyframes need more space than other frames, so larger numbers here mean slightly smaller files but less precise seeking. 0 is equivalent to 1, which makes every frame a keyframe. Values >300 are not recommended as the quality might be bad depending upon decoder, encoder and luck. It is common for MPEG-1/2 to use values <=30.
xvidenc:
max_key_interval= - maximum interval between keyframes (default: 10*fps)
Interestingly, this solution may reduce the ability to seek in the file, so you will want to test that.

I think this problem is related to the implementation of video encoder, not the video encoding standard itself.
Actually, most video encoder implementations are not designed for videos of static image, thus it will not produce perfect bitstream as we imagined when a video of static image is inputted. Most video encoder implementations are designed for processing "natural" video.
If you really need a better encoding result for video of static image, you may do a hack on an open source video encoder, from 2nd frame on, mark all MBs' as "skip"...

Related

Set webcam grabbing resolution with ImageIO in Python

Using simple Windows/Python to read from Webcam:
camera = iio.get_reader("<video0>")
screenshot = camera.get_data(0)
camera.close()
I'm getting a default resolution of 1980x1920. The webcam had different, larger resolutions available. How do I set that UP?
ALSO -
How do I set exposure time? image comes out pretty dark.
Thanks
You can set the resolution via the size kwarg, e.g. size=(1280, 720)
My webcam is my third device and the resolution defaults to 640x360 but has 1280x720 available, so I would do something like:
import imageio.v3 as iio
frame = iio.imread("<video2>", index=0, size=(1280, 720))
On a tangent, I'd also suggest switching to the easier iio.imiter for stream reading. It tends to produce cleaner code than the old iio.get_reader syntax.
import imageio.v3 as iio
for idx, frame in enumerate(iio.imiter("<video0>", size=(1980, 1920))):
... # do something with the frame
if idx == 9:
# read 10 frames
break
Response to your edit:
Setting webcam exposure is a question that actually hasn't come up yet. Webcams typically feature automatic brightness adjustment, but that might take a few frames depending on the webcam's quality.
Manual adjustment might already be possible and I just don't know about it (never looked into it). This is a separate question though and is probably better tracked as a new issue over at the ImageIO repo.

Measure difference between two files

I have a question that's very specific, yet very general at the same time. (Also, I don't know if this is quite the right site for this.)
The Scenario
Let's say I have an uncompressed video vid.avi. It is then run through [Some compression algorithm], which is lossy. I want to compare vid.avi and the new, compressed file to determine just how much data was lost in the compression. How can I compare the files and how can I measure the difference between the two, using the original as the reference point? Is it possible at all? I would prefer a generic answer that will work with any language, but I would also gladly accept an answer that's specific to a language.
EDIT: Let me be more specific. I want something that compares two video files in a similar way that the Notepad++ Compare plugin compares text files. I just want to find out how close each individual pixel's colour is to the original file's colour for that pixel.
Thanks in advance, and thank you for taking the time to read this question.
It is generally the change in video quality that people want to measure when comparing compression methods, rather than a loss of data.
If you did want to measure somehow the data loss, you would have to define what you mean by 'data' and how you wanted to measure it. Video compression is quite complex and the approach may even differ frame by frame within a video. Data could mean the colour depth for each pixel, the number of frames per second, whether a frame is encoded based on a delay to other frames etc.
Video quality is subjective so the reduction in quality after compression will not be an absolute value. The usual way to measure the quality is similar to the technique used for audio - Mean Opinion Score: https://en.wikipedia.org/wiki/Mean_opinion_score. Its essentially uses a well defined process to try to apply some objectivity to a test audiences subjective experience.

Images and Filters in OpenCL

Lets say I have an image called Test.jpg.
I just figured out how to bring an image into the project by the following line:
FILE *infile = fopen("Stonehenge.jpg", "rb");
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
I have never worked with images before, let alone OpenCl so there is a lot that is going over my head.
I need further clarification on this part for my own understanding
Does this bmp image also need to be stored in an array in order to have a filter applied to it? I have seen a sliding window technique be used a couple of times in other examples. Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
I know this may seem like a basic question to most but I do not have a mentor on this subject in my workplace.
Now that I have the file, do I need to convert this file into a bmp image in order to apply a filter to it?
Not exactly. bmp is a very specific image serialization format and actually a quite complicated one (implementing a BMP file parser that deals with all the corner cases correctly is actually rather difficult).
However what you have there so far is not even file content data. What you have there is a C stdio FILE handle and that's it. So far you did not even check if the file could be opened. That's not really useful.
JPEG is a lossy compressed image format. What you need to be able to "work" with it is a pixel value array. Either an array of component tuples, or a number of arrays, one for each component (depending on your application either format may perform better).
Now implementing image format decoders becomes tedious. It's not exactly difficult but also not something you can write down on a single evening. Of course the devil is in the details and writing an implementation that is high quality, covers all corner cases and is fast is a major effort. That's why for every image (and video and audio) format out there you usually can find only a small number of encoder and decoder implementations. The de-facto standard codec library for JPEG are libjpeg and libjpeg-turbo. If your aim is to read just JPEG files, then these libraries would be the go-to implementation. However you also may want to support PNG files, and then maybe EXR and so on and then things become tedious again. So there are meta-libraries which wrap all those format specific libraries and offer them through a universal API.
In the OpenGL wiki there's a dedicated page on the current state of image loader libraries: https://www.opengl.org/wiki/Image_Libraries
Does this bmp image also need to be stored in an array in order to have a filter applied to it?
That actually depends on the kind of filter you want to apply. A simple threshold filter for example does not take a pixel's surroundings into account. If you were to perform scanline signal processing (e.g. when processing old analogue television signals) you may require only a single row of pixels at a time.
The universal solution of course to keep the whole image in memory, but then some pictures are so HUGE that no average computer's RAM can hold them. There are image processing libraries like VIPS that implement processing graphs that can operate on small subregions of an image at a time and can be executed independently.
Is the bmp image pretty much split up into RGB values (0-255)? If someone can provide a link on this item that should help me understand this a lot better.
In case you mean "pixel array" instead of BMP (remember, BMP is a specific data structure), then no. Pixel component values may be of any scalar type and value range. And there are in fact colour spaces in which there are value regions which are mathematically necessary but do not denote actually sensible colours.
When it comes down to pixel data, an image is just a n-dimensional array of scalar component tuples where each component's value lies in a given range of values. It doesn't get more specific for that. Only when you introduce colour spaces (RGB, CMYK, YUV, CIE-Lab, CIE-XYZ, etc.) you give those values specific colour-meaning. And the choice of data type is more or less arbitrary. You can either use 8 bits per component RGB (0..255), 10 bits (0..1024) or floating point (0.0 .. 1.0); the choice is yours.

Most performant image format for SCNParticles?

I've been using 24bit .png with Alpha, from Photoshop, and just tried a .psd which worked fine with OpenGL ES, but Metal didn't see the Alpha channel.
What's the absolutely most performant texture format for particles within SceneKit?
Here's a sheet to test on, if needs be.
It looks white... right click and save as in the blank space. It's an alpha heavy set of rings. You can probably barely make them out if you squint at the screen:
exaggerated example use case:
https://www.dropbox.com/s/vu4dvfl0aj3f50o/circless.mov?dl=0
// Additional points for anyone can guess the difference between the left and right rings in the video.
Use a grayscale/alpha PNG, not an RGBA one. Since it uses 16 bits per pixel (8+8) instead of 32 (8+8+8+8), the initial texture load will be faster and it may (depending on the GPU) use less memory as well. At render time, though, you’re not going to see much of a speed difference, since whatever the texture format is it’s still being drawn to a full RGB(A) render buffer.
There’s also PVRTC, which can get you down as low as 2–4 bits per pixel, but I tried Imagine’s tool out on your image and even the highest quality settings caused a bunch of artifacts like the below:
Long story short: go with a grayscale+alpha PNG, which you can easily export from Photoshop. If your particle system is hurting your frame rate, reduce the number and/or size of the particles—in this case you might be able to get away with layering a couple of your particle images on top of each other in the source texture atlas, which may not be too noticeable if you pick ones that differ in size enough.

H.264 Video Encoding

I'm working on a video encoding component that suppose to Transcode a stream from Resolution X to Resolution Y and stream it over the network (down scaling).
I'm getting an encoded stream which I need to decode, rescale and encode again.
What I'm thinking of doing in order to reduce the CPU is to decode only the key-frames and then do the re-scale and encoding.
Will it be more beneficial from CPU-wise perspective to also encode only to key-frames as well? meaning, each decoded key-frame will be encoded to key-frame.
Thanks.
This sounds like a good (patentable) idea! However, most codec doesn't really support this right now. Give a sequence, resolution of all frames must be same. Resolution of key frame cannot be different from that of other frames. Partly this is also needed because of gamuts of motion compensation algorithms involved in construction of P and B frames from I and P frames. (a.k.a IDR frames in H.264).
In my knowledge H.264 also doesn't support this as well. Will be happy to know if this is possible.
I do not understand this question, if you decode,rescale and encode only the keyframes only 1 in 30(assuming key frame interval is 30) will be rescaled. Is that what you want? that is 3.3% of stream. What purpose would this serve? Key frames in video compression mean Intra/IDR frames.

Resources