X264 : How to access NAL units from encoder? - c

When I call
frame_size = x264_encoder_encode(encoder, &nals, &i_nals, &pic_in, &pic_out);
and subsequently write each NAL to a file like this:
if (frame_size >= 0)
{
int i;
int j;
for (i = 0; i < i_nals; i++)
{
printf("******************* NAL %d (%d bytes) *******************\n", i, nals[i].i_payload);
fwrite(&(nals[i].p_payload[0]), 1, nals[i].i_payload, fid);
}
}
then I get this
My questions are:
1) Is it normal that there's readable parameters in the beginning of the file?
2) How do I configure the X264 encoder so that the encoder returns frames that I can send via UDP without the packet getting fragmented (size must be below 1390 or somewhere around that).
3) With the x264.exe I pass in these options:
"--threads 1 --profile baseline --level 3.2 --preset ultrafast --bframes 0 --force-cfr --no-mbtree --sync-lookahead 0 --rc-lookahead 0 --keyint 1000 --intra-refresh"
How do I map those to the settings in the X264 parameters structure ? (x264_param_t)
4) I have been told that the x264 static library doesn't support bitmap input to the encoder and that I have to use libswscale for conversion of the 24bit RGB input bitmap to YUV2. The encoder, supposedly, only takes YUV2 as input? Is this true? If so, how do I build libswscale for the x264 static library?

1) Yes. x264 includes the automatically. Its an SEI slice, and you can throw it away if you want.
2) set i_slice_max_size = 1390
3) Take a look at x264_param_t in x264.h. The settings are fairly self explanatory. As for setting the profile and preset call int x264_param_apply_profile( x264_param_t *, const char *profile ) and int x264_param_default_preset( x264_param_t *, const char *preset, const char *tune )
4) Yes, it is true, I want lying when I said that. Look online/on stack overflow there are a million resources on compiling ffmpeg. In fact if you compiled x264 with avcodec support you already have it on your system.
5) Yes!, you should be a good stack overflow citizen and up vote and accept answers form people who donate there free time and knowledge (which takes years to acquire) to helping you.

Related

How to FFmpeg decode and extract metadata from last frame?

I am decoding using FFMpeg. The videos I am decoding are H.264 or MPEG4 videos using C code. I am using the 32bit libs. I have successfully decoded and extracted the metadata for the first frame. I would now like to decode the last frame. I have a defined duration of the video, and felt it was a safe assumption to say that isLastFrame = duration. Here's what I have, any suggestions?
AVFormatContext* pFormatCtx = avformat_alloc_context();
avformat_open_input(&pFormatCtx, filename, NULL, NULL);
int64_t duration = pFormatCtx->duration;
i=0;
while(av_read_frame(pFormatCtx, &packet)>=0) {
/* Is this a packet from the video stream? */
if(packet.stream_index==videoStream) {
/* Decode video frame*/
avcodec_decode_video2(pCodecCtx, pFrame, &duration, &packet);
}
Any help is much appreciated! :)
Thanks everyone for your help but I found that the reason the AV_SEEK_FRAME duration wasn't working was because you must multiply it by 1000 for it to be applicable in read frame. Also please note that the reason I have but decode_video instead of the decode functions calls is because I was using 32 bit and created my own but if you plug in video_decode() or I believe it's decode_video2 it works just as well. Hopefully this will help any fellow decoders in the future.
AVFormat Format;
int64_t duration = Format->duration;
duration = duration * 1000;
if (av_seek_frame(Format, Packet->stream_index, duration, AVSEEK_FLAG_ANY) <= 0)
{
/* read the frame and decode the packet */
if (av_read_frame(FormatContext, &Packet) >= 0)
{
/*decode the video frame*/
decode_video(CodecContext, Frame, &duration, &Packet);
}
This might be what you're looking for:
Codecs which have the CODEC_CAP_DELAY capability set have a delay
between input and output, these need to be fed with avpkt->data=NULL,
avpkt->size=0 at the end to return the remaining frames.
Link to FFmpeg documentation

List available capture formats

New to V4L, I decided to start using the video4linux2 library in order to capture a frame from my camera in C (I am using the uvcvideo module with a Ricoh Co. camera). I followed several guides and tutorials, and managed to get a running program. My question is mainly about this usual code line :
struct v4l2_format format = {0};
format.fmt.pix.pixelformat = V4L2_PIX_FMT_MJPEG;
// ...
This is where I set the actual video format to use when capturing. As you can see, in this sample, I'm using MJPEG (http://lxr.free-electrons.com/source/include/uapi/linux/videodev2.h#L390). Even though this might be a great format and all, my application will probably require simple RGB formatting, pixel per pixel I guess. For this reason, I tried using RGB format constants such as V4L2_PIX_FMT_RGB24. Now for some reason... v4l2 doesn't like it. I'm guessing this is hardware-related, but I'd like to avoid MJPEG manipulations as much as possible. For testing purposes, I tried using other constants and formats, but no matter what I did, v4l2 kept changing the pixelformat field's value :
xioctl(fd, VIDIOC_S_FMT, &format); // This call succeeds with errno != EINTR.
if(format.fmt.pix.pixelformat != V4L2_PIX_FMT_RGB24){
// My program always enters this block when not using MJPEG.
fprintf(stderr, "Format wasn't accepted by v4l2.");
exit(4);
}
Now my question is : is there a way I could get a list of accepted video formats (and I mean, accepted by my camera/v4l2), from which I could pick something else than MJPEG ? If you think I have to stick with MJPEG, would you recommend me any library allowing me to manipulate it, and eventually, to draw back the capture in a GUI frame ?
Barbarian test code
I used the following trick to test all available formats on my hardware. First, some shell script to get a list of all formats...
grep 'V4L2_PIX_FMT' /usr/include/linux/videodev2.h | grep define | tr '\t' ' ' | cut -d' ' -f2 | sed 's/$/,/g'
... the output of which is used in this C program :
int formats[] = {/* result of above command */};
int i = 0;
struct v4l2_format format = {0};
for(i = 0; i < /* number of lines in previous command output */; i++){
format.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
format.fmt.pix.width = 320;
format.fmt.pix.height = 240;
format.fmt.pix.pixelformat = formats[i];
format.fmt.pix.field = V4L2_FIELD_NONE;
if(xioctl(fd, VIDIOC_S_FMT, &format) != -1 && format.fmt.pix.pixelformat == formats[i])
fprintf(stderr, "Accepted : %d\n", i);
}
This test reveals that only V4L2_PIX_FMT_YUYV and V4L2_PIX_FMT_MJPEG are functional. Any way I could improve this, or is it hardware-related ?
In Linux, command line utility v4l2-ctl displays all of a webcam's natively supported formats -- install it with sudo apt-get install v4l-utils, run it with v4l2-ctl -dX --list-formats-ext where X is the camera index as in /dev/videoX. These formats are reported to the v4l2 kernel module via uvcvideo module and they are supported natively by the webcam chipset. Only the listed formats are supported by v4l2, anything else would need to be coded by the user, and RGBs are very seldom provided, despite virtually all CCDs working in Bayer RGGB. The most common formats by far are YUV422 (YUYV or YUY2) and MJPEG, with a certain overlap: MJPEG achieve larger frame rates for large resolutions.
C++ code for listing the camera formats can be found in Chromium GetDeviceSupportedFormats() implementation for Linux here.
If you have to plug code to convert YUV to RGB I'd recommend libyuv which has been optimized for plenty of architectures.
In order to enumerate the available format you can use ioctl VIDIOC_ENUM_FMT
To print descriptions of capture format supported by /dev/video0 you can process like this :
#include <string.h>
#include <stdio.h>
#include <fcntl.h>
#include <linux/videodev2.h>
int main()
{
int fd = v4l2_open("/dev/video0", O_RDWR);
if (fd != -1)
{
struct v4l2_fmtdesc fmtdesc;
memset(&fmtdesc,0,sizeof(fmtdesc));
fmtdesc.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
while (ioctl(fd,VIDIOC_ENUM_FMT,&fmtdesc) == 0)
{
printf("%s\n", fmtdesc.description);
fmtdesc.index++;
}
v4l2_close(fd);
}
}

How can I seek to frame No. X with ffmpeg?

I'm writing a video editor, and I need to seek to exact frame, knowing the frame number.
Other posts on stackoverflow told me that ffmpeg may give me a few broken frames after seeking, which is not a problem for playback but a big problem for video editors.
And I need to seek by frame number, not by time, which will become inaccurate when converted to frame number.
I've read dranger's tuts (which is outdated now), and end up with:
av_seek_frame(fmt_ctx, video_stream_id, frame, AVSEEK_FLAG_ANY);
It always seek to frame No. 0, and always return 0 which means success.
Then I tried to read Blender's source code and found it really complex(maybe I should implement an image buffer?).
So, is there any simple way to seek to a frame with just a simple call like seek(context, frame_number)(while getting a full frame, not a broken one)? Or, is there any lightweight library that simplifies this?
EDIT:
Thanks to praks411,I found the solution:
void AV_seek(AV * av, size_t frame)
{
int frame_delta = frame - av->frame_id;
if (frame_delta < 0 || frame_delta > 5)
av_seek_frame(av->fmt_ctx, av->video_stream_id,
frame, AVSEEK_FLAG_BACKWARD);
while (av->frame_id != frame)
AV_read_frame(av);
}
void AV_read_frame(AV * av)
{
AVPacket packet;
int frame_done;
while (av_read_frame(av->fmt_ctx, &packet) >= 0) {
if (packet.stream_index == av->video_stream_id) {
avcodec_decode_video2(av->codec_ctx, av->frame, &frame_done, &packet);
if (frame_done) {
...
av->frame_id = packet.dts;
av_free_packet(&packet);
return;
}
}
av_free_packet(&packet);
}
}
EDIT2:
Turns out there is a library for this: FFMS2.
It is "an FFmpeg based source library [...] for easy frame accurate access", and is portable (at least across Windows and Linux).
av_seek_frame will only seek based on timestamp to the key-frame. Since it seeks to the keyframe, you may not get what you want. Hence it is recommended to seek to nearest keyframe and then read frame by frame util you reach the desired frame.
However, if you are dealing with fixed FPS value, then you can easily map timestamp to frame index.
Before seeking you will need to convert your time to AVStream.time_base units if you have specified stream. Read ffmpeg documentation of av_seek_frame in avformat.h.
For example, if you want to seek to 1.23 seconds of clip:
double m_out_start_time = 1.23;
int flgs = AVSEEK_FLAG_ANY;
int seek_ts = (m_out_start_time*(m_in_vid_strm->time_base.den))/(m_in_vid_strm->time_base.num);
if(av_seek_frame(m_informat, m_in_vid_strm_idx,seek_ts, flgs) < 0)
{
PRINT_MSG("Failed to seek Video ")
}

How do I convert a G.726 ADPCM signal into a PCM signal?

I usually look to SoX or Window's built in audio libraries for this stuff, but it appears that neither have G.726 codecs.
So I have a sequence of bytes that I know are encoded as G.726 although the bit-rate and whether it is mu-law or A-law is not known at this time (experimentation will determine those parameters), and I need to decode them into a normal PCM signal.
So I downloaded the reference implementation from the ITU-T (ITU-T Recommendation G.191) but I'm kind of confused on how to use the G726_decode function. According to the documentation inp_buf and out_buf need to have the same length smpno and both buffers are 16-bit buffers. This seems to me like a step is missing; otherwise no compression is accomplished by using G.726. According to the Wikipedia page on G.726 sample size depends on bit rate (from 2 to 5 bits). Am I supposed to do the decompression into samples myself? So if I assume maximum compression (2 bit samples) then each byte will produce 4 samples.
Example:
char b = /* read the code from input */
short inp[4], output[4];
inp[0] = b & 0x0003;
inp[1] = b & 0x000C >> 2;
inp[2] = (b & 0x0030) >> 4;
inp[3] = (b & 0x00C0) >> 6;
G726_state state;
memset(&state, 0, sizeof(G726_state));
G726_decode(inp, output, 4, "u", 2, 1, &state);
/* ouput now contains 4 PCM samples */
Or am I missing something completely?
Looks like ffmpeg actually isn't able to do this, as I thought it surely would be able to... however, while I was googling I did find this post to the ffmpeg mailing list which offers a solution.
Basically, there is a separate program called g72x++ which seems to be able to decode the audio to raw PCM for you.

Jpeglib code gives garbled output, even the bundled example code?

I'm on Ubuntu Intrepid and I'm using jpeglib62 6b-14. I was working on some code, which only gave a black screen with some garbled output at the top when I tried to run it. After a few hours of debugging I got it down to pretty much the JPEG base, so I took the example code, wrote a little piece of code around it and the output was exactly the same.
I'm convinced jpeglib is used in a lot more places on this system and it's simply the version from the repositories so I'm hesitant to say that this is a bug in jpeglib or the Ubuntu packaging.
I put the example code below (most comments stripped). The input JPEG file is an uncompressed 640x480 file with 3 channels, so it should be 921600 bytes (and it is). The output image is JFIF and around 9000 bytes.
If you could help me with even a hint, I'd be very grateful.
Thanks!
#include <stdio.h>
#include <stdlib.h>
#include "jpeglib.h"
#include <setjmp.h>
int main ()
{
// read data
FILE *input = fopen("input.jpg", "rb");
JSAMPLE *image_buffer = (JSAMPLE*) malloc(sizeof(JSAMPLE) * 640 * 480 * 3);
if(input == NULL or image_buffer == NULL)
exit(1);
fread(image_buffer, 640 * 3, 480, input);
// initialise jpeg library
struct jpeg_compress_struct cinfo;
struct jpeg_error_mgr jerr;
cinfo.err = jpeg_std_error(&jerr);
jpeg_create_compress(&cinfo);
// write to foo.jpg
FILE *outfile = fopen("foo.jpg", "wb");
if (outfile == NULL)
exit(1);
jpeg_stdio_dest(&cinfo, outfile);
// setup library
cinfo.image_width = 640;
cinfo.image_height = 480;
cinfo.input_components = 3; // 3 components (R, G, B)
cinfo.in_color_space = JCS_RGB; // RGB
jpeg_set_defaults(&cinfo); // set defaults
// start compressing
int row_stride = 640 * 3; // number of characters in a row
JSAMPROW row_pointer[1]; // pointer to the current row data
jpeg_start_compress(&cinfo, TRUE); // start compressing to jpeg
while (cinfo.next_scanline < cinfo.image_height) {
row_pointer[0] = & image_buffer[cinfo.next_scanline * row_stride];
(void) jpeg_write_scanlines(&cinfo, row_pointer, 1);
}
jpeg_finish_compress(&cinfo);
// clean up
fclose(outfile);
jpeg_destroy_compress(&cinfo);
}
You're reading a JPEG file into memory (without decompressing it) and writing out that buffer as if it were uncompressed, that's why you're getting garbage. You need to decompress the image first before you can feed it into the JPEG compressor.
In other words, the JPEG compressor assumes that its input is raw pixels.
You can convert your input image into raw RGB using ImageMagick:
convert input.jpg rgb:input.raw
It should be exactly 921600 bytes in size.
EDIT: Your question is misleading when you state that your input JPEG file in uncompressed. Anyway, I compiled your code and it works fine, compresses the image correctly. If you can upload the file you're using as input, it might be possible to debug further. If not, I suggest you test your program using an image created from a known JPEG using ImageMagick:
convert some_image_that_is_really_a_jpg.jpg -resize 640x480! rgb:input.jpg
You are reading the input file into memmory compressed and then you are recompressing it before righting to file. You need to decompress the image_buffer before compressing it again. Or alternativly instead of reading in a jpeg read a .raw image
What exactly do you mean by "The input JPEG file is an uncompressed"? Jpegs are all compressed.
In your code, it seems that in the loop you give one row of pixels to libjpeg and ask it to compress it. It doesn't work that way. libjpeg has to have at least 8 rows to start compression (sometimes even more, depending on parameters). So it's best to leave libjpeg to control the input buffer and don't do its job for it.
I suggest you read how cjpeg.c does its job. The easiest way I think is to put your data in a raw type known by libjpeg (say, BMP), and use libjpeg to read the BMP image into its internal representation and compress from there.

Resources