How to FFmpeg decode and extract metadata from last frame? - c

I am decoding using FFMpeg. The videos I am decoding are H.264 or MPEG4 videos using C code. I am using the 32bit libs. I have successfully decoded and extracted the metadata for the first frame. I would now like to decode the last frame. I have a defined duration of the video, and felt it was a safe assumption to say that isLastFrame = duration. Here's what I have, any suggestions?
AVFormatContext* pFormatCtx = avformat_alloc_context();
avformat_open_input(&pFormatCtx, filename, NULL, NULL);
int64_t duration = pFormatCtx->duration;
i=0;
while(av_read_frame(pFormatCtx, &packet)>=0) {
/* Is this a packet from the video stream? */
if(packet.stream_index==videoStream) {
/* Decode video frame*/
avcodec_decode_video2(pCodecCtx, pFrame, &duration, &packet);
}
Any help is much appreciated! :)

Thanks everyone for your help but I found that the reason the AV_SEEK_FRAME duration wasn't working was because you must multiply it by 1000 for it to be applicable in read frame. Also please note that the reason I have but decode_video instead of the decode functions calls is because I was using 32 bit and created my own but if you plug in video_decode() or I believe it's decode_video2 it works just as well. Hopefully this will help any fellow decoders in the future.
AVFormat Format;
int64_t duration = Format->duration;
duration = duration * 1000;
if (av_seek_frame(Format, Packet->stream_index, duration, AVSEEK_FLAG_ANY) <= 0)
{
/* read the frame and decode the packet */
if (av_read_frame(FormatContext, &Packet) >= 0)
{
/*decode the video frame*/
decode_video(CodecContext, Frame, &duration, &Packet);
}

This might be what you're looking for:
Codecs which have the CODEC_CAP_DELAY capability set have a delay
between input and output, these need to be fed with avpkt->data=NULL,
avpkt->size=0 at the end to return the remaining frames.
Link to FFmpeg documentation

Related

DirectShow data copy is TOO slow

Have USB 3.0 HDMI Capture device. It uses YUY2 format (2 bytes per pixel) and 1920x1080 resolution.
Video capture Output Pin connects directly to Video Render input Pin.
And all works good. It shows me 1920x1080 without any freezes.
But I need to make screenshot every second. So this is what I do:
void CaptureInterface::ScreenShoot() {
IMemInputPin* p_MemoryInputPin = nullptr;
hr = p_RenderInputPin->QueryInterface(IID_IMemInputPin, (void**)&p_MemoryInputPin);
IMemAllocator* p_MemoryAllocator = nullptr;
hr = p_MemoryInputPin->GetAllocator(&p_MemoryAllocator);
IMediaSample* p_MediaSample = nullptr;
hr = p_MemoryAllocator->GetBuffer(&p_MediaSample, 0, 0, 0);
long buff_size = p_MediaSample->GetSize(); //buff_size = 4147200 Bytes
BYTE* buff = nullptr;
hr = p_MediaSample->GetPointer(&buff);
//BYTE CaptureInterface::ScreenBuff[1920*1080*2]; defined in header
//--------- TOO SLOW (1.5 seconds for 4 MBytes) ----------
std::memcpy(ScreenBuff, buff, buff_size);
//--------------------------------------------
p_MediaSample->Release();
p_MemoryAllocator->Release();
p_MemoryInputPin->Release();
return;
}
Any other operations with this buffer is very slow too.
But If I use memcpy on other data (2 arrays in my class for example same size 4MB) It is very fast. <0.01sec
Video memory is (might be) slow to read back by its nature (e.g. VMR9 IBasicVideo->GetCurrentImage very slow and you can find other references). You normally want to grab the data before it actually reaches video memory.
Additionally, the way you read data is not quite reliable. You don't know what frame you are actually copying and it might so happen that you even read blackness or garbage, or vice versa your acquiring access to buffer freezes the main video streaming. This is because you are grabbing an unused buffer from pool of available buffers rather than a buffer that corresponds to specific video frame. Your getting an image from such buffer happen in a fragile assumption that unused data from previously streamed frame was initialized and is not yet overwritten by anything else.

ffmpeg recording h264 live stream got error

I am trying to record a h.264 live stream using the following code:
AVOutputFormat* fmt = av_guess_format(NULL, "test.mpeg", NULL);
AVFormatContext* oc = avformat_alloc_context();
oc->oformat = fmt;
avio_open2(&oc->pb, "test.mpeg", AVIO_FLAG_WRITE, NULL, NULL);
AVStream* stream = NULL;
...
while(!done)
{
// Read a frame
if(av_read_frame(inputStreamFormatCtx, &packet)<0)
return false;
if(packet.stream_index==correct_index)
{
////////////////////////////////////////////////////////////////////////
// Is this a packet from the video stream -> decode video frame
if (stream == NULL){//create stream in file
stream = avformat_new_stream(oc, pFormatCtx->streams[videoStream]->codec->codec);
avcodec_copy_context(stream->codec, pFormatCtx->streams[videoStream]->codec);
stream->sample_aspect_ratio = pFormatCtx->streams[videoStream]->codec->sample_aspect_ratio;
stream->sample_aspect_ratio.num = pFormatCtx->streams[videoStream]->codec->sample_aspect_ratio.num;
stream->sample_aspect_ratio.den = pFormatCtx->streams[videoStream]->codec->sample_aspect_ratio.den;
// Assume r_frame_rate is accurate
stream->r_frame_rate = pFormatCtx->streams[videoStream]->r_frame_rate;
stream->avg_frame_rate = stream->r_frame_rate;
stream->time_base = av_inv_q(stream->r_frame_rate);
stream->codec->time_base = stream->time_base;
avformat_write_header(oc, NULL);
}
av_write_frame(oc, &packet);
...
}
}
However, the ffmpeg says
encoder did not produce proper pts making some up
when the code runs to av_write_frame(); what's the problem here?
First make sure the inputStreamFormatCtx allocated and filled by right values (which is the cause of 90% of problem with demuxing/remuxing problems) - check some samples on internet to know how u should allocate and set its values.
The error tells us what is happening and it seems it is just a warning.
PTS (Presentation Time Stamp) is a number based on stream->time_base which tells us when we should show the decoded frame of this packet. when u get a live stream via network it's possible the server hasn't put a valid number for PTS of packet and when you receive the data it has a INVALID PTS (which you can find out by reading packet.pts and check if its a AV_NOPTS_VALUE). so then libav tries to generate the right pts based on frame rate and time_base of stream. It's a helpful attempt and if the recorded file can be played on a real motion (fps-wise) you should be happy. and if the recorded file will be played on a fast or slow motion (fps-wise) you got a problem and you can't rely on libav to correct the fps anymore. so then you should calculate the right fps by decoding the packets and then calculate the right pts based on the stream->time_base and set it to packet.pts.

X264 : How to access NAL units from encoder?

When I call
frame_size = x264_encoder_encode(encoder, &nals, &i_nals, &pic_in, &pic_out);
and subsequently write each NAL to a file like this:
if (frame_size >= 0)
{
int i;
int j;
for (i = 0; i < i_nals; i++)
{
printf("******************* NAL %d (%d bytes) *******************\n", i, nals[i].i_payload);
fwrite(&(nals[i].p_payload[0]), 1, nals[i].i_payload, fid);
}
}
then I get this
My questions are:
1) Is it normal that there's readable parameters in the beginning of the file?
2) How do I configure the X264 encoder so that the encoder returns frames that I can send via UDP without the packet getting fragmented (size must be below 1390 or somewhere around that).
3) With the x264.exe I pass in these options:
"--threads 1 --profile baseline --level 3.2 --preset ultrafast --bframes 0 --force-cfr --no-mbtree --sync-lookahead 0 --rc-lookahead 0 --keyint 1000 --intra-refresh"
How do I map those to the settings in the X264 parameters structure ? (x264_param_t)
4) I have been told that the x264 static library doesn't support bitmap input to the encoder and that I have to use libswscale for conversion of the 24bit RGB input bitmap to YUV2. The encoder, supposedly, only takes YUV2 as input? Is this true? If so, how do I build libswscale for the x264 static library?
1) Yes. x264 includes the automatically. Its an SEI slice, and you can throw it away if you want.
2) set i_slice_max_size = 1390
3) Take a look at x264_param_t in x264.h. The settings are fairly self explanatory. As for setting the profile and preset call int x264_param_apply_profile( x264_param_t *, const char *profile ) and int x264_param_default_preset( x264_param_t *, const char *preset, const char *tune )
4) Yes, it is true, I want lying when I said that. Look online/on stack overflow there are a million resources on compiling ffmpeg. In fact if you compiled x264 with avcodec support you already have it on your system.
5) Yes!, you should be a good stack overflow citizen and up vote and accept answers form people who donate there free time and knowledge (which takes years to acquire) to helping you.

How can I seek to frame No. X with ffmpeg?

I'm writing a video editor, and I need to seek to exact frame, knowing the frame number.
Other posts on stackoverflow told me that ffmpeg may give me a few broken frames after seeking, which is not a problem for playback but a big problem for video editors.
And I need to seek by frame number, not by time, which will become inaccurate when converted to frame number.
I've read dranger's tuts (which is outdated now), and end up with:
av_seek_frame(fmt_ctx, video_stream_id, frame, AVSEEK_FLAG_ANY);
It always seek to frame No. 0, and always return 0 which means success.
Then I tried to read Blender's source code and found it really complex(maybe I should implement an image buffer?).
So, is there any simple way to seek to a frame with just a simple call like seek(context, frame_number)(while getting a full frame, not a broken one)? Or, is there any lightweight library that simplifies this?
EDIT:
Thanks to praks411,I found the solution:
void AV_seek(AV * av, size_t frame)
{
int frame_delta = frame - av->frame_id;
if (frame_delta < 0 || frame_delta > 5)
av_seek_frame(av->fmt_ctx, av->video_stream_id,
frame, AVSEEK_FLAG_BACKWARD);
while (av->frame_id != frame)
AV_read_frame(av);
}
void AV_read_frame(AV * av)
{
AVPacket packet;
int frame_done;
while (av_read_frame(av->fmt_ctx, &packet) >= 0) {
if (packet.stream_index == av->video_stream_id) {
avcodec_decode_video2(av->codec_ctx, av->frame, &frame_done, &packet);
if (frame_done) {
...
av->frame_id = packet.dts;
av_free_packet(&packet);
return;
}
}
av_free_packet(&packet);
}
}
EDIT2:
Turns out there is a library for this: FFMS2.
It is "an FFmpeg based source library [...] for easy frame accurate access", and is portable (at least across Windows and Linux).
av_seek_frame will only seek based on timestamp to the key-frame. Since it seeks to the keyframe, you may not get what you want. Hence it is recommended to seek to nearest keyframe and then read frame by frame util you reach the desired frame.
However, if you are dealing with fixed FPS value, then you can easily map timestamp to frame index.
Before seeking you will need to convert your time to AVStream.time_base units if you have specified stream. Read ffmpeg documentation of av_seek_frame in avformat.h.
For example, if you want to seek to 1.23 seconds of clip:
double m_out_start_time = 1.23;
int flgs = AVSEEK_FLAG_ANY;
int seek_ts = (m_out_start_time*(m_in_vid_strm->time_base.den))/(m_in_vid_strm->time_base.num);
if(av_seek_frame(m_informat, m_in_vid_strm_idx,seek_ts, flgs) < 0)
{
PRINT_MSG("Failed to seek Video ")
}

Getting individual frames using CV_CAP_PROP_POS_FRAMES in cvSetCaptureProperty

I am trying to jump to a specific frame by setting the CV_CAP_PROP_POS_FRAMES property and then reading the frame like this:
cvSetCaptureProperty( input_video, CV_CAP_PROP_POS_FRAMES, current_frame );
frame = cvQueryFrame( input_video );
The problem I am facing is that, OpenCV 2.1 returns the same frame for the 12 consecutive values of current_frame whereas I want to read each individual frame, not just the key frames. Can anyone please tell me what's wrong?
I did some research and found out that the problem is caused by the decompression algorithm.
The MPEG-like algorithms (including HD, et all) do not compress each frame separately, but save a keyframe from time to time, and then only the differences between the last frame and subsequent frames.
The problem you reported is caused by the fact that, when you select a frame, the decoder (ffmpeg, likely) automatically advances to the next keyframe.
So, is there a way around this? I don't want only key frames but each individual frame.
I don't know whether or not this would be precise enough for your purpose, but I've had success getting to a particular point in an MPEG video by grabbing the frame rate, converting the frame number to a time, then advancing to the time. Like so:
cv::VideoCapture sourceVideo("/some/file/name.mpg");
double frameRate = sourceVideo.get(CV_CAP_PROP_FPS);
double frameTime = 1000.0 * frameNumber / frameRate;
sourceVideo.set(CV_CAP_PROP_POS_MSEC, frameTime);
Due to this limitation in OpenCV, it may be wise to to use FFMPEG instead. Moviepy is a nice wrapper library.
# Get nth frame from a video
from moviepy.video.io.ffmpeg_reader import FFMPEG_VideoReader
cap = FFMPEG_VideoReader("movie.mov",True)
cap.initialize()
cap.get_frame(n/FPS)
Performance is great too. Seeking to the nth frame with get_frame is O(1), and a speed-up is used if (nearly) consecutive frames are requested. I've gotten better-than-realtime results loading three 720p videos simultaneously.
CV_CAP_PROP_POS_FRAMES jumps to a key frame. I had the same issue and worked around it using this (python-)code. It's probably not totally efficient, but get's the job done:
def seekTo(cap, position):
positiontoset = position
pos = -1
cap.set(cv.CV_CAP_PROP_POS_FRAMES, position)
while pos < position:
ret, image = cap.read()
pos = cap.get(cv.CV_CAP_PROP_POS_FRAMES)
if pos == position:
return image
elif pos > position:
positiontoset -= 1
cap.set(cv.CV_CAP_PROP_POS_FRAMES, positiontoset)
pos = -1
I've successfully used the following on OpenCV 3 / Python 3:
# Skip to 150 frame then read the 151th frame
cap.set(cv2.CAP_PROP_POS_FRAMES, 150))
ret, frame = cap.read()
After some years assuming this as a unsavable bug, I think I've figured out a way to use with a good balance between speed and correctness.
A previous solution suggested to use the CV_CAP_PROP_POS_MSEC property before reading the frame:
cv::VideoCapture sourceVideo("/some/file/name.mpg");
const auto frameRate = sourceVideo.get(CV_CAP_PROP_FPS);
void readFrame(int frameNumber, cv::Mat& image) {
const double frameTime = 1000.0 * frameNumber / frameRate;
sourceVideo.set(CV_CAP_PROP_POS_MSEC, frameTime);
sourceVideo.read(image);
}
It does return the expected frame, but the problem is that using CV_CAP_PROP_POS_MSEC may be very slow, for example for a video conversion.
Note: using global variables for simplicity.
On the other hand, if you just want to read the video sequentially, it is enough to read frame without seeking at all.
for (int frameNumber = 0; frameNumber < nFrames; ++frameNumber) {
sourceVideo.read(image);
}
The solution comes from combining both: using a variable to remember the last queried frame, lastFrameNumber, and only seek when requested frame is not the next one. In this way it is possible to increase the speed in a sequential reading while allowing random seek if necessary.
cv::VideoCapture sourceVideo("/some/file/name.mpg");
const auto frameRate = sourceVideo.get(CV_CAP_PROP_FPS);
const int lastFrameNumber = -2; // guarantee seeking the first time
void readFrame(int frameNumber, cv::Mat& image) {
if (lastFrameNumber + 1 != frameNumber) { // not the next frame? seek
const double frameTime = 1000.0 * frameNumber / frameRate;
sourceVideo.set(CV_CAP_PROP_POS_MSEC, frameTime);
}
sourceVideo.read(image);
lastFrameNumber = frameNumber;
}

Resources