How can I seek to frame No. X with ffmpeg? - c

I'm writing a video editor, and I need to seek to exact frame, knowing the frame number.
Other posts on stackoverflow told me that ffmpeg may give me a few broken frames after seeking, which is not a problem for playback but a big problem for video editors.
And I need to seek by frame number, not by time, which will become inaccurate when converted to frame number.
I've read dranger's tuts (which is outdated now), and end up with:
av_seek_frame(fmt_ctx, video_stream_id, frame, AVSEEK_FLAG_ANY);
It always seek to frame No. 0, and always return 0 which means success.
Then I tried to read Blender's source code and found it really complex(maybe I should implement an image buffer?).
So, is there any simple way to seek to a frame with just a simple call like seek(context, frame_number)(while getting a full frame, not a broken one)? Or, is there any lightweight library that simplifies this?
EDIT:
Thanks to praks411,I found the solution:
void AV_seek(AV * av, size_t frame)
{
int frame_delta = frame - av->frame_id;
if (frame_delta < 0 || frame_delta > 5)
av_seek_frame(av->fmt_ctx, av->video_stream_id,
frame, AVSEEK_FLAG_BACKWARD);
while (av->frame_id != frame)
AV_read_frame(av);
}
void AV_read_frame(AV * av)
{
AVPacket packet;
int frame_done;
while (av_read_frame(av->fmt_ctx, &packet) >= 0) {
if (packet.stream_index == av->video_stream_id) {
avcodec_decode_video2(av->codec_ctx, av->frame, &frame_done, &packet);
if (frame_done) {
...
av->frame_id = packet.dts;
av_free_packet(&packet);
return;
}
}
av_free_packet(&packet);
}
}
EDIT2:
Turns out there is a library for this: FFMS2.
It is "an FFmpeg based source library [...] for easy frame accurate access", and is portable (at least across Windows and Linux).

av_seek_frame will only seek based on timestamp to the key-frame. Since it seeks to the keyframe, you may not get what you want. Hence it is recommended to seek to nearest keyframe and then read frame by frame util you reach the desired frame.
However, if you are dealing with fixed FPS value, then you can easily map timestamp to frame index.
Before seeking you will need to convert your time to AVStream.time_base units if you have specified stream. Read ffmpeg documentation of av_seek_frame in avformat.h.
For example, if you want to seek to 1.23 seconds of clip:
double m_out_start_time = 1.23;
int flgs = AVSEEK_FLAG_ANY;
int seek_ts = (m_out_start_time*(m_in_vid_strm->time_base.den))/(m_in_vid_strm->time_base.num);
if(av_seek_frame(m_informat, m_in_vid_strm_idx,seek_ts, flgs) < 0)
{
PRINT_MSG("Failed to seek Video ")
}

Related

Decoding TIFF LZW codes not yet in the dictionary

I made a decoder of LZW-compressed TIFF images, and all the parts work, it can decode large images at various bit depths with or without horizontal prediction, except in one case. While it decodes files written by most programs (like Photoshop and Krita with various encoding options) fine, there's something very strange about the files created by ImageMagick's convert, it produces LZW codes that aren't yet in the dictionary, and I don't know how to handle it.
Most of the time the 9 to 12-bit code in the LZW stream that isn't yet in the dictionary is the next one that my decoding algorithm will try to put in the dictionary (which I'm not sure should be a problem although my algorithm fails on an image that contains such cases), but at times it can even be hundreds of codes into the future. In one case the first code after the clear code (256) is 364, which seems quite impossible given that the clear code clears my dictionary of all codes 258 and above, in another case the code is 501 when my dictionary only goes up to 317!
I have no idea how to deal with it, but it seems that I'm the only one with this problem, the decoders in other programs load such images fine. So how do they do it?
Here's the core of my decoding algorithm, obviously due to how much code is involved I can't provide complete compilable code in a compact manner, but since this is a matter of algorithmic logic this should be enough. It follows closely the algorithm described in the official TIFF specification (page 61), in fact most of the spec's pseudo code is in the comments.
void tiff_lzw_decode(uint8_t *coded, buffer_t *dec)
{
buffer_t word={0}, outstring={0};
size_t coded_pos; // position in bits
int i, new_index, code, maxcode, bpc;
buffer_t *dict={0};
size_t dict_as=0;
bpc = 9; // starts with 9 bits per code, increases later
tiff_lzw_calc_maxcode(bpc, &maxcode);
new_index = 258; // index at which new dict entries begin
coded_pos = 0; // bit position
lzw_dict_init(&dict, &dict_as);
while ((code = get_bits_in_stream(coded, coded_pos, bpc)) != 257) // while ((Code = GetNextCode()) != EoiCode)
{
coded_pos += bpc;
if (code >= new_index)
printf("Out of range code %d (new_index %d)\n", code, new_index);
if (code == 256) // if (Code == ClearCode)
{
lzw_dict_init(&dict, &dict_as); // InitializeTable();
bpc = 9;
tiff_lzw_calc_maxcode(bpc, &maxcode);
new_index = 258;
code = get_bits_in_stream(coded, coded_pos, bpc); // Code = GetNextCode();
coded_pos += bpc;
if (code == 257) // if (Code == EoiCode)
break;
append_buf(dec, &dict[code]); // WriteString(StringFromCode(Code));
clear_buf(&word);
append_buf(&word, &dict[code]); // OldCode = Code;
}
else if (code < 4096)
{
if (dict[code].len) // if (IsInTable(Code))
{
append_buf(dec, &dict[code]); // WriteString(StringFromCode(Code));
lzw_add_to_dict(&dict, &dict_as, new_index, 0, word.buf, word.len, &bpc);
lzw_add_to_dict(&dict, &dict_as, new_index, 1, dict[code].buf, 1, &bpc); // AddStringToTable
new_index++;
tiff_lzw_calc_bpc(new_index, &bpc, &maxcode);
clear_buf(&word);
append_buf(&word, &dict[code]); // OldCode = Code;
}
else
{
clear_buf(&outstring);
append_buf(&outstring, &word);
bufwrite(&outstring, word.buf, 1); // OutString = StringFromCode(OldCode) + FirstChar(StringFromCode(OldCode));
append_buf(dec, &outstring); // WriteString(OutString);
lzw_add_to_dict(&dict, &dict_as, new_index, 0, outstring.buf, outstring.len, &bpc); // AddStringToTable
new_index++;
tiff_lzw_calc_bpc(new_index, &bpc, &maxcode);
clear_buf(&word);
append_buf(&word, &dict[code]); // OldCode = Code;
}
}
}
free_buf(&word);
free_buf(&outstring);
for (i=0; i < dict_as; i++)
free_buf(&dict[i]);
free(dict);
}
As for the results that my code produces in such situations it's quite clear from how it looks that it's only those few codes that are badly decoded, everything before and after is properly decoded, but obviously in most cases the subsequent image after one of these mystery future codes is ruined by virtue of shifting the rest of the decoded bytes by a few places. That means that my reading of the 9 to 12-bit code stream is correct, so this really means that I see a 364 code right after a 256 dictionary-clearing code.
Edit: Here's an example file that contains such weird codes. I've also found a small TIFF LZW loading library that suffers from the same problem, it crashes where my loader finds the first weird code in this image (code 3073 when the dictionary only goes up to 2051). The good thing is that since it's a small library you can test it with the following code:
#include "loadtiff.h"
#include "loadtiff.c"
void loadtiff_test(char *path)
{
int width, height, format;
floadtiff(fopen(path, "rb"), &width, &height, &format);
}
And if anyone insists on diving into my code (which should be unnecessary, and it's a big library) here's where to start.
The bogus codes come from trying to decode more than we're supposed to. The problem is that a LZW strip may sometimes not end with an End-of-Information 257 code, so the decoding loop has to stop when a certain number of decoded bytes have been output. That number of bytes per strip is determined by the TIFF tags ROWSPERSTRIP * IMAGEWIDTH * BITSPERSAMPLE / 8, and if PLANARCONFIG is 1 (which means interleaved channels as opposed to planar), by multiplying it all by SAMPLESPERPIXEL. So on top of stopping the decoding loop when a code 257 is encountered the loop must also be stopped after that count of decoded bytes has been reached.

DirectShow data copy is TOO slow

Have USB 3.0 HDMI Capture device. It uses YUY2 format (2 bytes per pixel) and 1920x1080 resolution.
Video capture Output Pin connects directly to Video Render input Pin.
And all works good. It shows me 1920x1080 without any freezes.
But I need to make screenshot every second. So this is what I do:
void CaptureInterface::ScreenShoot() {
IMemInputPin* p_MemoryInputPin = nullptr;
hr = p_RenderInputPin->QueryInterface(IID_IMemInputPin, (void**)&p_MemoryInputPin);
IMemAllocator* p_MemoryAllocator = nullptr;
hr = p_MemoryInputPin->GetAllocator(&p_MemoryAllocator);
IMediaSample* p_MediaSample = nullptr;
hr = p_MemoryAllocator->GetBuffer(&p_MediaSample, 0, 0, 0);
long buff_size = p_MediaSample->GetSize(); //buff_size = 4147200 Bytes
BYTE* buff = nullptr;
hr = p_MediaSample->GetPointer(&buff);
//BYTE CaptureInterface::ScreenBuff[1920*1080*2]; defined in header
//--------- TOO SLOW (1.5 seconds for 4 MBytes) ----------
std::memcpy(ScreenBuff, buff, buff_size);
//--------------------------------------------
p_MediaSample->Release();
p_MemoryAllocator->Release();
p_MemoryInputPin->Release();
return;
}
Any other operations with this buffer is very slow too.
But If I use memcpy on other data (2 arrays in my class for example same size 4MB) It is very fast. <0.01sec
Video memory is (might be) slow to read back by its nature (e.g. VMR9 IBasicVideo->GetCurrentImage very slow and you can find other references). You normally want to grab the data before it actually reaches video memory.
Additionally, the way you read data is not quite reliable. You don't know what frame you are actually copying and it might so happen that you even read blackness or garbage, or vice versa your acquiring access to buffer freezes the main video streaming. This is because you are grabbing an unused buffer from pool of available buffers rather than a buffer that corresponds to specific video frame. Your getting an image from such buffer happen in a fragile assumption that unused data from previously streamed frame was initialized and is not yet overwritten by anything else.

How to FFmpeg decode and extract metadata from last frame?

I am decoding using FFMpeg. The videos I am decoding are H.264 or MPEG4 videos using C code. I am using the 32bit libs. I have successfully decoded and extracted the metadata for the first frame. I would now like to decode the last frame. I have a defined duration of the video, and felt it was a safe assumption to say that isLastFrame = duration. Here's what I have, any suggestions?
AVFormatContext* pFormatCtx = avformat_alloc_context();
avformat_open_input(&pFormatCtx, filename, NULL, NULL);
int64_t duration = pFormatCtx->duration;
i=0;
while(av_read_frame(pFormatCtx, &packet)>=0) {
/* Is this a packet from the video stream? */
if(packet.stream_index==videoStream) {
/* Decode video frame*/
avcodec_decode_video2(pCodecCtx, pFrame, &duration, &packet);
}
Any help is much appreciated! :)
Thanks everyone for your help but I found that the reason the AV_SEEK_FRAME duration wasn't working was because you must multiply it by 1000 for it to be applicable in read frame. Also please note that the reason I have but decode_video instead of the decode functions calls is because I was using 32 bit and created my own but if you plug in video_decode() or I believe it's decode_video2 it works just as well. Hopefully this will help any fellow decoders in the future.
AVFormat Format;
int64_t duration = Format->duration;
duration = duration * 1000;
if (av_seek_frame(Format, Packet->stream_index, duration, AVSEEK_FLAG_ANY) <= 0)
{
/* read the frame and decode the packet */
if (av_read_frame(FormatContext, &Packet) >= 0)
{
/*decode the video frame*/
decode_video(CodecContext, Frame, &duration, &Packet);
}
This might be what you're looking for:
Codecs which have the CODEC_CAP_DELAY capability set have a delay
between input and output, these need to be fed with avpkt->data=NULL,
avpkt->size=0 at the end to return the remaining frames.
Link to FFmpeg documentation

ffmpeg recording h264 live stream got error

I am trying to record a h.264 live stream using the following code:
AVOutputFormat* fmt = av_guess_format(NULL, "test.mpeg", NULL);
AVFormatContext* oc = avformat_alloc_context();
oc->oformat = fmt;
avio_open2(&oc->pb, "test.mpeg", AVIO_FLAG_WRITE, NULL, NULL);
AVStream* stream = NULL;
...
while(!done)
{
// Read a frame
if(av_read_frame(inputStreamFormatCtx, &packet)<0)
return false;
if(packet.stream_index==correct_index)
{
////////////////////////////////////////////////////////////////////////
// Is this a packet from the video stream -> decode video frame
if (stream == NULL){//create stream in file
stream = avformat_new_stream(oc, pFormatCtx->streams[videoStream]->codec->codec);
avcodec_copy_context(stream->codec, pFormatCtx->streams[videoStream]->codec);
stream->sample_aspect_ratio = pFormatCtx->streams[videoStream]->codec->sample_aspect_ratio;
stream->sample_aspect_ratio.num = pFormatCtx->streams[videoStream]->codec->sample_aspect_ratio.num;
stream->sample_aspect_ratio.den = pFormatCtx->streams[videoStream]->codec->sample_aspect_ratio.den;
// Assume r_frame_rate is accurate
stream->r_frame_rate = pFormatCtx->streams[videoStream]->r_frame_rate;
stream->avg_frame_rate = stream->r_frame_rate;
stream->time_base = av_inv_q(stream->r_frame_rate);
stream->codec->time_base = stream->time_base;
avformat_write_header(oc, NULL);
}
av_write_frame(oc, &packet);
...
}
}
However, the ffmpeg says
encoder did not produce proper pts making some up
when the code runs to av_write_frame(); what's the problem here?
First make sure the inputStreamFormatCtx allocated and filled by right values (which is the cause of 90% of problem with demuxing/remuxing problems) - check some samples on internet to know how u should allocate and set its values.
The error tells us what is happening and it seems it is just a warning.
PTS (Presentation Time Stamp) is a number based on stream->time_base which tells us when we should show the decoded frame of this packet. when u get a live stream via network it's possible the server hasn't put a valid number for PTS of packet and when you receive the data it has a INVALID PTS (which you can find out by reading packet.pts and check if its a AV_NOPTS_VALUE). so then libav tries to generate the right pts based on frame rate and time_base of stream. It's a helpful attempt and if the recorded file can be played on a real motion (fps-wise) you should be happy. and if the recorded file will be played on a fast or slow motion (fps-wise) you got a problem and you can't rely on libav to correct the fps anymore. so then you should calculate the right fps by decoding the packets and then calculate the right pts based on the stream->time_base and set it to packet.pts.

Getting individual frames using CV_CAP_PROP_POS_FRAMES in cvSetCaptureProperty

I am trying to jump to a specific frame by setting the CV_CAP_PROP_POS_FRAMES property and then reading the frame like this:
cvSetCaptureProperty( input_video, CV_CAP_PROP_POS_FRAMES, current_frame );
frame = cvQueryFrame( input_video );
The problem I am facing is that, OpenCV 2.1 returns the same frame for the 12 consecutive values of current_frame whereas I want to read each individual frame, not just the key frames. Can anyone please tell me what's wrong?
I did some research and found out that the problem is caused by the decompression algorithm.
The MPEG-like algorithms (including HD, et all) do not compress each frame separately, but save a keyframe from time to time, and then only the differences between the last frame and subsequent frames.
The problem you reported is caused by the fact that, when you select a frame, the decoder (ffmpeg, likely) automatically advances to the next keyframe.
So, is there a way around this? I don't want only key frames but each individual frame.
I don't know whether or not this would be precise enough for your purpose, but I've had success getting to a particular point in an MPEG video by grabbing the frame rate, converting the frame number to a time, then advancing to the time. Like so:
cv::VideoCapture sourceVideo("/some/file/name.mpg");
double frameRate = sourceVideo.get(CV_CAP_PROP_FPS);
double frameTime = 1000.0 * frameNumber / frameRate;
sourceVideo.set(CV_CAP_PROP_POS_MSEC, frameTime);
Due to this limitation in OpenCV, it may be wise to to use FFMPEG instead. Moviepy is a nice wrapper library.
# Get nth frame from a video
from moviepy.video.io.ffmpeg_reader import FFMPEG_VideoReader
cap = FFMPEG_VideoReader("movie.mov",True)
cap.initialize()
cap.get_frame(n/FPS)
Performance is great too. Seeking to the nth frame with get_frame is O(1), and a speed-up is used if (nearly) consecutive frames are requested. I've gotten better-than-realtime results loading three 720p videos simultaneously.
CV_CAP_PROP_POS_FRAMES jumps to a key frame. I had the same issue and worked around it using this (python-)code. It's probably not totally efficient, but get's the job done:
def seekTo(cap, position):
positiontoset = position
pos = -1
cap.set(cv.CV_CAP_PROP_POS_FRAMES, position)
while pos < position:
ret, image = cap.read()
pos = cap.get(cv.CV_CAP_PROP_POS_FRAMES)
if pos == position:
return image
elif pos > position:
positiontoset -= 1
cap.set(cv.CV_CAP_PROP_POS_FRAMES, positiontoset)
pos = -1
I've successfully used the following on OpenCV 3 / Python 3:
# Skip to 150 frame then read the 151th frame
cap.set(cv2.CAP_PROP_POS_FRAMES, 150))
ret, frame = cap.read()
After some years assuming this as a unsavable bug, I think I've figured out a way to use with a good balance between speed and correctness.
A previous solution suggested to use the CV_CAP_PROP_POS_MSEC property before reading the frame:
cv::VideoCapture sourceVideo("/some/file/name.mpg");
const auto frameRate = sourceVideo.get(CV_CAP_PROP_FPS);
void readFrame(int frameNumber, cv::Mat& image) {
const double frameTime = 1000.0 * frameNumber / frameRate;
sourceVideo.set(CV_CAP_PROP_POS_MSEC, frameTime);
sourceVideo.read(image);
}
It does return the expected frame, but the problem is that using CV_CAP_PROP_POS_MSEC may be very slow, for example for a video conversion.
Note: using global variables for simplicity.
On the other hand, if you just want to read the video sequentially, it is enough to read frame without seeking at all.
for (int frameNumber = 0; frameNumber < nFrames; ++frameNumber) {
sourceVideo.read(image);
}
The solution comes from combining both: using a variable to remember the last queried frame, lastFrameNumber, and only seek when requested frame is not the next one. In this way it is possible to increase the speed in a sequential reading while allowing random seek if necessary.
cv::VideoCapture sourceVideo("/some/file/name.mpg");
const auto frameRate = sourceVideo.get(CV_CAP_PROP_FPS);
const int lastFrameNumber = -2; // guarantee seeking the first time
void readFrame(int frameNumber, cv::Mat& image) {
if (lastFrameNumber + 1 != frameNumber) { // not the next frame? seek
const double frameTime = 1000.0 * frameNumber / frameRate;
sourceVideo.set(CV_CAP_PROP_POS_MSEC, frameTime);
}
sourceVideo.read(image);
lastFrameNumber = frameNumber;
}

Resources