Custom IO writes only header, rest of frames seems omitted - c

I'm using libavformat to read packets from rtsp and remux it to mp4 (fragmented).
Video frames are intact, meaning I don't want to transcode/modify/change anything.
Video frames shall be remuxed into mp4 in their original form. (i.e.: NALUs shall remain the same).
I have updated libavformat to latest (currently 4.4).
Here is my snippet:
//open input, probesize is set to 32, we don't need to decode anything
avformat_open_input
//open output with custom io
avformat_alloc_output_context2(&ofctx,...);
ofctx->pb = avio_alloc_context(buffer, bufsize, 1/*write flag*/, 0, 0, &writeOutput, 0);
ofctx->flags |= AVFMT_FLAG_NOBUFFER | AVFMT_FLAG_FLUSH_PACKETS | AVFMT_FLAG_CUSTOM_IO;
avformat_write_header(...);
//loop
av_read_frame()
LOGPACKET_DETAILS //<- this works, packets are coming
av_write_frame() //<- this doesn't work, my write callback is not called. Also tried with av_write_interleaved_frame, not seem to work.
int writeOutput(void *opaque, uint8_t *buffer, int buffer_size) {
printf("writeOutput: writing %d bytes: ", buffer_size);
}
avformat_write_header works, it prints the header correctly.
I'm looking for the reason on why my custom IO is not called after a frame has been read.
There must be some more flags should be set to ask avformat to don't care about decoding, just write out whatever comes in.
More information:
Input stream is a VBR encoded H264. It seems av_write_frame calls my write function only in case an SPS, PPS or IDR frame. Non-IDR frames are not passed at all.
Update
I found out if I request IDR frame at every second (I can ask it from the encoder), writeOutput is called at every second.
I created a test: after a client joins, I requested the encoder to create IDRs #1Hz for 10 times. Libav calls writeOutput at 1Hz for 10 seconds, but then encoder sets itself back to create IDR only at every 10 seconds. And then libav calls writeOutput only at every 10s, which makes my decoder fail. In case 1Hz IDRs, decoder is fine.

Related

Output video to both file and pipe: simultaneously using FFmpeg Libav

I have been trying to output video (from my webcam) simultaneously to both a file ('out.mkv') and pipe:
The file gets filtered frames, and the pipe: gets unfiltered rawvideo.
My frame rate is 30 fps. However, I am getting a far lower framerate in my file output.
Attached is the while loop which reads packets and writes them to output:
while (1) {
av_read_frame(ifmt_ctx, packet);
stream_index = packet->stream_index;
StreamContext *stream = &file_stream_ctx[stream_index];
av_packet_rescale_ts(packet,
ifmt_ctx->streams[stream_index]->time_base,
stream->dec_ctx->time_base);
avcodec_send_packet(stream->dec_ctx, packet);
while (ret >= 0) {
avcodec_receive_frame(stream->dec_ctx, stream->dec_frame);
stream->dec_frame->pts = stream->dec_frame->best_effort_timestamp;
ret = filter_encode_write_frame(stream->dec_frame, stream_index,file_stream_ctx,file_filter_ctx, file_ofmt_ctx);
ret = av_interleaved_write_frame(pipe_ofmt_ctx, packet);
}
}
'ifmt_ctx' is the AVFormatContext for the webcam.
'file_ofmt_ctx', is the AVFormatContext for the pipe for the output file, and pipe_ofmt_ctx is the AVFormatContext.
'file_stream_ctx' and 'file_filter_ctx' are the stream and filter contexts used for filtering and encoding the file output.
My guess is that writing to the pipe is taking too long and not allowing the next packet to be read on time - causing a lower frame rate. Does that make sense? If so - any suggestions on how to fix it? (I tried using AVFMT_FLAG_NONBLOCK but it doesn't seem to help).
Thanks
Hillel
It is hard to tell without any profiling, but you're probably correct given what you're seeing. I'd be curious to know if you eliminate one of the write outs if you see better performance.
I always write the packets or frames (depending on the case) into a queue and have a separate thread do writing. You might even want two threads - one for the video (packet queue) and one for the frames.
As an aside, I think the line avcodec_send_packet(stream->dec_ctx, packet); consumes the packet, so when you use it later in ret = av_interleaved_write_frame(pipe_ofmt_ctx, packet); that seems like it wouldn't work. You could try copying the packet and see if that helps at all:
av_packet_ref(new_packet, packet);
Assuming you've allocated new_packet of course. Doesn't sound like what your seeing is this, so it might be OK what you're doing. But something you could also try.

ALSA - Non blocking (interleaved) read

I inherited some ALSA code that runs on a Linux embedded platform.
The existing implementation does blocking reads and writes using snd_pcm_readi() and snd_pcm_writei().
I am tasked to make this run on an ARM processor, but I find that the blocked interleaved reads push the CPU to 99%, so I am exploring non-blocking reads and writes.
I open the device as can be expected:
snd_pcm_handle *handle;
const char* hwname = "plughw:0"; // example name
snd_pcm_open(&handle, hwname, SND_PCM_STREAM_CAPTURE, SND_PCM_NONBLOCK);
Other ALSA stuff then happens which I can supply on request.
Noteworthy to mention at this point that:
we set a sampling rate of 48,000 [Hz]
the sample type is signed 32 bit integer
the device always overrides our requested period size to 1024 frames
Reading the stream like so:
int32* buffer; // buffer set up to hold #period_size samples
int actual = snd_pcm_readi(handle, buffer, period_size);
This call takes approx 15 [ms] to complete in blocking mode. Obviously, variable actual will read 1024 on return.
The problem is; in non-blocking mode, this function also takes 15 msec to complete and actual also always reads 1024 on return.
I would expect that the function would return immediately, with actual being <=1024 and quite possibly reading "EAGAIN" (-11).
In between read attempts I plan to put the thread to sleep for a specific amount of time, yielding CPU time to other processes.
Am I misunderstanding the ALSA API? Or could it be that my code is missing a vital step?
If the function returns a value of 1024, then at least 1024 frames were available at the time of the call.
(It's possible that the 15 ms is time needed by the driver to actually start the device.)
Anyway, blocking or non-blocking mode does not make any difference regarding CPU usage. To reduce CPU usage, replace the default device with plughw or hw, but then you lose features like device sharing or sample rate/format conversion.
I solved my problem by wrapping snd_pcm_readi() as follows:
/*
** Read interleaved stream in non-blocking mode
*/
template <typename SampleType>
snd_pcm_sframes_t snd_pcm_readi_nb(snd_pcm_t* pcm, SampleType* buffer, snd_pcm_uframes_t size, unsigned samplerate)
{
const snd_pcm_sframes_t avail = ::snd_pcm_avail(pcm);
if (avail < 0) {
return avail;
}
if (avail < size) {
snd_pcm_uframes_t remain = size - avail;
unsigned long msec = (remain * 1000) / samplerate;
static const unsigned long SLEEP_THRESHOLD_MS = 1;
if (msec > SLEEP_THRESHOLD_MS) {
msec -= SLEEP_THRESHOLD_MS;
// exercise for the reader: sleep for msec
}
}
return ::snd_pcm_readi(pcm, buffer, size);
}
This works quite well for me. My audio process now 'only' takes 19% CPU time.
And it matters not if the PCM interface was opened using SND_PCM_NONBLOCK or 0.
Going to perform callgrind analysis to see if more CPU cycles can be saved elsewhere in the code.

What is the best way to read input of unpredictable and indeterminate (ie no EOF) size from stdin in C?

This must be a stupid question because this should be a very common and simple problem, but I haven't been able to find an answer anywhere, so I'll bite the bullet and ask.
How on earth should I go about reading from the standard input when there is no way of determining the size of the data? Obviously if the data ends in some kind of terminator like a NUL or EOF then this is quite trivial, but my data does not. This is simple IPC: the two programs need to talk back and forth and ending the file streams with EOF would break everything.
I thought this should be fairly simple. Clearly programs talk to each other over pipes all the time without needing any arcane tricks, so I hope there is a simple answer that I'm too stupid to have thought of. Nothing I've tried has worked.
Something obvious like (ignoring necessary realloc's for brevity):
int size = 0, max = 8192;
unsigned char *buf = malloc(max);
while (fread((buf + size), 1, 1, stdin) == 1)
++size;
won't work since fread() blocks and waits for data, so this loop won't terminate. As far as I know nothing in stdio allows nonblocking input, so I didn't even try any such function. Something like this is the best I could come up with:
struct mydata {
unsigned char *data;
int slen; /* size of data */
int mlen; /* maximum allocated size */
};
...
struct mydata *buf = xmalloc(sizeof *buf);
buf->data = xmalloc((buf->mlen = 8192));
buf->slen = 0;
int nread = read(0, buf->data, 1);
if (nread == (-1))
err(1, "read error");
buf->slen += nread;
fcntl(0, F_SETFL, oflags | O_NONBLOCK);
do {
if (buf->slen >= (buf->mlen - 32))
buf->data = xrealloc(buf->data, (buf->mlen *= 2));
nread = read(0, (buf->data + buf->slen), 1);
if (nread > 0)
buf->slen += nread;
} while (nread == 1);
fcntl(0, F_SETFL, oflags);
where oflags is a global variable containing the original flags for stdin (cached at the start of the program, just in case). This dumb way of doing it works as long as all of the data is present immediately, but fails otherwise. Because this sets read() to be non-blocking, it just returns -1 if there is no data. The program communicating with mine generally sends responses whenever it feels like it, and not all at once, so if the data is at all large this exits too early and fails.
How on earth should I go about reading from the standard input when there is no way of determining the size of the data?
There always has to be a way to determinate the size. Otherwise, the program would require infinite memory, and thus impossible to run on a physical computer.
Think about it this way: even in the case of a never-ending stream of data, there must be some chunks or points where you have to process it. For instance, a live-streamed video has to decode a portion of it (e.g. a frame). Or a video game which processes messages one by one, even if the game has undetermined length.
This holds true regardless of the type of I/O you decide to use (blocking/non-blocking, synchronous/asynchronous...). For instance, if you want to use typical blocking synchronous I/O, what you have to do is process the data in a loop: each iteration, you read as much data as is available, and process as much as you can. Whatever you can not process (because you have not received enough yet), you keep for the next iteration. Then, the rest of the loop is the rest of the logic of the program.
In the end, regardless of what you do, you (or someone else, e.g. a library, the operating system, the hardware buffers...) have to buffer incoming data until it can be processed.
Basically, you have two choices -- synchronous or asynchronous -- and both have their advantages and disadvantages.
For synchronous, you need either delimeters or a length field embedded in the record (or fixed length records, but that is pretty inflexible). This works best for synchronous protocols like synchronous rpc or simplex client-server interactions where only one side talks at a time while the other side waits. For ASCII/text based protocols, it is common to use a control-character delimiter like NL/EOL or NUL or CTX to mark the end of messages. Binary protocols more commonly use an embedded length field -- the receiver first reads the length and then reads the full amount of (expected) data.
For asynchronous, you use non-blocking mode. It IS possible to use non-blocking mode with stdio streams, it just requires some care. out-of-data conditions show up to stdio like error conditions, so you need to use ferror and clearerr on the FILE * as appropriate.
It's possible for both to be used -- for example in client-server interactions, the clients may use synchronous (they send a request and wait for a reply) while the server uses asynchronous (to be be robust in the presence of misbehaving clients).
The read api on Linux or the ReadFile Api on windows will immediately return and not wait for the specified number of bytes to fill the buffer (when reading a pipe or socket). Read then reurns the number of bytes read.
This means, when reading from a pipe, you set a buffersize, read as much as returned and the process it. You then read the next bit. The only time you are blocked is if there is no data available at all.
This differs from fread which only returns once the desired number of bytes are returned or the stream determines doing so is impossible (like eof).

Winpcap code - Capture loses packets in loop

I have a loop to capture packets with pcap_next_ex and in each iteraction I do a lot of functions calls according to process the packets. This stuff can be simulated by a Sleep() call in the loop. Then what happen then I call Sleep in a pcap_next_ex() loop?.
pcap_pkthdr* header = NULL;
UCHAR* content = NULL;
pcap = pcap_open(adapterName.c_str(), 65536, PCAP_OPENFLAG_PROMISCUOUS, 1000, NULL, NULL);
//Set to nonblock mode?
while (INT res = pcap_next_ex(pcap, &header, const_cast<const UCHAR**>(&content)) >= 0)
{
if (res != FALSE)
{
if (content)
{
//Here i do the stuff which I will simulate with a Sleep() call
Sleep(200);
}
}
}
I have seen code which uses pcap_next_ex and save the packets in a vector to treat them later with another thread, this method reduces the time of the stuff notably but does not convince me a lot. Shall I use this method?.
I would like to use other winpcap functions which capture packets in "non blocking" mode and call an event for each packet which comes... What is the best method to not lost packets with winpcap?.
Any help will be appreciated. Regards.
WinPcap stores packets it captures into a ring buffer the size of which is limited.
If the number of bytes of packets reach the ring buffer size, the old packets are discarded so that WinPcap can store new packets.
So, you should call pcap_next_ex as frequently as possible so that you can get as many packets as possible before they are discarded.
Calling pcap_next_ex in a dedicated thread and processing packets in another thread is a good practice because this way can call pcap_next_ex the most frequently.

how to play short tones with ALSA

I'm trying to generate short tones of variable lengths through ALSA in a small C program. A few examples I've tried work fine when playing one second worth of sound but anything shorter than that just doesn't produce any sound at all.
I'm filling a buffer with a sine wave like so:
#define BUFFER_LEN 44100
int freq = 700; //audio frequency
int fs = 44100; //sampling frequency
float buffer [BUFFER_LEN];
for (k=0; k<BUFFER_LEN; k++) {
buffer[k] = (sin(2*M_PI*freq/fs*k));
}
Setting the pcm device parameters:
if ((err = snd_pcm_set_params(handle,
SND_PCM_FORMAT_FLOAT,
SND_PCM_ACCESS_RW_INTERLEAVED,
1,
44100,
1,
500000)) < 0) {
printf("Playback open error: %s\n", snd_strerror(err));
exit(EXIT_FAILURE);
}
Playback:
frames = snd_pcm_writei(handle, buffer, BUFFER_LEN);
But if I want to change it to play the tone, say a quarter of a second (i.e. changing BUFFER_LEN to 11025) nothing comes out of the speaker anymore.
I've tried changing types from floats to shorts, setting the PCM_FORMAT to other values and trying different ways of filling the buffer with sine waves.
If anything, I hear a little 'bip' sound in the speakers but just not what I expect. The program doesn't segfaults or crash but I'm just puzzled how to make ALSA play shorter samples.
I don't know if I need to work with some exact frame or buffer size multiple but I'm very open to suggestions.
Your application writing samples to the buffer, and the hardware reading samples from the buffer and playing them are two different processes that run asynchronously.
If there is not enough free space in the buffer for the amount of samples you're trying to write, then snd_pcm_writei() will wait until enough space is available. But when snd_pcm_writei() returns, up to a full buffer of samples might not yet have been played.
To wait until the buffer is empty, use snd_pcm_drain().

Resources