Trouble syncing libavformat/ffmpeg with x264 and RTP - c

I've been working on some streaming software that takes live feeds
from various kinds of cameras and streams over the network using
H.264. To accomplish this, I'm using the x264 encoder directly (with
the "zerolatency" preset) and feeding NALs as they are available to
libavformat to pack into RTP (ultimately RTSP). Ideally, this
application should be as real-time as possible. For the most part,
this has been working well.
Unfortunately, however, there is some sort of synchronization issue:
any video playback on clients seems to show a few smooth frames,
followed by a short pause, then more frames; repeat. Additionally,
there appears to be approximately a 4-second delay. This happens with
every video player I've tried: Totem, VLC, and basic gstreamer pipes.
I've boiled it all down to a somewhat small test case:
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <x264.h>
#include <libavformat/avformat.h>
#include <libswscale/swscale.h>
#define WIDTH 640
#define HEIGHT 480
#define FPS 30
#define BITRATE 400000
#define RTP_ADDRESS "127.0.0.1"
#define RTP_PORT 49990
struct AVFormatContext* avctx;
struct x264_t* encoder;
struct SwsContext* imgctx;
uint8_t test = 0x80;
void create_sample_picture(x264_picture_t* picture)
{
// create a frame to store in
x264_picture_alloc(picture, X264_CSP_I420, WIDTH, HEIGHT);
// fake image generation
// disregard how wrong this is; just writing a quick test
int strides = WIDTH / 8;
uint8_t* data = malloc(WIDTH * HEIGHT * 3);
memset(data, test, WIDTH * HEIGHT * 3);
test = (test << 1) | (test >> (8 - 1));
// scale the image
sws_scale(imgctx, (const uint8_t* const*) &data, &strides, 0, HEIGHT,
picture->img.plane, picture->img.i_stride);
}
int encode_frame(x264_picture_t* picture, x264_nal_t** nals)
{
// encode a frame
x264_picture_t pic_out;
int num_nals;
int frame_size = x264_encoder_encode(encoder, nals, &num_nals, picture, &pic_out);
// ignore bad frames
if (frame_size < 0)
{
return frame_size;
}
return num_nals;
}
void stream_frame(uint8_t* payload, int size)
{
// initalize a packet
AVPacket p;
av_init_packet(&p);
p.data = payload;
p.size = size;
p.stream_index = 0;
p.flags = AV_PKT_FLAG_KEY;
p.pts = AV_NOPTS_VALUE;
p.dts = AV_NOPTS_VALUE;
// send it out
av_interleaved_write_frame(avctx, &p);
}
int main(int argc, char* argv[])
{
// initalize ffmpeg
av_register_all();
// set up image scaler
// (in-width, in-height, in-format, out-width, out-height, out-format, scaling-method, 0, 0, 0)
imgctx = sws_getContext(WIDTH, HEIGHT, PIX_FMT_MONOWHITE,
WIDTH, HEIGHT, PIX_FMT_YUV420P,
SWS_FAST_BILINEAR, NULL, NULL, NULL);
// set up encoder presets
x264_param_t param;
x264_param_default_preset(&param, "ultrafast", "zerolatency");
param.i_threads = 3;
param.i_width = WIDTH;
param.i_height = HEIGHT;
param.i_fps_num = FPS;
param.i_fps_den = 1;
param.i_keyint_max = FPS;
param.b_intra_refresh = 0;
param.rc.i_bitrate = BITRATE;
param.b_repeat_headers = 1; // whether to repeat headers or write just once
param.b_annexb = 1; // place start codes (1) or sizes (0)
// initalize
x264_param_apply_profile(&param, "high");
encoder = x264_encoder_open(&param);
// at this point, x264_encoder_headers can be used, but it has had no effect
// set up streaming context. a lot of error handling has been ommitted
// for brevity, but this should be pretty standard.
avctx = avformat_alloc_context();
struct AVOutputFormat* fmt = av_guess_format("rtp", NULL, NULL);
avctx->oformat = fmt;
snprintf(avctx->filename, sizeof(avctx->filename), "rtp://%s:%d", RTP_ADDRESS, RTP_PORT);
if (url_fopen(&avctx->pb, avctx->filename, URL_WRONLY) < 0)
{
perror("url_fopen failed");
return 1;
}
struct AVStream* stream = av_new_stream(avctx, 1);
// initalize codec
AVCodecContext* c = stream->codec;
c->codec_id = CODEC_ID_H264;
c->codec_type = AVMEDIA_TYPE_VIDEO;
c->flags = CODEC_FLAG_GLOBAL_HEADER;
c->width = WIDTH;
c->height = HEIGHT;
c->time_base.den = FPS;
c->time_base.num = 1;
c->gop_size = FPS;
c->bit_rate = BITRATE;
avctx->flags = AVFMT_FLAG_RTP_HINT;
// write the header
av_write_header(avctx);
// make some frames
for (int frame = 0; frame < 10000; frame++)
{
// create a sample moving frame
x264_picture_t* pic = (x264_picture_t*) malloc(sizeof(x264_picture_t));
create_sample_picture(pic);
// encode the frame
x264_nal_t* nals;
int num_nals = encode_frame(pic, &nals);
if (num_nals < 0)
printf("invalid frame size: %d\n", num_nals);
// send out NALs
for (int i = 0; i < num_nals; i++)
{
stream_frame(nals[i].p_payload, nals[i].i_payload);
}
// free up resources
x264_picture_clean(pic);
free(pic);
// stream at approx 30 fps
printf("frame %d\n", frame);
usleep(33333);
}
return 0;
}
This test shows black lines on a white background that
should move smoothly to the left. It has been written for ffmpeg 0.6.5
but the problem can be reproduced on 0.8 and 0.10 (from what I've tested so far). I've taken some shortcuts in error handling to make this example as short as
possible while still showing the problem, so please excuse some of the
nasty code. I should also note that while an SDP is not used here, I
have tried using that already with similar results. The test can be
compiled with:
gcc -g -std=gnu99 streamtest.c -lswscale -lavformat -lx264 -lm -lpthread -o streamtest
It can be played with gtreamer directly:
gst-launch udpsrc port=49990 ! application/x-rtp,payload=96,clock-rate=90000 ! rtph264depay ! decodebin ! xvimagesink
You should immediately notice the stuttering. One common "fix" I've
seen all over the Internet is to add sync=false to the pipeline:
gst-launch udpsrc port=49990 ! application/x-rtp,payload=96,clock-rate=90000 ! rtph264depay ! decodebin ! xvimagesink sync=false
This causes playback to be smooth (and near-realtime), but is a
non-solution and only works with gstreamer. I'd like to fix the
problem at the source. I've been able to stream with near-identical
parameters using raw ffmpeg and haven't had any issues:
ffmpeg -re -i sample.mp4 -vcodec libx264 -vpre ultrafast -vpre baseline -b 400000 -an -f rtp rtp://127.0.0.1:49990 -an
So clearly I'm doing something wrong. But what is it?

1) You didn't set PTS for frames you send to libx264 (you probably should see "non-strictly-monotonic PTS" warnings)
2) You didn't set PTS/DTS for packets you send to libavformat's rtp muxer (I not 100% sure it need to be set but I guess it would be better. From source code it looks like rtp use PTS).
3) IMHO usleep(33333) is bad. It cause encoder to stall this time also (increasing latency) while you could encode next frame during this time even if you still don't need to send it by rtp.
P.S. btw you didn't set param.rc.i_rc_method to X264_RC_ABR so libx264 will use CRF 23 instead and ignore your "param.rc.i_bitrate = BITRATE". Also it can be good idea to use VBV when encoding for network sending.

Related

How to pipe to ffmpeg RGB value 10?

I am trying to create a video file using ffmpeg. I have all the RGB pixel data for each frame, and following this blogpost I have code which sends the data frame by frame via a pipe. And it works mostly. However if any pixel has a value of 10 in any of the 3 channels (e.g. #00000A, #0AFFFF, etc) then it produces these errors:
[rawvideo # 0000020c3787f040] Packet corrupt (stream = 0, dts = 170)
pipe:: corrupt input packet in stream 0
[rawvideo # 0000020c3789f100] Invalid buffer size, packet size 32768 < expected frame_size 49152
Error while decoding stream #0:0: Invalid argument
And the output video is garbled.
Now I suspect because 10 in ASCII is newline character, that this is confusing the pipe somehow.
What exactly is happening here and how do I fix it so that I can use RGB values like #00000a?
Below is the C code which is an example of this
#include <stdio.h>
unsigned char frame[128][128][3];
int main() {
int x, y, i;
FILE *pipeout = popen("ffmpeg -y -f rawvideo -vcodec rawvideo -pix_fmt rgb24 -s 128x128 -r 24 -i - -f mp4 -q:v 1 -an -vcodec mpeg4 output.mp4", "w");
for (i = 0; i < 128; i++) {
for (x = 0; x < 128; ++x) {
for (y = 0; y < 128; ++y) {
frame[y][x][0] = 0;
frame[y][x][1] = 0;
frame[y][x][2] = 10;
}
}
fwrite(frame, 1, 128*128*3, pipeout);
}
fflush(pipeout);
pclose(pipeout);
return 0;
}
EDIT: for clarity, I am using Windows
I've just tried your code in Linux and it worked for me. I think #Craig Estey suggestion is probably the answer.
If it doesn't work, you could try to write the data using write instead of fwrite, if available. (I've had issues writing binary data to pipes using fread/fwrite family of functions in the past.)
So you could try changing this line:
fwrite(frame, 1, 128*128*3, pipeout);
To something like:
int fd = fileno(pipeout);
write(fd, frame, sizeof(frame));
And also remove the following line:
fflush(pipeout);
EDIT: There is some troubleshooting tips on the comment section of the blog post you linked. Specially regarding the Windows version of this program.

SDL_OpenAudioDevice: Continuous play from real time processed source buffer

I'm writing a porting of an emulator to SDL. There is a method, called at each frame, that passes a buffer with new audio samples for next frame.
I opened a device with SDL_OpenAudioDevice and at each frame the SDL callback method reproduces samples from audio buffer.
It works but the sound is not perfect, some tic, some metallic noise and so on.
Sound is 16 bit signed.
EDIT: Ok, I found a solution!
With the code of the opening post I was playing samples for next frame at the current frame in real time. It was wrong!
So, I implemented a circular buffer where I put samples for next frame that underlying code passes to me at each (current) frame.
In that buffer there are 2 pointers, one for read point and the other one for write point. SDL calls callback function when on its audio stream there are no more data to play; so when callback function is called I play audio samples from read point on the circular buffer then update the read pointer.
When underlying code gives me audio samples data for next frame I write them in the circular buffer at write point, then update the write pointer.
Read and write pointers are shifted by the amount of samples to be played at each frame.
Code updated, needs some adjustment when samplesPerFrame is not an int but it works ;-)
Circular buffer structure:
typedef struct circularBufferStruct
{
short *buffer;
int cells;
short *readPoint;
short *writePoint;
} circularBuffer;
This method is called at initialization:
int initialize_audio(int stereo)
{
if (stereo)
channel = 2;
else
channel = 1;
// Check if sound is disabled
if (sampleRate != 0)
{
// Initialize SDL Audio
if (SDL_InitSubSystem(SDL_INIT_AUDIO) < 0)
{
SDL_Log("SDL fails to initialize audio subsystem!\n%s", SDL_GetError());
return 1;
}
// Number of samples per frame
samplesPerFrame = (double)sampleRate / (double)framesPerSecond * channel;
audioSamplesSize = samplesPerFrame * bytesPerSample; // Bytes
audioBufferSize = audioSamplesSize * 10; // Bytes
// Set and clear circular buffer
audioBuffer.buffer = malloc(audioBufferSize); // Bytes, must be a multiple of audioSamplesSize
memset(audioBuffer.buffer, 0, audioBufferSize);
audioBuffer.cells = (audioBufferSize) / sizeof(short); // Cells, not Bytes!
audioBuffer.readPoint = audioBuffer.buffer;
audioBuffer.writePoint = audioBuffer.readPoint + (short)samplesPerFrame;
}
else
samplesPerFrame = 0;
// First frame
return samplesPerFrame;
}
This is the SDL method callback from want.callback:
void audioCallback(void *userdata, uint8_t *stream, int len)
{
SDL_memset(stream, 0, len);
if (audioSamplesSize == 0)
return;
if (len > audioSamplesSize)
{
len = audioSamplesSize;
}
SDL_MixAudioFormat(stream, (const Uint8 *)audioBuffer.readPoint, AUDIO_S16SYS, len, SDL_MIX_MAXVOLUME);
audioBuffer.readPoint += (short)samplesPerFrame;
if (audioBuffer.readPoint >= audioBuffer.buffer + audioBuffer.cells)
audioBuffer.readPoint = audioBuffer.readPoint - audioBuffer.cells;
}
This method is called at each frame (after first pass we require only the amount of samples):
int update_audio(short *buffer)
{
// Check if sound is disabled
if (sampleRate != 0)
{
memcpy(audioBuffer.writePoint, buffer, audioSamplesSize); // Bytes
audioBuffer.writePoint += (short)samplesPerFrame; // Cells
if (audioBuffer.writePoint >= audioBuffer.buffer + audioBuffer.cells)
audioBuffer.writePoint = audioBuffer.writePoint - audioBuffer.cells;
if (firstTime)
{
// Set required audio specs
want.freq = sampleRate;
want.format = AUDIO_S16SYS;
want.channels = channel;
want.samples = samplesPerFrame / channel; // total samples divided by channel count
want.padding = 0;
want.callback = audioCallback;
want.userdata = NULL;
device = SDL_OpenAudioDevice(SDL_GetAudioDeviceName(0, 0), 0, &want, &have, 0);
SDL_PauseAudioDevice(device, 0);
firstTime = 0;
}
}
else
samplesPerFrame = 0;
// Next frame
return samplesPerFrame;
}
I expect that this question/answer will be useful for others in the future because I didn't find almost nothing on the net for SDL Audio
Ok, I found a solution!
With the code of the opening post I was playing samples for next frame at the current frame in real time. It was wrong!
So, I implemented a circular buffer where I put samples for next frame that underlying code passes to me at each (current) frame. From that buffer I read and write in different position, see opening post

Confused about Passing user data to PortAudio Callbacks

This is my first post here and I'm fairly new to programming and especially with C. A couple weeks ago I started working through the Audio Programming Book(MIT press) and have been expand on some examples to try to understand things further.
I think my question lies with how I'm trying to pass data (retrieved from the user in an initialization function) to a PortAudio callback. I feel like what I've done isn't that different from the examples (both from the book and PortAudio's examples like paex_sine.c), but for some reason I can't my code to work and I've been banging my head against a wall trying to understand why. I've tried searching pretty extensively for solutions or example code to study, but I kind of don't know what I don't know, so that hasn't returned much.
How do I get user data into the callback?
Am I just not understanding how pointers and structs work and trying to force them to do things they don't want to?
Or, am I just overlooking something really obvious?
The following code either gives a really high pitched output, short high pitched blips, or no (audible) output:
#include <stdio.h>
#include <math.h>
#include "portaudio.h"
#define FRAME_BLOCK_LEN 64
#define SAMPLING_RATE 44100
#define TWO_PI (3.14159265f * 2.0f)
PaStream *audioStream;
double si = 0;
typedef struct
{
float frequency;
float phase;
}
paTestData;
int audio_callback (const void *inputBuffer, void *outputBuffer,
unsigned long framesPerBuffer,
const PaStreamCallbackTimeInfo* timeinfo,
PaStreamCallbackFlags statusFlags,
void *userData )
{
paTestData *data = (paTestData*)userData;
float *out = (float*)outputBuffer;
unsigned long i;
// data->frequency = 400;
for(i = 0; i < framesPerBuffer; i++){
si = TWO_PI * data->frequency / SAMPLING_RATE; // calculate sampling-incr
*out++ = sin(data->phase);
*out++ = sin(data->phase);
data->phase += si; // add sampling-incr to phase
}
return paContinue;
}
void init_stuff()
{
float frequency;
int i;
PaStreamParameters outputParameters;
paTestData data;
printf("type the modulator frequency in Hz: ");
scanf("%f", &data.frequency); // get modulator frequency
printf("you chose data.frequency %.2f\n",data.frequency);
data.phase = 0.0;
printf("initializing Portaudio. Please wait...\n");
Pa_Initialize(); // initialize Portaudio
outputParameters.device = Pa_GetDefaultOutputDevice(); /* default output device */
outputParameters.channelCount = 2; /* stereo output */
outputParameters.sampleFormat = paFloat32; /* 32 bit floating point output */
outputParameters.suggestedLatency = Pa_GetDeviceInfo( outputParameters.device )->defaultLowOutputLatency;
outputParameters.hostApiSpecificStreamInfo = NULL;
Pa_OpenStream( // open paStream object
&audioStream, // portaudio stream object
NULL, // input params
&outputParameters, // output params
SAMPLING_RATE, // SampleRate
FRAME_BLOCK_LEN, // frames per buffer
paNoFlag, // set no Flag
audio_callback, // callbak function address
&data ); // user data
Pa_StartStream(audioStream); // start the callback mechanism
printf("running... press space bar and enter to exit\n");
}
void terminate_stuff()
{
Pa_StopStream(audioStream); // stop callback mechanism
Pa_CloseStream(audioStream); // destroy audio stream object
Pa_Terminate(); // terminate portaudio
}
int main(void)
{
init_stuff();
while(getchar() != ' ') Pa_Sleep(100);
terminate_stuff();
return 0;
}
Uncommenting data->frequency = 400; at least plays a 400hz sine wave, but that ignores any user input done in init_stuff()
If I put a printf("%f\n",data->frequency); inside the callback, it prints 0.000000 or something like -146730090609497866240.000000.
It's pretty unpredictable, and this really makes me think it's pointer related.
My goal for this code is to eventually incorporate envelope generators to change the pitch and possibly incorporate wavetable oscillators so I'm not calculating sin(x) for every iteration.
I can get envelopes and wavetables to work while using a blocking API like portsf that's used in the book, but trying to adapt any of that code from earlier chapters to use PortAudio callbacks is turning my brain to mush.
Thanks so much!
The problem you're having with your callback data is that it goes out of scope and memory is deallocated as soon as init_stuff finishes execution.
You should allocate memory for your callback data using malloc or new and passing the pointer to it for the callback.
For example:
void init_stuff()
{
float frequency;
int i;
PaStreamParameters outputParameters;
paTestData *data = (paTestData *) malloc(sizeof(paTestData));
printf("type the modulator frequency in Hz: ");
scanf("%f", &(data->frequency)); // get modulator frequency
printf("you chose data.frequency %.2f\n",data->frequency);
data->phase = 0.0;
...
Pa_OpenStream( // open paStream object
&audioStream, // portaudio stream object
NULL, // input params
&outputParameters, // output params
SAMPLING_RATE, // SampleRate
FRAME_BLOCK_LEN, // frames per buffer
paNoFlag, // set no Flag
audio_callback, // callbak function address
data );
...
I wasn't able to get the original code working using malloc but based on both suggestions, I realized another workable solution. Because running init_stuff() caused my data to get deallocated, I'm for now just making all my assignments and calls to Pa_OpenStream() from main.
Works beautifully and I can now send whatever data I want to the callback. Thanks for the help!

Writing multichannel audio for MATLAB with libsndfile

I am trying to use libsndfile to write a multichannel wav that can be read by MATLAB 2010+.
the following code writes a 4 channel interleaved wav. all samples on channel 1 should be 0.1, on channel 2 they are 0.2, on channel 3 ... etc.
Each channel is 44100 samples in length.
I drag the wave file onto the MATLAB workspace and unfortunately MATLAB keeps returning "File contains uninterpretable data".
It may also be worth noting that when all samples are set to 0.0, MATLAB successfully reads the file, although very slowly.
I have successfully used libsndfile to read multichannel data written by MATLAB's wavwrite.m, so the library is setup up correctly I believe.
Audacity can read the resulting file from the code below.
VS 2012 64 bit compiler,
Win7 64bit, MATLAB 2015a
ref: the code has been adapted from http://www.labbookpages.co.uk/audio/wavFiles.html
Any suggestions, I presume i'm making a simple error here?
Thanks
#include <sndfile.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
// Create interleaved audio data
int numFrames_out = 44100;
int channels = 4;
float *int_y;
int_y = (float*)malloc(channels*numFrames_out*sizeof(float));
long q=0;
for (long i = 0; i<numFrames_out; i++)
{
for (int j = 0; j<channels; j++)
{
int_y[q+j] = ((float)(j+1))/10.0;
}
q+=channels;
}
// Set multichannel file settings
SF_INFO info;
info.format = SF_FORMAT_WAV | SF_FORMAT_PCM_32;
info.channels = channels;
info.samplerate = 44100;
// Open sound file for writing
char out_filename[] = "out_audio.wav";
SNDFILE *sndFile = sf_open(out_filename, SFM_WRITE, &info);
if (sndFile == NULL)
{
fprintf(stderr, "Error opening sound file '%s': %s\n", out_filename, sf_strerror(sndFile));
return -1;
}
// Write frames
long writtenFrames = sf_writef_float(sndFile, int_y, numFrames_out);
// Check correct number of frames saved
if (writtenFrames != numFrames_out) {
fprintf(stderr, "Did not write enough frames for source\n");
sf_close(sndFile);
free(int_y);
return -1;
}
sf_close (sndFile);
}
It looks like you are only closing the output file (using sf_close()) in the error case. The output file will not be a well formed WAV file unless you call sf_close() at the end of your program.

Audio samplerate converter using libsndfile and libsamplerate. Not sure if using function src_simple correctly

I have been building a simple samplerate converter in c using libsndfile and libsamplerate. I just cant seem to get the src_simple function of libsamplerate to work, whatever I try. I have striped back my code to be as simple as possible and it now just outputs a silent audio file of identical sampling rate:
#include <stdio.h>
#include <sndfile.h>
#include <samplerate.h>
#define BUFFER_LEN 1024
#define MAX_CHANNELS 6
int main ()
{
static double datain [BUFFER_LEN];
static double dataout [BUFFER_LEN];
SNDFILE *infile, *outfile;
SF_INFO sfinfo, sfinfo2 ;
int readcount ;
const char *infilename = "C:/Users/Oli/Desktop/MARTYTHM.wav" ;
const char *outfilename = "C:/Users/Oli/Desktop/Done.wav" ;
SRC_DATA src_data;
infile = sf_open (infilename, SFM_READ, &sfinfo);
outfile = sf_open (outfilename, SFM_WRITE, &sfinfo);
src_data.data_in = datain
src_data.input_frames = BUFFER_LEN;
src_data.data_out = dataout;
src_data.output_frames = BUFFER_LEN;
src_data.src_ratio = 0.5;
src_simple (&src_data, SRC_SINC_BEST_QUALITY, 1);
while ((readcount = sf_read_double (infile, datain, BUFFER_LEN)))
{
src_simple (&src_data, SRC_SINC_BEST_QUALITY, 1);
sf_write_double (outfile, dataout, readcount) ;
};
sf_close (infile);
sf_close (outfile);
sf_open ("C:/Users/Oli/Desktop/Done.wav", SFM_READ, &sfinfo2);
printf("%d", sfinfo2.samplerate);
return 0;
}
It's really starting to stress me out. The program is a uni project and is due very soon, it is making me very anxious as whatever I try seems to result in failure. Can anyone please help me?
I'm not an expert on this particular library, but just from looking at the online documentation I see a few problems with your code:
src_simple apparently works with floats, yet your buffers are doubles - I think you need to change the buffers to float and use sf_read_float/sf_write_float for I/O.
src_simple is the "simple" interface and is intended to be applied to an entire waveform in one call, not in chunks as you are doing - see http://www.mega-nerd.com/SRC/faq.html#Q004 - you should first get the input file size, then allocate sufficient memory, read in the whole file, convert it in one go, then write the converted output data to your output file.
when changing sample rate you will get a different number of samples in the output file than in the output file (around half as many in for case), yet you're writing the same number of samples that you read (readcount). You should probably be using src_data.output_frames_gen as the number of frames to write, not readcount.

Resources