How to multiplex Vorbis and Theora streams using libogg - c

I am currently writing a simple Theora video encoder, which uses libogg, libvorbis and libtheora. Currently, I can submit frames to the Theora encoder, and PCM samples to the Vorbis encoder, pass the resulting packets to Ogg streams (one for Theora and one for Vorbis) and get pages out.
When the program starts, it flushes the headers first from the Theora encoder, then from the Vorbis encoder to the output file (obviously, both streams have unique serial numbers). Then, I write interleaved pages to the file from both of the streams.
When writing just the video, or just the audio, I am able to play back the output in mplayer just fine, however when I attempt to write both, I get the following:
Ogg demuxer error : we met an unknown stream
I'm guessing I'm doing the multiplexing wrong. I have read through the documentation for multiplexing streams on Xiph.org, and I can't see where I differ. I cannot seem to find any example code for doing this, short of going through the source of an open-source encoder (which I'm having some trouble understanding). Would anyone be able to explain how to multiplex streams correctly using libogg? I'm trying to do this in C on Ubuntu 10.04, using the libraries from the Ubuntu repository.
Many thanks in advance!
Tom

Ok, for anyone who was reading this, I have to some extent solved it.
You should not flush all of the the header packets from each stream - just the first (setup) packet, which for Vorbis and Theora gets its own page by default. Put the other header packets into their respective streams, but do not flush until the setup pages from all streams have been written to the file.
Once you have done this, try to keep the streams as closely sync'd as possible (mplayer gave some errors for me when they got too far out). At 24fps video and 44.1 KHz audio, 1 frame should span 1837.5 audio samples (with PCM audio, this is 7,350 bytes).
If anyone else has any tips / info, it would be good to hear - I've never done anything with audio / video before!
Thanks!
Tom

Related

Convert a stereo wav to mono in C

I have developed the Synchronous Audio Interface (SAI) driver for a proprietary Real-Time Operating System (RTOS) using C language. My driver is configured to output left and right channel data (I2S) to the amplifier. But, since the amplifier attached is mono, it only outputs left or right channel audio data to the speaker. Now, i have a stereo PCM 16-bit audio data file and i want to somehow mix the left and right channel audio data in my application and send it to either of the left or right channel in the SAI driver. In this way, i will be able to play combined stereo audio data as mono on the speaker attached to the mono amplifier.
Can anyone suggest me that what's the best possible solution to do it?
As said in a comment, the usual way to mix two stereo channels in a mono one is to divide the sample of each channel by 2 and add them.
Example in C like :
int left_channel_sample, right_channel_sample;
int mono_channel = (left_channel_sample / 2) + ( right_channel_sample / 2);
You mentioned some driver you coded, modify it or add some new feature. Can't really help more given the mess of your question...

FFmpeg: what does av_parser_parse2 do?

When sending h264 data for frame decoding, it seems like a common method is to first call av_parser_parse2 from the libav library on the raw data.
I looked for documentation but I couldn't find anything other than some example codes. Does it group up packets of data so that the resulting data starts out with NAL headers so it can be perceived a frame?
The following is a link to a sample code that uses av_parser_parse2:
https://github.com/DJI-Mobile-SDK-Tutorials/Android-VideoStreamDecodingSample/blob/master/android-videostreamdecodingsample/jni/dji_video_jni.c
I would appreciate if anyone could explain those library details to me or link me resources for better understanding.
Thank you.
It is like you guessed, av_parser_parse2() for H.264 consumes input data, looks for NAL start codes 0x000001 and checks the NAL unit type looking for frame starts and outputs the input data, but with a different framing.
That is it consumes the input data, ignores its framing by putting all consecutive data into a big buffer and then restores the framing from the H.264 byte stream alone, which is possible because of the start codes and the NAL unit types. It does not increase or decrease the amount of data given to it. If you get 30k out, you have put 30k in. But maybe you did it in little pieces of around 1500 bytes, the payload of the network packets you received.
Btw, when the function declaration is not documented well, it is a good idea to look at the implementation.
Just to recover the framing is not involved enough to call it parsing. But the H.264 parser in ffmpeg also gathers some more information from the H.264 stream, eg. whether it is interlaced, so it really deserves its name.
It however does not decode the image data of the H.264 stream.
DJI's video transmission does not guarantee the data in each packet belongs to a single video frame. Mostly a packet contains only part of the data needed for a single frame. It also does not guarantee that a packet contains data from one frame and not two consecutive frames.
Android's MediaCodec need to be queued with buffers, each holding the full data for a single frame.
This is where av_parser_parse2() comes in. It gathers packets until it can find enough data for a full frame. This frame is then sent to MediaCodec for decoding.

Is it possible to force ffmpeg/libav decoder to output an H.264 frame with only current information?

I'm currently using a slightly legacy version of ffmpeg/libav to decode H.264 frames.
It decodes them with a call to:
avcodec_decode_video2(context, &outPicture, &gotPicture, inNALPacket);
For this I provide a series of NAL packets, and once it has 'enough' it produces the image frame, as outPicture.
So far, so good.
However, sometimes (due to network issues) a packet/NAL goes missing.
I can detect this.
When this happens I would like to give up on this frame, and tell the decoder to just give me its best shot at the image, given the data so far.
Is there any way of doing this? eg. can I construct an inNALPacket that essentially tells the encoder to give up and move on?

Capturing sound by ALSA

I am trying to capture the sound from sound card by ALSA in linux systems. Its read the data from the vector in PCM format. I need a way to find out the right way to capturing and save it to in the file and play to check the recevied data is correct or not.
To capture audio to a file with alsa , you can use arecord. By using this you can simply capture input audio to a file. Or you can write your own application which read PCM data. You can use snd_pcm_readi API for this purpose.

C Linux Device Programming - Reading Straight from /Dev

I have been playing with creating sounds using mathematical wave functions in C. The next step in my project is getting user input from a MIDI keyboard controller in order to modulate the waves to different pitches.
My first notion was that this would be relatively simple and that Linux, being Linux, would allow me to read the raw data stream from my device like I would any other file.
However, research overwhelmingly advises that I write a device driver for the MIDI controller. The general idea is that even though the device file may be present, the kernel will not know what system calls to execute when my application calls functions like read() and write().
Despite these warnings, I did an experiment. I plugged in the MIDI controller and cat'ed the "/dev/midi1" device file. A steady stream of null characters appeared, and when I pressed a key on the MIDI controller several bytes appeared corresponding to the expected Message Chunks that a MIDI device should output. MIDI Protocol Info
So my questions are:
Why does the cat'ed stream behave this way?
Does this mean that there is a plug and play device driver already installed on my system?
Should I still go ahead and write a device driver, or can I get away with reading it like a file?
Thank you in advanced for sharing your wisdom in these areas.
Why does the cat'ed stream behave this way?
Because that is presumably the raw MIDI data that is being received by the controller. The null bytes are probably some sort of sync tick.
Does this mean that there is a plug and play device driver already installed on my system?
Yes.
However, research overwhelmingly advises that I write a device driver for the MIDI controller. The general idea is that even though the device file may be present, the kernel will not know what system calls to execute when my application calls functions like read() and write().
<...>
Should I still go ahead and write a device driver, or can I get away with reading it like a file?
I'm not sure what you're reading or how you're coming to this conclusion, but it's wrong. :) You've already got a perfectly good driver installed for your MIDI controller -- go ahead and use it!
Are you sure you are reading NUL bytes? And not 0xf8 bytes? Because 0xf8 is the MIDI time tick status and is usually sent periodically to keep the instruments in sync. Try reading the device using od:
od -vtx1 /dev/midi1
If you're seeing a bunch of 0xf8, it's okay. If you don't need the tempo information sent by your MIDI controller, either disable it on your controller or ignore those 0xf8 status bytes.
Also, for MIDI, keep in mind that the current MIDI status is usually sent once (to save on bytes) and then the payload bytes follow for as long as needed. For example, the pitch bend status is byte 0xeK (where K is the channel number, i.e. 0 to 15) and its payload is 7 bits of the least significant byte followed by 7 bits of the most significant bytes. Thus, maybe you got a weird controller and you're seeing only repeated payloads of some status, but any controller that's not stupid won't repeat what it doesn't need to.
Now for the driver: have a look at dmesg when you plug in your MIDI controller. Now if your OSS /dev/midi1 appears when you plug in your device (udev is doing this job), and dmesg doesn't shoot any error, you don't need anything else. The MIDI protocol is yet-another-serial-protocol that has a fixed baudrate and transmits/receives bytes. There's nothing complicated about that... just read from or write to the device and you're done.
The only issue is that queuing at some place could result in bad audio latency (if you're using the MIDI commands to control live audio, which I believe is what you're doing). It seems like those devices are mostly made for system exclusive messages, that is, for example, downloading some patch/preset for a synthesizer online and uploading it to the device using MIDI. Latency doesn't really matter in this situation.
Also have a look at the ALSA way of playing with MIDI on Linux.
If you are not developing a new MIDI controller hardware, you shouldn't worry about writing a driver for it. It's the user's concern installing their hardware, and the vendor's obligation to supply the drivers.
Under Linux, you just read the file. Now to interpret and make useful things with the data.

Resources