FFmpeg AVFrame Audio Data Modification - c

I'm trying to figure out how FFmpeg saves data in an AVFrame after the audio has been decoded.
Basically, if I print the data in the AVFrame->data[] array I get a number of unsigned 8 bit integers that is the audio in raw format.
From what I can understand from the FFmpeg doxygen, the format of the data is expressed in the enum AVSampleFormat and there are 2 main categories: interleaved and planar. In the interleaved type, the data is all kept in the first row of the AVFrame->data array with size AVFrame->linesize[0] while in the planar type each channel of the audio file is kept in a separate row of the AVFrame->data array and the arrays have as size AVFrame->linesize[0].
Is there a guide/tutorial that explains what do the numbers in the array mean for each of the formats?

Values in each of the data arrays (planes) are actual audio samples according to specified format. E.g. if format is AV_SAMPLE_FMT_S16P it means that data arrays actually are arrays of int16_t PCM data. If we have deal with mono signal - only data[0] is valid, if it is stereo - data[0] and data[1] are valid, so on.
I'm not sure that there is any guide that can help you to explain each particular case but anyway described approach is quite simple and is easy to understand. You should just play a bit with it and thing should become clear.

Related

About the wav data sub-chunk

I am working on a project in which I have to merge two 8bits .wav files using C and i still have no clue how to do it.
I have read about wav files and I want to start by reading one of the files.
There's one thing i didn't understand:
Let's say i have an 8bit WAV audio file, And i was able to read (even tho I am still trying to) the Data that starts after the 44 byte, I will get numbers between 0 and 255 logically.
My question is:
What do those numbers mean?
If I get 255 or 0 what do they mean?
Are they samples from the wave?
Can anyone please explain?
Thanks in advance
Assuming we're not dealing with file format issues, getting values between 0 and 255 means that the audio samples are of unsigned eight-bit format, as you have put it.
One way of merging data would consist of reading data from files into buffers, arrays a and b and summing them value by value: c[i] = a[i] + b[i]. By doing so, you'd have to take care of the following:
length of the files may not be equal
on summing the unsigned 8-bit buffers, such as yours will almost certainly overflow
This is usually achieved using a for loop. You first get the sizes of the chunks. Your for loop has to be written in such a way that it neither reads past the array boundary, nor ignores what can be read. For preventing overflows you can either:
divide values by two on reading
or
read (convert) into a format which wouldn't overflow, then normalize and convert the merged data back into the original format or whichever format desired.
For all particulars of reading from and writing to a .wav format file you may use some of the existing audio file libraries, or write your own routine. Dealing with audio file format is not a trivial thing, though. Here's a reference on .wav format.
Here are few audio file APIs worth of looking at:
libsndfile
sndlib
Hope this can help.
See any good guide to WAVE for information on the format of samples in the data chunk, such as this one I found: http://www.neurophys.wisc.edu/auditory/riff-format.txt
Relevant excerpts:
In a single-channel WAVE file, samples are stored consecutively. For
stereo WAVE files, channel 0 represents the left channel, and channel
1 represents the right channel. The speaker position mapping for more
than two channels is currently undefined. In multiple-channel WAVE
files, samples are interleaved.
Data Format of the Samples
Each sample is contained in an integer i. The size of i is the
smallest number of bytes required to contain the specified sample
size. The least significant byte is stored first. The bits that
represent the sample amplitude are stored in the most significant bits
of i, and the remaining bits are set to zero.
For example, if the sample size (recorded in nBitsPerSample) is 12
bits, then each sample is stored in a two-byte integer. The least
significant four bits of the first (least significant) byte is set to
zero.
The data format and maximum and minimums values for PCM waveform
samples of various sizes are as follows:
Sample Size Data Format Maximum Value Minimum Value
One to Unsigned 255 (0xFF) 0
eight bits integer
Nine or Signed Largest Most negative more bits
integer i positive value of i
value of i
N.B.: Even if the file has >8 bits of audio resolution, you should read the file as an array of unsigned char and reconstitute the larger samples manually as per the above spec. Don't try to do anything like reading the samples directly over an array of native C ints, as their layout and size is platform-dependent and therefore should not be relied upon in any code.
Note also that the header is not guaranteed to be 44 bytes long: How can I detect whether a WAV file has a 44 or 46-byte header? You need to read the length and process the header based on that, not any assumption.

Get raw samples in array from iphone library song

This questions has been asked many times, however I have not found a sufficient answer. I am trying to extract PCM data from a song off the users iPod library then put it into an array so that I can run FFT on it. I can grab any song off the iPod with MediaPlayer framework and obtain its URL. I have converted it into an AVURLAsset and experimented with many AVFramework tools with no luck.
MPMediaItem *currentItem = [appMusicPlayer nowPlayingItem];
NSURL * assetURL_ = [currentItem valueForProperty: MPMediaItemPropertyAssetURL];
AVURLAsset *songAsset = [AVURLAsset URLAssetWithURL:assetURL_ options:nil];
From this point I need to put its data into an array from which I can run the FFT on the song.
Thanks for any help!
You can use the AVAssetReader and AVAssetWriter APIs to read in a compressed file from a URL, and write out raw samples or a WAV/RIFF file, which is just raw samples with a small header of known size (usually 44 bytes) that can be skipped before analysis. e.g. Use different audio formats for reading and writing if needed.
Once you get a file as raw samples, the app can just read it in as NSData to get at the raw bytes (an audio 16-bit sample will be 2 bytes, which you can get at by casting the C pointer to the bytes).

audio to 8-bit text sample conversion

I have an interesting question today.
I need to convert some pokemon audio files to a list of 8-bit samples (0-255 values). I am writing an assembly routine on the MC6800 chipset that will require these sounds to be played. I plan on including an array with the 8-bit samples that the program will loop through when a function is called.
Does anyone know a way to convert audio files (wav/mp3) into a list of comma separated 8-bit text sample values? Or anything of this relative method?
Thank you so much in advance!
You can use the command-line "sox" tool or the Audacity audio editor to convert the file to a raw, unsigned 8-bit mono audio file.
In Audacity 1.3 or higher, open the audio then select Export, choose "Wave, AIFF, and other uncompressed types" as the format, then click Options... - then choose "Other..." for the Format, "RAW" for the Header, and Signed 8-bit PCM as the encoding. (Sorry, unsigned isn't available.)
From the command line, try sox with -c 1 for 1 channel, -t raw for no header, -u for unsigned linear, and -1 for 1 byte per sample.
Then you can use a tool like "hexdump" to dump out the bytes of the file as numbers and paste them into your code.
If sox doesn't have it, you will have to use it to generate raw (headerless) files and convert the raw files to comma-separated yourself.
EDIT: sox has "Raw textual data" as one of its formats, from the web page. You can make it convert your sound files to unsigned 8-bit linear samples in a first pass and then probably get exactly the output you want using this option for output.
For .wav it is a very simple process. You can find the .wav specification easily with a google search. It comprises a header then simply raw samples. You should read the header first, then loop through all the samples. Usually they are 16 bit samples, so you want to normalize them from the range -32768 to 32767 to your 0-255 range. I suggest simple scaling at first. If that's not successful maybe find the actual min and max amongst the samples and adjust your scale accordingly.
Well a lot depends on your audio format. The wave format, for example, consists of uncompressed interleaved PCM data.
ie for an 8-bit stereo file each sample will be arranged as follows.
[Left Sample 1][Right Sample 1][Left Sample 2][Right Sample2]...[Left Sample n][Right sample n].
ie each 8 bit stereo sample is stored in 2 bytes. 1 for the left channel and 1 for the right. This is the data format your sound hardware will most likely require.
A 16 or 24-bit audio file will work in each way but the left and right samples will be 2 or 3 bytes each, respectively.
Obviously a wave file has a load of extyra information in it. It follows the RIFF format. You can find info on it and the "chunks" wave files use at places such as www.wotsit.org.
To decompress an MP3 is more complicated. You are best off getting hold of a decompressor and running it on the MP3 encoded audio. IT will spit out PCM data as above from the other side.

reading 16-bit greyscale TIFF

I'm trying to read a 16-bit greyscale TIFF file (BitsPerSample=16) using a small C program to convert into an array of floating point numbers for further analysis. The pixel data are, according to the header information, in a single strip of 2048x2048 pixels. Encoding is little-endian.
With that header information, I was expecting to be able to read a single block of 2048x2048x2 bytes and interpret it as 2048x2048 2-byte integers. What I in fact get is a picture split into four quadrants of 1024x1024 pixels each, the lower two of which contain only zeros. Each of the top two quadrants look like I expected the whole picture to look: alt text http://users.aber.ac.uk/ruw/unlinked/15_inRT_0p457.png
If I read the same file into Gimp or Imagemagick, both tell me that they have to reduce to 8-bit (which doesn't help me - I need the full range), but the pixels turn up in the right places: alt text http://users.aber.ac.uk/ruw/unlinked/15_inRT_0p457_gimp.png
This would suggest that my idea about how the data are arranged within the one strip is wrong. On the other hand, the file must be correctly formatted in terms of the header information as otherwise Gimp wouldn't get it right. Where am I going wrong?
Output from tiffdump:
15_inRT_0p457.tiff:
Magic: 0x4949 Version: 0x2a
Directory 0: offset 8 (0x8) next 0 (0)
ImageWidth (256) LONG (4) 1<2048>
ImageLength (257) LONG (4) 1<2048>
BitsPerSample (258) SHORT (3) 1<16>
Compression (259) SHORT (3) 1<1>
Photometric (262) SHORT (3) 1<1>
StripOffsets (273) LONG (4) 1<4096>
Orientation (274) SHORT (3) 1<1>
RowsPerStrip (278) LONG (4) 1<2048>
StripByteCounts (279) LONG (4) 1<8388608>
XResolution (282) RATIONAL (5) 1<126.582>
YResolution (283) RATIONAL (5) 1<126.582>
ResolutionUnit (296) SHORT (3) 1<3>
34710 (0x8796) LONG (4) 1<0>
(Tag 34710 is camera information; to make sure this doesn't somehow make any difference, I've zeroed the whole range from the end of the image file directory to the start of data at 0x1000, and that in fact doesn't make any difference.)
I've found the problem - it was in my C program...
I had allocated memory for an array of longs and used fread() to read in the data:
#define PPR 2048;
#define BPP 2;
long *pix;
pix=malloc(PPR*PPR*sizeof(long));
fread(pix,BPP,PPR*PPR,in);
But since the data come in 2-byte chunks (BPP=2) but sizeof(long)=4, fread() packs the data densely inside the allocated memory rather than packing them into long-sized parcels. Thus I end up with two rows packed together into one and the second half of the picture empty.
I've changed it to loop over the number of pixels and read two bytes each time and store them in the allocated memory instead:
for (m=0;m<PPR*PPR;m++) {
b1=fgetc(in);
b2=fgetc(in);
*(pix+m)=256*b1+b2;
}
You understand that if StripOffsets is an array, it is an offset to an array of offsets, right? You might not be doing that dereference properly.
What's your platform? What are you trying to do? If you're willing to work in .NET on Windows, my company sells an image processing toolkit that includes a TIFF codec that works on pretty much anything you can throw at it and will return 16 bpp images. We also have many tools that operate natively on 16bpp images.

libsox to convert wav file sample rate etc.

EDIT: i want to use libsox to programatically convert a wav file's sample rate, audio format, channels, and etc.
in the libsox man page, there are a bunch of functions I can use but I'm clueless as hell on what to do. Can anyone give me a sort of steps on how to do it?
Help?
Can anyone please explain this?
The function sox_write writes len samples from buf using the format
handler specified by ft. Data in buf must be 32-bit signed samples and
will be converted during the write process. The value of len is speci-
fied in total samples. If its value is not evenly divisable by the num-
ber of channels, undefined behavior will occur.
I'd recommend a combination of libsndfile and libsamplerate
http://www.mega-nerd.com/SRC
SRC provides a small set of converters
to allow quality to be traded off
against computation cost. The current
best converter provides a
signal-to-noise ratio of 145dB with
-3dB passband extending from DC to 96% of the theoretical best bandwidth for
a given pair of input and output
sample rates.
http://www.mega-nerd.com/libsndfile/
Ability to read and write a large number of file formats.
A simple, elegant and easy to use Applications Programming
Interface.
Usable on Unix, Win32, MacOS and others.
On the fly format conversion, including endian-ness swapping, type
conversion and bitwidth scaling.
Optional normalisation when reading floating point data from files
containing integer data.
Ability to open files in read/write mode.
The ability to write the file header without closing the file (only
on files open for write or
read/write).
Ability to query the library about all supported formats and
retrieve text strings describing each
format.
Well, I guess your question has something to do with the last sentence. If you have an interleaved buffer, the number of samples in the buffer has to be divisable by the number of channels, because this is the number of per-channel samples you will write. For example, let's say you have L and R channels; your data will be like this on the buffer:
[0] 1st sample - L
[1] 1st sample - R
[2] 2nd sample - L
[3] 2nd sample - R
...
[n-1] n/2-th sample - L
[n] n-th sample - R
Hope it helps.

Resources