This questions has been asked many times, however I have not found a sufficient answer. I am trying to extract PCM data from a song off the users iPod library then put it into an array so that I can run FFT on it. I can grab any song off the iPod with MediaPlayer framework and obtain its URL. I have converted it into an AVURLAsset and experimented with many AVFramework tools with no luck.
MPMediaItem *currentItem = [appMusicPlayer nowPlayingItem];
NSURL * assetURL_ = [currentItem valueForProperty: MPMediaItemPropertyAssetURL];
AVURLAsset *songAsset = [AVURLAsset URLAssetWithURL:assetURL_ options:nil];
From this point I need to put its data into an array from which I can run the FFT on the song.
Thanks for any help!
You can use the AVAssetReader and AVAssetWriter APIs to read in a compressed file from a URL, and write out raw samples or a WAV/RIFF file, which is just raw samples with a small header of known size (usually 44 bytes) that can be skipped before analysis. e.g. Use different audio formats for reading and writing if needed.
Once you get a file as raw samples, the app can just read it in as NSData to get at the raw bytes (an audio 16-bit sample will be 2 bytes, which you can get at by casting the C pointer to the bytes).
Related
I use Azure Speech SDK to convert a set of text files to voice. Successfully all of the texts are converted and an instance of ArrayBuffer is returned for each text. Then I convert each of them to Buffer and concatenate all of the buffers into one with Buffer.concat(). Then I pass the concatenated buffer data to fs.writeFile() to create an mp3 file. However only the first buffer is included in the audio file instead of the concatenated buffer.
What should I do?
To provide a little background, audio files generally consist of some header data that contains information about the audio (e.g. the sampling rate, how many channels of audio, etc...), followed by the actual audio data. Generally speaking, you should only have one header per audio file.
If you simply concatenate the audio data together, the header from the first file will be read by your media player. Since this doesn't contain any information about the other audio files you've concatenated, you'll get some non deterministic behaviour. It may play the audio from the first file only, it may give you an error, or it may try to play the header section for the remaining audio files as audio data (which is not correct).
For things to work correctly, you'd need to update the header to reflect all of the audio data, and strip out the remaining headers. You'd also need to make sure that the byte alignment, formatting, etc... of all your audio data is consistent.
Your best bet here is to use some software that understands how to read and parse audio files to stitch together your files. Using your search engine of choice for to search for e.g. combine mp3 files should help.
I'm trying to figure out how FFmpeg saves data in an AVFrame after the audio has been decoded.
Basically, if I print the data in the AVFrame->data[] array I get a number of unsigned 8 bit integers that is the audio in raw format.
From what I can understand from the FFmpeg doxygen, the format of the data is expressed in the enum AVSampleFormat and there are 2 main categories: interleaved and planar. In the interleaved type, the data is all kept in the first row of the AVFrame->data array with size AVFrame->linesize[0] while in the planar type each channel of the audio file is kept in a separate row of the AVFrame->data array and the arrays have as size AVFrame->linesize[0].
Is there a guide/tutorial that explains what do the numbers in the array mean for each of the formats?
Values in each of the data arrays (planes) are actual audio samples according to specified format. E.g. if format is AV_SAMPLE_FMT_S16P it means that data arrays actually are arrays of int16_t PCM data. If we have deal with mono signal - only data[0] is valid, if it is stereo - data[0] and data[1] are valid, so on.
I'm not sure that there is any guide that can help you to explain each particular case but anyway described approach is quite simple and is easy to understand. You should just play a bit with it and thing should become clear.
I have an H264 stream (IIS - smooth streaming) that I would like to play with SilverLight. Apparently SilverLight can do it, but how?
Note: the VC-1 stream can be played by the SilverLight, but H264 not.Also, I can provide a stream and any additional information required. H264 encoder is the one in Media Foundation (MFT). Same goes for the VC-1 that works (although is it impossible to create equal chunks for smooth streaming because forcing key-frame insertion makes video jerky.EDIT: MPEG2VIDEOINFO values for H264:
Just a guess. Based on your question 18009152. I am guessing you are encoding h.264 using the annexb bitstream format. According to comments, you can not tell the encoder to use AVCC format. Therefore, you must perform this conversion manually (Annex B WILL NOT work in an ISO container). You can do this by looking for start codes in your AVC stream. A start code is 3 or 4 bytes (0x000001, 0x00000001). You get the length of the NALU by locating the next start code, or the end of the stream. Strip the start code (throw it away) and in its place write the size of the NALU in a 32bit integer big endian. Then write this data to the container. Just to be clear, this is performed on the video frames that come out of the encoder. The extra data is a separate step that appears you have mostly figure out (except for the NALUSizeLength). Because we uses a 4 byte integer to write the NALU sizes, you MUST set NALUSizeLength to 4.
Silverlight 3 can play H264 files. Use MediaStreamSource for this.
Here is the interface description: http://msdn.microsoft.com/en-us/library/system.windows.media.mediastreamsource(v=vs.95).aspx
Also, this blog entry is related to H264 playing sing Silverlight 3: http://nonsenseinbasic.blogspot.ru/2011/05/silverlights-mediastreamsource-some.html
It will help you with other issues that may arise.
I want to read pcm samples from a file using fread and want to determine the signal strength of the samples.How do I go about it?
For reading, how many bytes constitute 1 pcm sample? Can I read more than 1 pcm sample at a time?
This is for WAV and AAC files.
You have to understand that WAV-files (and even more so AAC-files) are not all the same. I will only explain about WAV-files, you'll hopefully understand how it is with AAC-files then. As you pointed out, a WAV-file has PCM-encoded data. However that can be: 8-bit, 16-bit, 32-bit, ...Mono, Stereo, 5.1, ...,8kHz, 16kHz, 44.1kHz, etc. Depending on these values you have to interpret the data (e.g. when reading it with the fread()-function) differently. Therefore WAV-files have a header. You have to read that header first, in the standard way (I do not know the details). Then you know how to read the actual data. Since it is not that easy, I suggest you use on of the libraries out there, that read WAV-files for you, e.g. http://www.mega-nerd.com/libsndfile/ . Of course you can also google or use SO to find others. Or you do it the hard way and find out how WAV-file headers look like and decode that data first, then move on to the actual PCM-encoded data.
I have no experience tackling with WAV file, but once read data from mp3 file. As to the mp3 file, each 576 pcm samples are encoded into a frame. All the frames are stored directly into a file alone with some side information. When processing encoded data, I read binary data from the mp3 file and stored in a buffer, decoding buffered data and extract what is meaningful to me.
I think processing wav file(which stores pcm samples per my understand) is not quite different. You can read the binary data from file directly and perform some transformation according to wav encoding specification.
The file itself does not know what kind of data even what format of the data is in it. You can take everything in a file as bytes(even plain text), read byte from file interpreting the binary data yourself.
I have an interesting question today.
I need to convert some pokemon audio files to a list of 8-bit samples (0-255 values). I am writing an assembly routine on the MC6800 chipset that will require these sounds to be played. I plan on including an array with the 8-bit samples that the program will loop through when a function is called.
Does anyone know a way to convert audio files (wav/mp3) into a list of comma separated 8-bit text sample values? Or anything of this relative method?
Thank you so much in advance!
You can use the command-line "sox" tool or the Audacity audio editor to convert the file to a raw, unsigned 8-bit mono audio file.
In Audacity 1.3 or higher, open the audio then select Export, choose "Wave, AIFF, and other uncompressed types" as the format, then click Options... - then choose "Other..." for the Format, "RAW" for the Header, and Signed 8-bit PCM as the encoding. (Sorry, unsigned isn't available.)
From the command line, try sox with -c 1 for 1 channel, -t raw for no header, -u for unsigned linear, and -1 for 1 byte per sample.
Then you can use a tool like "hexdump" to dump out the bytes of the file as numbers and paste them into your code.
If sox doesn't have it, you will have to use it to generate raw (headerless) files and convert the raw files to comma-separated yourself.
EDIT: sox has "Raw textual data" as one of its formats, from the web page. You can make it convert your sound files to unsigned 8-bit linear samples in a first pass and then probably get exactly the output you want using this option for output.
For .wav it is a very simple process. You can find the .wav specification easily with a google search. It comprises a header then simply raw samples. You should read the header first, then loop through all the samples. Usually they are 16 bit samples, so you want to normalize them from the range -32768 to 32767 to your 0-255 range. I suggest simple scaling at first. If that's not successful maybe find the actual min and max amongst the samples and adjust your scale accordingly.
Well a lot depends on your audio format. The wave format, for example, consists of uncompressed interleaved PCM data.
ie for an 8-bit stereo file each sample will be arranged as follows.
[Left Sample 1][Right Sample 1][Left Sample 2][Right Sample2]...[Left Sample n][Right sample n].
ie each 8 bit stereo sample is stored in 2 bytes. 1 for the left channel and 1 for the right. This is the data format your sound hardware will most likely require.
A 16 or 24-bit audio file will work in each way but the left and right samples will be 2 or 3 bytes each, respectively.
Obviously a wave file has a load of extyra information in it. It follows the RIFF format. You can find info on it and the "chunks" wave files use at places such as www.wotsit.org.
To decompress an MP3 is more complicated. You are best off getting hold of a decompressor and running it on the MP3 encoded audio. IT will spit out PCM data as above from the other side.