how to read pcm samples from a file using fread and fwrite? - c

I want to read pcm samples from a file using fread and want to determine the signal strength of the samples.How do I go about it?
For reading, how many bytes constitute 1 pcm sample? Can I read more than 1 pcm sample at a time?
This is for WAV and AAC files.

You have to understand that WAV-files (and even more so AAC-files) are not all the same. I will only explain about WAV-files, you'll hopefully understand how it is with AAC-files then. As you pointed out, a WAV-file has PCM-encoded data. However that can be: 8-bit, 16-bit, 32-bit, ...Mono, Stereo, 5.1, ...,8kHz, 16kHz, 44.1kHz, etc. Depending on these values you have to interpret the data (e.g. when reading it with the fread()-function) differently. Therefore WAV-files have a header. You have to read that header first, in the standard way (I do not know the details). Then you know how to read the actual data. Since it is not that easy, I suggest you use on of the libraries out there, that read WAV-files for you, e.g. http://www.mega-nerd.com/libsndfile/ . Of course you can also google or use SO to find others. Or you do it the hard way and find out how WAV-file headers look like and decode that data first, then move on to the actual PCM-encoded data.

I have no experience tackling with WAV file, but once read data from mp3 file. As to the mp3 file, each 576 pcm samples are encoded into a frame. All the frames are stored directly into a file alone with some side information. When processing encoded data, I read binary data from the mp3 file and stored in a buffer, decoding buffered data and extract what is meaningful to me.
I think processing wav file(which stores pcm samples per my understand) is not quite different. You can read the binary data from file directly and perform some transformation according to wav encoding specification.
The file itself does not know what kind of data even what format of the data is in it. You can take everything in a file as bytes(even plain text), read byte from file interpreting the binary data yourself.

Related

How to concatenate Buffer data and writeFile in JS?

I use Azure Speech SDK to convert a set of text files to voice. Successfully all of the texts are converted and an instance of ArrayBuffer is returned for each text. Then I convert each of them to Buffer and concatenate all of the buffers into one with Buffer.concat(). Then I pass the concatenated buffer data to fs.writeFile() to create an mp3 file. However only the first buffer is included in the audio file instead of the concatenated buffer.
What should I do?
To provide a little background, audio files generally consist of some header data that contains information about the audio (e.g. the sampling rate, how many channels of audio, etc...), followed by the actual audio data. Generally speaking, you should only have one header per audio file.
If you simply concatenate the audio data together, the header from the first file will be read by your media player. Since this doesn't contain any information about the other audio files you've concatenated, you'll get some non deterministic behaviour. It may play the audio from the first file only, it may give you an error, or it may try to play the header section for the remaining audio files as audio data (which is not correct).
For things to work correctly, you'd need to update the header to reflect all of the audio data, and strip out the remaining headers. You'd also need to make sure that the byte alignment, formatting, etc... of all your audio data is consistent.
Your best bet here is to use some software that understands how to read and parse audio files to stitch together your files. Using your search engine of choice for to search for e.g. combine mp3 files should help.

Basics of file formats

I know that it might sound silly but, while working on project at some time I feel the need of knowing the very basics of file formats.
I know every thing is stored in binary 1-0 in hard disk and can get an input stream of that.
But now what if
I don't know the format of file now how to decide it by input stream
I know its format now what part of the input stream represent different portions of file
for eg. take a jpeg file that red background now what part of stream represent this information.
I need urgent help(any type links to blog e-books) will be highly appreciable.
Thank you
List of file signatures and Magic number (programming)
Why do you need to operate on such a low level like input streams? Use a library to get you the information needed on the given file. And btw your jpeg example is a bad one. jpeg is a pixel based image format which has no such thing like "background". That "background" exists only because the user interpretes the red pixels as background.

How are files (especially audio files) organized internally?

I try to grok that: Apple is talking about "packets" in audio files, and there is a fancy function called AudioFileReadPackets which takes a lot of arguments. One of them specifies the "start packet", and another one the number of packets which you want to read.
So I imagine an audio file to look like this, internally: It's made up of a lot of packets. If it's an audio file which has an variable bit rate format, then every packet may have a different size. If the file has an constant bit rate format, then every packet is the same size. So an audio file is like a truck full of boxes, and every box contains some interesting stuff.
Is that correct? Does it apply to any kind of file? Is this how files actually look like?
The question (even with the "especially audio files" qualification) is far too broad; different file formats are, well, different!
So to answer the question you will first have to specify a particular file type; then the answer to the question will invariably to look at its specification. Proprietary formats may not have a publicly available specification.
Specifications for many files (official and reverse engineered) can be found at the brilliant Wotsit's Format site.
AAC used by Apple iTunes and others is defined by ISO/IEC 13818-7:2006. The document will cost you 252 Swiss Francs (about US$233)! You'd have to be really interested (commercially) to pay that rather than use an existing AAC Codec.
"Packet" is a term commonly used in data transmission, so may be more applicable to audio streaming than audio files, where a "frame" may be more appropriate, or for data files in general a "record", but the terminology is flexible because it means whatever the person that wrote it thought it meant! If enough people misuse a term, it essentially becomes redefined (or multiply defined) to mean that, so I would not get too hung up on that. The author was do doubt using it to define a unit that has a defined format within a file that has multiple such units repeated sequentially.
"Packet" looks to me like Apple-specific terminology. I just did a lot of reading and coding to process WAV and MP3 files and I don't believe I saw the term "packet" once.
Files contain whatever the application that created them chose to place in them. Files are essentially a sequence of bytes. Any further organisation is a semantic distinction made by the program that created them. It is untrue to think of all files containing the same structure.
That said, certain data storage problems are similar enough to be solved in similar ways, and patterns start to emerge. Splitting data into records or packets is an example of that.
That's pretty much what audio files look like: a series of chunks of data, or frames. AudioFileReadPacketData and AudioFileReadPackets shield you from the details of, for instance, how big a frame might be in bytes (because you might be reading from a WAV file, which has a different structure to an MP3 file, or your MP3 file uses a variable bit rate).
The concept of frames doesn't apply in general to any file, but then you wouldn't be using the Audio File Services API to access just any old file.
For MP3 (and MP1, MP2) the file consists of frames. And yes, your understanding is correct - in VBR files packets have different size. In WAV files packets have the same length if memory serves (I wrote a decoder / player 11 years ago,).

audio to 8-bit text sample conversion

I have an interesting question today.
I need to convert some pokemon audio files to a list of 8-bit samples (0-255 values). I am writing an assembly routine on the MC6800 chipset that will require these sounds to be played. I plan on including an array with the 8-bit samples that the program will loop through when a function is called.
Does anyone know a way to convert audio files (wav/mp3) into a list of comma separated 8-bit text sample values? Or anything of this relative method?
Thank you so much in advance!
You can use the command-line "sox" tool or the Audacity audio editor to convert the file to a raw, unsigned 8-bit mono audio file.
In Audacity 1.3 or higher, open the audio then select Export, choose "Wave, AIFF, and other uncompressed types" as the format, then click Options... - then choose "Other..." for the Format, "RAW" for the Header, and Signed 8-bit PCM as the encoding. (Sorry, unsigned isn't available.)
From the command line, try sox with -c 1 for 1 channel, -t raw for no header, -u for unsigned linear, and -1 for 1 byte per sample.
Then you can use a tool like "hexdump" to dump out the bytes of the file as numbers and paste them into your code.
If sox doesn't have it, you will have to use it to generate raw (headerless) files and convert the raw files to comma-separated yourself.
EDIT: sox has "Raw textual data" as one of its formats, from the web page. You can make it convert your sound files to unsigned 8-bit linear samples in a first pass and then probably get exactly the output you want using this option for output.
For .wav it is a very simple process. You can find the .wav specification easily with a google search. It comprises a header then simply raw samples. You should read the header first, then loop through all the samples. Usually they are 16 bit samples, so you want to normalize them from the range -32768 to 32767 to your 0-255 range. I suggest simple scaling at first. If that's not successful maybe find the actual min and max amongst the samples and adjust your scale accordingly.
Well a lot depends on your audio format. The wave format, for example, consists of uncompressed interleaved PCM data.
ie for an 8-bit stereo file each sample will be arranged as follows.
[Left Sample 1][Right Sample 1][Left Sample 2][Right Sample2]...[Left Sample n][Right sample n].
ie each 8 bit stereo sample is stored in 2 bytes. 1 for the left channel and 1 for the right. This is the data format your sound hardware will most likely require.
A 16 or 24-bit audio file will work in each way but the left and right samples will be 2 or 3 bytes each, respectively.
Obviously a wave file has a load of extyra information in it. It follows the RIFF format. You can find info on it and the "chunks" wave files use at places such as www.wotsit.org.
To decompress an MP3 is more complicated. You are best off getting hold of a decompressor and running it on the MP3 encoded audio. IT will spit out PCM data as above from the other side.

Reading tag data for Ogg/Flac files

I'm working on a C library that reads tag information from music files. I've already got ID3v2 taken care of, but I can't figure out how Ogg files are structured.
I opened a .ogg file in a hexeditor and I could find the tag data because that was all human readable. But everything from the beginning of the file to the tag data looked like garbage. How is this data encoded?
I don't need any help in the actual code, I just need help visualizing what a Ogg header looks like and what encoding it uses so I that I can read it. I'd like to use a non-hacky approach to reading Ogg files.
I've been looking at the Flac format, which has been helpful.
The Flac file I'm looking at has about 350 bytes between the "fLac" identifier and the human readable Comments section, and none of it is human readable in my hex editor, so I'm sure there has to be something important in there.
I'm using Linux, and I have no intention of porting to Windows or OS X. So if I need to use a glibc only function to convert the encoding, I'm fine with that.
The Ogg file format is documented here. There is a very nice graphical visualization as you requested with a detailed written description.
You may also want to look at libogg which is a open source BSD-licensed library for reading and writing Ogg files.
As is described in the link you provided, the following metadata blocks can occur between the "fLaC" marker and the VORBIS_COMMENT metadata block.
STREAMINFO: This block has information about the whole stream, like sample rate, number of channels, total number of samples, etc. It must be present as the first metadata block in the stream. Other metadata blocks may follow, and ones that the decoder doesn't understand, it will skip.
APPLICATION: This block is for use by third-party applications. The only mandatory field is a 32-bit identifier. This ID is granted upon request to an application by the FLAC maintainers. The remainder is of the block is defined by the registered application. Visit the registration page if you would like to register an ID for your application with FLAC.
PADDING: This block allows for an arbitrary amount of padding. The contents of a PADDING block have no meaning. This block is useful when it is known that metadata will be edited after encoding; the user can instruct the encoder to reserve a PADDING block of sufficient size so that when metadata is added, it will simply overwrite the padding (which is relatively quick) instead of having to insert it into the right place in the existing file (which would normally require rewriting the entire file).
SEEKTABLE: This is an optional block for storing seek points. It is possible to seek to any given sample in a FLAC stream without a seek table, but the delay can be unpredictable since the bitrate may vary widely within a stream. By adding seek points to a stream, this delay can be significantly reduced. Each seek point takes 18 bytes, so 1% resolution within a stream adds less than 2k. There can be only one SEEKTABLE in a stream, but the table can have any number of seek points. There is also a special 'placeholder' seekpoint which will be ignored by decoders but which can be used to reserve space for future seek point insertion.
Just after the above description, there's also the specification of the format of each of those blocks. The link also says
All numbers used in a FLAC bitstream are integers; there are no floating-point representations. All numbers are big-endian coded. All numbers are unsigned unless otherwise specified.
So, what are you missing? You say
I'd like a non-hacky approach to reading Ogg files.
Why re-write a library to do that when they already exist?

Resources