How are files (especially audio files) organized internally? - c

I try to grok that: Apple is talking about "packets" in audio files, and there is a fancy function called AudioFileReadPackets which takes a lot of arguments. One of them specifies the "start packet", and another one the number of packets which you want to read.
So I imagine an audio file to look like this, internally: It's made up of a lot of packets. If it's an audio file which has an variable bit rate format, then every packet may have a different size. If the file has an constant bit rate format, then every packet is the same size. So an audio file is like a truck full of boxes, and every box contains some interesting stuff.
Is that correct? Does it apply to any kind of file? Is this how files actually look like?

The question (even with the "especially audio files" qualification) is far too broad; different file formats are, well, different!
So to answer the question you will first have to specify a particular file type; then the answer to the question will invariably to look at its specification. Proprietary formats may not have a publicly available specification.
Specifications for many files (official and reverse engineered) can be found at the brilliant Wotsit's Format site.
AAC used by Apple iTunes and others is defined by ISO/IEC 13818-7:2006. The document will cost you 252 Swiss Francs (about US$233)! You'd have to be really interested (commercially) to pay that rather than use an existing AAC Codec.
"Packet" is a term commonly used in data transmission, so may be more applicable to audio streaming than audio files, where a "frame" may be more appropriate, or for data files in general a "record", but the terminology is flexible because it means whatever the person that wrote it thought it meant! If enough people misuse a term, it essentially becomes redefined (or multiply defined) to mean that, so I would not get too hung up on that. The author was do doubt using it to define a unit that has a defined format within a file that has multiple such units repeated sequentially.

"Packet" looks to me like Apple-specific terminology. I just did a lot of reading and coding to process WAV and MP3 files and I don't believe I saw the term "packet" once.

Files contain whatever the application that created them chose to place in them. Files are essentially a sequence of bytes. Any further organisation is a semantic distinction made by the program that created them. It is untrue to think of all files containing the same structure.
That said, certain data storage problems are similar enough to be solved in similar ways, and patterns start to emerge. Splitting data into records or packets is an example of that.

That's pretty much what audio files look like: a series of chunks of data, or frames. AudioFileReadPacketData and AudioFileReadPackets shield you from the details of, for instance, how big a frame might be in bytes (because you might be reading from a WAV file, which has a different structure to an MP3 file, or your MP3 file uses a variable bit rate).
The concept of frames doesn't apply in general to any file, but then you wouldn't be using the Audio File Services API to access just any old file.

For MP3 (and MP1, MP2) the file consists of frames. And yes, your understanding is correct - in VBR files packets have different size. In WAV files packets have the same length if memory serves (I wrote a decoder / player 11 years ago,).

Related

How to filter specific frequencies from an Audio file in C

After searching on various search engines, and also here, there is very little information applicable to my situation.
Basically I want to make a program in C that does the following:
Open an Audio File (flac Mp3 and wav, to represent a bit of variety)
Filter and cut out a specific set of frequencies (for Example 4000-5200hz, the frequencies should be entered upon inquiry)
Save the new file (without the filtered frequencies) in the same format as the input file.
Things that would be of interest to me:
Open-Source examples of software that does the same or a similar thing, preferably in C
ANY literature on audio programming in C
Explanations on how the different formats are structured, any sources appreciated
Ps.: I apologise if some parts of the question can be easily googled, but I tried, and there wasn't anything that described this well in detail.
Thanks a lot!!
Answers:
FFmpeg does a lot of audio slicing and dicing, and it's written in pure C. It's pretty big, though, and might be difficult to digest in one go.
"Audio programming" is a bit vague. But from the rest of your question, it sounds like you want to open an audio file from disk, apply some transformations to the audio, and write the data to a new file. (Other areas under the "audio programming" umbrella would include accessing platform-specific APIs to read from a microphone and write audio to an output device).
Broad topic again, but we'll start simple.
I suggest getting (or generating) a .WAV file to start with. WAV files are probably the simplest audio files to read and write manually. Here is a page that describes what you need to know about the WAV format.
Pulse code modulation (PCM) is the simplest audio format to work with since you don't need to worry about decompressing it first. Here is a page (that I wrote) describing different PCM formats.
As for filtering and cutting different frequencies, I think what you're looking for would be low-pass, high-pass, or band-pass filters.
I hope that helps you get started. Ask more questions here on Stack Overflow as needed.

How are the binary data inside a certain format parsed?

Considering a binary data (video/images/audio/executable) can be regarded as a long sequence of random bytes,
when the data is inside a special format (SQL, BOLB in database, MP3, JSON, XML etc), how does the parser know that a special char(or sequence of chars, like {,},\t,space,EOF) is used in formatting, not a part of the binary data and vice versa?
Also, I am not quite sure which category this question fits in, so I put lexical analysis and linguistics. What subject/fields of computer science studies this?
This is indeed an odd place for this question. I'm a little unclear on exactly what you're asking here, but in sum, not all binary data (assuming you mean machine readable data) are equal. For instance: audio, images, and video are not executable data, they are parsed data; as such they are handled differently.
Also, "binary data" are not as arbitrary as you might think upon opening a hex editor for the first time :). Executables are structured into DATA and CODE segments, so with those flags the computer knows how to treat things appropriately. As for the other three types you mentioned, they are all structured differently depending upon their file format, which is why so many different file formats are out there! The executable program which parses these files knows how to handle them based upon information contained in the code about the file format, which of course means that the program has to know how to handle the file format and have info on how it is segmented to load it properly, which is why you can't open an MP3 in Microsoft Paint.
As for the study of file formats and data storage, that has applications in a lot of areas, it's not really a field unto itself so much as a topic that comes up in a lot of areas. Information Theory, Reverse Engineering, Natural Language Processing, and many others have uses for understanding different file types and how they store data. Anyhow, this was only a brief, cursory explanation, and there's plenty of things you can google (try .exe file formats or .jpg/.png file formats to start).

read thunderbird address mab files content

I have several address list's on my TBIRD address book.
every time I need to edit an address that is contained in several lists, is a pain on the neck to find which list contains the address to be modified.
As a help tool I want to read the several files and just gave the user a list of which
xxx.MAB files includes the searched address on just one search.
having the produced list, the user can simply go to edit just the right address list's.
Will like to know a minimum about the format of mentioned MAB files, so I can OPEN + SEARCH for strings into the files.
thanks in advance
juan
PD have asked mozilla forum, but there are no plans from mozilla to consolidate the address on one master file and have the different list's just containing links to the master. There is one individual thinking to do that, but he has no idea when due to lack of resources,
on this forum there is a similar question mentioning MORK files, but my actual TBIRD looks like to have all addresses contained on MAB files
I am afraid there is no answer that will give you a proper solution for this question.
MORK is a textual database containing the files Address Book Data (.mab files) and Mail Folder Summaries (.msf files).
The format, written by David McCusker, is a mix of various numerical namespaces and is undocumented and seem to no longer be developed/maintained/supported. The only way you would be able to get the grips of it is to reverse engineer it parallel with looking at source code using this format.
However, there have been experienced people trying to write parsers for this file format without any success. According to Wikipedia former Netscape engineer Jamie Zawinski had this to say about the format:
...the single most brain-damaged file format that I have ever seen in
my nineteen year career
This page states the following:
In brief, let's count its (Mork's) sins:
Two different numerical namespaces that overlap.
It can't decide what kind of character-quoting syntax to use: Backslash? Hex encoding with dollar-sign?
C++ line comments are allowed sometimes, but sometimes // is just a pair of characters in a URL.
It goes to all this serious compression effort (two different string-interning hash tables) and then writes out Unicode strings
without using UTF-8: writes out the unpacked wchar_t characters!
Worse, it hex-encodes each wchar_t with a 3-byte encoding, meaning the file size will be 3x or 6x (depending on whether whchar_t is 2
bytes or 4 bytes.)
It masquerades as a "textual" file format when in fact it's just another binary-blob file, except that it represents all its magic
numbers in ASCII. It's not human-readable, it's not hand-editable, so
the only benefit there is to the fact that it uses short lines and
doesn't use binary characters is that it makes the file bigger. Oh
wait, my mistake, that isn't actually a benefit at all."
The frustration shines through here and it is obviously not a simple task.
Consequently there apparently exist no parsers outside Mozilla products that is actually able to parse this format.
I have reversed engineered complex file formats in the past and know it can be done with the patience and right amount of energy.
Sadly, this seem to be your only option as well. A good place to start would be to take a look at Thunderbird's source code.
I know this doesn't give you a straight-up solution but I think it is the only answer to the question considering the circumstances for this format.
And of course, you can always look into the extension API to see if that allows you to access the data you need in a more structured way than handling the file format directly.
Sample code which reads mork
Node.js: https://www.npmjs.com/package/mork-parser
Perl: http://metacpan.org/pod/Mozilla::Mork
Python: https://github.com/KevinGoodsell/mork-converter
More links: https://wiki.mozilla.org/Mork

Audio (mp3) editing in C

What I wish to do is to "split" an .mp3 into two separate files, taking duration as an argument to specify how long the first file should be.
However, I'm not entirely sure how to achieve this in C. Is there a particular library that could help with this? Thanks in advance.
I think you should use the gstreamer framework for this purpose. So you can write an application where you can use existing plugins to do this job. I am not sure of this but you can give it a try. Check out http://gstreamer.freedesktop.org/
For queries related to gstreamer: http://gstreamer-devel.966125.n4.nabble.com/
If you don't find any library in the end, then understand the header of a mp3 file and divide the original file into any number of parts and add individual headers to them. Its not gonna be easy but its possible.
It can be as simple as cutting the file at a position determined by the mp3 bitrate and duration requested. But:
your file should be CBR - that method won't work on ABR or VBR files, because they have different densityes
your player should be robust enough not to break if it gets partial mp3 frame at a start. Most of the playback libraries will handle mp3s split that way very gracefully.
If you need more info, just ask, I am on the mobile and can get you more info later.
If you want to be extra precise when cutting, you can use some library to parse mp3 frame headers, and then write frames that you need. That way, and the way mentioned before, you'll get only frame alignment as a minimum, and you have to live with thaty and that's 40ms.
If that isn't enough, you must decode mp3 to PCM, split at sample boundary, then recompress to mp3 again.
Good luck
P.s.
When you say split, I hope you don't expect them to play one after another with no audible 'artifacts'. Mp3 frames aren't self-sufficient, as they carry information from the frame before.

Reading tag data for Ogg/Flac files

I'm working on a C library that reads tag information from music files. I've already got ID3v2 taken care of, but I can't figure out how Ogg files are structured.
I opened a .ogg file in a hexeditor and I could find the tag data because that was all human readable. But everything from the beginning of the file to the tag data looked like garbage. How is this data encoded?
I don't need any help in the actual code, I just need help visualizing what a Ogg header looks like and what encoding it uses so I that I can read it. I'd like to use a non-hacky approach to reading Ogg files.
I've been looking at the Flac format, which has been helpful.
The Flac file I'm looking at has about 350 bytes between the "fLac" identifier and the human readable Comments section, and none of it is human readable in my hex editor, so I'm sure there has to be something important in there.
I'm using Linux, and I have no intention of porting to Windows or OS X. So if I need to use a glibc only function to convert the encoding, I'm fine with that.
The Ogg file format is documented here. There is a very nice graphical visualization as you requested with a detailed written description.
You may also want to look at libogg which is a open source BSD-licensed library for reading and writing Ogg files.
As is described in the link you provided, the following metadata blocks can occur between the "fLaC" marker and the VORBIS_COMMENT metadata block.
STREAMINFO: This block has information about the whole stream, like sample rate, number of channels, total number of samples, etc. It must be present as the first metadata block in the stream. Other metadata blocks may follow, and ones that the decoder doesn't understand, it will skip.
APPLICATION: This block is for use by third-party applications. The only mandatory field is a 32-bit identifier. This ID is granted upon request to an application by the FLAC maintainers. The remainder is of the block is defined by the registered application. Visit the registration page if you would like to register an ID for your application with FLAC.
PADDING: This block allows for an arbitrary amount of padding. The contents of a PADDING block have no meaning. This block is useful when it is known that metadata will be edited after encoding; the user can instruct the encoder to reserve a PADDING block of sufficient size so that when metadata is added, it will simply overwrite the padding (which is relatively quick) instead of having to insert it into the right place in the existing file (which would normally require rewriting the entire file).
SEEKTABLE: This is an optional block for storing seek points. It is possible to seek to any given sample in a FLAC stream without a seek table, but the delay can be unpredictable since the bitrate may vary widely within a stream. By adding seek points to a stream, this delay can be significantly reduced. Each seek point takes 18 bytes, so 1% resolution within a stream adds less than 2k. There can be only one SEEKTABLE in a stream, but the table can have any number of seek points. There is also a special 'placeholder' seekpoint which will be ignored by decoders but which can be used to reserve space for future seek point insertion.
Just after the above description, there's also the specification of the format of each of those blocks. The link also says
All numbers used in a FLAC bitstream are integers; there are no floating-point representations. All numbers are big-endian coded. All numbers are unsigned unless otherwise specified.
So, what are you missing? You say
I'd like a non-hacky approach to reading Ogg files.
Why re-write a library to do that when they already exist?

Resources