audio to 8-bit text sample conversion - c

I have an interesting question today.
I need to convert some pokemon audio files to a list of 8-bit samples (0-255 values). I am writing an assembly routine on the MC6800 chipset that will require these sounds to be played. I plan on including an array with the 8-bit samples that the program will loop through when a function is called.
Does anyone know a way to convert audio files (wav/mp3) into a list of comma separated 8-bit text sample values? Or anything of this relative method?
Thank you so much in advance!

You can use the command-line "sox" tool or the Audacity audio editor to convert the file to a raw, unsigned 8-bit mono audio file.
In Audacity 1.3 or higher, open the audio then select Export, choose "Wave, AIFF, and other uncompressed types" as the format, then click Options... - then choose "Other..." for the Format, "RAW" for the Header, and Signed 8-bit PCM as the encoding. (Sorry, unsigned isn't available.)
From the command line, try sox with -c 1 for 1 channel, -t raw for no header, -u for unsigned linear, and -1 for 1 byte per sample.
Then you can use a tool like "hexdump" to dump out the bytes of the file as numbers and paste them into your code.

If sox doesn't have it, you will have to use it to generate raw (headerless) files and convert the raw files to comma-separated yourself.
EDIT: sox has "Raw textual data" as one of its formats, from the web page. You can make it convert your sound files to unsigned 8-bit linear samples in a first pass and then probably get exactly the output you want using this option for output.

For .wav it is a very simple process. You can find the .wav specification easily with a google search. It comprises a header then simply raw samples. You should read the header first, then loop through all the samples. Usually they are 16 bit samples, so you want to normalize them from the range -32768 to 32767 to your 0-255 range. I suggest simple scaling at first. If that's not successful maybe find the actual min and max amongst the samples and adjust your scale accordingly.

Well a lot depends on your audio format. The wave format, for example, consists of uncompressed interleaved PCM data.
ie for an 8-bit stereo file each sample will be arranged as follows.
[Left Sample 1][Right Sample 1][Left Sample 2][Right Sample2]...[Left Sample n][Right sample n].
ie each 8 bit stereo sample is stored in 2 bytes. 1 for the left channel and 1 for the right. This is the data format your sound hardware will most likely require.
A 16 or 24-bit audio file will work in each way but the left and right samples will be 2 or 3 bytes each, respectively.
Obviously a wave file has a load of extyra information in it. It follows the RIFF format. You can find info on it and the "chunks" wave files use at places such as www.wotsit.org.
To decompress an MP3 is more complicated. You are best off getting hold of a decompressor and running it on the MP3 encoded audio. IT will spit out PCM data as above from the other side.

Related

About the wav data sub-chunk

I am working on a project in which I have to merge two 8bits .wav files using C and i still have no clue how to do it.
I have read about wav files and I want to start by reading one of the files.
There's one thing i didn't understand:
Let's say i have an 8bit WAV audio file, And i was able to read (even tho I am still trying to) the Data that starts after the 44 byte, I will get numbers between 0 and 255 logically.
My question is:
What do those numbers mean?
If I get 255 or 0 what do they mean?
Are they samples from the wave?
Can anyone please explain?
Thanks in advance
Assuming we're not dealing with file format issues, getting values between 0 and 255 means that the audio samples are of unsigned eight-bit format, as you have put it.
One way of merging data would consist of reading data from files into buffers, arrays a and b and summing them value by value: c[i] = a[i] + b[i]. By doing so, you'd have to take care of the following:
length of the files may not be equal
on summing the unsigned 8-bit buffers, such as yours will almost certainly overflow
This is usually achieved using a for loop. You first get the sizes of the chunks. Your for loop has to be written in such a way that it neither reads past the array boundary, nor ignores what can be read. For preventing overflows you can either:
divide values by two on reading
or
read (convert) into a format which wouldn't overflow, then normalize and convert the merged data back into the original format or whichever format desired.
For all particulars of reading from and writing to a .wav format file you may use some of the existing audio file libraries, or write your own routine. Dealing with audio file format is not a trivial thing, though. Here's a reference on .wav format.
Here are few audio file APIs worth of looking at:
libsndfile
sndlib
Hope this can help.
See any good guide to WAVE for information on the format of samples in the data chunk, such as this one I found: http://www.neurophys.wisc.edu/auditory/riff-format.txt
Relevant excerpts:
In a single-channel WAVE file, samples are stored consecutively. For
stereo WAVE files, channel 0 represents the left channel, and channel
1 represents the right channel. The speaker position mapping for more
than two channels is currently undefined. In multiple-channel WAVE
files, samples are interleaved.
Data Format of the Samples
Each sample is contained in an integer i. The size of i is the
smallest number of bytes required to contain the specified sample
size. The least significant byte is stored first. The bits that
represent the sample amplitude are stored in the most significant bits
of i, and the remaining bits are set to zero.
For example, if the sample size (recorded in nBitsPerSample) is 12
bits, then each sample is stored in a two-byte integer. The least
significant four bits of the first (least significant) byte is set to
zero.
The data format and maximum and minimums values for PCM waveform
samples of various sizes are as follows:
Sample Size Data Format Maximum Value Minimum Value
One to Unsigned 255 (0xFF) 0
eight bits integer
Nine or Signed Largest Most negative more bits
integer i positive value of i
value of i
N.B.: Even if the file has >8 bits of audio resolution, you should read the file as an array of unsigned char and reconstitute the larger samples manually as per the above spec. Don't try to do anything like reading the samples directly over an array of native C ints, as their layout and size is platform-dependent and therefore should not be relied upon in any code.
Note also that the header is not guaranteed to be 44 bytes long: How can I detect whether a WAV file has a 44 or 46-byte header? You need to read the length and process the header based on that, not any assumption.

comparing these two audio files

i have a two audio files one is original file and another i have corrupted it by reversing some bits, how to compare the quality of these two files is there any algorithm or an software where i can compare the quality of the two files.
"any algorithm or an software": Do you want to program or not?
If you want a software to do this for you: stackoverflow cannot help you
If you are willing to program (at least call functions in a library) that's a different story:
There are some libraries which can do this, specifically to convert the audio from compressed to WAVEFORM format in the first place (the library-to-choose depends on which format your audio is in). Or is your audio in waveform format already? you didnt tell. If you have the audio in waveform format (raw audio in e.g. * signed 16bit mono at 22khz) you can easily program this yourself: Since the only damage you did to your audio is bitflips you can iterate throught them and just sum the differences up: you have to take in account the format the waveform is in tho: you cannot compare the bit-level (because each bit has different significance); if you have * signed 16bit audio you have to use in C the type int so that A) the comparison is signed and B) the difference does not overflow.
One physical measurement for the quality of sound is the SNR: http://en.wikipedia.org/wiki/Signal-to-noise_ratio. I don't know of any lib that does that for you, but it is not to hard to do yourself:
calculate the noise: noise[n] = manipulated[n] - original[n], n = sample index
calculate the power of "noise" and of "original": p_noise[n] = noise[n] * noise[n], ...
get the SNR by dividing the values = SNR[n] = p_original[n]/p_noise[n]
You may want to calculate an average(!) SNR... I hope you can figure out how to do that yourself. This should put your on the right track.

how to read pcm samples from a file using fread and fwrite?

I want to read pcm samples from a file using fread and want to determine the signal strength of the samples.How do I go about it?
For reading, how many bytes constitute 1 pcm sample? Can I read more than 1 pcm sample at a time?
This is for WAV and AAC files.
You have to understand that WAV-files (and even more so AAC-files) are not all the same. I will only explain about WAV-files, you'll hopefully understand how it is with AAC-files then. As you pointed out, a WAV-file has PCM-encoded data. However that can be: 8-bit, 16-bit, 32-bit, ...Mono, Stereo, 5.1, ...,8kHz, 16kHz, 44.1kHz, etc. Depending on these values you have to interpret the data (e.g. when reading it with the fread()-function) differently. Therefore WAV-files have a header. You have to read that header first, in the standard way (I do not know the details). Then you know how to read the actual data. Since it is not that easy, I suggest you use on of the libraries out there, that read WAV-files for you, e.g. http://www.mega-nerd.com/libsndfile/ . Of course you can also google or use SO to find others. Or you do it the hard way and find out how WAV-file headers look like and decode that data first, then move on to the actual PCM-encoded data.
I have no experience tackling with WAV file, but once read data from mp3 file. As to the mp3 file, each 576 pcm samples are encoded into a frame. All the frames are stored directly into a file alone with some side information. When processing encoded data, I read binary data from the mp3 file and stored in a buffer, decoding buffered data and extract what is meaningful to me.
I think processing wav file(which stores pcm samples per my understand) is not quite different. You can read the binary data from file directly and perform some transformation according to wav encoding specification.
The file itself does not know what kind of data even what format of the data is in it. You can take everything in a file as bytes(even plain text), read byte from file interpreting the binary data yourself.

WAV file data recovery

I have a situation where there is a corrupt WAV file from which I'm trying to recover data.
My colleagues have sliced up the large WAV file into smaller WAV files with proper headers. This has produced some interesting results.
Sliced into 1MB segments we get these results:
The first wave file segment is all noise.
The second wave file segment is distorted.
The third wave file segment is clear.
This pattern is repeated for the entire length of the file (after it's been broken into smaller files).
For 20MB slices:
The first wave file segment is all noise.
The second wave file segment is clear.
The third wave file segment is distorted.
Again, this pattern is repeated for the entire length of the file (after it's been broken into smaller files).
Would anyone know why this is occurring?
Assuming the WAV contains uncompressed (raw) samples, recovery should be easy. You need to know the sample format. For example: 16 bits, two channels, 44100 Hz (which is cd quality). Because one of the segments is okay, then you can look at this to figure out what the right values are.
Then just open the WAV using these values in, e.g., Adobe Audition (formerly Cool Edit), or any other wave editor that supports import of raw data.
Edit: Okay, now to answer your question. Some segments are clear, because then the alignment is right. Take the cd quality again, as I described before. The bytes of one sample look like this:
left_channel_high | left_channel_low | right_channel_high | right_channel_low
(I'm not sure about the ordering here! But it's just an example.) So the first data byte had better be the most significant byte of the left channel, or else you'll end up with fragments of two samples being interpreted as one whole sample:
left_channel_low | right_channel_high | right_channel_low || left_channel_high
-------------------part of first sample------------------ || --second sample--
You can see that everything "shifted" here, which happens because the size of your file slices is not a multiple of the sample size in bytes.
If you're lucky, this just causes the channels to be swapped. If you're unlucky, high and low bytes get swapped. Interestingly, this does lead to kind-of recognizable, but severely distorted audio.
What puzzles me is that the pattern you report repeats in blocks of three. From the above, I'd expect either two or four. Perhaps you are using an unusual sample format, such as 24-bits (3 bytes)?

libsox to convert wav file sample rate etc.

EDIT: i want to use libsox to programatically convert a wav file's sample rate, audio format, channels, and etc.
in the libsox man page, there are a bunch of functions I can use but I'm clueless as hell on what to do. Can anyone give me a sort of steps on how to do it?
Help?
Can anyone please explain this?
The function sox_write writes len samples from buf using the format
handler specified by ft. Data in buf must be 32-bit signed samples and
will be converted during the write process. The value of len is speci-
fied in total samples. If its value is not evenly divisable by the num-
ber of channels, undefined behavior will occur.
I'd recommend a combination of libsndfile and libsamplerate
http://www.mega-nerd.com/SRC
SRC provides a small set of converters
to allow quality to be traded off
against computation cost. The current
best converter provides a
signal-to-noise ratio of 145dB with
-3dB passband extending from DC to 96% of the theoretical best bandwidth for
a given pair of input and output
sample rates.
http://www.mega-nerd.com/libsndfile/
Ability to read and write a large number of file formats.
A simple, elegant and easy to use Applications Programming
Interface.
Usable on Unix, Win32, MacOS and others.
On the fly format conversion, including endian-ness swapping, type
conversion and bitwidth scaling.
Optional normalisation when reading floating point data from files
containing integer data.
Ability to open files in read/write mode.
The ability to write the file header without closing the file (only
on files open for write or
read/write).
Ability to query the library about all supported formats and
retrieve text strings describing each
format.
Well, I guess your question has something to do with the last sentence. If you have an interleaved buffer, the number of samples in the buffer has to be divisable by the number of channels, because this is the number of per-channel samples you will write. For example, let's say you have L and R channels; your data will be like this on the buffer:
[0] 1st sample - L
[1] 1st sample - R
[2] 2nd sample - L
[3] 2nd sample - R
...
[n-1] n/2-th sample - L
[n] n-th sample - R
Hope it helps.

Resources