i have a two audio files one is original file and another i have corrupted it by reversing some bits, how to compare the quality of these two files is there any algorithm or an software where i can compare the quality of the two files.
"any algorithm or an software": Do you want to program or not?
If you want a software to do this for you: stackoverflow cannot help you
If you are willing to program (at least call functions in a library) that's a different story:
There are some libraries which can do this, specifically to convert the audio from compressed to WAVEFORM format in the first place (the library-to-choose depends on which format your audio is in). Or is your audio in waveform format already? you didnt tell. If you have the audio in waveform format (raw audio in e.g. * signed 16bit mono at 22khz) you can easily program this yourself: Since the only damage you did to your audio is bitflips you can iterate throught them and just sum the differences up: you have to take in account the format the waveform is in tho: you cannot compare the bit-level (because each bit has different significance); if you have * signed 16bit audio you have to use in C the type int so that A) the comparison is signed and B) the difference does not overflow.
One physical measurement for the quality of sound is the SNR: http://en.wikipedia.org/wiki/Signal-to-noise_ratio. I don't know of any lib that does that for you, but it is not to hard to do yourself:
calculate the noise: noise[n] = manipulated[n] - original[n], n = sample index
calculate the power of "noise" and of "original": p_noise[n] = noise[n] * noise[n], ...
get the SNR by dividing the values = SNR[n] = p_original[n]/p_noise[n]
You may want to calculate an average(!) SNR... I hope you can figure out how to do that yourself. This should put your on the right track.
Related
Currently, I am trying to read the volume / gain of a raw 16 bit PCM audio stream. Basically, I'd like to be able to pipe a stream into an executable and print a number in a range from 0 (no sound) to 1 (at volume limit / 0db from clipping) followed by a new line. This should continue only when manually cancelled or when the audio source exits. It does not need to be very acurate, nor do the values require being smoothed heavily (like an RMS value; although a bit of smoothing would probably be better than none at all). The problem is that I understand very little of how to work with raw PCM, and when I found the math, I realized that I currently do not posses the skills required to do something like this. So my question would be: Could someone who's more familiar with the matter provide me with a working example code that could do the behaviour described above, idealy with an explaination of what exactly is done?
The language isn't too important, although something fast like Rust, C, C++, etc. would be preferred.
I'm writing a program in C which should visualize audio. As audio source I use a microphone and the Alsa C Sound library. I take the sound frames from the Alsa library, make some transformations (Fourier analysis and similar), and then visualise them. I almost have a working program besides one exception, it seems I'm converting the Alsa-frames to doubles in a wrong way.
This is how I do it:
unsigned char x=getFirstByte();
signed char y=getSecondByte();
double analog_signal=(y*256+x)/32768.;
Now this code works, but sometimes (relatively often) I get spikes where the value of analog_signal is about 0.99... where it shouldn't be.
So I started printing the values of x and y when such spikes occur.
The output was quite clear: always when such a spike occurred y was equal to 127 and x was some value around 230.
My conversion is still correct in my understanding, but it seems that Alsa treats its values in a different way. So that this special value of 127 in the second byte has to be converted differently, for whatever reason?!
I don't want to believe that my microphone is broken, so could someone who has worked with the Alsa-library, kindly give me some advice on my problem.
I would also be happy with a function of the Alsa-library which does this conversion for me as I haven't found one but maybe overlooked it.
I've been doing some reading on file formats and I'm very interested in them. I'm wondering what the process is to create a format. For example, a .jpeg, or .gif, or an audio format. What programming language would you use (if you use a programming language at all)?
The site warned me that this question might be closed, but that's just a risk I'll take in the pursuit of knowledge. :)
what the process is to create a format. For example, a .jpeg, or .gif, or an audio format.
Step 1. Decide what data is going to be in the file.
Step 2. Design how to represent that data in the file.
Step 3. Write it down so other people can understand it.
That's it. A file format is just an idea. Properly, it's an "agreement". Nothing more.
Everyone agrees to put the given information in the given format.
What programming language would you use (if you use a programming language at all)?
All programming languages that can do I/O can have file formats. Some have limitations on which file formats they can handle. Some languages don't handle low-level bytes as well as others.
But a "format" is not an "implementation".
The format is a concept. The implementation is -- well -- an implementation.
You do not need a programming language to write the specification for a file format, although a word processor might prove to be a handy tool.
Basically, you need to decide how the information of the file is to be stored as a sequence of bits. This might be trivial, or it might be exceedingly difficult. As a trivial example, a very primitive bitmap image format could start with one unsigned 32-bit integer representing the width of the bitmap, and then one more such integer representing the height of the bitmap. Then you could decide to simply write out the colour of the pixels sequentially, left-to-right and top-to-bottom (row 1 of pixels, row 2 of pixels, ...), using 24-bits per pixel, on the form 8 bits for red + 8 bits for green + 8 bits for blue. For instance, a 8×8 bitmap consisting of alternating blue and red pixels would be stored as
00000008000000080000FFFF00000000FFFF0000...
In a less trivial example, it really depends on the data you wish to save. Typically you would define a lot of records/structures, such as BITMAPINFOHEADER, and specify in what order they should come, how they should be nestled, and you might need to write a lot of indicies and look-up tables. Myself I have written quite a few file formats, most recently the ASD (AlgoSim Data) file format used to save AlgoSim structures. Such files consists of a number of records (maybe nestled), look-up tables, magic words (indicating structure begin, structures end, etc.) and strings in a custom-defined format. One typical thing that often simplifies the file format is that the records contain data about their size, and the sizes of the custom data parts following the record (in case the record is some sort of a header, preceeding data in a custom format, e.g. pixel colours or sound samples).
If you havn't been working with file formats before, I would suggest that you learn a very simple format, such as the Windows 3 Bitmap format, and write your own BMP encoder/decoder, i.e. programs that creates and reads BMP files (from scratch), and displays the read BMP files. Then you now the basic ideas.
Fundamentally, files only exist to store information that needs to be loaded back in the future, either by the same program or a different one. A really good file format is designed so that:
Any programming language can be used to read or write it.
The information a program would most likely need from the file can be accessed quickly and efficiently.
The format can be extended and expanded in the future, without breaking backwards compatibility.
The format should accommodate any special requirements (e.g. error resiliency, compression, encoding, etc.) present in the domain in which the file will be used
You are most certainly interested in looking into Protocol Buffers and Thrift. These tools provide a modern, principled way of designing forwards and backward compatible file formats.
I have an interesting question today.
I need to convert some pokemon audio files to a list of 8-bit samples (0-255 values). I am writing an assembly routine on the MC6800 chipset that will require these sounds to be played. I plan on including an array with the 8-bit samples that the program will loop through when a function is called.
Does anyone know a way to convert audio files (wav/mp3) into a list of comma separated 8-bit text sample values? Or anything of this relative method?
Thank you so much in advance!
You can use the command-line "sox" tool or the Audacity audio editor to convert the file to a raw, unsigned 8-bit mono audio file.
In Audacity 1.3 or higher, open the audio then select Export, choose "Wave, AIFF, and other uncompressed types" as the format, then click Options... - then choose "Other..." for the Format, "RAW" for the Header, and Signed 8-bit PCM as the encoding. (Sorry, unsigned isn't available.)
From the command line, try sox with -c 1 for 1 channel, -t raw for no header, -u for unsigned linear, and -1 for 1 byte per sample.
Then you can use a tool like "hexdump" to dump out the bytes of the file as numbers and paste them into your code.
If sox doesn't have it, you will have to use it to generate raw (headerless) files and convert the raw files to comma-separated yourself.
EDIT: sox has "Raw textual data" as one of its formats, from the web page. You can make it convert your sound files to unsigned 8-bit linear samples in a first pass and then probably get exactly the output you want using this option for output.
For .wav it is a very simple process. You can find the .wav specification easily with a google search. It comprises a header then simply raw samples. You should read the header first, then loop through all the samples. Usually they are 16 bit samples, so you want to normalize them from the range -32768 to 32767 to your 0-255 range. I suggest simple scaling at first. If that's not successful maybe find the actual min and max amongst the samples and adjust your scale accordingly.
Well a lot depends on your audio format. The wave format, for example, consists of uncompressed interleaved PCM data.
ie for an 8-bit stereo file each sample will be arranged as follows.
[Left Sample 1][Right Sample 1][Left Sample 2][Right Sample2]...[Left Sample n][Right sample n].
ie each 8 bit stereo sample is stored in 2 bytes. 1 for the left channel and 1 for the right. This is the data format your sound hardware will most likely require.
A 16 or 24-bit audio file will work in each way but the left and right samples will be 2 or 3 bytes each, respectively.
Obviously a wave file has a load of extyra information in it. It follows the RIFF format. You can find info on it and the "chunks" wave files use at places such as www.wotsit.org.
To decompress an MP3 is more complicated. You are best off getting hold of a decompressor and running it on the MP3 encoded audio. IT will spit out PCM data as above from the other side.
EDIT: i want to use libsox to programatically convert a wav file's sample rate, audio format, channels, and etc.
in the libsox man page, there are a bunch of functions I can use but I'm clueless as hell on what to do. Can anyone give me a sort of steps on how to do it?
Help?
Can anyone please explain this?
The function sox_write writes len samples from buf using the format
handler specified by ft. Data in buf must be 32-bit signed samples and
will be converted during the write process. The value of len is speci-
fied in total samples. If its value is not evenly divisable by the num-
ber of channels, undefined behavior will occur.
I'd recommend a combination of libsndfile and libsamplerate
http://www.mega-nerd.com/SRC
SRC provides a small set of converters
to allow quality to be traded off
against computation cost. The current
best converter provides a
signal-to-noise ratio of 145dB with
-3dB passband extending from DC to 96% of the theoretical best bandwidth for
a given pair of input and output
sample rates.
http://www.mega-nerd.com/libsndfile/
Ability to read and write a large number of file formats.
A simple, elegant and easy to use Applications Programming
Interface.
Usable on Unix, Win32, MacOS and others.
On the fly format conversion, including endian-ness swapping, type
conversion and bitwidth scaling.
Optional normalisation when reading floating point data from files
containing integer data.
Ability to open files in read/write mode.
The ability to write the file header without closing the file (only
on files open for write or
read/write).
Ability to query the library about all supported formats and
retrieve text strings describing each
format.
Well, I guess your question has something to do with the last sentence. If you have an interleaved buffer, the number of samples in the buffer has to be divisable by the number of channels, because this is the number of per-channel samples you will write. For example, let's say you have L and R channels; your data will be like this on the buffer:
[0] 1st sample - L
[1] 1st sample - R
[2] 2nd sample - L
[3] 2nd sample - R
...
[n-1] n/2-th sample - L
[n] n-th sample - R
Hope it helps.