I'm following along with The Audio Programming Book by Richard Boulanger. In Chapter two we create several basic audio rate oscillators using C and the standard libraries to generate simple WAV files.
Both my implementation and the book's code exhibit a strange issue:
I am able to generate a simple sine wave using sin() (from math.h), but noticed the playback has a bit of static. Upon investigation I noticed that occasionally, some of the audio frames at peak amplitude are getting "flipped" to a negative value.
To debug this situation, I'm outputting the values of the audio frames generated to stdout and this flipping behavior lines up with the peak value of 0.999999.
When I scale the output by 0.99 this problem disappears. What's going on?
Looking at siggen.c I was able to find the real problem has to do with the maximum value conversion to short datatype:
portsf.c:78
#define MAX_16BIT (32768.0)
portsf.c:1618
ssamp = (short) psf_round(fsamp * MAX_16BIT);
When we have a float sample near 1.0 this results in the conversion (short)(32768). Given that a short can only hold up to 32767 the value gets wrapped to the smallest short possible, -32768.
To fix this, I suggest modifying line 1618 to instead read:
ssamp = (short) psf_round(fsamp * 32767.0);
This will reduce the peak-to-peak value range by 1, but well worth avoiding this problem in my opinion
Related
I am quite new to signal processing so forgive me if I rant on a bit. I have download and installed FFTW for windows. The documentation is ok but I still have queries.
My overall aim is to capture raw audio data sampled at 44100 samps/sec from the sound card on the computer (this task is already implemented using libraries and my code), and then perform the DFT on blocks of this audio data.
I am only interested in finding a range of frequency components in the audio and I will not be performing any inverse DFT. In this case, is a real to real transformation all that is necessary, hence the fftw_plan_r2r_1d() function?
My blocks of data to be transformed are 11025 samples long. My function is called as shown below. This will result in a spectrum array of 11025 bins. How do I know the maximum frequency component in the result?
I believe that the bin spacing is Fs/n , 44100/11025, so 4. Does it mean that I will have a frequency spectrum in the array from 0 Hz all the way up to 44100Hz in steps of 4, or up to half the nyquist frequency 22200?
This would be a problem for me as I only wish to search for frequencies from 60Hz up to 3000Hz. Is there some way to limit the transform range?
I don't see any arguments for the function, or maybe there is another way?
Many thanks in advance for any help with this.
p = fftw_plan_r2r_1d(11025, audioData, spectrum, FFTW_REDFT00, FFTW_ESTIMATE);
To answer some of your individual questions from the above:
you need a real-to-complex transform, not real-to-real
you will calculate the magnitude of the complex output bins at the frequencies of interest (magnitude = sqrt(re*re + im*im))
the frequency resolution is indeed Fs / N = 44100 / 11025 = 4 Hz, i.e. the width of each output bin is 4 Hz
for a real-to-complex transform you get N/2 + 1 output bins giving you frequencies from 0 to Fs / 2
you just ignore frequencies in which you are not interested - the FFT is very efficient so you can afford to "waste" unwanted output bins (unless you are only interested in a relatively small number of output frequencies)
Additional notes:
plan creation does not actually perform an FFT - typically you create a plan once and then use it many times (by calling fftw_execute)
for performance you probably want to use the single precision calls (e.g. fftwf_execute rather than fftw_execute, and similarly for plan creation etc)
Some useful related questions/answers on StackOverflow:
How do I obtain the frequencies of each value in an FFT?
How to get frequency from fft result?
How to generate the audio spectrum using fft in C++?
There are many more similar questions and answers which you might also want to read - search for the fft and fftw tags.
Also note that dsp.stackexchange.com is the preferred site for site for questions on DSP theory rather than actual specific programming problems.
I've got two 8-bit chars. They're the product of some 16-bit signed float being broken up into MSB and LSB inside a gyroscope.
The standard method I know of combining two bytes is this:
(signed float) = (((MSB value) << 8) | (LSB value));
Just returns garbage.
How can I do this?
Okay, so, dear me from ~4 years ago:
First of all, the gyroscope you're working with is a MAX21000. The datasheet, as far as future you can see, doesn't actually describe the endianness of the I2C connection, which probably also tripped you up. However, the SPI connection does state that the data is transmitted MSB-first, with the top 8-bits of the axis data in the first byte, and the additional 8 in the next.
To your credit, the datasheet doesn't really go into what type those 16 bits represent - however, that's because it's standardized across manufacturers.
The real reason why you got such meaningless values when converting to float is that the gyro isn't sending a float. Why'd you even think it would?
The gyro sends a plain 'ol int16 (short). A simple search for "i2c gyro interface" would have made that clear. How do you get that into a decimal angular rate? You divide by 32,768 (int16's max positive value), then multiply by the full-scale range set on the gyro.
Simple! Here, want a code example?
float X_angular_rate = ((((int16_t)((byte_1 << 8) | byte_2))/SHRT_MAX)*GYRO_SCALE
However, I think that it's important to note that the data from these gyroscopes alone is not, in itself, as useful as you thought; to my current knowledge, due to their poor zero-rate drift characteristics, MEMS gyros are almost always used in a sensor fusion setup with an accelerometer and a Kalman filter to make a proper IMU.
Any position and attitude derived from dead-reckoning without this added complexity is going to be hopelessly inaccurate after mere minutes, which is why you added an accelerometer to the next revision of the board.
You have shown two bytes, and float is 4 bytes on most systems. What did you do with the other two bytes of the original float you deconstructed? You should preserve and re-construct all four original bytes if possible. If you can't, and you have to omit any bytes, set them to zero, and make them the least significant bits in the fractional part of the float and hopefully you'll get an answer with satisfactory precision.
The diagram below shows the bit positions, so acting in accordance with the endianness of your system, you should be able to construct a valid float based on how you deconstructed the original. It can really help to write a function to display values as binary numbers and line them up and display initial, intermediate and end results to ensure that you're really accomplishing what you think (hope) you are.
To get a valid result you have to put something sensible into those bits.
This question already has an answer here:
How to generate the audio spectrum using fft in C++? [closed]
(1 answer)
Closed 9 years ago.
How would I go about implementing a spectrum analyser like the ones in WinAmp below?
Just by looking at it, I think that these bars are rendered to display the 'volume level' of a specific frequency band of the incoming audio data; however, I'm not sure how to actually calculate this data needed for the rather easy task of drawing the bars.
From what I've been told and understand, calculating these values can be done by using an FFT โ however, I'm not exactly sure how to calculate those, given a buffer of input data โ am I on the right track about FFTs? How would I apply an FFT on the input data and get, say, an integer out of the FFT that represents the 'volume' of a specific frequency band?
The drawing part isn't a problem, since I can just draw directly to my framebuffer and render that out. I'm doing this as a project on an FPGA, using a Nios II soft-CPU, in case anyone's wondering about potential hardware limitations. Audio data comes in as 24-bit data at 96kHz.
You're probably looking for FFTw.
edit:
To elaborate on your question:
calculating these values can be done by using an FFT โ however, I'm not exactly sure how to calculate those, given a buffer of input data: yes, you're right; that's exactly how it's done. You take a (necssarily small, due to the time-frequency uncertainty principle) sample segment out of the currently playing audio data, and feed it to a (typically) discrete, real-only FFT (one of the best known, most widely used and fastest being the DCT family of DFTs - in fact there are highly optimized versions of most DCTs in FFTw). Then you take out the next sample segment and repeat the process.
The output of the FFT will be the freqeuncy decomposition of the audio signal that has been fed in - you then need to decide how to display it (i.e. which function to use on the outputs of the FFT, common candidates being f(x) = x; f(x) = sqrt(x); f(x) = log(x)) and also how to present/animate the following readings (e.g. you could average each band in the temporal direction or you could have the maximums "fall off" slowly).
rage-edit:
Additional links since it appears somebody knows how to downvote but not how to use google:
http://en.wikipedia.org/wiki/FFTW
http://webcache.googleusercontent.com/search?q=cache:m6ou54tn_soJ:www.fftw.org/+&cd=1&hl=en&ct=clnk&gl=it&client=firefox-a
http://web.archive.org/web/20130123131356/http://fftw.org/
It's pretty straight forward - just use one of the many FFT algorithms! Most of them require floating point calculations, but a google search brings up methods with just integers. You are spot on though, FFT's are what you want.
http://en.wikipedia.org/wiki/Fast_Fourier_transform
To understand how to apply a FFT, you should have a read of this page on discrete fourier transforms, although it's quite heavy on maths:
http://en.wikipedia.org/wiki/Discrete_Fourier_transform
To implement it on your FPGA, I'd have a look at the source code of this project:
http://qt-project.org/doc/qt-4.8/demos-spectrum.html
Here's a previous SO question that gives a summary of how it works (in any language).
How to generate the audio spectrum using fft in C++?
There's an immense amount of information on creating what by the way is called a "Spectrum Analyser", there must be dozens of complete implementations where the source code is freely available. Just page through a google search for "Spectrum analyser source code C" for example.
I'm looking for a way to extract data from a WAV file that will be useful for an FFT algorithm I'm trying to implement. So far what I have are a bunch of hex values for left and right audio channels, but I am a little lost on how to translate this over to time and frequency domains for an FFT.
Here's what I need for example:
3.6 2.6
2.9 6.3
5.6 4.0
4.8 9.1
3.3 0.4
5.9 4.8
5.0 2.6
4.3 4.1
And this is the prototype of the function taking in the data for the FFT:
void fft(int N, double (*x)[2], double (*y)[2])
Where N is the number of points for the FFT, x is a pointer to the time-domain samples, y is a pointer to the frequency-domain samples.
Thanks!
For testing purposes you don't need to extract waveform data from WAV files. You can just generate a few signals in memory (e.g. 0, non-zero constant, sinusoid, 2 superimposed sinusoids, white noise) and then test your FFT function on them and see whether or not you're getting what you should (0 for 0, peak at zero frequency for non-zero constant signal, 2 peaks for every sinusoid, uniform non-zero magnitude across all frequencies for white noise).
If you really want to parse WAV files, see Wikipedia on the format (follow the links). Use either raw PCM encoding or A/ยต-law PCM encoding (AKA G.711).
FFT is usually implemented using an in-place algorithm, meaning that the output replaces the input. If you do the same, you don't really need the second pointer.
The most commonly found WAVE/RIFF file format has a 44 byte header followed by 16-bit or 2-byte little-endian signed integer samples, interleaved for stereo. So if you know how to skip bytes, and read short ints into doubles, you should be good to go.
Just feed your desired length of time domain data to your FFT as the real component vector; the result of the FFT will be a complex frequency domain vector.
Let us say that I have a WAV file. In this file, is a series of sine tones at precise 1 second intervals. I want to use the FFTW library to extract these tones in sequence. Is this particularly hard to do? How would I go about this?
Also, what is the best way to write tones of this kind into a WAV file? I assume I would only need a simple audio library for the output.
My language of choice is C
To get the power spectrum of a section of your file:
collect N samples, where N is a power of 2 - if your sample rate is 44.1 kHz for example and you want to sample approx every second then go for say N = 32768 samples.
apply a suitable window function to the samples, e.g. Hanning
pass the windowed samples to an FFT routine - ideally you want a real-to-complex FFT but if all you have a is complex-to-complex FFT then pass 0 for all the imaginary input parts
calculate the squared magnitude of your FFT output bins (re * re + im * im)
(optional) calculate 10 * log10 of each magnitude squared output bin to get a magnitude value in dB
Now that you have your power spectrum you just need to identify the peak(s), which should be pretty straightforward if you have a reasonable S/N ratio. Note that frequency resolution improves with larger N. For the above example of 44.1 kHz sample rate and N = 32768 the frequency resolution of each bin is 44100 / 32768 = 1.35 Hz.
You are basically interested in estimating a Spectrum -assuming you've already gone past the stage of reading the WAV and converting it into a discrete time signal.
Among the various methods, the most basic is the Periodogram, which amounts to taking a windowed Discrete Fourier Transform (with a FFT) and keeping its squared magnitude. This correspond to Paul's answer. You need a window which spans over several periods of the lowest frequency you want to detect. Example: if your sinusoids can be as low as 10 Hz (period = 100ms), you should take a window of 200ms o 300ms or so (or more). However, the periodogram has some disadvantages, though it's simple to compute and it's more than enough if high precision is not required:
The raw periodogram is not a good
spectral estimate because of spectral
bias and the fact that the variance
at a given frequency does not decrease
as the number of samples used in the
computation increases.
The periodogram can perform better by averaging several windows, with a judious choosing of the widths (Bartlet method). And there are many other methods for estimating the spectrum (AR modelling).
Actually, you are not exactly interested in estimating a full spectrum, but only the location of a single frequency. This can be done seeking a peak of an estimated spectrum (done as explained), but also by more specific and powerful (and complicated) methods (Pisarenko, MUSIC algorithm). They would probably be overkill in your case.
WAV files contain linear pulse code modulated (LPCM) data. That just means that it is a sequence of amplitude values at a fixed sample rate. A RIFF header is contained at the beginning of the file to convey information like sampling rate and bits per sample (e.g. 8 kHz signed 16-bit).
The format is very simple and you could easily roll your own. However, there are several libraries available to speed the process such as libsndfile. Simple Direct-media Layer (SDL)/SDL_mixer and PortAudio are two nice libraries for playback.
As for feeding the data into FFTW, you would need to buffer 1 second chunks (determine size by the sample rate and bits per sample). Then convert all of the samples to IEEE floating-point (i.e. float or double depending on the FFTW configuration--libsndfile can do this for you). Next create another array to hold the frequency domain output. Finally, create and execute an FFTW plan by passing both buffers to fftw_plan_dft_r2c_1d and calling fftw_execute with the returned fftw_plan handle.