How to use KissFFT with audio? - c

I have an array of 2048 samples of an audio file at 44.1 khz and want to transform it into a spectrum for an LED effect. I don't know too much about the inner workings of fft but I tryed it using kiss fft:
kiss_fft_cpx *cpx_in = malloc(FRAMES * sizeof(kiss_fft_cpx));
kiss_fft_cpx *cpx_out = malloc(FRAMES * sizeof(kiss_fft_cpx));
kiss_fft_cfg cfg = kiss_fft_alloc( FRAMES , 0 ,0,0 );
for(int j = 0;j<FRAMES;j++) {
float x = (alsa_buffer[(fft_last_index+j+BUFFER_OVERSIZE*FRAMES)%(BUFFER_OVERSIZE*FRAMES)] - offset);
cpx_in[j] = (kiss_fft_cpx){.r = x, .i = x};
}
kiss_fft(cfg, cpx_in, cpx_out);
My output seems really off. When I play a simple sine, there multiple outputs with values way above zero. Also it generally seems like the first entries are way higher. Do I have to weigh the outputs?
I also don't understand how I have to treat the complex numbers, I'm currently using my input values on the real and imaginary part and for the output I use the abs, is that right?
Also usually spectrum analyzers for audio have logarithmic scaling, so I tried that but the problem is that the fft output as far as I know isn't logarithmic, so the first band for example is say 0-100hz but optimally my first LED on the effect should be only up to like 60hz (so a fraction of the first outputs band), while the last LED would be say 8khz to 10khz which would in that case be 20 fft outputs.
Is there any way to make the output logarithmic? How do I limit the spectrum to 20khz (or know what the bands of the output are in general) and is there any other thing to look out for when working with audio signals?

Related

Inverse FFT with CMSIS is wrong

I am attempting to perform an FFT on a signal and use the resulting data to retrieve the original samples via an IFFT. I am using the CMSIS DSP library on an STM32 with a M3.
My issue is understanding the scaling that occurs with the FFT, and also how to get a correct IFFT. Currently the IFFT results in a similar wave as the input, but points are scaled anywhere between 120x-140x of the original. Is this simply the result of precision errors of q15? Am I too scale the IFFT results by 7 bits? My code is below
The documentation also mentions "For the RIFFT, the source buffer must at least have length fftLenReal + 2. The last two elements must be equal to what would be generated by the RFFT: (pSrc[0] - pSrc[1]) >> 1 and 0". What is this for? Applying these operations to FFT_SIZE2 - 2, and FFT_SIZE2 - 1 respectively did not change the results of the IFFT at all.
//128 point FFT
#define FFT_SIZE 128
arm_rfft_instance_q15 fft_instance;
arm_rfft_instance_q15 ifft_instance;
//time domain signal buffers
float32_t sinetbl_in[FFT_SIZE];
float32_t sinetbl_out[FFT_SIZE];
//a copy for comparison after RFFT since function modifies input buffer
volatile q15_t fft_in_buf_cpy[FFT_SIZE];
q15_t fft_in_buf[FFT_SIZE];
//output for FFT, RFFT provides real and complex data organized as re[0], im[0], re[1], im[1]
q15_t fft_out_buf[FFT_SIZE*2];
q15_t fft_out_buf_mag[FFT_SIZE*2];
//inverse fft buffer result
q15_t ifft_out_buf[FFT_SIZE];
//generate 1kHz sinewave with a sample frequency of 8kHz for 128 samples, amplitude is 1
for(int i = 0; i < FFT_SIZE; ++i){
sinetbl_in[i] = arm_sin_f32(2*3.14*1000 *i/8000);
sinetbl_out[i] = 0;
}
//convert buffer to q15 (not enough flash to use f32 fft functions)
arm_float_to_q15(sinetbl_in, fft_in_buf, FFT_SIZE);
memcpy(fft_in_buf_cpy, fft_in_buf, FFT_SIZE*2);
//perform RFFT
arm_rfft_init_q15(&fft_instance, FFT_SIZE, 0, 1);
arm_rfft_q15(&fft_instance, fft_in_buf, fft_out_buf);
//calculate magnitude, skip 1st real and img numbers as they are DC and both real
arm_cmplx_mag_q15(fft_out_buf + 2, fft_out_buf_mag + 1, FFT_SIZE/2-1);
//weird operations described by documentation, does not change results
//fft_out_buf[FFT_SIZE*2 - 2] = (fft_out_buf[0] - fft_out_buf[1]) >> 1;
//fft_out_buf[FFT_SIZE*2 - 1] = 0;
//perform inverse FFT
arm_rfft_init_q15(&ifft_instance, FFT_SIZE, 1, 1);
arm_rfft_q15(&ifft_instance, fft_out_buf, ifft_out_buf);
//closest approximation to get to original scaling
//arm_shift_q15(ifft_out_buf, 7, ifft_out_buf, FFT_SIZE);
//convert back to float for comparison with input
arm_q15_to_float(ifft_out_buf, sinetbl_out, FFT_SIZE);
I feel like I answered my own question with the precision comment, but I'd like to be sure. Am I doing this FFT stuff right?
Thanks in advance
As Cris pointed out some libraries skip the normalization process. CMSIS DSP is one of those libraries as it is intended to be fast. For CMSIS, depending on the FFT size you must left shift your data a certain amount to get back to the original range. In my case with a FFT size of 128 and also the magnitude calculation, it was 7 as I originally surmised.

Phase angle from FFT using atan2 - weird behaviour. Phase shift offset? Unwrapping?

I'm testing and performing simple FFT's and I'm interested in phase shift.
I geneate simple array of 256 samples with sinusoid with 10 cycles.
I perform an FFT of those samples and receiving complex data (2x128).
Than I calculate magnitude of those data and FFT looks like expected:
Then I want to calculate phase shift from fft complex output. I'm using atan2.
Combined output fft_magnitude (blue) + fft+phase (red) looks like this:
This is pretty much what I expect with a "small" problem. I know this is wrapping but if I imagine unwrapping it, the phase shift in the magnitude peak is reading 36 degrees and I think it should be 0 because my input sinusiod was not shifted at all.
If I shift this -36 deg (blue is in-phase, red is shifted, blue is printed only for reference) the sinusiod looks like this:
And than if I perform an FFT of this red data the magnitude + phase output looks like this:
So it is easy to imagine that unwrapped phase will be close to 0 at the magniture peak.
So there is 36 deg offset. But what happenes if I prepare data with 20 cycles per 256 samples and 0 phase shift
If I then perform an FFT, this is an output (magnitude + phase):
And I can tell you if will cross the peak point at 72 degrees. So there is now 72 degrees offset.
Can anyone give me a hint why is that happening?
Is it right that atan2() phase output is frequency dependent with offset of 2pi/cycles (360 deg/cycles) ?
How to unwrap it and get correct results (I couldn't find working C library to unwrap).
This is running on ARM Cortex-M7 processor (embedded).
#define phaseShift 0
#define cycles 20
#include <arm_math.h>
#include <arm_const_structs.h>
float32_t phi = phaseShift * PI / 180; //phase shift in radians
float32_t data[256]; //input data for fft
float32_t output_buffer[256]; //output buffer from fft
float32_t phase_data[128]; //will contain atan2 values of output from fft (output values are complex)
float32_t magnitude[128]; //will contain absolute values of output from fft (output values are complex)
float32_t incrFactorRadians = cycles * 2 * PI / 255;
arm_rfft_fast_instance_f32 RealFFT_Instance;
void setup()
{
Serial.begin(115200);
delay(2000);
arm_rfft_fast_init_f32(&RealFFT_Instance, 256); //initializing fft to be ready for 256 samples
for (int i = 0; i < 256; i++) //print sinusoids
{
data[i] = arm_sin_f32(incrFactorRadians * i + phi);
Serial.print(arm_sin_f32(incrFactorRadians * i), 8); Serial.print(","); Serial.print(data[i], 8); Serial.print("\n"); //print reference in-phase sinusoid and shifted sinusoid (data for fft)
}
Serial.print("\n\n");
delay(10000);
arm_rfft_fast_f32(&RealFFT_Instance, data, output_buffer, 0); //perform fft
for (int i = 0; i < 128; i++) //calculate absolute values of an fft output (fft output is complex), and phase shift
{
magnitude[i] = output_buffer[i * 2] * output_buffer[i * 2] + output_buffer[(i * 2) + 1] * output_buffer[(i * 2) + 1];
__ASM("VSQRT.F32 %0,%1" : "=t"(magnitude[i]) : "t"(magnitude[i])); //fast square root ARM DSP function
phase_data[i] = atan2(output_buffer[i * 2], output_buffer[i * 2 +1]) * 180 / PI;
}
}
void loop() //print magnitude of fft and phase output every 10 seconds
{
for (int i = 0; i < 128; i++)
{
Serial.print(magnitude[i], 8); Serial.print(","); Serial.print(phase_data[i], 8); Serial.print("\n");
}
Serial.print("\n\n");
delay(10000);
}
To break down the excellent answer by hotpaw2. (Their answers are always so loaded with golden nuggets of information that I spend days learning enough to comprehend the brilliance.)
When an engineer says "integer periodic" they mean your samples that you are feeding into the FFT (the aperture) needs to sample in a way the ensures you capture one full wave of the frequency sin wave.
Think of the sin wave starting at zero and cresting at one then falling below zero into the trough at negative one and then coming back up to zero.
This is one "full cycle". Now if your wave has a period of 10 cycles per second and you sample at 100 samples per second you will have 10 samples per wave.
So now you put 13 samples into an FFT and your phase is off. Why?
Well the phase is looking for the wave to smoothly continue forever. You just started a zero for the first sample and dropped off as .25 on the 13th sample. Now the phase calculation tries to connect the two ends and has this jump in the wave. This causes the phase to come out wrong.
What you need to do is select a number of samples to feed into your FFT that you know will contain full waves only.
(NOTE) You are only concerned with the phase of one freq at a time.
AND your sample aperture must not start and end at the sin waves same point.
IF you start at zero and end at zero the calculation pasting the two ends together in a forever circle will get two zeros at the transition. So you have to stop one sample short of the repeat point.
Code demonstrating this can be found: Scipy FFT - how to get phase angle
An bare FFT plus an atan2() only correctly measures the starting phase of an input sinusoid if that sinusoid is exactly integer periodic in the FFT's aperture width.
If the signal is not exactly integer periodic (some other frequency), then you have to recenter the data by doing an FFTshift (rotate the data by N/2) before the FFT. The FFT will then correctly measure the phase at the center of the original data, and away from the circular discontinuity produced by the FFT's finite length rectangular window on non-periodic-in-aperture signals.
If you want the phase at some point in the data other than the center, you can use the estimate of the frequency and phase at the center to recalculate the phase at other positions.
There are other window functions (Blackman-Nutall, et.al.) that might produce a better phase estimate than a rectangular window, but usually not as good an estimate as using an FFTShift.

FFTW Output differs from Matlab with the same input dataset

I am developing an application that should analyze data coming from an A/D stage and find the frequency peaks in a defined frequency range (0-10kHz).
We are using the FFTW3 library, version 3.3.6, running on 64bit Slackware Linux (GCC version 5.3.0). As you can see in the piece of code included, we run the FFTW plan getting result in complex vector result[]. We have verified the operations using MATLAB. We run the FFT on MATLAB (that claims to use the same library) with exactly the same input datasets (complex signal[] as in the source code). We observe some difference between FFTW (Linux ANSI C) and MATLAB run. Each plot is done using MATLAB. In particular, we would like to understand (mag[] array):
Why is the noise floor so different?
After the main peak (at more or less 3kHz) we observe a negative peak in the Linux result, while MATLAB shows correctly a secondary peak as from the input signal.
In these examples, we do not perform any output normalization, neither in Linux nor in MATLAB. The two plots show the magnitude of the FFT results (not converted to dB).
The correct result is the MATLAB one. Does someone have any suggestion about this differences? And how can we produce with the FFTW library results closer to MATLAB?
Below the piece of C source code and the two plots.
//
// Part of source code:
//
// rup[] is filled with unsigned char data coming from an A/D conversion stage (8 bit depth)
// Sampling Frequency is 45.454 KHz
// Frequency Range: 0 - 10.0 KHz
//
#define CONVCOST 0.00787401574803149606
double mag[4096];
unsigned char rup[4096];
int i;
fftw_complex signal[1024];
fftw_complex result[1024];
...
fftw_plan plan = fftw_plan_dft_1d(1024,signal,result,FFTW_FORWARD,FFTW_ESTIMATE);
for(i=0;i<1024;i++)
{
signal[i][REAL] = (double)rup[i] * CONVCOST;
signal[i][IMAG] = 0.0;
}
fftw_execute(plan);
for (i = 0; i < 512; ++i)
{
mag[i] = sqrt(result[i][REAL] * result[i][REAL] + result[i][IMAG] * result[i][IMAG]);
}
fftw_destroy_plan(plan);

Audio tone for Microcontroller p18f4520 using a for loop

I'm using C Programming to program an audio tone for the microcontroller P18F4520.
I am using a for loop and delays to do this. I have not learned any other ways to do so and moreover it is a must for me to use a for loop and delay to generate an audio tone for the target board. The port for the speaker is at RA4. This is what I have done so far.
#include <p18f4520.h>
#include <delays.h>
void tone (float, int);
void main()
{
ADCON1 = 0x0F;
TRISA = 0b11101111;
/*tone(38.17, 262); //C (1)
tone(34.01, 294); //D (2)
tone(30.3, 330); //E (3)
tone(28.57, 350); //F (4)
tone(25.51, 392); //G (5)
tone(24.04, 416); //G^(6)
tone(20.41, 490); //B (7)
tone(11.36, 880); //A (8)*/
tone(11.36, 880); //A (8)
}
void tone(float n, int cycles)
{
unsigned int i;
for(i=0; i<cycles; i++)
{
PORTAbits.RA4 = 0;
Delay10TCYx(n);
PORTAbits.RA4 = 1;
Delay10TCYx(n);
}
}
So as you can see what I have done is that I have created a tone function whereby n is for the delay and cycles is for the number of cycles in the for loop. I am not that sure whether my calculations are correct but so far it is what I have done and it does produce a tone. I'm just not sure whether it is really a A tone or G tone etc. How I calculate is that firstly I will find out the frequency tone for example A tone has a frequency of 440Hz. Then I will find the period for it whereby it will be 1/440Hz. Then for a duty cycle, I would like the tone to beep only for half of it which is 50% so I will divide the period by 2 which is (1/440Hz)/2 = 0.001136s or 1.136ms. Then I will calculate delay for 1 cycle for the microcontroller 4*(1/2MHz) which is 2µs. So this means that for 1 cycle it will delay for 2µs, the ratio would be 2µs:1cyc. So in order to get the number of cycles for 1.136ms it will be 1.136ms:1.136ms/2µs which is 568 cycles. Currently at this part I have asked around what should be in n where n is in Delay10TCYx(n). What I have gotten is that just multiply 10 for 11.36 and for a tone A it will be Delay10TCYx(11.36). As for the cycles I would like to delay for 1 second so 1/1.136ms which is 880. That's why in my method for tone A it is tone(11.36, 880). It generates a tone and I have found out the range of tones but I'm not really sure if they are really tones C D E F G G^ B A.
So my questions are
1. How do I really calculate the delay and frequency for tone A?
2. for the state of the for loop for the 'cycles' is the number cycles but from the answer that I will get from question 1, how many cycles should I use in order to vary the period of time for tone A? More number of cycles will be longer periods of tone A? If so, how do I find out how long?
3. When I use a function to play the tone, it somehow generates a different tone compared to when I used the for loop directly in the main method. Why is this so?
4. Lastly, if I want to stop the code, how do I do it? I've tried using a for loop but it doesn't work.
A simple explanation would be great as I am just a student working on a project to produce tones using a for loop and delays. I've searched else where whereby people use different stuff like WAV or things like that but I would just simply like to know how to use a for loop and delay for audio tones.
Your help would be greatly appreciated.
First, you need to understand the general approach to how you generate an interrupt at an arbitrary time interval. This lets you know you are able to have a specific action occur every x microseconds, [1-2].
There are already existing projects that play a specific tone (in Hz) on a PIC like what you are trying to do, [3-4].
Next, you'll want to take an alternate approach to generating the tone. In the case of your delay functions, you are effectively using up the CPU for nothing, when it could be done something else. You would be better off using timer interrupts directly so you're not "burning the CPU by idling".
Once you have this implemented, you just need to know the corresponding frequency for the note you are trying to generate, either by using a formula to generate a frequency from a specific musical note [5], or using a look-up table [6].
What is the procedure for PIC timer0 to generate a 1 ms or 1 sec interrupt?, Accessed 2014-07-05, <http://www.edaboard.com/thread52636.html>
Introduction to PIC18′s Timers – PIC Microcontroller Tutorial, Accessed 2014-07-05, <http://extremeelectronics.co.in/microchip-pic-tutorials/introduction-to-pic18s-timers-pic-microcontroller-tutorial/>
Generate Ring Tones on your PIC16F87x Microcontroller, Accessed 2014-07-05, <http://retired.beyondlogic.org/pic/ringtones.htm>http://retired.beyondlogic.org/pic/ringtones.htm
AN655 - D/A Conversion Using PWM and R-2R Ladders to Generate Sine and DTMF Waveforms, Accessed 2014-07-05, <http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=1824&appnote=en011071>
Equations for the Frequency Table, Accessed 2014-07-05, <http://www.phy.mtu.edu/~suits/NoteFreqCalcs.html>
Frequencies for equal-tempered scale, A4 = 440 Hz, Accessed 2014-07-05, <http://www.phy.mtu.edu/~suits/notefreqs.html>
How to compute the number of cycles for delays to get a tone of 440Hz ? I will assume that your clock speed is 1/2MHz or 500kHz, as written in your question.
1) A 500kHz clock speed corresponds to a tic every 2us. Since a cycle is 4 clock tics, a cycle lasts 8 us.
2) A frequency of 440Hz corresponds to a period of 2.27ms, or 2270us, or 283 cycles.
3) The delay is called twice per period, so the delays should be about 141 cycles for A.
About your tone function...As you compile your code, you must face some kind of warning, something like warning: (42) implicit conversion of float to integer... The prototype of the delay function is void Delay10TCYx(unsigned char); : it expects an unsigned char, not a float. You will not get any floating point precision. You may try something like :
void tone(unsigned char n, unsigned int cycles)
{
unsigned int i;
for(i=0; i<cycles; i++)
{
PORTAbits.RA4 = 0;
Delay1TCYx(n);
PORTAbits.RA4 = 1;
Delay1TCYx(n);
}
}
I changed for Delay1TCYx() for accuracy. Now, A 1 second A-tone would be tone(141,440). A 1 second 880Hz-tone would be tone(70,880).
There is always a while(1) is all examples about PIC...if you just need one beep at start, do something like :
void main()
{
ADCON1 = 0x0F;
TRISA = 0b11101111;
tone(141,440);
tone(70,880);
tone(141,440);
while(1){
}
}
Regarding the change of tone when embedded in a function, keep in mind that every operation takes at least one cycle. Calling a function may take a few cycles. Maybe declaring static inline void tone (unsigned char, int) would be a good thing...
However, as signaled by #Dogbert , using delays.h is a good start for beginners, but do not get used to it ! Microcontrollers have lots of features to avoid counting and to save some time for useful computations.
timers : think of it as an alarm clock. 18f4520 has 4 of them.
interruption : the PIC stops the current operation, performs the code specified for this interruption, erases the flag and comes back to its previous task. A timer can trigger an interruption.
PWM pulse wave modulation. 18f4520 has 2 of them. Basically, it generates your signal !

KissFFT output of kiss_fftr

I'm receiving PCM data trough socket connection in packets containing 320 samples. Sample rate of sound is 8000 samples per second. I am doing with it something like this:
int size = 160 * 2;//160;
int isinverse = 1;
kiss_fft_scalar zero;
memset(&zero,0,sizeof(zero));
kiss_fft_cpx fft_in[size];
kiss_fft_cpx fft_out[size];
kiss_fft_cpx fft_reconstructed[size];
kiss_fftr_cfg fft = kiss_fftr_alloc(size*2 ,0 ,0,0);
kiss_fftr_cfg ifft = kiss_fftr_alloc(size*2,isinverse,0,0);
for (int i = 0; i < size; i++) {
fft_in[i].r = zero;
fft_in[i].i = zero;
fft_out[i].r = zero;
fft_out[i].i = zero;
fft_reconstructed[i].r = zero;
fft_reconstructed[i].i = zero;
}
// got my data through socket connection
for (int i = 0; i < size; i++) {
// samples are type of short
fft_in[i].r = samples[i];
fft_in[i].i = zero;
fft_out[i].r = zero;
fft_out[i].i = zero;
}
kiss_fftr(fft, (kiss_fft_scalar*) fft_in, fft_out);
kiss_fftri(ifft, fft_out, (kiss_fft_scalar*)fft_reconstructed);
// lets normalize samples
for (int i = 0; i < size; i++) {
short* samples = (short*) bufTmp1;
samples[i] = rint(fft_reconstructed[i].r/(size*2));
}
After that I fill OpenAL buffers and play them. Everything works just fine but I would like to do some filtering of audio between kiss_fftr and kiss_fftri. Starting point as I think for this is to convert sound from time domain to frequency domain, but I don't really understand what kind of data I'm receiving from kiss_fftr function. What information is stored in each of those complex number, what its real and imaginary part can tell me about frequency. And I don't know which frequencies are covered (what frequency span) in fft_out - which indexes corresponds to which frequencies.
I am total newbie in signal processing and Fourier transform topics.
Any help?
Before you jump in with both feet into a C implementation, get familiar with digital filters, esp FIR filters.
You can design the FIR filter using something like GNU Octave's signal toolbox. Look at the command fir1(the simplest), firls, or remez. Alternately, you might be able to design a FIR filter through a web page. A quick web search for "online fir filter design" found this (I have not used it, but it appears to use the equiripple design used in the remez or firpm command )
Try implementing your filter first with a direct convolution (without FFTs) and see if the speed is acceptable -- that is an easier path. If you need an FFT-based approach, there is a sample implementation of overlap-save in the kissfft/tools/kiss_fastfir.c file.
I will try to answer your questions directly.
// a) the real and imaginary components of the output need to be combined to calculate the amplitude at each frequency.
float ar,ai,scaling;
scaling=1.0/(float)size;
// then for each output [i] from the FFT...
ar = fft_out[i].r;
ai = fft_out[i].i;
amplitude[i] = 2.0 * sqrtf( ar*ar + ai*ai ) * scaling ;
// b) which index refers to which frequency? This can be calculated as follows. Only the first half of the FFT results are needed (assuming your 8KHz sampling rate)
for(i=1;i<(size/2);i++) freq = (float)i / (1/8000) / (float)size ;
// c) phase (range +/- PI) for each frequency is calculated like this:
phase[i] = phase = atan2(fft_out[i].i / fft_out[i].r);
What you might want to investigate is FFT fast convolution using overlap add or overlap save algorithms. You will need to expand the length of each FFT by the length of the impulse of your desired filter. This is because (1) FFT/IFFT convolution is circular, and (2) each index in the FFT array result corresponds to almost all frequencies (a Sinc shaped response), not just one (even if mostly near one), so any single bin modification will leak throughout the entire frequency response (except certain exact periodic frequencies).

Resources