computing fft and ifft with fftw.h in C - c

Hi all
I am using the fftw C libraries to compute the frequency spectrum for some signal processing applications on embedded systems. However, in my project I have run into a slight hinderence.
Below is a simple program I wrote to ensure I am implementing the fftw functions correctly. Basically I want to calculate the fft of a sequence of 12 numbers, then do the ifft and obtain the same sequence of numbers again. If you have fftw3 and gcc installed this program should work if you compile with:
gcc -g -lfftw3 -lm fftw_test.c -o fftw_test
Currently my fft length is the same size as the input array.
#include <stdio.h>
#include <stdlib.h>
#include <sndfile.h>
#include <stdint.h>
#include <math.h>
#include <fftw3.h>
int main(void)
{
double array[] = {0.1, 0.6, 0.1, 0.4, 0.5, 0, 0.8, 0.7, 0.8, 0.6, 0.1,0};
//double array2[] = {1, 6, 1, 4, 5, 0, 8, 7, 8, 6, 1,0};
double *out;
double *err;
int i,size = 12;
fftw_complex *out_cpx;
fftw_plan fft;
fftw_plan ifft;
out_cpx = (fftw_complex*) fftw_malloc(sizeof(fftw_complex)*size);
out = (double *) malloc(size*sizeof(double));
err = (double *) malloc(size*sizeof(double));
fft = fftw_plan_dft_r2c_1d(size, array, out_cpx, FFTW_ESTIMATE); //Setup fftw plan for fft
ifft = fftw_plan_dft_c2r_1d(size, out_cpx, out, FFTW_ESTIMATE); //Setup fftw plan for ifft
fftw_execute(fft);
fftw_execute(ifft);
//printf("Input: \tOutput: \tError:\n");
printf("Input: \tOutput:\n");
for(i=0;i<size;i++)
{
err[i] = abs(array[i] - out[i]);
printf("%f\t%f\n",(array[i]),out[i]);
//printf("%f\t%f\t%f\n",(array[i]),out[i],err[i]);
}
fftw_destroy_plan(fft);
fftw_destroy_plan(ifft);
fftw_free(out_cpx);
free(err);
free(out);
return 0;
}
Which Produces the following output:
Input: Output:
0.100000 1.200000
0.600000 7.200000
0.100000 1.200000
0.400000 4.800000
0.500000 6.000000
0.000000 0.000000
0.800000 9.600000
0.700000 8.400000
0.800000 9.600000
0.600000 7.200000
0.100000 1.200000
0.000000 0.000000
So obviously the ifft is producing some scaled up result. In the fftw docs found here:
fftw docs about scaling.
It mentions about some scaling, however I am using the "r2c" and "c2r" transforms rather than the FFT_FORWARD and FFT_BACKWARD. Any insight would be appreciated.

Looking at the great documentation for the functions you use, you will see you are using FFT_FORWARD and FFT_BACKWARD, and exactly where it is intended. Therefore, the scaling information you found previously also applies here.

Sorry to be pedantic, but your size for out_cpx is incorrect. instead of being size long, it should be size/2 + 1. This is because FFT of a real signal is Hermitian. You can verify what I say by initializing out_cpx to some random number (all 3.14159). Run both the forward and backward and then print out out_cpx from size/2 + 1 to size. It will not have changed.
http://www.fftw.org/fftw3_doc/Real_002ddata-DFT-Array-Format.html#Real_002ddata-DFT-Array-Format

r2c and c2r do essentially the same as the regular Fourier transform. The only difference is that both the input and output array need to hold half of the numbers. Please take a look at the last paragraph of the manual of FFTW r2c and c2r. So the normalization factor is precisely the number of elements of the real array, or the variable size (== 12) in your case.

Related

Inverse FFT with CMSIS is wrong

I am attempting to perform an FFT on a signal and use the resulting data to retrieve the original samples via an IFFT. I am using the CMSIS DSP library on an STM32 with a M3.
My issue is understanding the scaling that occurs with the FFT, and also how to get a correct IFFT. Currently the IFFT results in a similar wave as the input, but points are scaled anywhere between 120x-140x of the original. Is this simply the result of precision errors of q15? Am I too scale the IFFT results by 7 bits? My code is below
The documentation also mentions "For the RIFFT, the source buffer must at least have length fftLenReal + 2. The last two elements must be equal to what would be generated by the RFFT: (pSrc[0] - pSrc[1]) >> 1 and 0". What is this for? Applying these operations to FFT_SIZE2 - 2, and FFT_SIZE2 - 1 respectively did not change the results of the IFFT at all.
//128 point FFT
#define FFT_SIZE 128
arm_rfft_instance_q15 fft_instance;
arm_rfft_instance_q15 ifft_instance;
//time domain signal buffers
float32_t sinetbl_in[FFT_SIZE];
float32_t sinetbl_out[FFT_SIZE];
//a copy for comparison after RFFT since function modifies input buffer
volatile q15_t fft_in_buf_cpy[FFT_SIZE];
q15_t fft_in_buf[FFT_SIZE];
//output for FFT, RFFT provides real and complex data organized as re[0], im[0], re[1], im[1]
q15_t fft_out_buf[FFT_SIZE*2];
q15_t fft_out_buf_mag[FFT_SIZE*2];
//inverse fft buffer result
q15_t ifft_out_buf[FFT_SIZE];
//generate 1kHz sinewave with a sample frequency of 8kHz for 128 samples, amplitude is 1
for(int i = 0; i < FFT_SIZE; ++i){
sinetbl_in[i] = arm_sin_f32(2*3.14*1000 *i/8000);
sinetbl_out[i] = 0;
}
//convert buffer to q15 (not enough flash to use f32 fft functions)
arm_float_to_q15(sinetbl_in, fft_in_buf, FFT_SIZE);
memcpy(fft_in_buf_cpy, fft_in_buf, FFT_SIZE*2);
//perform RFFT
arm_rfft_init_q15(&fft_instance, FFT_SIZE, 0, 1);
arm_rfft_q15(&fft_instance, fft_in_buf, fft_out_buf);
//calculate magnitude, skip 1st real and img numbers as they are DC and both real
arm_cmplx_mag_q15(fft_out_buf + 2, fft_out_buf_mag + 1, FFT_SIZE/2-1);
//weird operations described by documentation, does not change results
//fft_out_buf[FFT_SIZE*2 - 2] = (fft_out_buf[0] - fft_out_buf[1]) >> 1;
//fft_out_buf[FFT_SIZE*2 - 1] = 0;
//perform inverse FFT
arm_rfft_init_q15(&ifft_instance, FFT_SIZE, 1, 1);
arm_rfft_q15(&ifft_instance, fft_out_buf, ifft_out_buf);
//closest approximation to get to original scaling
//arm_shift_q15(ifft_out_buf, 7, ifft_out_buf, FFT_SIZE);
//convert back to float for comparison with input
arm_q15_to_float(ifft_out_buf, sinetbl_out, FFT_SIZE);
I feel like I answered my own question with the precision comment, but I'd like to be sure. Am I doing this FFT stuff right?
Thanks in advance
As Cris pointed out some libraries skip the normalization process. CMSIS DSP is one of those libraries as it is intended to be fast. For CMSIS, depending on the FFT size you must left shift your data a certain amount to get back to the original range. In my case with a FFT size of 128 and also the magnitude calculation, it was 7 as I originally surmised.

FFTW Output differs from Matlab with the same input dataset

I am developing an application that should analyze data coming from an A/D stage and find the frequency peaks in a defined frequency range (0-10kHz).
We are using the FFTW3 library, version 3.3.6, running on 64bit Slackware Linux (GCC version 5.3.0). As you can see in the piece of code included, we run the FFTW plan getting result in complex vector result[]. We have verified the operations using MATLAB. We run the FFT on MATLAB (that claims to use the same library) with exactly the same input datasets (complex signal[] as in the source code). We observe some difference between FFTW (Linux ANSI C) and MATLAB run. Each plot is done using MATLAB. In particular, we would like to understand (mag[] array):
Why is the noise floor so different?
After the main peak (at more or less 3kHz) we observe a negative peak in the Linux result, while MATLAB shows correctly a secondary peak as from the input signal.
In these examples, we do not perform any output normalization, neither in Linux nor in MATLAB. The two plots show the magnitude of the FFT results (not converted to dB).
The correct result is the MATLAB one. Does someone have any suggestion about this differences? And how can we produce with the FFTW library results closer to MATLAB?
Below the piece of C source code and the two plots.
//
// Part of source code:
//
// rup[] is filled with unsigned char data coming from an A/D conversion stage (8 bit depth)
// Sampling Frequency is 45.454 KHz
// Frequency Range: 0 - 10.0 KHz
//
#define CONVCOST 0.00787401574803149606
double mag[4096];
unsigned char rup[4096];
int i;
fftw_complex signal[1024];
fftw_complex result[1024];
...
fftw_plan plan = fftw_plan_dft_1d(1024,signal,result,FFTW_FORWARD,FFTW_ESTIMATE);
for(i=0;i<1024;i++)
{
signal[i][REAL] = (double)rup[i] * CONVCOST;
signal[i][IMAG] = 0.0;
}
fftw_execute(plan);
for (i = 0; i < 512; ++i)
{
mag[i] = sqrt(result[i][REAL] * result[i][REAL] + result[i][IMAG] * result[i][IMAG]);
}
fftw_destroy_plan(plan);

DSP libraries - RFFT - strange results

Recently I've been trying to do FFT calculations on my STM32F4-Discovery evaluation board then send it to PC. I have looked into my problem - I think that I'm doing something wrong with FFT functions provided by manufacturer.
I'm using CMSIS-DSP libraries.
For now I've have been generating samples with code (if that works correct I'll do sampling by microphone).
I'm using arm_rfft_fast_f32 as my data are going to be floats in the future, but results I get in my output array are insane (I think) - I'm getting frequencies below 0.
number_of_samples = 512; (l_probek in code)
dt = 1/freq/number_of_samples
Here is my code
float32_t buffer_input[l_probek];
uint16_t i;
uint8_t mode;
float32_t dt;
float32_t freq;
bool DoFlag = false;
bool UBFlag = false;
uint32_t rozmiar = 4*l_probek;
union
{
float32_t f[l_probek];
uint8_t b[4*l_probek];
}data_out;
union
{
float32_t f[l_probek];
uint8_t b[4*l_probek];
}data_mag;
union
{
float32_t f;
uint8_t b[4];
}czest_rozdz;
/* Pointers ------------------------------------------------------------------*/
arm_rfft_fast_instance_f32 S;
arm_cfft_radix4_instance_f32 S_CFFT;
uint16_t output;
/* ---------------------------------------------------------------------------*/
int main(void)
{
freq = 5000;
dt = 0.000000390625;
_GPIO();
_LED();
_NVIC();
_EXTI(0);
arm_rfft_fast_init_f32(&S, l_probek);
GPIO_SetBits(GPIOD, LED_Green);
mode = 2;
//----------------- Infinite loop
while (1)
{
if(true)//(UBFlag == true)
for(i=0; i<l_probek; ++i)
{
buffer_input[i] = (float32_t) 15*sin(2*PI*freq*i*dt);
}
//Obliczanie FFT
arm_rfft_fast_f32(&S, buffer_input, data_out.f, 0);
//Obliczanie modulow
arm_cmplx_mag_f32(data_out.f, data_mag.f, l_probek);
USART_putdata(USART1, data_out.b, data_mag.b, rozmiar);
//USART_putdata(USART1, czest_rozdz.b, data_mag.b, rozmiar);
GPIO_ToggleBits(GPIOD, LED_Orange);
//mode++;
//UBFlag = false;
}
}
}
I'm using arm_rfft_fast_f32 as my data are going to be floats in the future, but results I get in my output array are insane (I think) - I'm getting frequencies below 0.
The arm_rfft_fast_f32 function does not return frequencies, but rather complex-valued coefficients computed using the Fast Fourier Transform (FFT). It is thus perfectly reasonable for those coefficients to be negative. More specifically, the expected coefficients for your single-cycle sin test tone input with an amplitude of 15 would be:
0.0, 0.0; // special case packing real-valued X[0] and X[N/2]
0.0, -3840.0; // X[1]
0.0, 0.0; // X[2]
0.0, 0.0; // X[3]
...
0.0, 0.0; // X[255]
Note that as indicated in the documentation the first two outputs correspond to the purely real coefficients X[0] and X[N/2] (you should be particularly careful about this special case in your subsequent call to arm_cmplx_mag_f32; see last point below).
The frequency of each of those frequency components are given by k*fs/N, where N is the number of samples (in your case l_probek) and fs = 1/dt is the sampling rate (in your case freq*l_probek):
X[0] -> 0*freq*l_probek/l_probek = 0
X[1] -> 1*freq*l_probek/l_probek = freq = 5000
X[2] -> 2*freq*l_probek/l_probek = 2*freq = 10000
X[3] -> 3*freq*l_probek/l_probek = 2*freq = 15000
...
Finally, due to the special packing of the first two values, you need to be careful when computing the N/2+1 magnitudes:
// General case for the magnitudes
arm_cmplx_mag_f32(data_out.f+2, data_mag.f+1, l_probek/2 - 1);
// Handle special cases
data_mag.f[0] = data_out.f[0];
data_mag.f[l_probek/2] = data_out.f[1];
As a follow-up to the above answer, which is awesome, some further clarifications which took me an age to figure out.
The frequency bins are centered on the target frequency, so for instance in the example above X[0] represents -2500Hz to 2500Hz, centered on zero, X[1] is 2500Hz to 7500Hz centered on 5000Hz and so on
It's common to interpolate frequencies within the bin by looking at the energy of the adjacent bins (see https://dspguru.com/dsp/howtos/how-to-interpolate-fft-peak/) if you do this you will need to make sure that your magnitude array is large enough for the bins + Nyquist and that the bin above Nyquist is 0, but note many interpolation techniques require the complex values (e.q. Quinn, Jacobson) so make sure you interpolate before finding the magnitudes.
The special case code above works because there is no complex component of the DC and Nyquist values and thus the magnitude is simply the real part
There is a bug in the code above however - although the imaginary parts of the DC and Nyquist components is always zero, the real part could still be negative, so you need to take the absolute value to get the magnitude:
// Handle special cases
data_mag.f[0] = fabs(data_out.f[0]);
data_mag.f[l_probek/2] = fabs(data_out.f[1]);

Gaussian random number generator

I'm trying to implement a gaussian distributed random number generator in the interval [0,1].
float rand_gauss (void) {
float v1,v2,s;
do {
v1 = 2.0 * ((float) rand()/RAND_MAX) - 1;
v2 = 2.0 * ((float) rand()/RAND_MAX) - 1;
s = v1*v1 + v2*v2;
} while ( s >= 1.0 );
if (s == 0.0)
return 0.0;
else
return (v1*sqrt(-2.0 * log(s) / s));
}
It's pretty much a straight forward implementation of the algorithm in Knuth's 2nd volume of TAOCP 3rd edition page 122.
The problem is that rand_gauss() sometimes returns values outside the interval [0,1].
Knuth describes the polar method on p 122 of the 2nd volume of TAOCP. That algorithm generates a normal distribution with mean = 0 and standard deviation = 1. But you can adjust that by multiplying by the desired standard deviation and adding the desired mean.
You might find it fun to compare your code to another implementation of the polar method in the C-FAQ.
Change your if statement to (s >= 1.0 || s == 0.0). Better yet, use a break as seen in the following example for a SIMD Gaussian random number generating returning a complex pair (u,v). This uses the Mersenne twister random number generator dsfmt(). If you only want a single, real, random-number, return only u and save the v for the next pass.
inline static void randn(double *u, double *v)
{
double s, x, y; // SIMD Marsaglia polar version for complex u and v
while (1){
x = dsfmt_genrand_close_open(&dsfmt) - 1.;
y = dsfmt_genrand_close_open(&dsfmt) - 1.;
s = x*x + y*y;
if (s < 1) break;
}
s = sqrt(-2.0*log(s)/s);
*u = x*s; *v = y*s;
return;
}
This algorithm is surprisingly fast. Execution times for computing two random numbers (u,v) for four different Gaussian random number generators are:
Times for delivering two Gaussian numbers (u + iv)
i7-2600K # 4GHz, gcc -Wall -Ofast -msse2 ..
gsl_ziggurat = 20.3 (ns)
Box-Muller = 78.8 (ns)
Box-Muller with fast_sin fast_cos = 28.1 (ns)
SIMD Marsaglia polar = 35.0 (ns)
The fast_sin and fast_cos polynomial routines of Charles K. Garrett speed up the Box-Muller computation by a factor 2.9 using a nested polynomial implementation of cos() and sin(). The SIMD Box Muller and polar algorithms are certainly competitive. Also they can be parallelized easily. Using gcc -Ofast -S, the assembly code dump shows that the square root is the SIMD SSE2: sqrt --> sqrtsd %xmm0, %xmm0
Comment: it is really hard and frustrating to get accurate timings with gcc5, but I think these are ok: as of 2/3/2016: DLW
[1] Related link: c malloc array pointer return in cython
[2] A comparison of algorithms, but not necessarily for SIMD versions: http://www.doc.ic.ac.uk/~wl/papers/07/csur07dt.pdf
[3] Charles K. Garrett: http://krisgarrett.net/papers/l2approx.pdf

cblas_dgemm - works ONLY if (beta) is power-of-two

I am totally stumped. I have a fairly large recursive program written in c that calls cblas_dgemm(). The result is verified independently by a program that works correctly.
C = alpha*A*B + beta*C
On repeated tests using random matrices and all possible combination of parameters the program gives correct answer ONLY if abs(beta) = 2^n (1,2,4,8..). Any value works for alpha. Any other positive/negative, odd/even value for beta gives correct answer b/w 10-30% of the time.
I am using Ubuntu 10.04, GCC 4.4.x, I have tried system installed blas/cblas/atlas as well as manually compiled atlas.
Any hints or suggestions would be greatly appreciated. I am amazed at the wonderfully generous (and smart) folks lurking at this site.
Thanking you all in advance,
Russ
Two completely unrelated errors conspired to produce an illusive picture. It made me look for problems in the wrong place.
(1) There was a simple error in the logic of the function calling dgemm. Would have been easily fixed if I was not chasing the wrong problem.
(2) My double-compare function: double version of AlmostEqual2sComplement() (http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm) used incorrect sized integer - resulting in an incorrect TRUE under certain rare circumstances. This was the first time the error bit me!
Thanks again for the useful suggestion of using the scientific method when trying to debug a program.
Russ
Yes, a full example would be handy. Here is an old example I had hanging around using GSL's sgemm variant; should be easy to fix to double. Please try and see if this gives the result shown in the GSL manual:
/* from the gsl info documentation in node 'gsl cblas examples' */
/* compile via 'gcc -o $file $file.c -lgslcblas' */
/* edd 15 Nov 2003 */
#include <stdio.h>
#include <gsl/gsl_cblas.h>
int
main (void)
{
int lda = 3;
float A[] = { 0.11, 0.12, 0.13,
0.21, 0.22, 0.23 };
int ldb = 2;
float B[] = { 1011, 1012,
1021, 1022,
1031, 1032 };
int ldc = 2;
float C[] = { 0.00, 0.00,
0.00, 0.00 };
/* Compute C = A B */
cblas_sgemm (CblasRowMajor,
CblasNoTrans, CblasNoTrans, 2, 2, 3,
1.0, A, lda, B, ldb, 0.0, C, ldc);
printf ("[ %g, %g\n", C[0], C[1]);
printf (" %g, %g ]\n", C[2], C[3]);
return 0;
}

Resources