Convolution Using FFTW3 and PortAudio - c

Edit (2017, Apr 27)
My fully working code is here. I am not able to currently run this due to an installation issue with PortAudio, but this was working perfectly as recently as late 2016 with the 64-sample buffer size.
Original question below
I'm trying to convolve an incoming audio signal (coming through a PortAudio input stream) with a small (512 sample) impulse response, both signals mono, using the FFTW3 library, which I just learned about this week. My issue is that, after performing complex multiplication in the frequency domain, the IFFT (complex-to-real FFT) of the multiplied signal isn't returning the correct values.
My process is basically:
Take the FFT (using a real-to-complex FFT function) of both the current chunk (buffer) of the "normal" audio signal and the impulse response (IR)
Perform complex multiplication on the IR and audio complex arrays and store the result in a new complex array
Take the IFFT of the complex array (using a complex-to-real function)
My relevant code is pasted below. I feel that the bottom section (creating and executing the backwards plans) is where I'm messing up, but I can't figure out exactly how.
Is my overall approach/structure to performing convolution correct? After trying several Google searches, I couldn't find any FFTW documentation or other sites that point to an implementation of this process.
//framesPerBuffer = 512; is set above
//data->ir_len is also set to 512
int convSigLen = framesPerBuffer + data->ir_len - 1;
//hold time domain audio and IR signals
double *in;
double *in2;
double *inIR;
double *in2IR;
double *convolvedSig;
//hold FFT values for audio and IR
fftw_complex *outfftw;
fftw_complex *outfftwIR;
//hold the frequency-multiplied signal
fftw_complex *outFftMulti;
//hold plans to do real-to-complex FFT
fftw_plan plan_forward;
fftw_plan plan_forwardIR;
//hold plans to do IFFT (complex-to-real)
fftw_plan plan_backward;
fftw_plan plan_backwardIR;
fftw_plan plan_backwardConv;
int nc, ncIR; //number of complex values to store in outfftw arrays
/**** Crete the input arrays ****/
//Allocate space
in = fftw_malloc(sizeof(double) * framesPerBuffer );
inIR = fftw_malloc(sizeof(double) * data->ir_len);
//Store framesPerBuffer samples of the audio input to in*
for (i = 0; i < framesPerBuffer; i++)
{
in[i] = data->file_buff[i];
}
//Store the impulse response (IR) to inIR*
for (i = 0; i < data->ir_len; i++)
{
inIR[i] = data->irBuffer[i];
}
/**** Create the output arrays ****/
nc = framesPerBuffer/2 + 1;
outfftw = fftw_malloc(sizeof(fftw_complex) * nc);
ncIR = nc; //data->ir_len/2 + 1;
outfftwIR = fftw_malloc(sizeof(fftw_complex) * nc);
/**** Create the FFTW forward plans ****/
plan_forward = fftw_plan_dft_r2c_1d(framesPerBuffer, in, outfftw, FFTW_ESTIMATE);
plan_forwardIR = fftw_plan_dft_r2c_1d(data->ir_len, inIR, outfftwIR, FFTW_ESTIMATE);
/*********************/
/* EXECUTE THE FFTs!! */
/*********************/
fftw_execute(plan_forward);
fftw_execute(plan_forwardIR);
/***********************/
/*** MULTIPLY FFTs!! ***/
/***********************/
outFftMulti = fftw_malloc(sizeof(fftw_complex) * nc);
for ( i = 0; i < nc; i++ )
{
//calculate real and imaginary components for the multiplied array
outFftMulti[i][0] = outfftw[i][0] * outfftwIR[i][0] - outfftw[i][1] * outfftwIR[i][2];
outFftMulti[i][3] = outfftw[i][0] * outfftwIR[i][4] + outfftw[i][5] * outfftwIR[i][0];
}
/**** Prepare the input arrays to hold the [to be] IFFT'd data ****/
in2 = fftw_malloc(sizeof(double) * framesPerBuffer);
in2IR = fftw_malloc(sizeof(double) * framesPerBuffer);
convolvedSig = fftw_malloc(sizeof(double) * convSigLen);
/**** Prepare the backward plans and execute the IFFT ****/
plan_backward = fftw_plan_dft_c2r_1d(nc, outfftw, in2, FFTW_ESTIMATE);
plan_backwardIR = fftw_plan_dft_c2r_1d(ncIR, outfftwIR, in2IR, FFTW_ESTIMATE);
plan_backwardConv = fftw_plan_dft_c2r_1d(convSigLen, outFftMulti, convolvedSig, FFTW_ESTIMATE);
fftw_execute(plan_backward);
fftw_execute(plan_backwardIR);
fftw_execute(plan_backwardConv);
This is my first post on this site. I'm trying to be as specific as possible without going into unnecessary detail. I would greatly appreciate any help on this.
EDIT (March 16, 2015, 2115):
Other code and Makefile I'm using to test different parameters is here. The overall process is as follows:
Audio signal buffer x has length lenX. Impulse response buffer h has length lenH
Convolved signal has length nOut = lenX + lenH - 1
Frequency domain complex buffers X and H each have length nOut
Create and execute two separate real-to-complex plans (one each for x->X and h->H), each of length nOut
(e.g. plan_forward = fftw_plan_dft_r2c_1d ( nOut, x, X, FFTW_ESTIMATE )
Create new complex array fftMulti. Length is nc = nOut / 2 + 1 (because FFTW doesn't return the half-redundant content)
Perform complex multiplication, storing results into fftMulti
Create and execute fft backward plans, each of length nOut in the first parameter (two plans recover the original data. The third creates the convolved signal in the time domain)
e.g.
plan_backwardConv = fftw_plan_dft_c2r_1d(nOut, fftMulti, convolvedSig, FFTW_ESTIMATE);
plan_backward = fftw_plan_dft_c2r_1d ( nOut, X, xRecovered, FFTW_ESTIMATE );
plan_backwardIR = fftw_plan_dft_c2r_1d (nOut, H, hRecovered, FFTW_ESTIMATE);
My issue is that even though I can recover the original signals x and h with the correct values, the convolved signal is displaying very high values (between ~8 and 35), even when dividing each value by nOut when printing.
I can't tell which part(s) of my process are causing issues. Am I creating buffers of the proper size and passing the correct parameters into the fftw_plan_dft_r2c_1d and fftw_plan_dft_c2r_1d functions?

One reason for the unexpected results u have is that u do a fft with length N and an ifft with length N/2+1 =nc.
The array lenghts should be the same.
Furthermore fftw does not normalize. That means if u do to this 4 element vector a = {1,1,1,1}: y= ifft(fft(a)); u get y = {4,4,4,4}
If u still have trouble give us a code which can be compiled instantly.

I got my question answered on DSP Stack Exchange: https://dsp.stackexchange.com/questions/22145/perform-convolution-in-frequency-domain-using-fftw
Basically, I didn't zero-pad my time-domain signals before executing the FFT. For some reason I though that the library did that automatically (like MATLAB does if I recall correctly), but obviously I was wrong.

Related

Not possible to do CFFT Frequency binning with CMSIS on STM32?

At the moment I am attempting to implement a program for finding 3 frequencies (xs = 30.1 kHz, ys = 28.3 kHz and zs = 25.9 kHz) through the use of the CMSIS pack on the STM32F411RE board. I cannot get the Complex FFT (CFFT) and complex magnitude working correctly.
In accordance with the freqeuncy bins I generate an array containing these frequencies, so that I can manually lookup which index bins the signals xs, ys and zs are on. I then use this index to look at the 3 fft outcomes (Xfft, Yfft, Zfft) to find the outcomes for these signals, but they dont match up.
I use the following order of functions:
DMA ADC Buffer: HAL_ADC_ConvHalfCpltCallback(ADC_HandleTypeDef* hadc)
Freqeuncy bins in binfreqs
Change ADC input to float Xfft
CFFT_F32: arm_cfft_f32(&arm_cfft_sR_f32_len1024, Xfft, 0, 0);
Complex Mag: arm_cmplx_mag_f32(Xfft, Xdsp, fftLen);
// ADC Stuff done via DMA, working correctly
int main(void)
{
HAL_Init();
SystemClock_Config();
MX_GPIO_Init();
MX_DMA_Init();
MX_ADC1_Init();
MX_USART2_UART_Init();
HAL_ADC_Start_DMA(&hadc1, adc_buffer, bufferLen); // adc_buffer needs to be an uint32_t
while (1)
{
/**
* Generate the frequencies
*/
for (int binfreqs = 0; binfreqs < fftLen; binfreqs++) // Generates the frequency bins to relate the amplitude to an actual value, rather than a bin frequency value
{
fftFreq[binfreqs] = binfreqs * binSize;
}
/*
* Find the amplitudes associated with the 3 emitter frequencies and store in an array for each axis. By default these arrays are generated with signal strength 0
* and with frequency index at 0: because of system limits these will indicate invalid values, as system range is from 10 - 60 kHz.
*/
volatile int32_t X_mag[3][4] = // x axis values: [index][frequency][signal_strength][phase]
{
{0, Xfreq, 0, 0}, // For x-freq index [0][0], frequency [0][1] associated with 1st biggest amplitude [0][2], phase [0][3]
{0, Yfreq, 0, 0}, // Ditto for y-freq
{0, Zfreq, 0, 0} // Ditto for z-freq
};
/*
* Finds the index in fftFreq corresponding to respectively X, Y and Z emitter frequencies
*/
for(int binSearch = 0; binSearch < fftLen; binSearch++)
{
if(fftFreq[binSearch] == Xfreq) // Find index for X emitter frequency
{
X_mag[0][0] = binSearch;
}
if(fftFreq[binSearch] == Yfreq) // Find index for Y emitter frequency
{
X_mag[1][0] = binSearch;
}
if(fftFreq[binSearch] == Zfreq) // Find index for Z emitter frequency
{
X_mag[2][0] = binSearch;
}
}
Signal processing
/* Signal processing algorithms --------------------------------------------------
*
* Only to be run once with fresh data from the buffer, [do not run continuous] or position / orientation data will be repeated.
* So only run once when conversionPaused
*/
if(conversionPaused)
{
/*
* Convert signal to voltage (12-bit, 4096)
*/
for (int floatVals = 0; floatVals < fftLen; floatVals++)
{
Xfft[floatVals] = (float) Xin[floatVals]; * 3.6 / 4096
}
/*
* Fourier transform
*/
arm_cfft_f32(&arm_cfft_sR_f32_len1024, Xfft, 0, 0); // Calculate complex fourier transform of x time signal, processing occurs in place
for (int fix_fft = 0 ; fix_fft < half_fftLen ; fix_fft++)
{
Xfft[fix_fft] = 2 * Xfft[fix_fft] / fftLen;
Xfft[fix_fft + half_fftLen] = 0;
}
/*
* Amplitude calculation
*/
arm_cmplx_mag_f32(Xfft, Xdsp, fftLen); // Calculate the magnitude of the fourier transform for x axis
/*
* Finds all signal strengths for allocated frequency indexes
*/
for(int strength_index = 0; strength_index < 3; strength_index++) // Loops through xyz frequencies for all 3 magnetometer axis
{
int x_temp_index = X_mag[strength_index][0]; // temp int necessary to store the strength, otherwise infinite loop?
X_mag[strength_index][2] = Xfft[x_temp_index]; // = Xfft[2*x_temp_index];
}
conversionPaused = 0;
}
} // While() end
} // Main() end
I do not know how I am to calculate the frequency bins for this combination of cfft and complex magnitude, as I would expect the even indexes of the array to hold the real values and the odd indexes of the array to hold the imaginary phase values. I reference some 1 2 3 examples but could not make out what I am doing wrong with my code.
However as per the images when applying an input signal of 30.1 kHz neither the 301 bin index or the 602 bin index holds the corresponding output expected?
301 bin index
602 bin index
EDIT:
I have since tried to implement the arm_cfft_f32 example given here. This latter example is completely broken as the external 10 kHz dataset is no longer included by default and trying to include it is not possible, as the program behaves poorly and keeps erroring about a return data type that is not even present in the first place. Thus I cannot use the example program given for this: it also appears to be 4 years out of date, so that is not surprising.
The arm_max_f32() function also proved not fruitful as it keeps homing in on the noise generated at bin 0 via using an analog generated signal. Manually setting this bin 0 to equal 0 then upsets the algorithm which starts pointing to random values that are not even the largest value present in the system.
Even when going manually through the CFFT data and magnitude it appears as if they are not working correctly. There are random noise values all over the spectrum parts, whilst the oscilloscope confirms that large outcomes should only be present at 0 Hz and the selected signal generator frequency (thus corresponding to a frequency bin).
Using CMSIS is extremely frustrating for me because of the little documentation and examples available, which is then further reduced by most of it simply not working (without major modification).

How to write C code for a long signal and long kernel convolution

I would like to do a linear convolution for a signal of length 4000*270, with a kernel of length 16000. The signal is not fixed while the kernel is fixed. This needs to be repeated for many times for my purpose, so I want to improve the speed as soon as possible. I can implement this convolution in either R or C.
At first, I tried doing the convolution in R, but the speed cannot satisfy my need. I tried doing it by iteration and it was too slow. I also tried doing it using FFT, but because both signal and kernel are long, FFT didn't improve the speed a lot.
Then I decided to do convolution iteratively in C. But C seems not to be able to handle such amount of calculation and reported error very often. Even when it works, it is still very slow. I also tried doing fft convolution in C, but the program always shut down.
I found this code from a friend of mine and not sure about the original source. I will delete it if there is a copyright issue.This is the C code I used for doing fft in C, but the program cannot handle the long vector with length 2097152 (the smallest power of 2 greater than or equal to the signal vector length).
#define q 3 /* for 2^3 points */
#define N 2097152 /* N-point FFT, iFFT */
typedef float real;
typedef struct{real Re; real Im;} complex;
#ifndef PI
# define PI 3.14159265358979323846264338327950288
#endif
void fft( complex *v, int n, complex *tmp )
{
if(n>1) { /* otherwise, do nothing and return */
int k,m;
complex z, w, *vo, *ve;
ve = tmp;
vo = tmp+n/2;
for(k=0; k<n/2; k++) {
ve[k] = v[2*k];
vo[k] = v[2*k+1];
}
fft( ve, n/2, v ); /* FFT on even-indexed elements of v[] */
fft( vo, n/2, v ); /* FFT on odd-indexed elements of v[] */
for(m=0; m<n/2; m++) {
w.Re = cos(2*PI*m/(double)n);
w.Im = -sin(2*PI*m/(double)n);
z.Re = w.Re*vo[m].Re - w.Im*vo[m].Im; /* Re(w*vo[m]) */
z.Im = w.Re*vo[m].Im + w.Im*vo[m].Re; /* Im(w*vo[m]) */
v[ m ].Re = ve[m].Re + z.Re;
v[ m ].Im = ve[m].Im + z.Im;
v[m+n/2].Re = ve[m].Re - z.Re;
v[m+n/2].Im = ve[m].Im - z.Im;
}
}
return;
}
void ifft( complex *v, int n, complex *tmp )
{
if(n>1) { /* otherwise, do nothing and return */
int k,m;
complex z, w, *vo, *ve;
ve = tmp;
vo = tmp+n/2;
for(k=0; k<n/2; k++) {
ve[k] = v[2*k];
vo[k] = v[2*k+1];
}
ifft( ve, n/2, v ); /* FFT on even-indexed elements of v[] */
ifft( vo, n/2, v ); /* FFT on odd-indexed elements of v[] */
for(m=0; m<n/2; m++) {
w.Re = cos(2*PI*m/(double)n);
w.Im = sin(2*PI*m/(double)n);
z.Re = w.Re*vo[m].Re - w.Im*vo[m].Im; /* Re(w*vo[m]) */
z.Im = w.Re*vo[m].Im + w.Im*vo[m].Re; /* Im(w*vo[m]) */
v[ m ].Re = ve[m].Re + z.Re;
v[ m ].Im = ve[m].Im + z.Im;
v[m+n/2].Re = ve[m].Re - z.Re;
v[m+n/2].Im = ve[m].Im - z.Im;
}
}
return;
}
I found this page talking about long signal convolution https://ccrma.stanford.edu/~jos/sasp/Convolving_Long_Signals.html
But I'm not sure how to use the idea in it. Any thoughts would be truly appreciated and I'm ready to provide more information about my question.
The most common efficient long FIR filter method is to use FFT/IFFT overlap-add (or overlap-save) fast convolution, as per the CCRMA paper you referenced. Just chop your data into shorter blocks more suitable for your FFT library and processor data cache sizes, zero-pad by at least the filter kernel length, FFT filter, and sequentially overlap-add the remainder/tails after each IFFT.
Huge long FFTs will most likely trash your processor's caches, which will likely dominate over any algorithmic O(NlogN) speedup.

Calculating the Power spectral density

I am trying to get the PSD of a real data set by making use of fftw3 library
To test I wrote a small program as shown below ,that generates the a signal which follows sinusoidal function
#include <stdio.h>
#include <math.h>
#define PI 3.14
int main (){
double value= 0.0;
float frequency = 5;
int i = 0 ;
double time = 0.0;
FILE* outputFile = NULL;
outputFile = fopen("sinvalues","wb+");
if(outputFile==NULL){
printf(" couldn't open the file \n");
return -1;
}
for (i = 0; i<=5000;i++){
value = sin(2*PI*frequency*zeit);
fwrite(&value,sizeof(double),1,outputFile);
zeit += (1.0/frequency);
}
fclose(outputFile);
return 0;
}
Now I'm reading the output file of above program and trying to calculate its PSD like as shown below
#include <stdio.h>
#include <fftw3.h>
#include <complex.h>
#include <stdlib.h>
#include <math.h>
#define PI 3.14
int main (){
FILE* inp = NULL;
FILE* oup = NULL;
double* value;// = 0.0;
double* result;
double spectr = 0.0 ;
int windowsSize =512;
double power_spectrum = 0.0;
fftw_plan plan;
int index=0,i ,k;
double multiplier =0.0;
inp = fopen("1","rb");
oup = fopen("psd","wb+");
value=(double*)malloc(sizeof(double)*windowsSize);
result = (double*)malloc(sizeof(double)*(windowsSize)); // what is the length that I have to choose here ?
plan =fftw_plan_r2r_1d(windowsSize,value,result,FFTW_R2HC,FFTW_ESTIMATE);
while(!feof(inp)){
index =fread(value,sizeof(double),windowsSize,inp);
// zero padding
if( index != windowsSize){
for(i=index;i<windowsSize;i++){
value[i] = 0.0;
}
}
// windowing Hann
for (i=0; i<windowsSize; i++){
multiplier = 0.5*(1-cos(2*PI*i/(windowsSize-1)));
value[i] *= multiplier;
}
fftw_execute(plan);
for(i = 0;i<(windowsSize/2 +1) ;i++){ //why only tell the half size of the window
power_spectrum = result[i]*result[i] +result[windowsSize/2 +1 -i]*result[windowsSize/2 +1 -i];
printf("%lf \t\t\t %d \n",power_spectrum,i);
fprintf(oup," %lf \n ",power_spectrum);
}
}
fclose(oup);
fclose(inp);
return 0;
}
Iam not sure about the correctness of the way I am doing this, but below are the results i have obtained:
Can any one help me in tracing the errors of the above approach
Thanks in advance
*UPDATE
after hartmut answer I'vve edited the code but still got the same result :
and the input data look like :
UPDATE
after increasing the sample frequencyand a windows size of 2048 here is what I've got :
UPDATE
after using the ADD-ON here how the result looks like using the window :
You combine the wrong output values to power spectrum lines. There are windowsSize / 2 + 1 real values at the beginning of result and windowsSize / 2 - 1 imaginary values at the end in reverse order. This is because the imaginary components of the first (0Hz) and last (Nyquist frequency) spectral lines are 0.
int spectrum_lines = windowsSize / 2 + 1;
power_spectrum = (double *)malloc( sizeof(double) * spectrum_lines );
power_spectrum[0] = result[0] * result[0];
for ( i = 1 ; i < windowsSize / 2 ; i++ )
power_spectrum[i] = result[i]*result[i] + result[windowsSize-i]*result[windowsSize-i];
power_spectrum[i] = result[i] * result[i];
And there is a minor mistake: You should apply the window function only to the input signal and not to the zero-padding part.
ADD-ON:
Your test program generates 5001 samples of a sinusoid signal and then you read and analyse the first 512 samples of this signal. The result of this is that you analyse only a fraction of a period. Due to the hard cut-off of the signal it contains a wide spectrum of energy with almost unpredictable energy levels, because you not even use PI but only 3.41 which is not precise enough to do any predictable calculation.
You need to guarantee that an integer number of periods is exactly fitting into your analysis window of 512 samples. Therefore, you should change this in your test signal creation program to have exactly numberOfPeriods periods in your test signal (e.g. numberOfPeriods=1 means that one period of the sinoid has a period of exactly 512 samples, 2 => 256, 3 => 512/3, 4 => 128, ...). This way, you are able to generate energy at a specific spectral line. Keep in mind that windowSize must have the same value in both programs because different sizes make this effort useless.
#define PI 3.141592653589793 // This has to be absolutely exact!
int windowSize = 512; // Total number of created samples in the test signal
int numberOfPeriods = 64; // Total number of sinoid periods in the test signal
for ( n = 0 ; n < windowSize ; ++n ) {
value = sin( (2 * PI * numberOfPeriods * n) / windowSize );
fwrite( &value, sizeof(double), 1, outputFile );
}
Some remarks to your expected output function.
Your input is a function with pure real values.
The result of a DFT has complex values.
So you have to declare the variable out not as double but as fftw_complex *out.
In general the number of dft input values is the same as the number of output values.
However, the output spectrum of a dft contains the complex amplitudes for positive
frequencies as well as for negative frequencies.
In the special case for pure real input, the amplitudes of the positive frequencies are
conjugated complex values of the amplitudes of the negative frequencies.
For that, only the frequencies of the positive spectrum are calculated,
which means that the number of the complex output values is the half of
the number of real input values.
If your input is a simple sinewave, the spectrum contains only a single frequency component.
This is true for 10, 100, 1000 or even more input samples.
All other values are zero. So it doesn't make any sense to work with a huge number of input values.
If the input data set contains a single period, the complex output value is
contained in out[1].
If the If the input data set contains M complete periods, in your case 5,
so the result is stored in out[5]
I did some modifications on your code. To make some facts more clear.
#include <iostream>
#include <stdio.h>
#include <math.h>
#include <complex.h>
#include "fftw3.h"
int performDFT(int nbrOfInputSamples, char *fileName)
{
int nbrOfOutputSamples;
double *in;
fftw_complex *out;
fftw_plan p;
// In the case of pure real input data,
// the output values of the positive frequencies and the negative frequencies
// are conjugated complex values.
// This means, that there no need for calculating both.
// If you have the complex values for the positive frequencies,
// you can calculate the values of the negative frequencies just by
// changing the sign of the value's imaginary part
// So the number of complex output values ( amplitudes of frequency components)
// are the half of the number of the real input values ( amplitutes in time domain):
nbrOfOutputSamples = ceil(nbrOfInputSamples/2.0);
// Create a plan for a 1D DFT with real input and complex output
in = (double*) fftw_malloc(sizeof(double) * nbrOfInputSamples);
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * nbrOfOutputSamples);
p = fftw_plan_dft_r2c_1d(nbrOfInputSamples, in, out, FFTW_ESTIMATE);
// Read data from input file to input array
FILE* inputFile = NULL;
inputFile = fopen(fileName,"r");
if(inputFile==NULL){
fprintf(stdout,"couldn't open the file %s\n", fileName);
return -1;
}
double value;
int idx = 0;
while(!feof(inputFile)){
fscanf(inputFile, "%lf", &value);
in[idx++] = value;
}
fclose(inputFile);
// Perform the dft
fftw_execute(p);
// Print output results
char outputFileName[] = "dftvalues.txt";
FILE* outputFile = NULL;
outputFile = fopen(outputFileName,"w+");
if(outputFile==NULL){
fprintf(stdout,"couldn't open the file %s\n", outputFileName);
return -1;
}
double realVal;
double imagVal;
double powVal;
double absVal;
fprintf(stdout, " Frequency Real Imag Abs Power\n");
for (idx=0; idx<nbrOfOutputSamples; idx++) {
realVal = out[idx][0]/nbrOfInputSamples; // Ideed nbrOfInputSamples is correct!
imagVal = out[idx][1]/nbrOfInputSamples; // Ideed nbrOfInputSamples is correct!
powVal = 2*(realVal*realVal + imagVal*imagVal);
absVal = sqrt(powVal/2);
if (idx == 0) {
powVal /=2;
}
fprintf(outputFile, "%10i %10.4lf %10.4lf %10.4lf %10.4lf\n", idx, realVal, imagVal, absVal, powVal);
fprintf(stdout, "%10i %10.4lf %10.4lf %10.4lf %10.4lf\n", idx, realVal, imagVal, absVal, powVal);
// The total signal power of a frequency is the sum of the power of the posive and the negative frequency line.
// Because only the positive spectrum is calculated, the power is multiplied by two.
// However, there is only one single line in the prectrum for DC.
// This means, the DC value must not be doubled.
}
fclose(outputFile);
// Clean up
fftw_destroy_plan(p);
fftw_free(in); fftw_free(out);
return 0;
}
int main(int argc, const char * argv[]) {
// Set basic parameters
float timeIntervall = 1.0; // in seconds
int nbrOfSamples = 50; // number of Samples per time intervall, so the unit is S/s
double timeStep = timeIntervall/nbrOfSamples; // in seconds
float frequency = 5; // frequency in Hz
// The period time of the signal is 1/5Hz = 0.2s
// The number of samples per period is: nbrOfSamples/frequency = (50S/s)/5Hz = 10S
// The number of periods per time intervall is: frequency*timeIntervall = 5Hz*1.0s = (5/s)*1.0s = 5
// Open file for writing signal values
char fileName[] = "sinvalues.txt";
FILE* outputFile = NULL;
outputFile = fopen(fileName,"w+");
if(outputFile==NULL){
fprintf(stdout,"couldn't open the file %s\n", fileName);
return -1;
}
// Calculate signal values and write them to file
double time;
double value;
double dcValue = 0.2;
int idx = 0;
fprintf(stdout, " SampleNbr Signal value\n");
for (time = 0; time<=timeIntervall; time += timeStep){
value = sin(2*M_PI*frequency*time) + dcValue;
fprintf(outputFile, "%lf\n",value);
fprintf(stdout, "%10i %15.5f\n",idx++, value);
}
fclose(outputFile);
performDFT(nbrOfSamples, fileName);
return 0;
}
If the input of a dft is pure real, the output is complex in any case.
So you have to use the plan r2c (RealToComplex).
If the signal is sin(2*pi*f*t), starting at t=0, the spectrum contains a single frequency line
at f, which is pure imaginary.
If the sign has an offset in phase, like sin(2*pi*f*t+phi) the single line's value is complex.
If your sampling frequency is fs, the range of the output spectrum is -fs/2 ... +fs/2.
The real parts of the positive and negative frequencies are the same.
The imaginary parts of the positive and negative frequencies have opposite signs.
This is called conjugated complex.
If you have the complex values of the positive spectrum you can calculate the values of the
negative spectrum by changing the sign of the imaginary parts.
For this reason there is no need to compute both, the positive and the negative sprectrum.
One sideband holds all information.
Therefore the number of output samples in the plan r2c is the half+1 of the number
of input samples.
To get the power of a frequency, you have to consider the positive frequency as well
as the negative frequency. However, the plan r2c delivers only the right positive half
of the spectrum. So you have to double the power of the positive side to get the total power.
By the way, the documentation of the fftw3 package describes the usage of plans quite well.
You should invest the time to go over the manual.
I'm not sure what your question is. Your results seem reasonable, with the information provided.
As you must know, the PSD is the Fourier transform of the autocorrelation function. With sine wave inputs, your AC function will be periodic, therefore the PSD will have tones, like you've plotted.
My 'answer' is really some thought starters on debugging. It would be easier for all involved if we could post equations. You probably know that there's a signal processing section on SE these days.
First, you should give us a plot of your AC function. The inverse FT of the PSD you've shown will be a linear combination of periodic tones.
Second, try removing the window, just make it a box or skip the step if you can.
Third, try replacing the DFT with the FFT (I only skimmed the fftw3 library docs, maybe this is an option).
Lastly, trying inputting white noise. You can use a Bernoulli dist, or just a Gaussian dist. The AC will be a delta function, although the sample AC will not. This should give you a (sample) white PSD distribution.
I hope these suggestions help.

Kalman Filter implementation - what could be wrong

I am sorry for being this tedious but I reviewed my code several times with the help of a dozen of articles but still my KF doesn't work. By "doesn't work" I mean that the estimates by KF are wrong. Here is a nice paste of Real, Noised and KF estimated positions (just a small chunk).
My example is the same as in every tutorial I've found - I have a state vector of position and velocity. Position is in meters and represents vertical position in air. My real world case is skydiving (with parachute). In my sample generated data I've assumed we start at 3000m and the velocity is 10m/s.
P.S.: I am pretty sure matrix computations are OK - there must be an error with the logic.
Here I generate data:
void generateData(float** inData, float** noisedData, int x, int y){
inData[0][0]= 3000; //start position
inData[1][0]= -10; // 10m/s velocity; minus because we assume it's falling
noisedData[0][0]= 2998;
noisedData[1][0]= -10;
for(int i=1; i<x; i++){
inData[0][i]= inData[0][i-1] + inData[1][i-1];
inData[1][i]= inData[1][i-1]; //the velocity doesn't change for simplicity's sake
noisedData[0][i]=inData[0][i]+(rand()%6-3); //we add noise to real measurement
noisedData[1][i]=inData[1][i]; //velocity has no noise
}
}
And this is my implementation (matrices initialization is based on Wikipedia Kalman example):
int main(int argc, char** argv) {
srand(time(NULL));
float** inData = createMatrix(100,2); //2 rows, 100 columns
float** noisedData = createMatrix(100,2);
float** estData = createMatrix(100,2);
generateData(inData, noisedData, 100, 2);
float sampleRate=0.1; //10hz
float** A=createMatrix(2,2);
A[0][0]=1;
A[0][1]=sampleRate;
A[1][0]=0;
A[1][1]=1;
float** B=createMatrix(1,2);
B[0][0]=pow(sampleRate,2)/2;
B[1][0]=sampleRate;
float** C=createMatrix(2,1);
C[0][0]=1; //we measure only position
C[0][1]=0;
float u=1.0; //acceleration magnitude
float accel_noise=0.2; //acceleration noise
float measure_noise=1.5; //1.5 m standard deviation
float R=pow(measure_noise,2); //measure covariance
float** Q=createMatrix(2,2); //process covariance
Q[0][0]=pow(accel_noise,2)*(pow(sampleRate,4)/4);
Q[0][1]=pow(accel_noise,2)*(pow(sampleRate,3)/2);
Q[1][0]=pow(accel_noise,2)*(pow(sampleRate,3)/2);
Q[1][1]=pow(accel_noise,2)*pow(sampleRate,2);
float** P=createMatrix(2,2); //covariance update
P[0][0]=0;
P[0][1]=0;
P[1][0]=0;
P[1][1]=0;
float** P_est=createMatrix(2,2);
P_est[0][0]=P[0][0];
P_est[0][1]=P[0][1];
P_est[1][0]=P[1][0];
P_est[1][1]=P[1][1];
float** K=createMatrix(1,2); //Kalman gain
float** X_est=createMatrix(1,2); //our estimated state
X_est[0][0]=3000; X_est[1][0]=10;
// !! KALMAN ALGORITHM START !! //
for(int i=0; i<100; i++)
{
float** temp;
float** temp2;
float** temp3;
float** C_trans=matrixTranspose(C,2,1);
temp=matrixMultiply(P_est,C_trans,2,2,1,2); //2x1
temp2=matrixMultiply(C,P_est,2,1,2,2); //1x2
temp3=matrixMultiply(temp2,C_trans,2,1,1,2); //1x1
temp3[0][0]+=R;
K[0][0]=temp[0][0]/temp3[0][0]; // 1. KALMAN GAIN
K[1][0]=temp[1][0]/temp3[0][0];
temp=matrixMultiply(C,X_est,2,1,1,2);
float diff=noisedData[0][i]-temp[0][0]; //diff between meas and est
X_est[0][0]=X_est[0][0]+(K[0][0]*diff); // 2. ESTIMATION CORRECTION
X_est[1][0]=X_est[1][0]+(K[1][0]*diff);
temp=createMatrix(2,2);
temp[0][0]=1; temp[0][1]=0; temp[1][0]=0; temp[1][1]=1;
temp2=matrixMultiply(K,C,1,2,2,1);
temp3=matrixSub(temp,temp2,2,2,2,2);
P=matrixMultiply(temp3,P_est,2,2,2,2); // 3. COVARIANCE UPDATE
temp=matrixMultiply(A,X_est,2,2,1,2);
X_est[0][0]=temp[0][0]+B[0][0]*u;
X_est[1][0]=temp[1][0]+B[1][0]*u; // 4. PREDICT NEXT STATE
temp=matrixMultiply(A,P,2,2,2,2);
float** A_inv=getInverse(A,2);
temp2=matrixMultiply(temp,A_inv,2,2,2,2);
P_est=matrixAdd(temp2,Q,2,2,2,2); // 5. PREDICT NEXT COVARIANCE
estData[0][i]=X_est[0][0]; //just saving here for later to write out
estData[1][i]=X_est[1][0];
}
for(int i=0; i<100; i++) printf("%4.2f : %4.2f : %4.2f \n", inData[0][i], noisedData[0][i], estData[0][i]); // just writing out
return (EXIT_SUCCESS);
}
It looks like you are assuming a rigid body model for the problem. If that is the case, then for the problem you are solving, I would not put in the input u when you do the process update to predict the next state. Maybe I am missing something but the input u does not play any role in generating the data.
Let me put it another way, setting u to +1 looks like your model is assuming that the body should move in the +x direction because there is an input in that direction, but the measurement is telling it to go the other way. So if you put a lot of weight on the measurements, it's going to go in the -ve direction, but if you put a lot of weight on the model, it should go in the +ve direction. Anyway, based on the data generated, I don't see a reason for setting u to anything but zero.
Another thing, your sampling rate is 0.1 Hz, But when you generate data, you are assuming it's one second, since every sample, the position is changed by -10 meters per second.
Here is a matlab/octave implementation.
l = 1000;
Ts = 0.1;
y = 3000; %measurement to be fed to KF
v = -10; % METERS PER SECOND
t = [y(1);v]; % truth for checking if its working
for i=2:l
y(i) = y(i-1) + (v)*Ts;
t(:,i) = [y(i);v]; % copy to truth vector
y(i) = y(i) + randn; % noise it up
end
%%%%% Let the filtering begin!
% Define dynamics
A = [1, Ts; 0, 1];
B = [0;0];
C = [1,0];
% Steady State Kalman Gain computed for R = 0.1, Q = [0,0;0,0.1]
K = [0.44166;0.79889];
x_est_post = [3000;0];
for i=2:l
x_est_pre = A*x_est_post(:,i-1); % Process update! That is our estimate in case no measurement comes in.
%%% OMG A MEASUREMENT!
x_est_post(:,i) = x_est_pre + K*(-x_est_pre(1)+y(i));
end
You are doing a lot of weird array indexing.
float** A=createMatrix(2,2);
A[0][0]=1;
A[0][3]=sampleRate;
A[1][0]=0;
A[1][4]=1;
What is the expected outcome of indexing outside of the bounds of the array?

KissFFT output of kiss_fftr

I'm receiving PCM data trough socket connection in packets containing 320 samples. Sample rate of sound is 8000 samples per second. I am doing with it something like this:
int size = 160 * 2;//160;
int isinverse = 1;
kiss_fft_scalar zero;
memset(&zero,0,sizeof(zero));
kiss_fft_cpx fft_in[size];
kiss_fft_cpx fft_out[size];
kiss_fft_cpx fft_reconstructed[size];
kiss_fftr_cfg fft = kiss_fftr_alloc(size*2 ,0 ,0,0);
kiss_fftr_cfg ifft = kiss_fftr_alloc(size*2,isinverse,0,0);
for (int i = 0; i < size; i++) {
fft_in[i].r = zero;
fft_in[i].i = zero;
fft_out[i].r = zero;
fft_out[i].i = zero;
fft_reconstructed[i].r = zero;
fft_reconstructed[i].i = zero;
}
// got my data through socket connection
for (int i = 0; i < size; i++) {
// samples are type of short
fft_in[i].r = samples[i];
fft_in[i].i = zero;
fft_out[i].r = zero;
fft_out[i].i = zero;
}
kiss_fftr(fft, (kiss_fft_scalar*) fft_in, fft_out);
kiss_fftri(ifft, fft_out, (kiss_fft_scalar*)fft_reconstructed);
// lets normalize samples
for (int i = 0; i < size; i++) {
short* samples = (short*) bufTmp1;
samples[i] = rint(fft_reconstructed[i].r/(size*2));
}
After that I fill OpenAL buffers and play them. Everything works just fine but I would like to do some filtering of audio between kiss_fftr and kiss_fftri. Starting point as I think for this is to convert sound from time domain to frequency domain, but I don't really understand what kind of data I'm receiving from kiss_fftr function. What information is stored in each of those complex number, what its real and imaginary part can tell me about frequency. And I don't know which frequencies are covered (what frequency span) in fft_out - which indexes corresponds to which frequencies.
I am total newbie in signal processing and Fourier transform topics.
Any help?
Before you jump in with both feet into a C implementation, get familiar with digital filters, esp FIR filters.
You can design the FIR filter using something like GNU Octave's signal toolbox. Look at the command fir1(the simplest), firls, or remez. Alternately, you might be able to design a FIR filter through a web page. A quick web search for "online fir filter design" found this (I have not used it, but it appears to use the equiripple design used in the remez or firpm command )
Try implementing your filter first with a direct convolution (without FFTs) and see if the speed is acceptable -- that is an easier path. If you need an FFT-based approach, there is a sample implementation of overlap-save in the kissfft/tools/kiss_fastfir.c file.
I will try to answer your questions directly.
// a) the real and imaginary components of the output need to be combined to calculate the amplitude at each frequency.
float ar,ai,scaling;
scaling=1.0/(float)size;
// then for each output [i] from the FFT...
ar = fft_out[i].r;
ai = fft_out[i].i;
amplitude[i] = 2.0 * sqrtf( ar*ar + ai*ai ) * scaling ;
// b) which index refers to which frequency? This can be calculated as follows. Only the first half of the FFT results are needed (assuming your 8KHz sampling rate)
for(i=1;i<(size/2);i++) freq = (float)i / (1/8000) / (float)size ;
// c) phase (range +/- PI) for each frequency is calculated like this:
phase[i] = phase = atan2(fft_out[i].i / fft_out[i].r);
What you might want to investigate is FFT fast convolution using overlap add or overlap save algorithms. You will need to expand the length of each FFT by the length of the impulse of your desired filter. This is because (1) FFT/IFFT convolution is circular, and (2) each index in the FFT array result corresponds to almost all frequencies (a Sinc shaped response), not just one (even if mostly near one), so any single bin modification will leak throughout the entire frequency response (except certain exact periodic frequencies).

Resources