Optimising radix-2 FFT C code - c

I'm a beginner in C programming.
I am current trying to work on a project requiring 1024-point FFT implementation using radix-2, Decimation-in-frequency.
I attach the FFT function C code below.
How can i increase the performance by modifying the C code.
#include "i_cmplx.h" /* definition of the complex type */
#include "twiddle1024.h" /* quantised and scaled Twiddle factors */
#define LL 1024 /* Maximum length of FFT */
/* fft radix-2 funtion using Decimation In Frequency */
//#pragma CODE_SECTION(fft, "mycode"); // Makes the program run from internal memory
void fft(COMPLEX *Y, int N) /* FFT(input sample array, # of points) */
{
int temp1R, temp1I, temp2R,temp2I; /* 32 bits temporary storage for */
/* intermediate results */
short tempR, tempI, c, s; /* 16 bits temporary storages */
/* variables */
int TwFStep, /* Step between twiddle factors */
TwFIndex, /* Index of twiddle factors */
BLStep, /* Step for incrementing butterfly index */
BLdiff, /* Difference between upper and lower butterfly legs */
upperIdx,
lowerIdx, /* upper and lower indexes of buterfly leg */
i, j, k; /* loop control variables */
BLdiff=N;
TwFStep=1;
for(k=N;k>1;k=(k>>1)) /* Do Log(base 2)(N) Stages */
{
BLStep=BLdiff;
BLdiff=BLdiff>>1;
TwFIndex=0;
for(j=0;j<BLdiff;j++)/* Nbr of twiddle factors to use=BLDiff */
{
c=w[TwFIndex].real;
s=w[TwFIndex].imag;
TwFIndex=TwFIndex+TwFStep;
/* Now do N/BLStep butterflies */
for(upperIdx=j;upperIdx<N;upperIdx+=BLStep)
{
/* Calculations inside this loop avoid overflow by shifting left once
the result of every adittion/substration and by shifting left 15
places the result of every multiplication. Double precision temporary
results (32-bit) are used in order to avoid losing information because
of overflow. Final DFT result is scaled by N (number of points), i.e.,
2^(Nbr of stages) =2^(log(base 2) N) = N */
lowerIdx=upperIdx+BLdiff;
temp1R = (Y[upperIdx].real - Y[lowerIdx].real)>>1;
temp2R = (Y[upperIdx].real + Y[lowerIdx].real)>>1;
Y[upperIdx].real = (short) temp2R;
temp1I = (Y[upperIdx].imag - Y[lowerIdx].imag)>>1;
temp2I = (Y[upperIdx].imag + Y[lowerIdx].imag)>>1;
Y[upperIdx].imag = (short) temp2I;
temp2R = (c*temp1R - s*temp1I)>>15;
Y[lowerIdx].real = (short) temp2R;
temp2I = (c*temp1I + s*temp1R)>>15;
Y[lowerIdx].imag = (short) temp2I;
}
}
TwFStep = TwFStep<<1; /* update separation of twiddle factors)*/
}
/* bit reversal for resequencing data */
j=0;
for (i=1;i<(N-1);i++)
{
k=N/2;
while (k<=j)
{
j = j-k;
k=k/2;
}
j=j+k;
if (i<j)
{
tempR=Y[j].real;
tempI=Y[j].imag;
Y[j].real=Y[i].real;
Y[j].imag=Y[i].imag;
Y[i].real=tempR;
Y[i].imag=tempI;
}
}
return;
}

Related

Why does my audio output increase of 100Hz on each cycle?

I have a bug in my audio code.
Expected behavior: sinewave output, sweeping from 100Hz to 200Hz, resetting to 100Hz every second
Actual behavior: sinewave output, sweeping from 100Hz to 200Hz, but then rising 100Hz on each cycle, so on the second cycle it will sweep from 200Hz to 300Hz, then from 300Hz to 400Hz, and so on...
I'm generating a 1Hz rising sawtooth wave, and scaling and offsetting it so it rises from 100 to 200 every second. I'm also printing its value, which shows that it's behaving as expected.
But for some reason, if I use that value as frequency for my sinewave, the resulting sound rises 100Hz on each cycle.
Plugging a fixed frequency into my sinewave function works as expected.
It's only when I use the two together that I'm getting the bug. The thing I really can't explain is that the bug is only in the output audio -- the printed values are still all fine.
I'm using miniaudio as audio backend, and it's the only dependency. It should compile without errors nor warnings on Win, Linux and Mac.
It's a single header library, you only need to include miniaudio.h, so it should be easy to replicate.
Here is my code:
/*
compiling on Win10 with GCC:
gcc -g0 test_nodep.c -o test_nodep.exe -Wall -Wextra -Wshadow -Wvla -pedantic-errors -ansi
*/
#include <stdio.h>
#include <math.h>
#include <float.h>
#include <stdint.h>
#define MA_NO_DECODING
#define MA_NO_ENCODING
#define MINIAUDIO_IMPLEMENTATION
#include "miniaudio.h" /* https://github.com/mackron/miniaudio - single header file audio os backend library */
/* global variables */
int32_t DEVICE_FORMAT = ma_format_f32; /* 32-bit float */
int32_t DEVICE_CHANNELS = 1; /* mono */
int32_t DEVICE_SAMPLE_RATE = 48000;
float clock = 0;
float time = 0;
static __inline__ float tik(float interval, float len, float offset){
return (len<=0)*(fmod(time-offset, interval)==0) +
(len>0)*((fmod(time-offset, interval)>=0)&&(fmod(time-offset, interval)<=(len)));
}
void data_callback(ma_device* pDevice, void* pOutput, const void* pInput, ma_uint32 frameCount){
float* Samples = pOutput;
ma_uint32 SampleIndex;
/* audio-callback variable definitions */
float test_saw;
float test_saw_freq = 1.f;
float i;
for(SampleIndex = 0; SampleIndex < frameCount; SampleIndex++){
test_saw = fmod(clock, (DEVICE_SAMPLE_RATE/test_saw_freq))/(DEVICE_SAMPLE_RATE/test_saw_freq); /* 1Hz rising saw, output range [0..1] */
test_saw = test_saw * 100.f + 100.f; /* shift range into [100..200] */
if(tik(.125f,0.f,0.f)){ /* this is to print the test_saw value every 1/8 of a second */
printf("== test_saw: %.2f", test_saw);
for(i=0.f;i<test_saw/10.f;i++){
printf(" ");
}
printf("%c\n", 254);
}
/* this is the output function, a sinewave, with frequency sweeping continuously from 100Hz to 200Hz */
/* f(t) = sin(2*PI * frequency + time) */
/* instead of a fixed frequency, I'm using test_saw, sweeping from 100Hz to 200Hz every second */
*Samples = (float)sin((double)(time * MA_TAU * test_saw));
/* using the same function with a fixed frequency works as expected, no problems */
/* *Samples = (float)sin((double)(time * MA_TAU * 100.f)); */
clock++;
clock*=(clock<FLT_MAX); /* continuously rising value, +1 on each sample, zeroes out when float is at its max value, to prevent float overflow */
time = clock/DEVICE_SAMPLE_RATE; /* same value, in seconds */
Samples++;
}
(void)pDevice;
(void)pInput;
}
int main(){
ma_device_config deviceConfig;
ma_device device;
/* audio output device configuration */
deviceConfig = ma_device_config_init(ma_device_type_playback); /* initialize for playback */
deviceConfig.playback.format = DEVICE_FORMAT;
deviceConfig.playback.channels = DEVICE_CHANNELS;
deviceConfig.sampleRate = DEVICE_SAMPLE_RATE;
deviceConfig.dataCallback = data_callback;
/* audio output device initialization */
if(ma_device_init(NULL, &deviceConfig, &device) != MA_SUCCESS){
printf("Failed to open playback device.\n");
return -4;
}
printf("== Device Name: %s\n", device.playback.name);
printf("== Sample Rate: %u Hz\n", DEVICE_SAMPLE_RATE);
if (ma_device_start(&device) != MA_SUCCESS) {
printf("== Failed to start playback device.\n");
ma_device_uninit(&device);
return -5;
}
printf("~~~ You should hear sound now ~~~\n");
printf("== Press Enter to quit...");
getchar();
ma_device_uninit(&device); /* turn off sound */
return 0;
}

Not possible to do CFFT Frequency binning with CMSIS on STM32?

At the moment I am attempting to implement a program for finding 3 frequencies (xs = 30.1 kHz, ys = 28.3 kHz and zs = 25.9 kHz) through the use of the CMSIS pack on the STM32F411RE board. I cannot get the Complex FFT (CFFT) and complex magnitude working correctly.
In accordance with the freqeuncy bins I generate an array containing these frequencies, so that I can manually lookup which index bins the signals xs, ys and zs are on. I then use this index to look at the 3 fft outcomes (Xfft, Yfft, Zfft) to find the outcomes for these signals, but they dont match up.
I use the following order of functions:
DMA ADC Buffer: HAL_ADC_ConvHalfCpltCallback(ADC_HandleTypeDef* hadc)
Freqeuncy bins in binfreqs
Change ADC input to float Xfft
CFFT_F32: arm_cfft_f32(&arm_cfft_sR_f32_len1024, Xfft, 0, 0);
Complex Mag: arm_cmplx_mag_f32(Xfft, Xdsp, fftLen);
// ADC Stuff done via DMA, working correctly
int main(void)
{
HAL_Init();
SystemClock_Config();
MX_GPIO_Init();
MX_DMA_Init();
MX_ADC1_Init();
MX_USART2_UART_Init();
HAL_ADC_Start_DMA(&hadc1, adc_buffer, bufferLen); // adc_buffer needs to be an uint32_t
while (1)
{
/**
* Generate the frequencies
*/
for (int binfreqs = 0; binfreqs < fftLen; binfreqs++) // Generates the frequency bins to relate the amplitude to an actual value, rather than a bin frequency value
{
fftFreq[binfreqs] = binfreqs * binSize;
}
/*
* Find the amplitudes associated with the 3 emitter frequencies and store in an array for each axis. By default these arrays are generated with signal strength 0
* and with frequency index at 0: because of system limits these will indicate invalid values, as system range is from 10 - 60 kHz.
*/
volatile int32_t X_mag[3][4] = // x axis values: [index][frequency][signal_strength][phase]
{
{0, Xfreq, 0, 0}, // For x-freq index [0][0], frequency [0][1] associated with 1st biggest amplitude [0][2], phase [0][3]
{0, Yfreq, 0, 0}, // Ditto for y-freq
{0, Zfreq, 0, 0} // Ditto for z-freq
};
/*
* Finds the index in fftFreq corresponding to respectively X, Y and Z emitter frequencies
*/
for(int binSearch = 0; binSearch < fftLen; binSearch++)
{
if(fftFreq[binSearch] == Xfreq) // Find index for X emitter frequency
{
X_mag[0][0] = binSearch;
}
if(fftFreq[binSearch] == Yfreq) // Find index for Y emitter frequency
{
X_mag[1][0] = binSearch;
}
if(fftFreq[binSearch] == Zfreq) // Find index for Z emitter frequency
{
X_mag[2][0] = binSearch;
}
}
Signal processing
/* Signal processing algorithms --------------------------------------------------
*
* Only to be run once with fresh data from the buffer, [do not run continuous] or position / orientation data will be repeated.
* So only run once when conversionPaused
*/
if(conversionPaused)
{
/*
* Convert signal to voltage (12-bit, 4096)
*/
for (int floatVals = 0; floatVals < fftLen; floatVals++)
{
Xfft[floatVals] = (float) Xin[floatVals]; * 3.6 / 4096
}
/*
* Fourier transform
*/
arm_cfft_f32(&arm_cfft_sR_f32_len1024, Xfft, 0, 0); // Calculate complex fourier transform of x time signal, processing occurs in place
for (int fix_fft = 0 ; fix_fft < half_fftLen ; fix_fft++)
{
Xfft[fix_fft] = 2 * Xfft[fix_fft] / fftLen;
Xfft[fix_fft + half_fftLen] = 0;
}
/*
* Amplitude calculation
*/
arm_cmplx_mag_f32(Xfft, Xdsp, fftLen); // Calculate the magnitude of the fourier transform for x axis
/*
* Finds all signal strengths for allocated frequency indexes
*/
for(int strength_index = 0; strength_index < 3; strength_index++) // Loops through xyz frequencies for all 3 magnetometer axis
{
int x_temp_index = X_mag[strength_index][0]; // temp int necessary to store the strength, otherwise infinite loop?
X_mag[strength_index][2] = Xfft[x_temp_index]; // = Xfft[2*x_temp_index];
}
conversionPaused = 0;
}
} // While() end
} // Main() end
I do not know how I am to calculate the frequency bins for this combination of cfft and complex magnitude, as I would expect the even indexes of the array to hold the real values and the odd indexes of the array to hold the imaginary phase values. I reference some 1 2 3 examples but could not make out what I am doing wrong with my code.
However as per the images when applying an input signal of 30.1 kHz neither the 301 bin index or the 602 bin index holds the corresponding output expected?
301 bin index
602 bin index
EDIT:
I have since tried to implement the arm_cfft_f32 example given here. This latter example is completely broken as the external 10 kHz dataset is no longer included by default and trying to include it is not possible, as the program behaves poorly and keeps erroring about a return data type that is not even present in the first place. Thus I cannot use the example program given for this: it also appears to be 4 years out of date, so that is not surprising.
The arm_max_f32() function also proved not fruitful as it keeps homing in on the noise generated at bin 0 via using an analog generated signal. Manually setting this bin 0 to equal 0 then upsets the algorithm which starts pointing to random values that are not even the largest value present in the system.
Even when going manually through the CFFT data and magnitude it appears as if they are not working correctly. There are random noise values all over the spectrum parts, whilst the oscilloscope confirms that large outcomes should only be present at 0 Hz and the selected signal generator frequency (thus corresponding to a frequency bin).
Using CMSIS is extremely frustrating for me because of the little documentation and examples available, which is then further reduced by most of it simply not working (without major modification).

How to write C code for a long signal and long kernel convolution

I would like to do a linear convolution for a signal of length 4000*270, with a kernel of length 16000. The signal is not fixed while the kernel is fixed. This needs to be repeated for many times for my purpose, so I want to improve the speed as soon as possible. I can implement this convolution in either R or C.
At first, I tried doing the convolution in R, but the speed cannot satisfy my need. I tried doing it by iteration and it was too slow. I also tried doing it using FFT, but because both signal and kernel are long, FFT didn't improve the speed a lot.
Then I decided to do convolution iteratively in C. But C seems not to be able to handle such amount of calculation and reported error very often. Even when it works, it is still very slow. I also tried doing fft convolution in C, but the program always shut down.
I found this code from a friend of mine and not sure about the original source. I will delete it if there is a copyright issue.This is the C code I used for doing fft in C, but the program cannot handle the long vector with length 2097152 (the smallest power of 2 greater than or equal to the signal vector length).
#define q 3 /* for 2^3 points */
#define N 2097152 /* N-point FFT, iFFT */
typedef float real;
typedef struct{real Re; real Im;} complex;
#ifndef PI
# define PI 3.14159265358979323846264338327950288
#endif
void fft( complex *v, int n, complex *tmp )
{
if(n>1) { /* otherwise, do nothing and return */
int k,m;
complex z, w, *vo, *ve;
ve = tmp;
vo = tmp+n/2;
for(k=0; k<n/2; k++) {
ve[k] = v[2*k];
vo[k] = v[2*k+1];
}
fft( ve, n/2, v ); /* FFT on even-indexed elements of v[] */
fft( vo, n/2, v ); /* FFT on odd-indexed elements of v[] */
for(m=0; m<n/2; m++) {
w.Re = cos(2*PI*m/(double)n);
w.Im = -sin(2*PI*m/(double)n);
z.Re = w.Re*vo[m].Re - w.Im*vo[m].Im; /* Re(w*vo[m]) */
z.Im = w.Re*vo[m].Im + w.Im*vo[m].Re; /* Im(w*vo[m]) */
v[ m ].Re = ve[m].Re + z.Re;
v[ m ].Im = ve[m].Im + z.Im;
v[m+n/2].Re = ve[m].Re - z.Re;
v[m+n/2].Im = ve[m].Im - z.Im;
}
}
return;
}
void ifft( complex *v, int n, complex *tmp )
{
if(n>1) { /* otherwise, do nothing and return */
int k,m;
complex z, w, *vo, *ve;
ve = tmp;
vo = tmp+n/2;
for(k=0; k<n/2; k++) {
ve[k] = v[2*k];
vo[k] = v[2*k+1];
}
ifft( ve, n/2, v ); /* FFT on even-indexed elements of v[] */
ifft( vo, n/2, v ); /* FFT on odd-indexed elements of v[] */
for(m=0; m<n/2; m++) {
w.Re = cos(2*PI*m/(double)n);
w.Im = sin(2*PI*m/(double)n);
z.Re = w.Re*vo[m].Re - w.Im*vo[m].Im; /* Re(w*vo[m]) */
z.Im = w.Re*vo[m].Im + w.Im*vo[m].Re; /* Im(w*vo[m]) */
v[ m ].Re = ve[m].Re + z.Re;
v[ m ].Im = ve[m].Im + z.Im;
v[m+n/2].Re = ve[m].Re - z.Re;
v[m+n/2].Im = ve[m].Im - z.Im;
}
}
return;
}
I found this page talking about long signal convolution https://ccrma.stanford.edu/~jos/sasp/Convolving_Long_Signals.html
But I'm not sure how to use the idea in it. Any thoughts would be truly appreciated and I'm ready to provide more information about my question.
The most common efficient long FIR filter method is to use FFT/IFFT overlap-add (or overlap-save) fast convolution, as per the CCRMA paper you referenced. Just chop your data into shorter blocks more suitable for your FFT library and processor data cache sizes, zero-pad by at least the filter kernel length, FFT filter, and sequentially overlap-add the remainder/tails after each IFFT.
Huge long FFTs will most likely trash your processor's caches, which will likely dominate over any algorithmic O(NlogN) speedup.

Calculating the Power spectral density

I am trying to get the PSD of a real data set by making use of fftw3 library
To test I wrote a small program as shown below ,that generates the a signal which follows sinusoidal function
#include <stdio.h>
#include <math.h>
#define PI 3.14
int main (){
double value= 0.0;
float frequency = 5;
int i = 0 ;
double time = 0.0;
FILE* outputFile = NULL;
outputFile = fopen("sinvalues","wb+");
if(outputFile==NULL){
printf(" couldn't open the file \n");
return -1;
}
for (i = 0; i<=5000;i++){
value = sin(2*PI*frequency*zeit);
fwrite(&value,sizeof(double),1,outputFile);
zeit += (1.0/frequency);
}
fclose(outputFile);
return 0;
}
Now I'm reading the output file of above program and trying to calculate its PSD like as shown below
#include <stdio.h>
#include <fftw3.h>
#include <complex.h>
#include <stdlib.h>
#include <math.h>
#define PI 3.14
int main (){
FILE* inp = NULL;
FILE* oup = NULL;
double* value;// = 0.0;
double* result;
double spectr = 0.0 ;
int windowsSize =512;
double power_spectrum = 0.0;
fftw_plan plan;
int index=0,i ,k;
double multiplier =0.0;
inp = fopen("1","rb");
oup = fopen("psd","wb+");
value=(double*)malloc(sizeof(double)*windowsSize);
result = (double*)malloc(sizeof(double)*(windowsSize)); // what is the length that I have to choose here ?
plan =fftw_plan_r2r_1d(windowsSize,value,result,FFTW_R2HC,FFTW_ESTIMATE);
while(!feof(inp)){
index =fread(value,sizeof(double),windowsSize,inp);
// zero padding
if( index != windowsSize){
for(i=index;i<windowsSize;i++){
value[i] = 0.0;
}
}
// windowing Hann
for (i=0; i<windowsSize; i++){
multiplier = 0.5*(1-cos(2*PI*i/(windowsSize-1)));
value[i] *= multiplier;
}
fftw_execute(plan);
for(i = 0;i<(windowsSize/2 +1) ;i++){ //why only tell the half size of the window
power_spectrum = result[i]*result[i] +result[windowsSize/2 +1 -i]*result[windowsSize/2 +1 -i];
printf("%lf \t\t\t %d \n",power_spectrum,i);
fprintf(oup," %lf \n ",power_spectrum);
}
}
fclose(oup);
fclose(inp);
return 0;
}
Iam not sure about the correctness of the way I am doing this, but below are the results i have obtained:
Can any one help me in tracing the errors of the above approach
Thanks in advance
*UPDATE
after hartmut answer I'vve edited the code but still got the same result :
and the input data look like :
UPDATE
after increasing the sample frequencyand a windows size of 2048 here is what I've got :
UPDATE
after using the ADD-ON here how the result looks like using the window :
You combine the wrong output values to power spectrum lines. There are windowsSize / 2 + 1 real values at the beginning of result and windowsSize / 2 - 1 imaginary values at the end in reverse order. This is because the imaginary components of the first (0Hz) and last (Nyquist frequency) spectral lines are 0.
int spectrum_lines = windowsSize / 2 + 1;
power_spectrum = (double *)malloc( sizeof(double) * spectrum_lines );
power_spectrum[0] = result[0] * result[0];
for ( i = 1 ; i < windowsSize / 2 ; i++ )
power_spectrum[i] = result[i]*result[i] + result[windowsSize-i]*result[windowsSize-i];
power_spectrum[i] = result[i] * result[i];
And there is a minor mistake: You should apply the window function only to the input signal and not to the zero-padding part.
ADD-ON:
Your test program generates 5001 samples of a sinusoid signal and then you read and analyse the first 512 samples of this signal. The result of this is that you analyse only a fraction of a period. Due to the hard cut-off of the signal it contains a wide spectrum of energy with almost unpredictable energy levels, because you not even use PI but only 3.41 which is not precise enough to do any predictable calculation.
You need to guarantee that an integer number of periods is exactly fitting into your analysis window of 512 samples. Therefore, you should change this in your test signal creation program to have exactly numberOfPeriods periods in your test signal (e.g. numberOfPeriods=1 means that one period of the sinoid has a period of exactly 512 samples, 2 => 256, 3 => 512/3, 4 => 128, ...). This way, you are able to generate energy at a specific spectral line. Keep in mind that windowSize must have the same value in both programs because different sizes make this effort useless.
#define PI 3.141592653589793 // This has to be absolutely exact!
int windowSize = 512; // Total number of created samples in the test signal
int numberOfPeriods = 64; // Total number of sinoid periods in the test signal
for ( n = 0 ; n < windowSize ; ++n ) {
value = sin( (2 * PI * numberOfPeriods * n) / windowSize );
fwrite( &value, sizeof(double), 1, outputFile );
}
Some remarks to your expected output function.
Your input is a function with pure real values.
The result of a DFT has complex values.
So you have to declare the variable out not as double but as fftw_complex *out.
In general the number of dft input values is the same as the number of output values.
However, the output spectrum of a dft contains the complex amplitudes for positive
frequencies as well as for negative frequencies.
In the special case for pure real input, the amplitudes of the positive frequencies are
conjugated complex values of the amplitudes of the negative frequencies.
For that, only the frequencies of the positive spectrum are calculated,
which means that the number of the complex output values is the half of
the number of real input values.
If your input is a simple sinewave, the spectrum contains only a single frequency component.
This is true for 10, 100, 1000 or even more input samples.
All other values are zero. So it doesn't make any sense to work with a huge number of input values.
If the input data set contains a single period, the complex output value is
contained in out[1].
If the If the input data set contains M complete periods, in your case 5,
so the result is stored in out[5]
I did some modifications on your code. To make some facts more clear.
#include <iostream>
#include <stdio.h>
#include <math.h>
#include <complex.h>
#include "fftw3.h"
int performDFT(int nbrOfInputSamples, char *fileName)
{
int nbrOfOutputSamples;
double *in;
fftw_complex *out;
fftw_plan p;
// In the case of pure real input data,
// the output values of the positive frequencies and the negative frequencies
// are conjugated complex values.
// This means, that there no need for calculating both.
// If you have the complex values for the positive frequencies,
// you can calculate the values of the negative frequencies just by
// changing the sign of the value's imaginary part
// So the number of complex output values ( amplitudes of frequency components)
// are the half of the number of the real input values ( amplitutes in time domain):
nbrOfOutputSamples = ceil(nbrOfInputSamples/2.0);
// Create a plan for a 1D DFT with real input and complex output
in = (double*) fftw_malloc(sizeof(double) * nbrOfInputSamples);
out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * nbrOfOutputSamples);
p = fftw_plan_dft_r2c_1d(nbrOfInputSamples, in, out, FFTW_ESTIMATE);
// Read data from input file to input array
FILE* inputFile = NULL;
inputFile = fopen(fileName,"r");
if(inputFile==NULL){
fprintf(stdout,"couldn't open the file %s\n", fileName);
return -1;
}
double value;
int idx = 0;
while(!feof(inputFile)){
fscanf(inputFile, "%lf", &value);
in[idx++] = value;
}
fclose(inputFile);
// Perform the dft
fftw_execute(p);
// Print output results
char outputFileName[] = "dftvalues.txt";
FILE* outputFile = NULL;
outputFile = fopen(outputFileName,"w+");
if(outputFile==NULL){
fprintf(stdout,"couldn't open the file %s\n", outputFileName);
return -1;
}
double realVal;
double imagVal;
double powVal;
double absVal;
fprintf(stdout, " Frequency Real Imag Abs Power\n");
for (idx=0; idx<nbrOfOutputSamples; idx++) {
realVal = out[idx][0]/nbrOfInputSamples; // Ideed nbrOfInputSamples is correct!
imagVal = out[idx][1]/nbrOfInputSamples; // Ideed nbrOfInputSamples is correct!
powVal = 2*(realVal*realVal + imagVal*imagVal);
absVal = sqrt(powVal/2);
if (idx == 0) {
powVal /=2;
}
fprintf(outputFile, "%10i %10.4lf %10.4lf %10.4lf %10.4lf\n", idx, realVal, imagVal, absVal, powVal);
fprintf(stdout, "%10i %10.4lf %10.4lf %10.4lf %10.4lf\n", idx, realVal, imagVal, absVal, powVal);
// The total signal power of a frequency is the sum of the power of the posive and the negative frequency line.
// Because only the positive spectrum is calculated, the power is multiplied by two.
// However, there is only one single line in the prectrum for DC.
// This means, the DC value must not be doubled.
}
fclose(outputFile);
// Clean up
fftw_destroy_plan(p);
fftw_free(in); fftw_free(out);
return 0;
}
int main(int argc, const char * argv[]) {
// Set basic parameters
float timeIntervall = 1.0; // in seconds
int nbrOfSamples = 50; // number of Samples per time intervall, so the unit is S/s
double timeStep = timeIntervall/nbrOfSamples; // in seconds
float frequency = 5; // frequency in Hz
// The period time of the signal is 1/5Hz = 0.2s
// The number of samples per period is: nbrOfSamples/frequency = (50S/s)/5Hz = 10S
// The number of periods per time intervall is: frequency*timeIntervall = 5Hz*1.0s = (5/s)*1.0s = 5
// Open file for writing signal values
char fileName[] = "sinvalues.txt";
FILE* outputFile = NULL;
outputFile = fopen(fileName,"w+");
if(outputFile==NULL){
fprintf(stdout,"couldn't open the file %s\n", fileName);
return -1;
}
// Calculate signal values and write them to file
double time;
double value;
double dcValue = 0.2;
int idx = 0;
fprintf(stdout, " SampleNbr Signal value\n");
for (time = 0; time<=timeIntervall; time += timeStep){
value = sin(2*M_PI*frequency*time) + dcValue;
fprintf(outputFile, "%lf\n",value);
fprintf(stdout, "%10i %15.5f\n",idx++, value);
}
fclose(outputFile);
performDFT(nbrOfSamples, fileName);
return 0;
}
If the input of a dft is pure real, the output is complex in any case.
So you have to use the plan r2c (RealToComplex).
If the signal is sin(2*pi*f*t), starting at t=0, the spectrum contains a single frequency line
at f, which is pure imaginary.
If the sign has an offset in phase, like sin(2*pi*f*t+phi) the single line's value is complex.
If your sampling frequency is fs, the range of the output spectrum is -fs/2 ... +fs/2.
The real parts of the positive and negative frequencies are the same.
The imaginary parts of the positive and negative frequencies have opposite signs.
This is called conjugated complex.
If you have the complex values of the positive spectrum you can calculate the values of the
negative spectrum by changing the sign of the imaginary parts.
For this reason there is no need to compute both, the positive and the negative sprectrum.
One sideband holds all information.
Therefore the number of output samples in the plan r2c is the half+1 of the number
of input samples.
To get the power of a frequency, you have to consider the positive frequency as well
as the negative frequency. However, the plan r2c delivers only the right positive half
of the spectrum. So you have to double the power of the positive side to get the total power.
By the way, the documentation of the fftw3 package describes the usage of plans quite well.
You should invest the time to go over the manual.
I'm not sure what your question is. Your results seem reasonable, with the information provided.
As you must know, the PSD is the Fourier transform of the autocorrelation function. With sine wave inputs, your AC function will be periodic, therefore the PSD will have tones, like you've plotted.
My 'answer' is really some thought starters on debugging. It would be easier for all involved if we could post equations. You probably know that there's a signal processing section on SE these days.
First, you should give us a plot of your AC function. The inverse FT of the PSD you've shown will be a linear combination of periodic tones.
Second, try removing the window, just make it a box or skip the step if you can.
Third, try replacing the DFT with the FFT (I only skimmed the fftw3 library docs, maybe this is an option).
Lastly, trying inputting white noise. You can use a Bernoulli dist, or just a Gaussian dist. The AC will be a delta function, although the sample AC will not. This should give you a (sample) white PSD distribution.
I hope these suggestions help.

Parallelize code that uses struct having pointer to pointer type elements using CUDA

If I have a code which takes struct variable as input and manipulate it's elements, how can I parallelize this using CUDA?
void BackpropagateLayer(NET* Net, LAYER* Upper, LAYER* Lower)
{
INT i,j;
REAL Out, Err;
for (i=1; i<=Lower->Units; i++) {
Out = Lower->Output[i];
Err = 0;
for (j=1; j<=Upper->Units; j++) {
Err += Upper->Weight[j][i] * Upper->Error[j];
}
Lower->Error[i] = Net->Gain * Out * (1-Out) * Err;
}
}
Where NET and LAYER are structs defined as:
typedef struct { /* A LAYER OF A NET: */
INT Units; /* - number of units in this layer */
REAL* Output; /* - output of ith unit */
REAL* Error; /* - error term of ith unit */
REAL** Weight; /* - connection weights to ith unit */
REAL** WeightSave; /* - saved weights for stopped training */
REAL** dWeight; /* - last weight deltas for momentum */
} LAYER;
typedef struct { /* A NET: */
LAYER** Layer; /* - layers of this net */
LAYER* InputLayer; /* - input layer */
LAYER* OutputLayer; /* - output layer */
REAL Alpha; /* - momentum factor */
REAL Eta; /* - learning rate */
REAL Gain; /* - gain of sigmoid function */
REAL Error; /* - total net error */
} NET;
What I could think of is to first convert the 2d Weight into 1d. And then send it to kernel to take the product or just use the CUBLAS library. Any suggestions?
If you are implementing your own neural network library then for simple cases (nets with fully connected or sparse layers) I strongly recommend using CUBLAS/CUSPARSE. In such case, all 3 basic linear operations can be elegantly expressed using calls to those libraries:
Feed forward: gemv (gemm in case mini-batch size > 1)
Back prop: gemv (gemm in case mini-batch size > 1) with appropriate transpose flags.
Weight updates: ger (gemm in case mini-batch size > 1).
Momentum can be represented using 3 basic operations (or a separate kernel for better perf).
Things will get much more interesting when you move beyond basic stuff and start adding things like convolutional layers and so on.
In neural nets you have a gazillion of hyper-parameters so I would suggest looking at some existing implementation on how to design your library (like convnet).

Resources