KissFFT output of kiss_fftr - c

I'm receiving PCM data trough socket connection in packets containing 320 samples. Sample rate of sound is 8000 samples per second. I am doing with it something like this:
int size = 160 * 2;//160;
int isinverse = 1;
kiss_fft_scalar zero;
memset(&zero,0,sizeof(zero));
kiss_fft_cpx fft_in[size];
kiss_fft_cpx fft_out[size];
kiss_fft_cpx fft_reconstructed[size];
kiss_fftr_cfg fft = kiss_fftr_alloc(size*2 ,0 ,0,0);
kiss_fftr_cfg ifft = kiss_fftr_alloc(size*2,isinverse,0,0);
for (int i = 0; i < size; i++) {
fft_in[i].r = zero;
fft_in[i].i = zero;
fft_out[i].r = zero;
fft_out[i].i = zero;
fft_reconstructed[i].r = zero;
fft_reconstructed[i].i = zero;
}
// got my data through socket connection
for (int i = 0; i < size; i++) {
// samples are type of short
fft_in[i].r = samples[i];
fft_in[i].i = zero;
fft_out[i].r = zero;
fft_out[i].i = zero;
}
kiss_fftr(fft, (kiss_fft_scalar*) fft_in, fft_out);
kiss_fftri(ifft, fft_out, (kiss_fft_scalar*)fft_reconstructed);
// lets normalize samples
for (int i = 0; i < size; i++) {
short* samples = (short*) bufTmp1;
samples[i] = rint(fft_reconstructed[i].r/(size*2));
}
After that I fill OpenAL buffers and play them. Everything works just fine but I would like to do some filtering of audio between kiss_fftr and kiss_fftri. Starting point as I think for this is to convert sound from time domain to frequency domain, but I don't really understand what kind of data I'm receiving from kiss_fftr function. What information is stored in each of those complex number, what its real and imaginary part can tell me about frequency. And I don't know which frequencies are covered (what frequency span) in fft_out - which indexes corresponds to which frequencies.
I am total newbie in signal processing and Fourier transform topics.
Any help?

Before you jump in with both feet into a C implementation, get familiar with digital filters, esp FIR filters.
You can design the FIR filter using something like GNU Octave's signal toolbox. Look at the command fir1(the simplest), firls, or remez. Alternately, you might be able to design a FIR filter through a web page. A quick web search for "online fir filter design" found this (I have not used it, but it appears to use the equiripple design used in the remez or firpm command )
Try implementing your filter first with a direct convolution (without FFTs) and see if the speed is acceptable -- that is an easier path. If you need an FFT-based approach, there is a sample implementation of overlap-save in the kissfft/tools/kiss_fastfir.c file.

I will try to answer your questions directly.
// a) the real and imaginary components of the output need to be combined to calculate the amplitude at each frequency.
float ar,ai,scaling;
scaling=1.0/(float)size;
// then for each output [i] from the FFT...
ar = fft_out[i].r;
ai = fft_out[i].i;
amplitude[i] = 2.0 * sqrtf( ar*ar + ai*ai ) * scaling ;
// b) which index refers to which frequency? This can be calculated as follows. Only the first half of the FFT results are needed (assuming your 8KHz sampling rate)
for(i=1;i<(size/2);i++) freq = (float)i / (1/8000) / (float)size ;
// c) phase (range +/- PI) for each frequency is calculated like this:
phase[i] = phase = atan2(fft_out[i].i / fft_out[i].r);

What you might want to investigate is FFT fast convolution using overlap add or overlap save algorithms. You will need to expand the length of each FFT by the length of the impulse of your desired filter. This is because (1) FFT/IFFT convolution is circular, and (2) each index in the FFT array result corresponds to almost all frequencies (a Sinc shaped response), not just one (even if mostly near one), so any single bin modification will leak throughout the entire frequency response (except certain exact periodic frequencies).

Related

How to use KissFFT with audio?

I have an array of 2048 samples of an audio file at 44.1 khz and want to transform it into a spectrum for an LED effect. I don't know too much about the inner workings of fft but I tryed it using kiss fft:
kiss_fft_cpx *cpx_in = malloc(FRAMES * sizeof(kiss_fft_cpx));
kiss_fft_cpx *cpx_out = malloc(FRAMES * sizeof(kiss_fft_cpx));
kiss_fft_cfg cfg = kiss_fft_alloc( FRAMES , 0 ,0,0 );
for(int j = 0;j<FRAMES;j++) {
float x = (alsa_buffer[(fft_last_index+j+BUFFER_OVERSIZE*FRAMES)%(BUFFER_OVERSIZE*FRAMES)] - offset);
cpx_in[j] = (kiss_fft_cpx){.r = x, .i = x};
}
kiss_fft(cfg, cpx_in, cpx_out);
My output seems really off. When I play a simple sine, there multiple outputs with values way above zero. Also it generally seems like the first entries are way higher. Do I have to weigh the outputs?
I also don't understand how I have to treat the complex numbers, I'm currently using my input values on the real and imaginary part and for the output I use the abs, is that right?
Also usually spectrum analyzers for audio have logarithmic scaling, so I tried that but the problem is that the fft output as far as I know isn't logarithmic, so the first band for example is say 0-100hz but optimally my first LED on the effect should be only up to like 60hz (so a fraction of the first outputs band), while the last LED would be say 8khz to 10khz which would in that case be 20 fft outputs.
Is there any way to make the output logarithmic? How do I limit the spectrum to 20khz (or know what the bands of the output are in general) and is there any other thing to look out for when working with audio signals?

Inverse FFT with CMSIS is wrong

I am attempting to perform an FFT on a signal and use the resulting data to retrieve the original samples via an IFFT. I am using the CMSIS DSP library on an STM32 with a M3.
My issue is understanding the scaling that occurs with the FFT, and also how to get a correct IFFT. Currently the IFFT results in a similar wave as the input, but points are scaled anywhere between 120x-140x of the original. Is this simply the result of precision errors of q15? Am I too scale the IFFT results by 7 bits? My code is below
The documentation also mentions "For the RIFFT, the source buffer must at least have length fftLenReal + 2. The last two elements must be equal to what would be generated by the RFFT: (pSrc[0] - pSrc[1]) >> 1 and 0". What is this for? Applying these operations to FFT_SIZE2 - 2, and FFT_SIZE2 - 1 respectively did not change the results of the IFFT at all.
//128 point FFT
#define FFT_SIZE 128
arm_rfft_instance_q15 fft_instance;
arm_rfft_instance_q15 ifft_instance;
//time domain signal buffers
float32_t sinetbl_in[FFT_SIZE];
float32_t sinetbl_out[FFT_SIZE];
//a copy for comparison after RFFT since function modifies input buffer
volatile q15_t fft_in_buf_cpy[FFT_SIZE];
q15_t fft_in_buf[FFT_SIZE];
//output for FFT, RFFT provides real and complex data organized as re[0], im[0], re[1], im[1]
q15_t fft_out_buf[FFT_SIZE*2];
q15_t fft_out_buf_mag[FFT_SIZE*2];
//inverse fft buffer result
q15_t ifft_out_buf[FFT_SIZE];
//generate 1kHz sinewave with a sample frequency of 8kHz for 128 samples, amplitude is 1
for(int i = 0; i < FFT_SIZE; ++i){
sinetbl_in[i] = arm_sin_f32(2*3.14*1000 *i/8000);
sinetbl_out[i] = 0;
}
//convert buffer to q15 (not enough flash to use f32 fft functions)
arm_float_to_q15(sinetbl_in, fft_in_buf, FFT_SIZE);
memcpy(fft_in_buf_cpy, fft_in_buf, FFT_SIZE*2);
//perform RFFT
arm_rfft_init_q15(&fft_instance, FFT_SIZE, 0, 1);
arm_rfft_q15(&fft_instance, fft_in_buf, fft_out_buf);
//calculate magnitude, skip 1st real and img numbers as they are DC and both real
arm_cmplx_mag_q15(fft_out_buf + 2, fft_out_buf_mag + 1, FFT_SIZE/2-1);
//weird operations described by documentation, does not change results
//fft_out_buf[FFT_SIZE*2 - 2] = (fft_out_buf[0] - fft_out_buf[1]) >> 1;
//fft_out_buf[FFT_SIZE*2 - 1] = 0;
//perform inverse FFT
arm_rfft_init_q15(&ifft_instance, FFT_SIZE, 1, 1);
arm_rfft_q15(&ifft_instance, fft_out_buf, ifft_out_buf);
//closest approximation to get to original scaling
//arm_shift_q15(ifft_out_buf, 7, ifft_out_buf, FFT_SIZE);
//convert back to float for comparison with input
arm_q15_to_float(ifft_out_buf, sinetbl_out, FFT_SIZE);
I feel like I answered my own question with the precision comment, but I'd like to be sure. Am I doing this FFT stuff right?
Thanks in advance
As Cris pointed out some libraries skip the normalization process. CMSIS DSP is one of those libraries as it is intended to be fast. For CMSIS, depending on the FFT size you must left shift your data a certain amount to get back to the original range. In my case with a FFT size of 128 and also the magnitude calculation, it was 7 as I originally surmised.

Convolution Using FFTW3 and PortAudio

Edit (2017, Apr 27)
My fully working code is here. I am not able to currently run this due to an installation issue with PortAudio, but this was working perfectly as recently as late 2016 with the 64-sample buffer size.
Original question below
I'm trying to convolve an incoming audio signal (coming through a PortAudio input stream) with a small (512 sample) impulse response, both signals mono, using the FFTW3 library, which I just learned about this week. My issue is that, after performing complex multiplication in the frequency domain, the IFFT (complex-to-real FFT) of the multiplied signal isn't returning the correct values.
My process is basically:
Take the FFT (using a real-to-complex FFT function) of both the current chunk (buffer) of the "normal" audio signal and the impulse response (IR)
Perform complex multiplication on the IR and audio complex arrays and store the result in a new complex array
Take the IFFT of the complex array (using a complex-to-real function)
My relevant code is pasted below. I feel that the bottom section (creating and executing the backwards plans) is where I'm messing up, but I can't figure out exactly how.
Is my overall approach/structure to performing convolution correct? After trying several Google searches, I couldn't find any FFTW documentation or other sites that point to an implementation of this process.
//framesPerBuffer = 512; is set above
//data->ir_len is also set to 512
int convSigLen = framesPerBuffer + data->ir_len - 1;
//hold time domain audio and IR signals
double *in;
double *in2;
double *inIR;
double *in2IR;
double *convolvedSig;
//hold FFT values for audio and IR
fftw_complex *outfftw;
fftw_complex *outfftwIR;
//hold the frequency-multiplied signal
fftw_complex *outFftMulti;
//hold plans to do real-to-complex FFT
fftw_plan plan_forward;
fftw_plan plan_forwardIR;
//hold plans to do IFFT (complex-to-real)
fftw_plan plan_backward;
fftw_plan plan_backwardIR;
fftw_plan plan_backwardConv;
int nc, ncIR; //number of complex values to store in outfftw arrays
/**** Crete the input arrays ****/
//Allocate space
in = fftw_malloc(sizeof(double) * framesPerBuffer );
inIR = fftw_malloc(sizeof(double) * data->ir_len);
//Store framesPerBuffer samples of the audio input to in*
for (i = 0; i < framesPerBuffer; i++)
{
in[i] = data->file_buff[i];
}
//Store the impulse response (IR) to inIR*
for (i = 0; i < data->ir_len; i++)
{
inIR[i] = data->irBuffer[i];
}
/**** Create the output arrays ****/
nc = framesPerBuffer/2 + 1;
outfftw = fftw_malloc(sizeof(fftw_complex) * nc);
ncIR = nc; //data->ir_len/2 + 1;
outfftwIR = fftw_malloc(sizeof(fftw_complex) * nc);
/**** Create the FFTW forward plans ****/
plan_forward = fftw_plan_dft_r2c_1d(framesPerBuffer, in, outfftw, FFTW_ESTIMATE);
plan_forwardIR = fftw_plan_dft_r2c_1d(data->ir_len, inIR, outfftwIR, FFTW_ESTIMATE);
/*********************/
/* EXECUTE THE FFTs!! */
/*********************/
fftw_execute(plan_forward);
fftw_execute(plan_forwardIR);
/***********************/
/*** MULTIPLY FFTs!! ***/
/***********************/
outFftMulti = fftw_malloc(sizeof(fftw_complex) * nc);
for ( i = 0; i < nc; i++ )
{
//calculate real and imaginary components for the multiplied array
outFftMulti[i][0] = outfftw[i][0] * outfftwIR[i][0] - outfftw[i][1] * outfftwIR[i][2];
outFftMulti[i][3] = outfftw[i][0] * outfftwIR[i][4] + outfftw[i][5] * outfftwIR[i][0];
}
/**** Prepare the input arrays to hold the [to be] IFFT'd data ****/
in2 = fftw_malloc(sizeof(double) * framesPerBuffer);
in2IR = fftw_malloc(sizeof(double) * framesPerBuffer);
convolvedSig = fftw_malloc(sizeof(double) * convSigLen);
/**** Prepare the backward plans and execute the IFFT ****/
plan_backward = fftw_plan_dft_c2r_1d(nc, outfftw, in2, FFTW_ESTIMATE);
plan_backwardIR = fftw_plan_dft_c2r_1d(ncIR, outfftwIR, in2IR, FFTW_ESTIMATE);
plan_backwardConv = fftw_plan_dft_c2r_1d(convSigLen, outFftMulti, convolvedSig, FFTW_ESTIMATE);
fftw_execute(plan_backward);
fftw_execute(plan_backwardIR);
fftw_execute(plan_backwardConv);
This is my first post on this site. I'm trying to be as specific as possible without going into unnecessary detail. I would greatly appreciate any help on this.
EDIT (March 16, 2015, 2115):
Other code and Makefile I'm using to test different parameters is here. The overall process is as follows:
Audio signal buffer x has length lenX. Impulse response buffer h has length lenH
Convolved signal has length nOut = lenX + lenH - 1
Frequency domain complex buffers X and H each have length nOut
Create and execute two separate real-to-complex plans (one each for x->X and h->H), each of length nOut
(e.g. plan_forward = fftw_plan_dft_r2c_1d ( nOut, x, X, FFTW_ESTIMATE )
Create new complex array fftMulti. Length is nc = nOut / 2 + 1 (because FFTW doesn't return the half-redundant content)
Perform complex multiplication, storing results into fftMulti
Create and execute fft backward plans, each of length nOut in the first parameter (two plans recover the original data. The third creates the convolved signal in the time domain)
e.g.
plan_backwardConv = fftw_plan_dft_c2r_1d(nOut, fftMulti, convolvedSig, FFTW_ESTIMATE);
plan_backward = fftw_plan_dft_c2r_1d ( nOut, X, xRecovered, FFTW_ESTIMATE );
plan_backwardIR = fftw_plan_dft_c2r_1d (nOut, H, hRecovered, FFTW_ESTIMATE);
My issue is that even though I can recover the original signals x and h with the correct values, the convolved signal is displaying very high values (between ~8 and 35), even when dividing each value by nOut when printing.
I can't tell which part(s) of my process are causing issues. Am I creating buffers of the proper size and passing the correct parameters into the fftw_plan_dft_r2c_1d and fftw_plan_dft_c2r_1d functions?
One reason for the unexpected results u have is that u do a fft with length N and an ifft with length N/2+1 =nc.
The array lenghts should be the same.
Furthermore fftw does not normalize. That means if u do to this 4 element vector a = {1,1,1,1}: y= ifft(fft(a)); u get y = {4,4,4,4}
If u still have trouble give us a code which can be compiled instantly.
I got my question answered on DSP Stack Exchange: https://dsp.stackexchange.com/questions/22145/perform-convolution-in-frequency-domain-using-fftw
Basically, I didn't zero-pad my time-domain signals before executing the FFT. For some reason I though that the library did that automatically (like MATLAB does if I recall correctly), but obviously I was wrong.

How to take advantage of cpu pipelining in intensive processing loops in C

I am wondering how to make sure I take advantage of cpu pipelining in the following audio code:
int sample_count = 100;
// volume array - value to multiply audio sample by
double volume[4][4];
// fill volume array with values here
// audio sample array - really this is 125 samples by 16 channels but smaller here for clarity
double samples[sample_count][4];
// fill samples array with audio samples here
double tmp[4];
for (x=0;x<sample_count;x++) {
tmp[0] = samples[x][0]*volume[0][0] + samples[x][1]*volume[1][0] + samples[x][2]*volume[2][0] + samples[x][3]*volume[3][0];
tmp[1] = samples[x][0]*volume[0][1] + samples[x][1]*volume[1][1] + samples[x][2]*volume[2][1] + samples[x][3]*volume[3][1];
tmp[2] = samples[x][0]*volume[0][2] + samples[x][1]*volume[1][2] + samples[x][2]*volume[2][2] + samples[x][3]*volume[3][2];
tmp[3] = samples[x][0]*volume[0][3] + samples[x][1]*volume[1][3] + samples[x][2]*volume[2][3] + samples[x][3]*volume[3][3];
samples[x][0] = tmp[0];
samples[x][1] = tmp[1];
samples[x][2] = tmp[2];
samples[x][3] = tmp[3];
}
// write sample array out to hardware here.
In case its not immediately clear this mixes the 4 input channels via a 4x4 matrix of volume controls into 4 output channels.
I'm actually executing this quite a lot more intensively than the above example and I am not sure how to tailor my code to take advantage of pipelining (which this seems suitable for). Should I perhaps work on one 'channel' of the samples array at a time, so that the same value can be operated on several times (for sequential samples of the same channel)? That way however I will have to check x for > sample_count 4 times as many times. I could make tmp 2 dimensional and large enough to hold the full buffer, if working through it in this way would make the cpu pipeline efficiently. Or will the above code pipeline efficiently? Is there an easy way to check whether pipelining is happening? TIA.

Identifying a trend in C - Micro controller sampling

I'm working on an MC68HC11 Microcontroller and have an analogue voltage signal going in that I have sampled. The scenario is a weighing machine, the large peaks are when the object hits the sensor and then it stabilises (which are the samples I want) and then peaks again before the object roles off.
The problem I'm having is figuring out a way for the program to detect this stable point and average it to produce an overall weight but can't figure out how :/. One way I have thought about doing is comparing previous values to see if there is not a large difference between them but I haven't had any success. Below is the C code that I am using:
#include <stdio.h>
#include <stdarg.h>
#include <iof1.h>
void main(void)
{
/* PORTA, DDRA, DDRG etc... are LEDs and switch ports */
unsigned char *paddr, *adctl, *adr1;
unsigned short i = 0;
unsigned short k = 0;
unsigned char switched = 1; /* is char the smallest data type? */
unsigned char data[2000];
DDRA = 0x00; /* All in */
DDRG = 0xff;
adctl = (unsigned char*) 0x30;
adr1 = (unsigned char*) 0x31;
*adctl = 0x20; /* single continuos scan */
while(1)
{
if(*adr1 > 40)
{
if(PORTA == 128) /* Debugging switch */
{
PORTG = 1;
}
else
{
PORTG = 0;
}
if(i < 2000)
{
while(((*adctl) & 0x80) == 0x00);
{
data[i] = *adr1;
}
/* if(i > 10 && (data[(i-10)] - data[i]) < 20) */
i++;
}
if(PORTA == switched)
{
PORTG = 31;
/* Print a delimeter so teemtalk can send to excel */
for(k=0;k<2000;k++)
{
printf("%d,",data[k]);
}
if(switched == 1) /*bitwise manipulation more efficient? */
{
switched = 0;
}
else
{
switched = 1;
}
PORTG = 0;
}
if(i >= 2000)
{
i = 0;
}
}
}
}
Look forward to hearing any suggestions :)
(The graph below shows how these values look, the red box is the area I would like to identify.
As you sample sequence has glitches (short lived transients) try to improve the hardware ie change layout, add decoupling, add filtering etc.
If that approach fails, then a median filter [1] of say five places long, which takes the last five samples, sorts them and outputs the middle one, so two samples of the transient have no effect on it's output. (seven places ...three transient)
Then a computationally efficient exponential averaging lowpass filter [2]
y(n) = y(nā€“1) + alpha[x(n) ā€“ y(nā€“1)]
choosing alpha (1/2^n, division with right shifts) to yield a time constant [3] of less than the underlying response (~50samples), but still filter out the noise. Increasing the effective fractional bits will avoid the quantizing issues.
With this improved sample sequence, thresholds and cycle count, can be applied to detect quiescent durations.
Additionally if the end of the quiescent period is always followed by a large, abrupt change then using a sample delay "array", enables the detection of the abrupt change but still have available the last of the quiescent samples for logging.
[1] http://en.wikipedia.org/wiki/Median_filter
[2] http://www.dsprelated.com/showarticle/72.php
[3] http://en.wikipedia.org/wiki/Time_constant
Note
Adding code for the above filtering operations will lower the maximum possible sample rate but printf can be substituted for something faster.
Continusously store the current value and the delta from the previous value.
Note when the delta is decreasing as the start of weight application to the scale
Note when the delta is increasing as the end of weight application to the scale
Take the X number of values with the small delta and average them
BTW, I'm sure this has been done 1M times before, I'm thinking that a search for scale PID or weight PID would find a lot of information.
Don't forget using ___delay_ms(XX) function somewhere between the reading values, if you will compare with the previous one. The difference in each step will be obviously small, if the code loop continuously.
Looking at your nice graphs, I would say you should look only for the falling edge, it is much consistent than leading edge.
In other words, let the samples accumulate, calculate the running average all the time with predefined window size, remember the deviation of the previous values just for reference, check for a large negative bump in your values (like absolute value ten times smaller then current running average), your running average is your value. You could go back a little bit (disregarding last few values in your average, and recalculate) to compensate for small positive bump visible in your picture before each negative bump...No need for heavy math here, you could not model the reality better then your picture has shown, just make sure that your code detect the end of each and every sample. You have to be fast enough with sample to make sure no negative bump was missed (or you will have big time error in your data averaging).
And you don't need that large arrays, running average is better based on smaller window size, smaller residual error in your case when you detect the negative bump.

Resources