I'm developing a plugin in C that detects audio peaks using gstreamer-1.0. I don't really have any knowledge about audio programming and so far, my plugin can only detect sound impulsion (if there is no audio, nothing happens, if there is sound, I print the energy).
Here is the sample code of my (really simple) algorithm.
gfloat energy_of_sample(guint8 array[], int num_elements, gfloat *p)
{
gfloat energy=0.f;
for(int i=0 ; i<num_elements ; i++)
{
energy += array[i]*array[i]/4096;
if (*p < (array[i]*array[i]/4096)) *p = array[i]*array[i]/4096;
}
return energy/num_elements;
}
static void
audio_process(GstBPMdetect *filter, GstBuffer *music)
{
GstMapInfo info;
gint threshold = 6;
// gets the information of the buffer and put it in "info"
gst_buffer_map (music, &info, GST_MAP_READ);
// calculate the average of the buffer data
gfloat energy = 0;
gfloat peak = 0;
energy = energy_of_sample(info.data, info.size, &peak);
if (energy >= threshold )g_print("energy : %f , peak : %f \n", energy,peak);
}
If the audio source is, for exemple, a simple hand clap or kick drum only, my plugin detects the audio peak just fine. But when the audio source is a song, my plugin is constantly detecting sound impulsion (always over the threshold).
My solution for that issue was to add a low-pass filter so only bass sound would be detected. By doing that, I'm cutting every part of the song containing high frequencies only and this is not what I want (will not work for high frequency beats).
So my question is : Does anyone have an idea on how to detect beats (audio impulsion) without cutting the high frequencies ? Tanks to everyone and hope that my question is clear !
You should measure energy not peak. There is a great method to calculate energy. Use variance formula from statistics. You need count square of sum and sum of squares for all points within an interval of 20 - 50 milliseconds. Using the variance formula you get the energy. The formula is here
http://staff.icdi.wvu.edu/djhstats/variance1.JPG
As an alternative you may use the existing plugin level in the set of good plugins.
Related
I have an array of 2048 samples of an audio file at 44.1 khz and want to transform it into a spectrum for an LED effect. I don't know too much about the inner workings of fft but I tryed it using kiss fft:
kiss_fft_cpx *cpx_in = malloc(FRAMES * sizeof(kiss_fft_cpx));
kiss_fft_cpx *cpx_out = malloc(FRAMES * sizeof(kiss_fft_cpx));
kiss_fft_cfg cfg = kiss_fft_alloc( FRAMES , 0 ,0,0 );
for(int j = 0;j<FRAMES;j++) {
float x = (alsa_buffer[(fft_last_index+j+BUFFER_OVERSIZE*FRAMES)%(BUFFER_OVERSIZE*FRAMES)] - offset);
cpx_in[j] = (kiss_fft_cpx){.r = x, .i = x};
}
kiss_fft(cfg, cpx_in, cpx_out);
My output seems really off. When I play a simple sine, there multiple outputs with values way above zero. Also it generally seems like the first entries are way higher. Do I have to weigh the outputs?
I also don't understand how I have to treat the complex numbers, I'm currently using my input values on the real and imaginary part and for the output I use the abs, is that right?
Also usually spectrum analyzers for audio have logarithmic scaling, so I tried that but the problem is that the fft output as far as I know isn't logarithmic, so the first band for example is say 0-100hz but optimally my first LED on the effect should be only up to like 60hz (so a fraction of the first outputs band), while the last LED would be say 8khz to 10khz which would in that case be 20 fft outputs.
Is there any way to make the output logarithmic? How do I limit the spectrum to 20khz (or know what the bands of the output are in general) and is there any other thing to look out for when working with audio signals?
Currently I'm using Arduino for my project and what I want is to have an array that stores an array of sensors. I do understand that there's limited resource to be used for a dynamic array. But by limiting the number of items in the array and uses struct data instead of creating a class, I managed to cut the SRAM cost. So without further ado, here's my code :
#define MAX_SENSOR 6
namespace Sensors
{
typedef struct
{
byte SlavePin;
byte LDRPin;
byte RedPin;
byte BluePin;
} Sensor;
Sensor _sensors[MAX_SENSOR];
byte _len = 0;
void Add(Sensor s)
{
if (_len > MAX_SENSOR)
return;
_len++;
_sensors[_len] = s;
}
Sensor Get(byte index)
{
return _sensors[index];
}
}
And here's how I use it.
#include "Sensors.h"
void setup()
{
for (int i = 0; i < 6; i++)
{
Sensors::Sensor sen;
sen.SlavePin = 0;
Sensors::Add(sen);
}
Serial.print("Length = ");
Serial.println(Sensors::_len);
for (int j = 0; j < Sensors::_len; j++)
{
Serial.print(j);
Serial.print(" = ");
Serial.println(Sensors::Get(i).SlavePin);
}
}
void loop() { //Nothing goes here }
This code works and it compiles successfully. But when I run it, the serial window shows this :
Length : 6
Sensor 0:0
Sensor 1:0
Sensor 2:1
Sensor 3:2
Sensor 4:3
Sensor 5:4
Apparently, the first and the second item in the array has the same value and honestly, I don't know why.
Here's the output that I'm expecting :
Length : 6
Sensor 0:0
Sensor 1:1
Sensor 2:2
Sensor 3:3
Sensor 4:4
Sensor 5:5
Any help would be very appreciated. And BTW, I'm sorry if this kind of thread had already existed.
The first call to Add() places the structure at index 1:
byte _len = 0;
void Add(Sensor s)
{
if (_len > MAX_SENSOR)
return;
_len++;
// On first call _len will be 1
_sensors[_len] = s;
}
I understand the design intent of this code, but consider that this is a wasteful approach for a microcontroller.
Implementing Add() increases the code size. A library for a desktop computer would surely rate the code size a fair trade off for safety. A library for a microcontroller is harder to rate as good use of scarce memory.
Implementing Get() increases code size and execution time. Again, this seems like a good design for typical desktop environment and a library that you want to be safe. On a microcontroller, this is wrong.
The factor I see as key decider of good or bad is the permanent cost versus a one time savings. The safe version of Sensor costs code space and execution time on every system deployed and every second the program is running. The benefit is only the first day you are run and debug the program.
I'm working on an MC68HC11 Microcontroller and have an analogue voltage signal going in that I have sampled. The scenario is a weighing machine, the large peaks are when the object hits the sensor and then it stabilises (which are the samples I want) and then peaks again before the object roles off.
The problem I'm having is figuring out a way for the program to detect this stable point and average it to produce an overall weight but can't figure out how :/. One way I have thought about doing is comparing previous values to see if there is not a large difference between them but I haven't had any success. Below is the C code that I am using:
#include <stdio.h>
#include <stdarg.h>
#include <iof1.h>
void main(void)
{
/* PORTA, DDRA, DDRG etc... are LEDs and switch ports */
unsigned char *paddr, *adctl, *adr1;
unsigned short i = 0;
unsigned short k = 0;
unsigned char switched = 1; /* is char the smallest data type? */
unsigned char data[2000];
DDRA = 0x00; /* All in */
DDRG = 0xff;
adctl = (unsigned char*) 0x30;
adr1 = (unsigned char*) 0x31;
*adctl = 0x20; /* single continuos scan */
while(1)
{
if(*adr1 > 40)
{
if(PORTA == 128) /* Debugging switch */
{
PORTG = 1;
}
else
{
PORTG = 0;
}
if(i < 2000)
{
while(((*adctl) & 0x80) == 0x00);
{
data[i] = *adr1;
}
/* if(i > 10 && (data[(i-10)] - data[i]) < 20) */
i++;
}
if(PORTA == switched)
{
PORTG = 31;
/* Print a delimeter so teemtalk can send to excel */
for(k=0;k<2000;k++)
{
printf("%d,",data[k]);
}
if(switched == 1) /*bitwise manipulation more efficient? */
{
switched = 0;
}
else
{
switched = 1;
}
PORTG = 0;
}
if(i >= 2000)
{
i = 0;
}
}
}
}
Look forward to hearing any suggestions :)
(The graph below shows how these values look, the red box is the area I would like to identify.
As you sample sequence has glitches (short lived transients) try to improve the hardware ie change layout, add decoupling, add filtering etc.
If that approach fails, then a median filter [1] of say five places long, which takes the last five samples, sorts them and outputs the middle one, so two samples of the transient have no effect on it's output. (seven places ...three transient)
Then a computationally efficient exponential averaging lowpass filter [2]
y(n) = y(nā1) + alpha[x(n) ā y(nā1)]
choosing alpha (1/2^n, division with right shifts) to yield a time constant [3] of less than the underlying response (~50samples), but still filter out the noise. Increasing the effective fractional bits will avoid the quantizing issues.
With this improved sample sequence, thresholds and cycle count, can be applied to detect quiescent durations.
Additionally if the end of the quiescent period is always followed by a large, abrupt change then using a sample delay "array", enables the detection of the abrupt change but still have available the last of the quiescent samples for logging.
[1] http://en.wikipedia.org/wiki/Median_filter
[2] http://www.dsprelated.com/showarticle/72.php
[3] http://en.wikipedia.org/wiki/Time_constant
Note
Adding code for the above filtering operations will lower the maximum possible sample rate but printf can be substituted for something faster.
Continusously store the current value and the delta from the previous value.
Note when the delta is decreasing as the start of weight application to the scale
Note when the delta is increasing as the end of weight application to the scale
Take the X number of values with the small delta and average them
BTW, I'm sure this has been done 1M times before, I'm thinking that a search for scale PID or weight PID would find a lot of information.
Don't forget using ___delay_ms(XX) function somewhere between the reading values, if you will compare with the previous one. The difference in each step will be obviously small, if the code loop continuously.
Looking at your nice graphs, I would say you should look only for the falling edge, it is much consistent than leading edge.
In other words, let the samples accumulate, calculate the running average all the time with predefined window size, remember the deviation of the previous values just for reference, check for a large negative bump in your values (like absolute value ten times smaller then current running average), your running average is your value. You could go back a little bit (disregarding last few values in your average, and recalculate) to compensate for small positive bump visible in your picture before each negative bump...No need for heavy math here, you could not model the reality better then your picture has shown, just make sure that your code detect the end of each and every sample. You have to be fast enough with sample to make sure no negative bump was missed (or you will have big time error in your data averaging).
And you don't need that large arrays, running average is better based on smaller window size, smaller residual error in your case when you detect the negative bump.
I'm receiving PCM data trough socket connection in packets containing 320 samples. Sample rate of sound is 8000 samples per second. I am doing with it something like this:
int size = 160 * 2;//160;
int isinverse = 1;
kiss_fft_scalar zero;
memset(&zero,0,sizeof(zero));
kiss_fft_cpx fft_in[size];
kiss_fft_cpx fft_out[size];
kiss_fft_cpx fft_reconstructed[size];
kiss_fftr_cfg fft = kiss_fftr_alloc(size*2 ,0 ,0,0);
kiss_fftr_cfg ifft = kiss_fftr_alloc(size*2,isinverse,0,0);
for (int i = 0; i < size; i++) {
fft_in[i].r = zero;
fft_in[i].i = zero;
fft_out[i].r = zero;
fft_out[i].i = zero;
fft_reconstructed[i].r = zero;
fft_reconstructed[i].i = zero;
}
// got my data through socket connection
for (int i = 0; i < size; i++) {
// samples are type of short
fft_in[i].r = samples[i];
fft_in[i].i = zero;
fft_out[i].r = zero;
fft_out[i].i = zero;
}
kiss_fftr(fft, (kiss_fft_scalar*) fft_in, fft_out);
kiss_fftri(ifft, fft_out, (kiss_fft_scalar*)fft_reconstructed);
// lets normalize samples
for (int i = 0; i < size; i++) {
short* samples = (short*) bufTmp1;
samples[i] = rint(fft_reconstructed[i].r/(size*2));
}
After that I fill OpenAL buffers and play them. Everything works just fine but I would like to do some filtering of audio between kiss_fftr and kiss_fftri. Starting point as I think for this is to convert sound from time domain to frequency domain, but I don't really understand what kind of data I'm receiving from kiss_fftr function. What information is stored in each of those complex number, what its real and imaginary part can tell me about frequency. And I don't know which frequencies are covered (what frequency span) in fft_out - which indexes corresponds to which frequencies.
I am total newbie in signal processing and Fourier transform topics.
Any help?
Before you jump in with both feet into a C implementation, get familiar with digital filters, esp FIR filters.
You can design the FIR filter using something like GNU Octave's signal toolbox. Look at the command fir1(the simplest), firls, or remez. Alternately, you might be able to design a FIR filter through a web page. A quick web search for "online fir filter design" found this (I have not used it, but it appears to use the equiripple design used in the remez or firpm command )
Try implementing your filter first with a direct convolution (without FFTs) and see if the speed is acceptable -- that is an easier path. If you need an FFT-based approach, there is a sample implementation of overlap-save in the kissfft/tools/kiss_fastfir.c file.
I will try to answer your questions directly.
// a) the real and imaginary components of the output need to be combined to calculate the amplitude at each frequency.
float ar,ai,scaling;
scaling=1.0/(float)size;
// then for each output [i] from the FFT...
ar = fft_out[i].r;
ai = fft_out[i].i;
amplitude[i] = 2.0 * sqrtf( ar*ar + ai*ai ) * scaling ;
// b) which index refers to which frequency? This can be calculated as follows. Only the first half of the FFT results are needed (assuming your 8KHz sampling rate)
for(i=1;i<(size/2);i++) freq = (float)i / (1/8000) / (float)size ;
// c) phase (range +/- PI) for each frequency is calculated like this:
phase[i] = phase = atan2(fft_out[i].i / fft_out[i].r);
What you might want to investigate is FFT fast convolution using overlap add or overlap save algorithms. You will need to expand the length of each FFT by the length of the impulse of your desired filter. This is because (1) FFT/IFFT convolution is circular, and (2) each index in the FFT array result corresponds to almost all frequencies (a Sinc shaped response), not just one (even if mostly near one), so any single bin modification will leak throughout the entire frequency response (except certain exact periodic frequencies).
The DSP board I am currently using is DSK6416 from Spectrum Digital, and I am implementing a convolution algorithm in C to convolve input voice samples with a pre-recorded impulse response array. The objective is to speak into the microphone, and output the processed effect so we sound like we are speaking in that environment where the impulse response array is obtained.
The challenge I am facing now is doing the convolution live and keep up the pace of the input and output speed of the interrupt function at 8 kHz.
Here is my brain storming idea:
My current inefficient implementation that does not work is as follows:
The interrupt will stop the convolution process, output the index, and resume convolution at 8 kHz, or 1/8kHz seconds.
However, a complete iteration of convolution runs much slower than 1/8kHz seconds. So when the interrupt wants to output the data from the output array, the data is not ready yet.
My ideal implementation for fast pipelining convolution algorithm:
We would have many convolution processes running in the background while outputting the completed ones as time goes on. There will be many pipes running in parallel.
If I use the pipelining approach, we would need to have N = 10000 pipeline processes running in the background...
Now I have the idea down (at least I think I do, I might be wrong), I have no clue how to implement this on the DSK board using C programming language because C does not support object orientation.
The following is the pseudo-code for our C implementation:
#include <stdio.h>
#include "DSK6416_AIC23.h"
Uint32 fs=DSK6416_AIC23_FREQ_48KHZ; //set sampling rate
#define DSK6416_AIC23_INPUT_MIC 0x0015
#define DSK6416_AIC23_INPUT_LINE 0x0011
Uint16 inputsource=DSK6416_AIC23_INPUT_MIC; // select input
//input & output parameters declaration
#define MAX_SIZE 10000
Uint32 curr_input;
Int16 curr_input2;
short input[1];
short impulse[MAX_SIZE ];
short output[MAX_SIZE ];
Int16 curr_output;
//counters declaration
Uint32 a, b, c, d; //dip switch counters
int i, j, k; //convolution iterations
int x; //counter for initializing output;
interrupt void c_int11() //interrupt running at 8 kHz
{
//Reads Input
//Start new pipe
//Outputs output to speaker
}
void main()
{
//Read Impulse.txt into impulse array
comm_intr();
while(1)
{
if (DIP switch pressed)
{
//convolution here (our current inefficient convolution algorithm)
//Need to run multiple of the same process in the background in parallel.
for (int k = 0; k < MAX_SIZE; k++)
{
if (k==MAX_SIZE-1 && i == 0) // special condition overwriting element at i = MAX_SIZE -1
{
output[k] = (impulse[k]*input[0]);
}
else if (k+i < MAX_SIZE) // convolution from i to MAX_SIZE
{
output[k+i] += (impulse[k]*input[0]);
}
else if (k+i-MAX_SIZE != i-1) // convolution from 0 to i-2
{
output[k+i-MAX_SIZE] += (impulse[k]*input[0]);
}
else // overwrite element at i-1
{
output[i-1] = (impulse[k]*input[0]);
}
}
}
else //if DIP switch is not pressed
{
DSK6416_LED_off(0);
DSK6416_LED_off(1);
DSK6416_LED_off(2);
DSK6416_LED_off(3);
j = 0;
curr_output = input[1];
output_sample(curr_output); //outputs unprocessed dry voice
}
} //end of while
fclose(fp);
}
Is there a way to implement pipeline in C code to compile on the hardware DSP board so we can run multiple convolution iterations in the background all at the same time?
I drew some pictures, but I am new to this board so I can't post images.
Please let me know if you need my pictorial ideas to help you help me~
Any help on how to implement this code is very much appreciated !!
You probably need to process data in chunks of some N samples. While one chunk is being I/O'd in an DAC/ADC interrupt handler, another one is being processed somewhere in main(). The main thing here is to make sure your processing of a chunk of N samples takes less time than receiving/transmitting N samples.
Here's what it may look like in time (all things in every step (except step 1) happen "in parallel"):
buf1=buf3=zeroes, buf2=anything
ISR: DAC sends buf1, ADC receives buf2; main(): processes buf3
ISR: DAC sends buf3, ADC receives buf1; main(): processes buf2
ISR: DAC sends buf2, ADC receives buf3; main(): processes buf1
Repeat indefinitely from step 2.
Also, you may want to implement your convolution in assembly for extra speed. I'd look at some TI app notes or what not for an implementation. Perhaps it's available in some library too.
You may also consider doing convolution via Fast Fourier Transform.
Your DSP only has so many CPU cycles available per second. You need to analyze your algorithm to determine how many CPU cycles it takes to process each sample on average. That needs to be less that the number of CPU cycles between samples. No amount of pipelining or object orientation will help if you don't have an algorithm that completes in a small enough number of cycles per sample on average.