FFTW plan segmentation fault - c

I am using FFTW3 to perform an fft on multiple columns of data (i.e multi channel audio, where I desire the transform of each channel). This is working fine on OSX but porting the code over to linux gives me a seg fault.
const int fftwFlags = FFTW_PRESERVE_INPUT|FFTW_PATIENT;
struct fft {
fftw_complex **complexSig;
double **realSig;
fftw_plan forwardR2C;
int fftLen;
int numChan;
}
void create FFT(struct fft *fft) {
int bufLen = 1024;
int numChan = 4;
fft->fftLen = bufLen;
fft->numChan = numChan;
fft->realSig = fftw_malloc(sizeof(double *) * numChan);
for(int i = 0; i < numChan; i++) {
fft->realSig[i] = fftw_malloc(sizeof(double) * bufLen);
}
fft->complexSig = fftw_malloc(sizeof(fftw_complex *) * numChan);
for(int i = 0; i < numChan; i++) {
fft->complexSig[i] = fftw_malloc(sizeof(fftw_complex) * bufLen);
}
fft->forwardR2C = fftw_plan_many_dft_r2c(1, &fft->fftLen, fft->numChan, *fft->realSig, &fft->fftLen, 1, fft->fftLen, *fft->complexSig, &fft->fftLen, 1, fft->fftLen, fftwFlags);
}
valgrind is showing that the fftw planner is attempting to access past the end of this array (by 8 bytes, one sample), resulting in a segmentation fault. When increasing the amount of memory allocated to realSig to bufLen * 2 this error is absent.
I am sure this is an error in how I am telling FFTW to read my data, but I can not spot it!

You seem to be assuming that successive malloc calls will be contiguous, which of course they are unlikely to be (you probably just "got lucky" on OS X). You can fix this quite easily though by making one large allocation, e.g.
void createFFT(struct fft *fft)
{
const int bufLen = 1024;
const int numChan = 4;
fft->fftLen = bufLen;
fft->numChan = numChan;
fft->realSig = fftw_malloc(sizeof(double *) * numChan);
// array of numChan pointers
fft->realSig[0] = fftw_malloc(sizeof(double) * numChan * bufLen);
// one large contiguous block of size `numChan * bufLen`
for(int i = 1; i < numChan; i++) // init pointers
{
fft->realSig[i] = fft->realSig[i - 1] + bufLen;
}
// ...
}
Note: when you're done you just need to:
fftw_free(fft->realSig[0]);

Related

Segfault after refactoring nested loops

I have some MATLAB code from a digital audio course that I've ported to C. Given an array of numeric data (for example, PCM audio encoded as double-precision floating-point), produce an array of data segments of a specified width and which overlap each other by a specified amount. Here's the relevant code.
typedef struct AudioFramesDouble {
const size_t n, // number of elements in each frame
num_frames;
double* frames[];
} AudioFramesDouble;
/*
* Produce a doubly-indexed array of overlapping substrings (a.k.a windows, frames,
* segments ...) from a given array of data.
*
* x: array of (i.e., pointer to) data
* sz: number of data elements to consider
* n: number of elements in each frame
* overlap: each frame overlaps the next by a factor of 1 - 1/overlap.
*/
AudioFramesDouble* audio_frames_double(register const double x[], const size_t sz, const unsigned n, const unsigned overlap) {
// Graceful exit on nullptr
if (!x) return (void*) x;
const double hop_d = ((double) n) / ((double) overlap); // Lets us "hop" to the start of the next frame.
const unsigned hop = (unsigned) ceil(hop_d);
const unsigned remainder = (unsigned) sz % hop;
const double num_frames_d = ((double) sz) / hop_d;
const size_t num_frames = (size_t) (remainder == 0
? floor(num_frames_d) // paranoia about floating point errors
: ceil(num_frames_d)); // room for zero-padding
const size_t total_samples = (size_t) n * num_frames;
AudioFramesDouble af = {.n = n, .num_frames = num_frames};
// We want afp->frames to appear as (double*)[num_frames].
AudioFramesDouble* afp = malloc((sizeof *afp) + (sizeof (double*) * num_frames));
if (!afp) return afp;
memcpy(afp, &af, sizeof af);
for (size_t i = 0; i < num_frames; ++i) {
/* Allocate zero-initialized space at the start of each frame. If this
fails, free up the memory and vomit a null pointer. */
afp->frames[i] = calloc(n, sizeof(double));
if (!afp->frames[i]) {
double* p = afp->frames[i];
for (long ii = ((long)i) - 1; 0 <= ii; ii--) {
free(afp->frames[--i]);
}
free(afp);
return (void*) p;
}
for (size_t j = 0, k; j < n; ++j) {
if (sz <= (k = (i*hop) + j)) break;
afp->frames[i][j] = x[k];
}
}
return afp;
}
This performs as expected. I wanted to optimize the nested FOR to the following
for (size_t i = 0, j = 0, k; i < num_frames; (j == n - 1) ? (j = 0,i++) : ++j) {
// If we've reached the end of the frame, reset j to zero.
// Then allocate the next frame and check for null.
if (j == 0 && !!(afp->frames[i] = calloc(n, sizeof(double)))) {
double* p = afp->frames[i];
for (long ii = ((long)i) - 1; 0 <= ii; ii--) {
free(afp->frames[--i]);
}
free(afp);
return (void*) p;
}
if (sz <= (k = (i*hop) + j)) break;
afp->frames[i][j] = x[k];
}
This actually compiles and runs just fine; but in my testing, when I try to access the last frame as in
xFrames->frames[xFrames->num_frames-1],
I get a segmentation fault. What's going on here? Am I neglecting an edge case in my loop? I've been looking over the code for awhile, but I might need a second set of eyes. Sorry if the answer is glaringly obvious; I'm a bit of a C novice.
P.S. I'm a fan of branchless programming, so if anyone has tips for cutting out those IFs, I'm all ears. I was using ternary operators before, but reverted to IFs for readability in debugging.
Remember that the logical operator && and || does short-circuit evaluation.
That means if j != 0 then you won't actually call calloc, and you'll have an invalid pointer in afp->frames[i].

C Keep Getting Double Free, despite trying to free in same form as allocation

Hey I'm trying to do a simple machine learning application for school but I keep getting double free for some reason I cannot even fathom.
float * evaluate(Network net,float * in)
{
int i,j;
float * out;
Neuron cur_neu;
for(i=0,j=0;i<net.n_lay;i++) j = net.lay_sizes[i]>j?net.lay_sizes[i]:j; //Calculating the maximum lay size for output storage
out = (float *) malloc(j*sizeof(float));
for(i=0;i<net.n_lay;i++) //Cycling through layers
{
for(j=0;j<net.lay_sizes[i];j++) //Cycling through Neurons
{
cur_neu=net.matrix[i][j];
out[j] = cur_neu.af(cur_neu.w,in,net.lay_sizes[i-1]); //Storing each answer in out
}
for(j=0;j<net.lay_sizes[i];j++) in[j] = out[j]; //Transfering answers to in
}
return out;
}
float loss(Network net, float **ins_orig, int t_steps)
{
float **profecies;
float st = .5f;
int d_steps = 4;
int t, i, j;
int out_size = net.lay_sizes[net.n_lay - 1];
int in_size = net.lay_sizes[0];
float out = 0.0f;
float **ins;
/*
d_steps = Divination Steps: Number of time steps forward the network has to predict.
The size of the output layer must be d_steps*#ins (deconsidering any conceptual i/os)
t_steps = Total of Steps: Total number of time steps to simulate.
*/
//Copying ins
ins = (float **)malloc(t_steps * sizeof(float *));
for (i = 0; i < t_steps; i++) //I allocate memory for and copy ins_orig to ins here
{
ins[i] = (float *)malloc(in_size * sizeof(float));
for (j = 0; j < in_size; j++)
ins[i][j] = ins_orig[i][j];
}
//
profecies = (float **)malloc(t_steps * sizeof(float *));
for (t = 0; t < t_steps; t++)
{
profecies[t] = evaluate(net, ins[t]);
/*
Profecy 0:
[[a1,b1,c1,d1]
[e1,f1,g1,h1]
[i1,j1,k1,l1]]
Profecy 1:
[[e2,f2,g2,h2]
[i2,j2,k2,l2]
[m2,n2,o2,q2]]
Verification for:
t=0:
loss+= abs(a1-ins[t][0]+b2-ins[t][1]...)
t=1:
t=0:
loss+= abs(e1-ins[t][0]+f2-ins[t][1]...)
*/
for (i = 0; i < d_steps; i++) //i is distance of prediction
{
if (i <= t) // stops negative profecy indexing
{
for (j = 0; j < in_size; j++)
{
out += (ins[t][j] - profecies[t-i][j+in_size*i]) * (ins[t][j] - profecies[t-i][j+in_size*i]) * (1 + st*i); //(1+st*i) The further the prediction, the bigger reward
}
}
}
}
//Free ins
for (i = 0; i < t_steps; i++) //I try to free it here, but to no avail
{
free(ins[i]);
}
free(ins);
return out;
}
I realize it's probably something very obvious but, I can't figure it out for the life of me and would appreciate the help.
Extra details that probably aren't necessary:
evaluate just passes the input to the network (stored in ins) and returns the output
both inputs and outputs are stored in float "matrixes"
Edit: Added evaluate
In your loss() you allocate the same number of floats for each ins:
ins[i] = (float *)malloc(in_size * sizeof(float));
In your evaluate() you calculate the longest lay_size, indicating that it may NOT be net.lay_sizes[0]:
for(i=0,j=0;i<net.n_lay;i++) j = net.lay_sizes[i]>j?net.lay_sizes[i]:j; //Calculating the maximum lay size for output storage
Then you are writing out-of-bounds here:
for(j=0;j<net.lay_sizes[i];j++) in[j] = out[j]; //Transfering answers to in
From that point, your memory is corrupted.

figure out why my RC4 Implementation doesent produce the correct result

Ok I am new to C, I have programmed in C# for around 10 years now so still getting used to the whole language, Ive been doing great in learning but im still having a few hickups, currently im trying to write a implementation of RC4 used on the Xbox 360 to encrypt KeyVault/Account data.
However Ive run into a snag, the code works but it is outputting the incorrect data, I have provided the original c# code I am working with that I know works and I have provided the snippet of code from my C project, any help / pointers will be much appreciated :)
Original C# Code :
public struct RC4Session
{
public byte[] Key;
public int SBoxLen;
public byte[] SBox;
public int I;
public int J;
}
public static RC4Session RC4CreateSession(byte[] key)
{
RC4Session session = new RC4Session
{
Key = key,
I = 0,
J = 0,
SBoxLen = 0x100,
SBox = new byte[0x100]
};
for (int i = 0; i < session.SBoxLen; i++)
{
session.SBox[i] = (byte)i;
}
int index = 0;
for (int j = 0; j < session.SBoxLen; j++)
{
index = ((index + session.SBox[j]) + key[j % key.Length]) % session.SBoxLen;
byte num4 = session.SBox[index];
session.SBox[index] = session.SBox[j];
session.SBox[j] = num4;
}
return session;
}
public static void RC4Encrypt(ref RC4Session session, byte[] data, int index, int count)
{
int num = index;
do
{
session.I = (session.I + 1) % 0x100;
session.J = (session.J + session.SBox[session.I]) % 0x100;
byte num2 = session.SBox[session.I];
session.SBox[session.I] = session.SBox[session.J];
session.SBox[session.J] = num2;
byte num3 = data[num];
byte num4 = session.SBox[(session.SBox[session.I] + session.SBox[session.J]) % 0x100];
data[num] = (byte)(num3 ^ num4);
num++;
}
while (num != (index + count));
}
Now Here is my own c version :
typedef struct rc4_state {
int s_box_len;
uint8_t* sbox;
int i;
int j;
} rc4_state_t;
unsigned char* HMAC_SHA1(const char* cpukey, const unsigned char* hmac_key) {
unsigned char* digest = malloc(20);
digest = HMAC(EVP_sha1(), cpukey, 16, hmac_key, 16, NULL, NULL);
return digest;
}
void rc4_init(rc4_state_t* state, const uint8_t *key, int keylen)
{
state->i = 0;
state->j = 0;
state->s_box_len = 0x100;
state->sbox = malloc(0x100);
// Init sbox.
int i = 0, index = 0, j = 0;
uint8_t buf;
while(i < state->s_box_len) {
state->sbox[i] = (uint8_t)i;
i++;
}
while(j < state->s_box_len) {
index = ((index + state->sbox[j]) + key[j % keylen]) % state->s_box_len;
buf = state->sbox[index];
state->sbox[index] = (uint8_t)state->sbox[j];
state->sbox[j] = (uint8_t)buf;
j++;
}
}
void rc4_crypt(rc4_state_t* state, const uint8_t *inbuf, uint8_t **outbuf, int buflen)
{
int idx = 0;
uint8_t num, num2, num3;
*outbuf = malloc(buflen);
if (*outbuf) { // do not forget to test for failed allocation
while(idx != buflen) {
state->i = (int)(state->i + 1) % 0x100;
state->j = (int)(state->j + state->sbox[state->i]) % 0x100;
num = (uint8_t)state->sbox[state->i];
state->sbox[state->i] = (uint8_t)state->sbox[state->j];
state->sbox[state->j] = (uint8_t)num;
num2 = (uint8_t)inbuf[idx];
num3 = (uint8_t)state->sbox[(state->sbox[state->i] + (uint8_t)state->sbox[state->j]) % 0x100];
(*outbuf)[idx] = (uint8_t)(num2 ^ num3);
printf("%02X", (*outbuf)[idx]);
idx++;
}
}
printf("\n");
}
Usage (c#) :
byte[] cpukey = new byte[16]
{
...
};
byte[] hmac_key = new byte[16]
{
...
};
byte[] buf = new System.Security.Cryptography.HMACSHA1(cpukey).ComputeHash(hmac_key);
MessageBox.Show(BitConverter.ToString(buf).Replace("-", ""), "");
Usage(c):
const char cpu_key[16] = { 0xXX, 0xXX, 0xXX };
const unsigned char hmac_key[16] = { ... };
unsigned char* buf = HMAC_SHA1(cpu_key, hmac_key);
uint8_t buf2[20];
uint8_t buf3[8] = { 0x1E, 0xF7, 0x94, 0x48, 0x22, 0x26, 0x89, 0x8E }; // Encrypted Xbox 360 data
uint8_t* buf4;
// Allocated 8 bytes out.
buf4 = malloc(8);
int num = 0;
while(num < 20) {
buf2[num] = (uint8_t)buf[num]; // convert const char
num++;
}
rc4_state_t* rc4 = malloc(sizeof(rc4_state_t));
rc4_init(rc4, buf2, 20);
rc4_crypt(rc4, buf3, &buf4, 8);
Now I have the HMACsha1 figured out, im using openssl for that and I confirm I am getting the correct hmac/decryption key its just the rc4 isnt working, Im trying to decrypt part of the Kyevault that should == "Xbox 360"||"58626F7820333630"
The output is currently : "0000008108020000" I do not get any errors in the compilation, again any help would be great ^.^
Thanks to John's help I was able to fix it, it was a error in the c# version, thanks John !
As I remarked in comments, your main problem appeared to involve how the output buffer is managed. You have since revised the question to fix that, but I describe it anyway here, along with some other alternatives for fixing it. The remaining problem is discussed at the end.
Function rc4_crypt() allocates an output buffer for itself, but it has no mechanism to communicate a pointer to the allocated space back to its caller. Your revised usage furthermore exhibits some inconsistency with rc4_crypt() with respect to how the output buffer is expected to be managed.
There are three main ways to approach the problem.
Function rc4_crypt() presently returns nothing, so you could let it continue to allocate the buffer itself, and modify it to return a pointer to the allocated output buffer.
You could modify the type of the outbuf parameter to uint8_t ** to enable rc4_crypt() to set the caller's pointer value indirectly.
You could rely on the caller to manage the output buffer, and make rc4_crypt() just write the output via the pointer passed to it.
The only one of those that might be tricky for you is #2; it would look something like this:
void rc4_crypt(rc4_state_t* state, const uint8_t *inbuf, uint8_t **outbuf, int buflen) {
*outbuf = malloc(buflen);
if (*outbuf) { // do not forget to test for failed allocation
// ...
(*outbuf)[idx] = (uint8_t)(num2 ^ num3);
// ...
}
}
And you would use it like this:
rc4_crypt(rc4, buf3, &buf4, 8);
... without otherwise allocating any memory for buf4.
The caller in any case has the responsibility for freeing the output buffer when it is no longer needed. This is clearer when it performs the allocation itself; you should document that requirement if rc4_crypt() is going to be responsible for the allocation.
The remaining problem appears to be strictly an output problem. You are apparently relying on print statements in rc4_crypt() to report on the encrypted data. I have no problem whatever with debugging via print statements, but you do need to be careful to print the data you actually want to examine. In this case you do not. You update the joint buffer index idx at the end of the encryption loop before printing a byte from the output buffer. As a result, at each iteration you print not the encrypted byte value you've just computed, but rather an indeterminate value that happens to be in the next position of the output buffer.
Move the idx++ to the very end of the loop to fix this problem, or change it from a while loop to a for loop and increment idx in the third term of the loop control statement. In fact, I strongly recommend for loops over while loops where the former are a good fit to the structure of the code (as here); I daresay you would not have made this mistake if your loop had been structured that way.

Simple reverb alghoritm when buffer is small

I'm trying to implement simple delay/reverb described in this post https://stackoverflow.com/a/5319085/1562784 and I have a problem. On windows where I record 16bit/16khz samples and get 8k samples per recording callback call, it works fine. But on linux I get much smaller chunks from soundcard. Something around 150 samples. Because of that I modified delay/reverb code to buffer samples:
#define REVERB_BUFFER_LEN 8000
static void reverb( int16_t* Buffer, int N)
{
int i;
float decay = 0.5f;
static int16_t sampleBuffer[REVERB_BUFFER_LEN] = {0};
//Make room at the end of buffer to append new samples
for (i = 0; i < REVERB_BUFFER_LEN - N; i++)
sampleBuffer[ i ] = sampleBuffer[ i + N ] ;
//copy new chunk of audio samples at the end of buffer
for (i = 0; i < N; i++)
sampleBuffer[REVERB_BUFFER_LEN - N + i ] = Buffer[ i ] ;
//perform effect
for (i = 0; i < REVERB_BUFFER_LEN - 1600; i++)
{
sampleBuffer[i + 1600] += (int16_t)((float)sampleBuffer[i] * decay);
}
//copy output sample
for (i = 0; i < N; i++)
Buffer[ i ] = sampleBuffer[REVERB_BUFFER_LEN - N + i ];
}
This results in white noise on output, so clearly I'm doing something wrong.
On linux, I record in 16bit/16khz, same like on Windows and I'm running linux in VMWare.
Thank you!
Update:
As indicated in answered post, I was 'reverbing' old samples over and over again. Simple 'if' sovled a problem:
for (i = 0; i < REVERB_BUFFER_LEN - 1600; i++)
{
if((i + 1600) >= REVERB_BUFFER_LEN - N)
sampleBuffer[i + 1600] += (int16_t)((float)sampleBuffer[i] * decay);
}
Your loop that performs the actual reverb effect will be performed multiple times on the same samples, on different calls to the function. This is because you save old samples in the buffer, but you perform the reverb on all samples each time. This will likely cause them to overflow at some point.
You should only perform the reverb on the new samples, not on ones which have already been modified. I would also recommend checking for overflow and clipping to the min/max values instead of wrapping in that case.
A probably better way to perform reverb, which will work for any input buffer size, is to maintain a circular buffer of size REVERB_SAMPLES (1600 in your case), which contains the last samples.
void reverb( int16_t* buf, int len) {
static int16_t reverb_buf[REVERB_SAMPLES] = {0};
static int reverb_pos = 0;
for (int i=0; i<len; i++) {
int16_t new_value = buf[i] + reverb_buf[reverb_pos] * decay;
reverb_buf[reverb_pos] = new_value;
buf[i] = new_value;
reverb_pos = (reverb_pos + 1) % REVERB_SAMPLES;
}
}

C Memory Management Issue

I have traced an EXC_BAD_ACCESS to the following allocation and deallocation of memory. It involves the accelerate framework in Xcode. The main issue is that this code is in a loop. If i force the loop to only iterate once then it works fine. But when it loops (7 times) it causes an error on the second iteration. Does any of this look incorrect?
EDIT: *added actual code. This segment runs if I remove certain parts and such but seems to have poor memory management which results in issues
#import <Foundation/Foundation.h>
#include <math.h>
#include <Accelerate/Accelerate.h>
for(int i = 0; i < 8; i++)
{
int XX[M][m]; //M and m are just 2 ints
for(int kk = 0; kk < M; kk++)
{
for (int kk1 = 0; kk1 < m; kk1++)
{
XX[kk][kk1] = [[x objectAtIndex: (kk + kk1 * J)] intValue]; //x is a NSMutableArray of NSNumber objects
}
}
double FreqRes = (double) freqSamp/n;
NSMutableArray *freqs = [[NSMutableArray alloc] initWithCapacity: round((freqSamp/2 - FreqRes) - 1)];
int freqSum = 0;
for(double i = -1 * freqSamp/2; i < (freqSamp/2 - FreqRes); i+= FreqRes)
{
[freqs addObject: [NSNumber numberWithInt: i]];
if(i == 0)
{
freqSum++;
}
}
int num = [x count];
int log2n = (int) log2f(num);
int nOver2 = n / 2;
FFTSetupD fftSetup = vDSP_create_fftsetupD (log2n, kFFTRadix2);
double ffx[num];
DSPDoubleSplitComplex fft_data;
fft_data.realp = malloc(nOver2 * sizeof(double)); //Error usually thrown on this line in the second iteration. Regardless of what I put there. If I add an NSLog here it throws the error on that NSLog
fft_data.imagp = malloc(nOver2 * sizeof(double));
for (int i = 0; i < n; ++i)
{
ffx[i] = [[x objectAtIndex:i] doubleValue];
}
vDSP_ctozD((DSPDoubleComplex *) ffx, 2, &fft_data, 1, nOver2);
vDSP_fft_zripD (fftSetup, &fft_data, 1, log2n, kFFTDirection_Forward);
for (int i = 0; i < nOver2; ++i)
{
fft_data.realp[i] *= 0.5;
fft_data.imagp[i] *= 0.5;
}
int temp = 1;
ffx[0] = abs(fft_data.realp[0]);
for(int i = 1; i < nOver2; i++)
ffx[i] = sqrt((fft_data.realp[i] * fft_data.realp[i]) + (fft_data.imagp[i] * fft_data.imagp[i]));
ffx[nOver2] = abs(fft_data.imagp[0]);
for(int i = nOver2-1; i > 0; i--)
{
ffx[nOver2 + temp] = sqrt((fft_data.realp[i] * fft_data.realp[i]) + (fft_data.imagp[i] * fft_data.imagp[i]));
temp++;
}
//clear Fxx and freqs data
vDSP_destroy_fftsetupD(fftSetup);
free(fft_data.imagp);
free(fft_data.realp);
[freqs release];
}
Your problem could be that you are casting malloc to a value. As you're tagging this c, I'm assuming that you are compiling in c in which case you should see this answer to a previous question as to why casting with malloc is bad:
https://stackoverflow.com/a/1565552/1515720
you can get an unpredictable runtime error when using the cast without including stdlib.h.
So the error on your side is not the cast, but forgetting to include stdlib.h. Compilers may assume that malloc is a function returning int, therefore converting the void* pointer actually returned by malloc to int and then to your your pointer type due to the explicit cast. On some platforms, int and pointers may take up different numbers of bytes, so the type conversions may lead to data corruption.
Regardless though, as the answer says, YOU SHOULD NOT BE CASTING MALLOC RETURNS, because void*'s are safely implicitly converted to whatever you are assigning it to.
As another answerer stated:
vDSP_destroy_fftsetupD(fftSetup);
Could be also free'ing the memory you allocated on accident.
Any chance the destructor of DSPDoubleSplitComplex is freeing up those two allocated blocks?
It could also be that you are only allowed to call vDSP_create_fftsetupD and vDSP_destroy_fftsetupD once during your process's lifetime

Resources