multithreading and parameters mixup - c

I have a function which behaves correctly when called by a single thread (either by calling it directly, or via CreateThread() / WaitForSingleObject() calls ), but seems to go haywire when invoked by multiple CreateThread() followed by a WaitForMultipleObject() call.
From the extensive debugging I have tried, It looks as if some of the variables passed as parameters to the main function being called are not kept isolated between different threads, and instead use the same address space (example below). Here's a summary with some details of the problem:
First, I define a type to hold all the parameters for the function every thread needs to call:
typedef struct {
tDebugInfo DebugParms; int SampleCount; double** Sample; double** Target; double** a; double** F; double** dF; double** prevF; double** prevdF; double*** W; double*** prevW; double*** prevdW; double* e; double* dk; double* dj; double* dj2; double* sk; double* sk2; double* adzev21; double* prevadzev21; double** UW10; double* ro10e; double** dW10d; double** A; double** B; double** C; double** D; double** E; double** G; double** ET; double** AB; double** ABC; double** ABCD; double** ABCDE; double** ABCDH; double** ABCDHG; double** SABCDE; double** SABCDHG; double** I; double** J; double** M; double** x; double** xT; double* xU; double** dW10; int DataSetId; int TestId; int PredictionLen; double* Forecast; double ScaleM; double ScaleP; NN_Parms* ElmanParms; int DP[2][10];} tTrainParams;
I then allocate an array of structures to hold each thread's set of parameters:
HANDLE* HTrain = (HANDLE*)malloc(DatasetsCount*sizeof(HANDLE));
tTrainParams* tp = (tTrainParams*)malloc(DatasetsCount * sizeof(tTrainParams));
DWORD tid = 0; LPDWORD th_id = &tid;
Then, I set function parameters for each thread:
tp[d].ElmanParms = pElmanParams; tp[d].SampleCount = SampleCount; tp[d].Sample = SampleData_Scaled[d]; tp[d].Target = TargetData_Scaled[d]; tp[d].a = a; tp[d].F = F; tp[d].dF = dF; tp[d].prevF = prevF; tp[d].prevdF = prevdF; tp[d].W = W; tp[d].prevW = prevW; tp[d].prevdW = prevdW; tp[d].e = e; tp[d].dk = dk; tp[d].dj = dj; tp[d].dj2 = dj2; tp[d].sk = sk; tp[d].sk2 = sk2; tp[d].adzev21 = adzev21; tp[d].prevadzev21 = prevadzev21; tp[d].UW10 = UW10; tp[d].ro10e = ro10e; tp[d].dW10d = dW10d; tp[d].A = A; tp[d].B = B; tp[d].C = C; tp[d].D = D; tp[d].E = E; tp[d].G = G; tp[d].ET = ET; tp[d].AB = AB; tp[d].ABC = ABC; tp[d].ABCD = ABCD; tp[d].ABCDE = ABCDE; tp[d].ABCDH = ABCDH; tp[d].ABCDHG = ABCDHG; tp[d].SABCDE = SABCDE; tp[d].SABCDHG = SABCDHG; tp[d].I = I; tp[d].J = J; tp[d].M = M; tp[d].x = x; tp[d].xT = xT; tp[d].xU = xU; tp[d].dW10 = dW10; tp[d].DebugParms = pDebugParms; tp[d].ElmanParms = pElmanParams; tp[d].PredictionLen = pPredictionLen; tp[d].Forecast = ForecastData[d]; tp[d].ScaleM = ScaleM[d]; tp[d].ScaleP = ScaleP[d]; tp[d].TestId = pTestId; tp[d].DataSetId = d;
Then, I call a wrapper function GetForecastFromTraining(tTrainParams* parms) for each thread, having set in advance the relevant parameters in the "tp" structure array:
HTrain[d] = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)GetForecastFromTraining, &tp[d], 0, th_id);
Finally, I call WaitForMultipleObjects():
WaitForMultipleObjects(DatasetsCount, HTrain, TRUE, INFINITE);
What happens inside GetForecastFromTraining() for most variables (apparently arrays only) is that whenever one thread changes the value of one array element (say, W[0][0][0]), the new value becomes current inside all the other threads, too. This, of course, screws up all the calculations that are being made across all threads, and looks to me to be contrary to the whole segregation story across threads.
One hint of what's going on is that, when I look at "Parallel Watch" debugging window inside VS2013, I see that W has the same address across all the threads (hence the same values); however, &W is different for each thread. Other non-array variables seem to behave fine. Finally, I double-checked the /MTd flag in the compiler option, and it is there.
I'm quite lost on this. Any suggestion?
P.S.: Here is a streamlined version of my program, which displays the same problematic behaviour. In this example, breaking the execution after the Sleep(1000) line shows that a1, a2 and G variables each correctly contains the thread id, while F is the same for all threads.
#include <Windows.h>
#include <stdio.h>
#define MAX_THREADS 5
HANDLE h[MAX_THREADS];
typedef struct{
int a1;
int a2;
double* F;
double G[5];
} tMySumParms;
void MySum(tMySumParms* p){
int tid = GetCurrentThreadId();
Sleep(200);
p->a1 = tid;
p->a2 = -tid;
p->F[0] = tid;
p->F[1] = -tid;
p->G[0] = tid;
p->G[1] = -tid;
Sleep(1000);
}
extern "C" __declspec(dllexport) int GetKaz(){
LPDWORD t = NULL;
tMySumParms* p = (tMySumParms*)malloc(MAX_THREADS*sizeof(tMySumParms));
HANDLE* h = (HANDLE*)malloc(MAX_THREADS*sizeof(HANDLE));
double G[5];
double* F = (double*)malloc(5 * sizeof(double));
for (int i = 0; i < MAX_THREADS; i++){
p[i].a1 = 1;
p[i].a2 = 2 ;
p[i].F = F;
memcpy(p[i].G, G, 5 * sizeof(double));
h[i] = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)MySum, &p[i], 0, t);
}
WaitForMultipleObjects(MAX_THREADS, h, TRUE, INFINITE);
return 0;
}

W is declared as double*** in the parameter struct, later in the question you say you use it as W[0][0][0]. So W is an array of pointers to arrays of pointers to arrays of doubles.
My guess is that one of those layers is common for all threads.
To confirm this theory, and to make sure it is not a concurrency problem but a data structure problem, I would create a simple single-threaded test function as follows:
Fill the array intended for thread 1 with 1.0
Then fill the array for thread 2 with 2.0
Check the values for thread 1.
The streamlined version shows the problem: The F array is allocated once and each thread gets a pointer to this single array. So if one thread updates the array, all the others see the changes.
double* F = (double*)malloc(5 * sizeof(double)); // one array!
for (int i = 0; i < MAX_THREADS; i++){
...
p[i].F = F; // all threads use the same array!
Change it to:
for (int i = 0; i < MAX_THREADS; i++){
...
p[i].F = malloc(5 * sizeof(double)); // each thread has its own array

Related

Python C Extension

I am having issues returning a 2D array from a C extension back to Python. When I allocate memory using malloc the returned data is rubbish. When I just initialise an array like sol_matrix[nt][nvar] the returned data is as expected.
#include <Python.h>
#include <numpy/arrayobject.h>
#include <math.h>
#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
// function to be solved by Euler solver
double func (double xt, double y){
double y_temp = pow(xt, 2);
y = y_temp;
return y;
}
static PyObject* C_Euler(double h, double xn)
{
double y_temp, dydx; //temps required for solver
double y_sav = 0; //temp required for solver
double xt = 0; //starting value for xt
int nvar = 2; //number of variables (including time)
int nt = xn/h; //timesteps
double y = 0; //y starting value
//double sol_matrix[nt][nvar]; //works fine
double **sol_matrix = malloc(nt * sizeof(double*)); //doesn't work
for (int i=0; i<nt; ++i){
sol_matrix[i] = malloc (nvar * sizeof(double));
}
int i=0;
//solution loop - Euler method.
while (i < nt){
sol_matrix[i][0]=xt;
sol_matrix[i][1]=y_sav;
dydx = func(xt, y);
y_temp = y_sav + h*dydx;
xt = xt+h;
y_sav=y_temp;
i=i+1;
}
npy_intp dims[2];
dims[0] = nt;
dims[1] = 2;
//Create Python object to copy solution array into, get pointer to
//beginning of array, memcpy the data from the C colution matrix
//to the Python object.
PyObject *newarray = PyArray_SimpleNew(2, dims, NPY_DOUBLE);
double *p = (double *) PyArray_DATA(newarray);
memcpy(p, sol_matrix, sizeof(double)*(nt*nvar));
// return array to Python
return newarray;
}
static PyObject* Euler(PyObject* self, PyObject* args)
{
double h, xn;
if (!PyArg_ParseTuple(args, "dd", &h, &xn)){
return NULL;
}
return Py_BuildValue("O", C_Euler(h,xn));
}
Could you provide any guidance on where I am going wrong?
Thank you.
The data in sol_matrix is not in contiguous memory, it's in nt separately allocated arrays. Therefore the line
memcpy(p, sol_matrix, sizeof(double)*(nt*nvar));
is not going to work.
I'm not a big fan of pointer-to-pointer arrays so believe your best option is to allocate sol_matrix as one big chunk:
double *sol_matrix = malloc(nt*nvar * sizeof(double));
This does mean you can't do 2D indexing so will need to do
// OLD: sol_matrix[i][0]=xt;
sol_matrix[i*nvar + 0] = xt;
In contrast
double sol_matrix[nt][nvar]; //works fine
is a single big chunk of memory so the copy works fine.

Returning an array of structs from a function - C programming

So I'm trying to write a function that will return an array of several values. At the moment, it is running correctly but only outputting the final calculated value. How would I make it so the output includes all calculated values?
My code looks like this:
//Practice to output an array of structs
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
struct boat_params {
double V, Uc, Vc;
};
struct boat_params submerged_volume(double L1, double L2, double Lavg, double H) {
struct boat_params volume;
double V_sub, Uc_sub, Vc_sub;
V_sub = 0;
//Boat description
double C, delta;
double theta, theta_rad, theta_min, theta_min2, theta_lim, theta_lim2, theta_lim_deg;
double Ug1, Ug2, Vg1, Vg2, V1, V2;
double pi;
pi = 4*atan(1);
C = sqrt(L1*L1 + L2*L2);
delta = acos(L1/C);
theta_lim = asin(H/L1);
theta_lim_deg = (theta_lim/pi) * 180.0;
theta_min = asin(H/C) - delta;
theta_min2 = 0;
//Calculating the submerged volume and centre of gravity for each different angle
for (theta = 0; theta <= 10; theta ++) {
//**Note: I've taken out the actual calculations of V_sub, Uc_sub, and Vc_sub for brevity**
volume.V = V_sub;
volume.Uc = Uc_sub;
volume.Vc = Vc_sub;
}
return volume;
}
int main () {
double L1, L2, Lavg, H;
struct boat_params volume;
L1 = 17.6;
L2 = 3;
Lavg = 4;
H = 4.5;
volume = submerged_volume(L1, L2, Lavg, H);
printf("V = %lf\nUc = %lf\nVc = %lf\n", volume.V, volume.Uc, volume.Vc);
return 0;
}
I can get it to correctly output the last calculated value (for theta = 10) but that's the only value I'm getting. How would I calculate V_sub, Uc_sub, and Vc_sub for each theta value? and output each value. I'm assuming this means turning the struct into an array and filling each element of the array with values of the struct for that theta but I don't know how to do this!
I really appreciate any help and thank you in advance.
Also: If possible I'd like to avoid pointers but understand this may not be possible! I'm still very new and not good at using them!
You are quite right, you will need to have an array for that. If the number of elements in the array is constant, you could also create a struct that contains exactly that number elements, but please don't do that.
To operate on arrays you will - unfortunately - need pointers. A very common way to do this in C is not to return a pointer, but pass a 'result' pointer in. This means that it will be up to the user of the function to allocate space and free it, he can also use the syntax for arrays. In your code it seems that the number of values is constant, this makes the aforementioned solution possible. Alternatively you could allocate space on the heap (using malloc) and return a pointer, but that means the user needs to free memory he never allocated, counter intuitive and might result in memory leaks if he forgets to do so. Consider the following solution:
void submerged_volume(double L1, double L2, double Lavg, double H, struct boat_params *result) {
// your calculations here
for (theta = 0; theta <= 10; theta ++) {
(result+theta)->V = V_sub;
(result+theta)->Uc = Uc_sub;
(result+theta)->Vc = Vc_sub;
}
}
// somewhere in your code where you want to use your function
struct boat_params values[11];
unsigned char i = 0;
submerged_values(/* parameters */, values);
for (; i <= 10; ++i) {
printf("V = %lf\nUc = %lf\nVc = %lf\n", values[i].V, values[i].Uc, values[i].Vc);
}
Try this, just add your logic to the loop and maths:
#include <stdio.h>
#include <stdlib.h>
#define ARRSIZE 100
typedef struct boat_params {
double V, Uc, Vc;
} Volume;
struct boat_params submerged_volume(double L1, double L2, double Lavg, double H, Volume *volumes[]) {
double theta;
int i = 0; /* only example, change as needed */
Volume *p;
for (theta = 0; theta <= 10; theta ++) {
p = malloc(sizeof(* p));
if (p == NULL) {
printf("malloc failed to allocate a new space");
exit(0);
}
p->V = 1; //V_sub;
p->Uc = 2; //Uc_sub;
p->Vc = 3; //Vc_sub;
volumes[i] = p;
i++;
}
}
int main () {
double L1, L2, Lavg, H;
L1 = 17.6;
L2 = 3;
Lavg = 4;
H = 4.5;
Volume *volumes[ARRSIZE];
submerged_volume(L1, L2, Lavg, H, volumes);
printf("V = %lf\nUc = %lf\nVc = %lf\n", volumes[0]->V, volumes[0]->Uc, volumes[0]->Vc); /* first element for example */
return 0;
}
If you don't know the size of the volumes array in advance, you should consider using linked list.

sending struct array to cuda kernel

I'm working on a project and I have to sent a struct array to cuda kernel. The struct also contains an array. To test it I have written a simple program.
struct Point {
short x;
short *y;
};
my kernel code:
__global__ void addKernel(Point *a, Point *b, Point *c)
{
int i = threadIdx.x;
c[i].x = a[i].x + b[i].x;
for (int j = 0; j<4; j++){
c[i].y[j] = a[i].y[j] + a[i].y[j];
}
}
my main code:
int main()
{
const int arraySize = 4;
const int arraySize2 = 4;
short *ya, *yb, *yc;
short *dev_ya, *dev_yb, *dev_yc;
Point *a;
Point *b;
Point *c;
Point *dev_a;
Point *dev_b;
Point *dev_c;
size_t sizeInside = sizeof(short) * arraySize2;
ya = (short *)malloc(sizeof(short) * arraySize2);
yb = (short *)malloc(sizeof(short) * arraySize2);
yc = (short *)malloc(sizeof(short) * arraySize2);
ya[0] = 1; ya[1] =2; ya[2]=3; ya[3]=4;
yb[0] = 2; yb[1] =3; yb[2]=4; yb[3]=5;
size_t sizeGeneral = (sizeInside+sizeof(short)) * arraySize;
a = (Point *)malloc( sizeGeneral );
b = (Point *)malloc( sizeGeneral );
c = (Point *)malloc( sizeGeneral );
a[0].x = 2; a[0].y = ya;
a[1].x = 2; a[1].y = ya;
a[2].x = 2; a[2].y = ya;
a[3].x = 2; a[3].y = ya;
b[0].x = 4; b[0].y = yb;
b[1].x = 4; b[1].y = yb;
b[2].x = 4; b[2].y = yb;
b[3].x = 4; b[3].y = yb;
cudaMalloc((void**)&dev_a, sizeGeneral);
cudaMalloc((void**)&dev_b, sizeGeneral);
cudaMalloc((void**)&dev_c, sizeGeneral);
cudaMemcpy(dev_a, a, sizeGeneral, cudaMemcpyHostToDevice);
cudaMemcpy(dev_b, b, sizeGeneral, cudaMemcpyHostToDevice);
addKernel<<<1, 4>>>(dev_a, dev_b, dev_c);
cudaError_t err = cudaMemcpy(c, dev_c, sizeGeneral, cudaMemcpyDeviceToHost);
printf("{%d-->%d,%d,%d,%d} \n err= %d",c[0].x,c[0].y[0],c[1].y[1],c[1].y[2],c[2].y[3], err);
cudaFree(dev_a);
cudaFree(dev_b);
cudaFree(dev_c);
return 0;
}
It seems cuda kernel is not working. Actually I can access structs 'x' variable but I cannot access 'y' array. What can I do to access the 'y' array? Thanks in advance.
When you are sending this struct to kernel you send short and pointer to short in host memory not device. This is crucial. For simple type - as short this works, because kernel has its local copy in memory designated to accept parameters. So when you call this kernel you have moved x and y to device, but not the area pointed by y. This you have to do manually by allocating space for it and updating pointer y to point to device memory.
You are not passin the array to the device. You can either make the array a part of the struct, by defining it like this:
struct {
short normalVal;
short inStructArr[4];
}
Or pass the array into the device memory and update the pointer in the struct.

MPI_reduce() with custom Datatype containing dynamically allocated arays : segmentation fault

I don't get why MPI_Reduce() does a segmentation fault as soon as I use a custom MPI datatype which contains dynamically allocated arrays. Does anyone know ? The following code crashes with 2 processors, inside the MPI_Reduce().
However If I remove the member double *d int MyType and changes the operator and MPI type routines accordingly, the reduction is done without any problem.
Is there a problem using dynamically allocated arrays or is there something fundamentally wrong with what I do :
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
typedef struct mytype_s
{
int c[2];
double a;
double b;
double *d;
} MyType;
void CreateMyTypeMPI(MyType *mt, MPI_Datatype *MyTypeMPI)
{
int block_lengths[4]; // # of elt. in each block
MPI_Aint displacements[4]; // displac.
MPI_Datatype typelist[4]; // list of types
MPI_Aint start_address, address; // use for calculating displac.
MPI_Datatype myType;
block_lengths[0] = 2;
block_lengths[1] = 1;
block_lengths[2] = 1;
block_lengths[3] = 10;
typelist[0] = MPI_INT;
typelist[1] = MPI_DOUBLE;
typelist[2] = MPI_DOUBLE;
typelist[3] = MPI_DOUBLE;
displacements[0] = 0;
MPI_Address(&mt->c, &start_address);
MPI_Address(&mt->a, &address);
displacements[1] = address - start_address;
MPI_Address(&mt->b,&address);
displacements[2] = address-start_address;
MPI_Address(&mt->d, &address);
displacements[3] = address-start_address;
MPI_Type_struct(4,block_lengths, displacements,typelist,MyTypeMPI);
MPI_Type_commit(MyTypeMPI);
}
void MyTypeOp(MyType *in, MyType *out, int *len, MPI_Datatype *typeptr)
{
int i;
int j;
for (i=0; i < *len; i++)
{
out[i].a += in[i].a;
out[i].b += in[i].b;
out[i].c[0] += in[i].c[0];
out[i].c[1] += in[i].c[1];
for (j=0; j<10; j++)
{
out[i].d[j] += in[i].d[j];
}
}
}
int main(int argc, char **argv)
{
MyType mt;
MyType mt2;
MPI_Datatype MyTypeMPI;
MPI_Op MyOp;
int rank;
int i;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
mt.a = 2;
mt.b = 4;
mt.c[0] = 6;
mt.c[1] = 8;
mt.d = calloc(10,sizeof *mt.d);
for (i=0; i<10; i++) mt.d[i] = 2.1;
mt2.a = 0;
mt2.b = 0;
mt2.c[0] = mt2.c[1] = 0;
mt2.d = calloc(10,sizeof *mt2.d);
CreateMyTypeMPI(&mt, &MyTypeMPI);
MPI_Op_create((MPI_User_function *) MyTypeOp,1,&MyOp);
if(rank==0) printf("type and operator are created now\n");
MPI_Reduce(&mt,&mt2,1,MyTypeMPI,MyOp,0,MPI_COMM_WORLD);
if(rank==0)
{
for (i=0; i<10; i++) printf("%f ",mt2.d[i]);
printf("\n");
}
free(mt.d);
free(mt2.d);
MPI_Finalize();
return 0;
}
Let's look at your struct:
typedef struct mytype_s
{
int c[2];
double a;
double b;
double *d;
} MyType;
...
MyType mt;
mt.d = calloc(10,sizeof *mt.d);
And your description of this struct as an MPI type:
displacements[0] = 0;
MPI_Address(&mt->c, &start_address);
MPI_Address(&mt->a, &address);
displacements[1] = address - start_address;
MPI_Address(&mt->b,&address);
displacements[2] = address-start_address;
MPI_Address(&mt->d, &address);
displacements[3] = address-start_address;
MPI_Type_struct(4,block_lengths, displacements,typelist,MyTypeMPI);
The problem is, this MPI struct is only ever going to apply to the one instance of the structure you've used in the definition here. You have no control at all of where calloc() decides to grab memory from; it could be anywhere in virtual memory. The next one of these type you create and instantiate, the displacement for your d array will be completely different; and even using the same struct, if you change the size of the array with realloc() of the current mt, it could end up having a different displacement.
So when you send, receive, reduce, or anything else with one of these types, the MPI library will dutifully go to a probably meaningless displacement, and try to read or write from there, and that'll likely cause a segfault.
Note that this isn't an MPI thing; in using any low-level communications library, or for that matter trying to write out/read in from disk, you'd have the same problem.
Your options include manually "marshalling" the array into a message, either with the other fields or without; or adding some predictability to where d is located such as by defining it to be an array of some defined maximum size.

Getting value from a dynamic allocated 2d array by pointers

I have filled a dynamic allocated float multi array in a function.
A second function has to get the values of the array exploiting the pointer to the first element of the array defined in the former function.
The second function do not access to the correct memory location so it doesn't work but it does if the multy array is defined in a static way.
Does somebody know why?
eval_cell should get values defined in div_int
float f_imp(float x, float y){
return pow(x,2)+pow(y,2)-1;
}
int eval_cell(float* p){
int s[4];
s[0] = f_imp(*p, *(p+1)) <= 0;
printf("%f %f\n",*p, *(p+1));
s[1] = f_imp(*(p+3), *(p+4)) <= 0;
printf("%f %f\n",*(p+3), *(p+4));
s[2] = f_imp(*(p+9), *(p+10)) <= 0;
printf("%f %f\n",*(p+9), *(p+10));
s[3] = f_imp(*(p+6), *(p+7)) <= 0;
printf("%f %f\n",*(p+6), *(p+7));
printf("%d%d%d%d\n",s[0],s[1],s[2],s[3]);
return s[0];
}
void div_int(float* x1, float* y1,float* x2,float* y2,
float* f0, float* f2,float* f6,float* f8){
int i,j,m;
float* p;
float** a_cell; // array 9x3 contente coordinate vertici e valore funzione
*a_cell = (float**) malloc(9*sizeof(float*));
for (i=0;i<9;i++){
a_cell[i] = (float*) malloc(3*sizeof(float));
}
a_cell[0][0] = *x1;
a_cell[0][1] = *y1;
a_cell[0][2] = *f0;
a_cell[2][0] = *x2;
a_cell[2][1] = *y1;
a_cell[2][2] = *f2;
a_cell[6][0] = *x1;
a_cell[6][1] = *y2;
a_cell[6][2] = *f6;
a_cell[8][0] = *x2;
a_cell[8][1] = *y2;
a_cell[8][2] = *f8;
/*** calcolo dei valori incogniti di a_cell ***/
a_cell[1][0] = (*x1+*x2)/2;
a_cell[1][1] = *y1;
a_cell[1][2] = f_imp(a_cell[1][0], a_cell[1][1]);
a_cell[3][0] = *x1;
a_cell[3][1] = (*y1+*y2)/2;
a_cell[3][2] = f_imp(a_cell[3][0], a_cell[3][1]);;
a_cell[4][0] = (*x2+*x1)/2;
a_cell[4][1] = (*y2+*y1)/2;
a_cell[4][2] = f_imp(a_cell[4][0], a_cell[4][1]);
a_cell[5][0] = *x2;
a_cell[5][1] = (*y2+*y1)/2;
a_cell[5][2] = f_imp(a_cell[5][0], a_cell[5][1]);
a_cell[7][0] = (*x1+*x2)/2;
a_cell[7][1] = *y2;
a_cell[7][2] = f_imp(a_cell[7][0], a_cell[7][1]);
for (j=0;j<2;j++){
m = j*3;
for(i=0;i<2;i++){
m += i;
eval_cell(&a_cell[m][0]);
}
}
p = *a_cell;
for (i=0;i<9;i++){
for (j=0;j<3;j++){
printf("%f \n",*(p+3*i+j));
printf("%f \n",a_cell[i][j]);
printf("\n");
}
}
free(a_cell);
return;
}
It's because you using pointer in incorrect way:
See a_cell is pointer to dynamic array of 9 pointers to dynamic array of 3 floats.
So when you do eval_cell(&a_cell[m][0]) (or just eval_cell(a_cell[m]) this is actually the same) you actually get pointer to array of 3 floats. And after that you do:
int eval_cell(float* p){
...
s[2] = f_imp(*(p+9), *(p+10)) <= 0;
*(p+9) will get 9th element in array of 3 floats, so this is incorrect.
It works in static way, because static multi dimension array in memory is just one dimension array for which you was given multi indexing (by compiler). That's why in static you will probably address valid memory area.
See picture for more explanation:
If you want a completely dynamic matrix (2d array), you have to make your own element access function:
double *
make_array (unsigned int rows, unsigned int cols)
{
return malloc (rows * cols * sizeof (double));
}
double *
array_element (double *a, unsigned int cols, unsigned int i, unsigned int j)
{
return a + i * cols + j;
}
#define A(i,j) (*array_element ((a), (cols), (i), (j)))
double *a;
unsigned int rows, cols;
a = make_array (rows, cols);
A(3,4) = 3.14;
printf ("%f\n:" A(3,4));
EDIT:
In your program
*a_cell = (float**) malloc(9*sizeof(float*));
should be
a_cell = (float**) malloc(9*sizeof(float*));
And likewise for
p = *a_cell;

Resources