Problems with malloc on OS X - c

I have written a some C code running on OS X 10.6, which happens to be slow so I am using valgrind to check for memory leaks etc. One of the things I have noticed whilst doing this:
If I allocate memory to a 2D array like this:
double** matrix = NULL;
allocate2D(matrix, 2, 2);
void allocate2D(double** matrix, int nrows, int ncols) {
matrix = (double**)malloc(nrows*sizeof(double*));
int i;
for(i=0;i<nrows;i++) {
matrix[i] = (double*)malloc(ncols*sizeof(double));
}
}
Then check the memory address of matrix it is 0x0.
However if I do
double** matrix = allocate2D(2,2);
double** allocate2D(int nrows, int ncols) {
double** matrix = (double**)malloc(nrows*sizeof(double*));
int i;
for(i=0;i<nrows;i++) {
matrix[i] = (double*)malloc(ncols*sizeof(double));
}
return matrix;
}
This works fine, i.e. the pointer to the newly created memory is returned.
When I also have a free2D function to free up the memory. It doesn't seem to free properly. I.e. the pointer still point to same address as before call to free, not 0x0 (which I thought might be default).
void free2D(double** matrix, int nrows) {
int i;
for(i=0;i<nrows;i++) {
free(matrix[i]);
}
free(matrix);
}
My question is: Am I misunderstanding how malloc/free work? Otherwise can someone suggest whats going on?
Alex

When you free a pointer, the value of the pointer does not change, you will have to explicitly set it to 0 if you want it to be null.

In the first example, you've only stored the pointer returned by malloc in a local variable. It's lost when the function returns.
Usual practice in the C language is to use the function's return value to pass the pointer to an allocated object back to the caller. As Armen pointed out, you can also pass a pointer to where the function should store its output:
void Allocate2D(double*** pMatrix...)
{
*pMatrix = malloc(...)
}
but I think most people would scream as soon as they see ***.
You might also consider that arrays of pointers are not an efficient implementation of matrices. Allocating each row separately contributes to memory fragmentation, malloc overhead (because each allocation involves some bookkeeping, not to mention the extra pointers you have to store), and cache misses. And each access to an element of the matrix involves 2 pointer dereferences rather than just one, which can introduce stalls. Finally, you have a lot more work to do allocating the matrix, since you have to check for failure of each malloc and cleanup everything you've already done if any of them fail.
A better approach is to use a one-dimensional array:
double *matrix;
matrix = malloc(nrows*ncols*sizeof *matrix);
then access element (i,j) as matrix[i*ncols+j]. The potential disadvantages are the multiplication (which is slow on ancient cpus but fast on modern ones) and the syntax.
A still-better approach is not to seek excess generality. Most matrix code on SO is not for advanced numerical mathematics where arbitrary matrix sizes might be needed, but for 3d gaming where 2x2, 3x3, and 4x4 are the only matrix sizes of any practical use. If that's the case, try something like
double (*matrix)[4] = malloc(4*sizeof *matrix);
and then you can access element (i,j) as matrix[i][j] with a single dereference and an extremely fast multiply-by-constant. And if your matrix is only needed at local scope or inside a structure, just declare it as:
double matrix[4][4];
If you're not extremely adept with the C type system and the declarations above, it might be best to just wrap all your matrices in structs anyway:
struct matrix4x4 {
double x[4][4];
};
Then declarations, pointer casts, allocations, etc. become a lot more familiar. The only disadvantage is that you need to do something like matrix.x[i][j] or matrix->x[i][j] (depending on whether matrix is a struct of pointer to struct) instead of matrix[i][j].
Edit: I did think of one useful property of implementing your matrices as arrays of row pointers - it makes permutation of rows a trivial operation. If your algorithms need to perform a lot of row permutation, this may be beneficial. Note that the benefit will not be much for small matrices, though, and than column permutation cannot be optimized this way.

In C++ You should pass the pointer by reference :)
Allocate2D(double**& matrix...)
As to what's going on - well you have a pointer that is NULL, you pass the copy of that pointer to the function which allocated mamory and initializes the copy of your pointer with the address of the newly allocated memory, but your original pointer remains NULL. As for free you don't need to pass by reference since only the value of the pointer is relevant. HTH
Since there are no references in C, you can pass by pointer, that is
Allocate2D(double*** pMatrix...)
{
*pMatrix = malloc(...)
}
and later call like
Allocate2D(&matrix ...)

Related

Pointer Copying for Two Dynamically Growing Arrays in C

UPDATE: I think I've answered my own question, except for some possible issues with memory leaks.
ORIGINAL QUESTION HERE, ANSWER BELOW.
Background: I'm doing some numerical computing, but I almost never use languages that require me to manage memory on my own. I'm piping something out to C now and am having trouble with (I think) pointer reference issues.
I have two arrays of doubles that are growing in a while loop, and at each iteration, I want to free the memory for the smaller, older array, set 'old' to point to the newer array, and then set 'new' to point to a larger block of memory.
After looking around a bit, it seemed as though I should be using pointers to pointers, so I've tried this, but am running into "lvalue required as unary ‘&’ operand" errors.
I start with:
double ** oldarray;
oldarray = &malloc(1*sizeof(double));
double ** newarray;
newarray = &malloc(2*sizeof(double));
These initializations give me an "lvalue required as unary ‘&’ operand" error, and I'm not sure whether I should replace it with
*oldarray = (double *) malloc(1*sizeof(double));
When I do that, I can compile a simple program (It just has the lines I have above and returns 0) but I get a seg fault.
The rest of the program is as follows:
while ( <some condition> ) {
// Do a lot of processing, most of which is updating
// the values in newarray using old array and other newarray values.
// Now I'm exiting the loop, and growing and reset ing arrays.
free(*oldarray) // I want to free the memory used by the smaller, older array.
*oldarray = *newarray // Then I want oldarray to point to the larger, newer array.
newarray = &malloc( <previous size + 1>*sizeof(double))
}
So I'd like to be, at each iteration, updating an array of size (n) using itself and an older array of size (n-1). Then I want to free up the memory of the array of size (n-1), set 'oldarray' to point to the array I just created, and then set 'newarray' to point to a new block of size (n+1) doubles.
Do I actually need to be using pointers to pointers? I think my main issue is that, when I set old to new, they share a pointee, and I then don't know how to set new to a new array. I think that using pointers to pointers gets me out of this, but, I'm not sure, and I still have the lvalue errors with pointers to pointers.
I've checked out C dynamically growing array and a few other stack questions, and have been googling pointers, malloc, and copying for about half a day.
Thanks!
HERE IS MY OWN ANSWER
I've now got a working solution. My only worry is that it might contain some memory leaks.
Using realloc() works, and I also need to be careful to make sure I'm only free()ing pointers that I initialized using malloc or realloc, and not pointers initialized with double * oldarray;.
The working version goes like this:
double * olddiagonal = (double *) malloc(sizeof(double));
olddiagonal[0] = otherfunction(otherstuff);
int iter = 1;
// A bunch of other stuff
while (<error tolerance condition>) {
double * new diagonal = (double *) malloc((iter+1)*sizeof(double));
newdiagonal[0] = otherfunction(moreotherstuff);
// Then I do a bunch of things and fill in the values of new diagonal using
// its other values and the values in olddiagonal.
// To finish, I free the old stuff, and use realloc to point old to new.
free(olddiagonal);
olddiagonal = (double *) realloc(newdiagonal, sizeof(double) * (iter+1));
iter++
}
This seems to work for my purposes. My only concern is possible memory leaks, but for now, it's behaving well and getting the correct values.
Here are some explanations:
double ** oldarray;
oldarray = &malloc(1*sizeof(double));
is wrong, because you don't store the result of malloc() anywhere, and since it is not stored anywhere, you can't take its address. You can get the effect that you seem to have had in mind by adding an intermediate variable:
double* intermediatePointer;
double** oldarray = &intermediatePointer;
intermediatePointer = malloc(1*sizeof(*intermediatePointer);
oldarray is now a pointer to the memory location of intermediatePointer, which points to the allocated memory slap in turn.
*oldarray = (double *) malloc(1*sizeof(double));
is wrong, because you are dereferencing an unitialized pointer. When you declare oldarray with double** oldarray;, you are only reserving memory for one pointer, not for anything the pointer is supposed to point to (the memory reservation is independent of what the pointer points to!). The value that you find in that pointer variable is undefined, so you have absolutely no control about what memory address you are writing to when you assign something to *oldarray.
Whenever you declare a pointer, you must initialize the pointer before you dereference it:
int* foo;
*foo = 7; //This is always a bug.
int bar;
int* baz = &bar; //Make sure the pointer points to something sensible.
*baz = 7; //OK.
Your answer code is indeed correct. However, it can be improved concerning style:
The combination of
int iter = 1;
while (<error tolerance condition>) {
...
iter++
}
calls for the use of the for() loop, which encapsulates the definition and incrementation of the loop variable into the loop control statement:
for(int iter = 1; <error tolerance condition>; iter++) {
...
}
In C, the cast of the return value of malloc() is entirely superfluous, it only clutters your code. (Note however that C++ does not allow the implicit conversion of void*s as C does, so int *foo = malloc(sizeof(*foo)) is perfectly valid C, but not legal in C++. However, in C++ you wouldn't be using malloc() in the first place.)

Triple pointers in C: is it a matter of style?

I feel like triple pointers in C are looked at as "bad". For me, it makes sense to use them at times.
Starting from the basics, the single pointer has two purposes: to create an array, and to allow a function to change its contents (pass by reference):
char *a;
a = malloc...
or
void foo (char *c); //means I'm going to modify the parameter in foo.
{ *c = 'f'; }
char a;
foo(&a);
The double pointer can be a 2D array (or array of arrays, since each "column" or "row" need not be the same length). I personally like to use it when I need to pass a 1D array:
void foo (char **c); //means I'm going to modify the elements of an array in foo.
{ (*c)[0] = 'f'; }
char *a;
a = malloc...
foo(&a);
To me, that helps describe what foo is doing. However, it is not necessary:
void foo (char *c); //am I modifying a char or just passing a char array?
{ c[0] = 'f'; }
char *a;
a = malloc...
foo(a);
will also work.
According to the first answer to this question, if foo were to modify the size of the array, a double pointer would be required.
One can clearly see how a triple pointer (and beyond, really) would be required. In my case if I were passing an array of pointers (or array of arrays), I would use it. Evidently it would be required if you are passing into a function that is changing the size of the multi-dimensional array. Certainly an array of arrays of arrays is not too common, but the other cases are.
So what are some of the conventions out there? Is this really just a question of style/readability combined with the fact that many people have a hard time wrapping their heads around pointers?
Using triple+ pointers is harming both readability and maintainability.
Let's suppose you have a little function declaration here:
void fun(int***);
Hmmm. Is the argument a three-dimensional jagged array, or pointer to two-dimensional jagged array, or pointer to pointer to array (as in, function allocates an array and assigns a pointer to int within a function)
Let's compare this to:
void fun(IntMatrix*);
Surely you can use triple pointers to int to operate on matrices. But that's not what they are. The fact that they're implemented here as triple pointers is irrelevant to the user.
Complicated data structures should be encapsulated. This is one of manifest ideas of Object Oriented Programming. Even in C, you can apply this principle to some extent. Wrap the data structure in a struct (or, very common in C, using "handles", that is, pointers to incomplete type - this idiom will be explained later in the answer).
Let's suppose that you implemented the matrices as jagged arrays of double. Compared to contiguous 2D arrays, they are worse when iterating over them (as they don't belong to a single block of contiguous memory) but allow for accessing with array notation and each row can have different size.
So now the problem is you can't change representations now, as the usage of pointers is hard-wired over user code, and now you're stuck with inferior implementation.
This wouldn't be even a problem if you encapsulated it in a struct.
typedef struct Matrix_
{
double** data;
} Matrix;
double get_element(Matrix* m, int i, int j)
{
return m->data[i][j];
}
simply gets changed to
typedef struct Matrix_
{
int width;
double data[]; //C99 flexible array member
} Matrix;
double get_element(Matrix* m, int i, int j)
{
return m->data[i*m->width+j];
}
The handle technique works like this: in the header file, you declare a incomplete struct and all the functions that work on the pointer to the struct:
// struct declaration with no body.
struct Matrix_;
// optional: allow people to declare the matrix with Matrix* instead of struct Matrix*
typedef struct Matrix_ Matrix;
Matrix* create_matrix(int w, int h);
void destroy_matrix(Matrix* m);
double get_element(Matrix* m, int i, int j);
double set_element(Matrix* m, double value, int i, int j);
in the source file you declare the actual struct and define all the functions:
typedef struct Matrix_
{
int width;
double data[]; //C99 flexible array member
} Matrix;
double get_element(Matrix* m, int i, int j)
{
return m->data[i*m->width+j];
}
/* definition of the rest of the functions */
The rest of the world doesn't know what does the struct Matrix_ contain and it doesn't know the size of it. This means users can't declare the values directly, but only by using pointer to Matrix and the create_matrix function. However, the fact that the user doesn't know the size means the user doesn't depend on it - which means we can remove or add members to struct Matrix_ at will.
Most of the time, the use of 3 levels of indirection is a symptom of bad design decisions made elsewhere in the program. Therefore it is regarded as bad practice and there are jokes about "three star programmers" where, unlike the the rating for restaurants, more stars means worse quality.
The need for 3 levels of indirection often originates from the confusion about how to properly allocate multi-dimensional arrays dynamically. This is often taught incorrectly even in programming books, partially because doing it correctly was burdensome before the C99 standard. My Q&A post Correctly allocating multi-dimensional arrays addresses that very issue and also illustrates how multiple levels of indirection will make the code increasingly hard to read and maintain.
Though as that post explains, there are some situations where a type** might make sense. A variable table of strings with variable length is such an example. And when that need for type** arises, you might soon be tempted to use type***, because you need to return your type** through a function parameter.
Most often this need arises in a situation where you are designing some manner of complex ADT. For example, lets say that we are coding a hash table, where each index is a 'chained' linked list, and each node in the linked list an array. The proper solution then is to re-design the program to use structs instead of multiple levels of indirection. The hash table, linked list and array should be distinct types, autonomous types without any awareness of each other.
So by using proper design, we will avoid the multiple stars automatically.
But as with every rule of good programming practice, there are always exceptions. It is perfectly possible to have a situation like:
Must implement an array of strings.
The number of strings is variable and may change in run-time.
The length of the strings is variable.
You can implement the above as an ADT, but there may also be valid reasons to keep things simple and just use a char* [n]. You then have two options to allocate this dynamically:
char* (*arr_ptr)[n] = malloc( sizeof(char*[n]) );
or
char** ptr_ptr = malloc( sizeof(char*[n]) );
The former is more formally correct, but also cumbersome. Because it has to be used as (*arr_ptr)[i] = "string";, while the alternative can be used as ptr_ptr[i] = "string";.
Now suppose we have to place the malloc call inside a function and the return type is reserved for an error code, as is custom with C APIs. The two alternatives will then look like this:
err_t alloc_arr_ptr (size_t n, char* (**arr)[n])
{
*arr = malloc( sizeof(char*[n]) );
return *arr == NULL ? ERR_ALLOC : OK;
}
or
err_t alloc_ptr_ptr (size_t n, char*** arr)
{
*arr = malloc( sizeof(char*[n]) );
return *arr == NULL ? ERR_ALLOC : OK;
}
It is quite hard to argue and say that the former is more readable, and it also comes with the cumbersome access needed by the caller. The three star alternative is actually more elegant, in this very specific case.
So it does us no good to dismiss 3 levels of indirection dogmatically. But the choice to use them must be well-informed, with an awareness that they may create ugly code and that there are other alternatives.
So what are some of the conventions out there? Is this really just a question of style/readability combined with the fact that many people have a hard time wrapping their heads around pointers?
Multiple indirection is not bad style, nor black magic, and if you're dealing with high-dimension data then you're going to be dealing with high levels of indirection; if you're really dealing with a pointer to a pointer to a pointer to T, then don't be afraid to write T ***p;. Don't hide pointers behind typedefs unless whoever is using the type doesn't have to worry about its "pointer-ness". For example, if you're providing the type as a "handle" that gets passed around in an API, such as:
typedef ... *Handle;
Handle h = NewHandle();
DoSomethingWith( h, some_data );
DoSomethingElseWith( h, more_data );
ReleaseHandle( h );
then sure, typedef away. But if h is ever meant to be dereferenced, such as
printf( "Handle value is %d\n", *h );
then don't typedef it. If your user has to know that h is a pointer to int1 in order to use it properly, then that information should not be hidden behind a typedef.
I will say that in my experience I haven't had to deal with higher levels of indirection; triple indirection has been the highest, and I haven't had to use it more than a couple of times. If you regularly find yourself dealing with >3-dimensional data, then you'll see high levels of indirection, but if you understand how pointer expressions and indirection work it shouldn't be an issue.
1. Or a pointer to pointer to int, or pointer to pointer to pointer to pointer to struct grdlphmp, or whatever.
After two levels of indirection, comprehension becomes difficult. Moreover if the reason you're passing these triple (or more) pointers into your methods is so that they can re-allocate and re-set some pointed-to memory, that gets away from the concept of methods as "functions" that just return values and don't affect state. This also negatively affects comprehension and maintainability beyond some point.
But more fundamentally, you've hit upon one of the main stylistic objections to the triple pointer right here:
One can clearly see how a triple pointer (and beyond, really) would be required.
It's the "and beyond" that is the issue here: once you get to three levels, where do you stop? Surely it's possible to have an aribitrary number of levels of indirection. But it's better to just have a customary limit someplace where comprehensibility is still good but flexibility is adequate. Two's a good number. "Three star programming", as it's sometimes called, is controversial at best; it's either brilliant, or a headache for those who need to maintain the code later.
Unfortunately you misunderstood the concept of pointer and arrays in C. Remember that arrays are not pointers.
Starting from the basics, the single pointer has two purposes: to create an array, and to allow a function to change its contents (pass by reference):
When you declare a pointer, then you need to initialize it before using it in the program. It can be done either by passing address of a variable to it or by dynamic memory allocation.
In latter, pointer can be used as indexed arrays (but it is not an array).
The double pointer can be a 2D array (or array of arrays, since each "column" or "row" need not be the same length). I personally like to use it when I need to pass a 1D array:
Again wrong. Arrays are not pointers and vice-versa. A pointer to pointer is not the 2D array.
I would suggest you to read the c-faq section 6. Arrays and Pointers.

Malloc or normal array definition?

When shall i use malloc instead of normal array definition in C?
I can't understand the difference between:
int a[3]={1,2,3}
int array[sizeof(a)/sizeof(int)]
and:
array=(int *)malloc(sizeof(int)*sizeof(a));
In general, use malloc() when:
the array is too large to be placed on the stack
the lifetime of the array must outlive the scope where it is created
Otherwise, use a stack allocated array.
int a[3]={1,2,3}
int array[sizeof(a)/sizeof(int)]
If used as local variables, both a and array would be allocated on the stack. Stack allocation has its pros and cons:
pro: it is very fast - it only takes one register subtraction operation to create stack space and one register addition operation to reclaim it back
con: stack size is usually limited (and also fixed at link time on Windows)
In both cases the number of elements in each arrays is a compile-time constant: 3 is obviously a constant while sizeof(a)/sizeof(int) can be computed at compile time since both the size of a and the size of int are known at the time when array is declared.
When the number of elements is known only at run-time or when the size of the array is too large to safely fit into the stack space, then heap allocation is used:
array=(int *)malloc(sizeof(int)*sizeof(a));
As already pointed out, this should be malloc(sizeof(a)) since the size of a is already the number of bytes it takes and not the number of elements and thus additional multiplication by sizeof(int) is not necessary.
Heap allocaiton and deallocation is relatively expensive operation (compared to stack allocation) and this should be carefully weighted against the benefits it provides, e.g. in code that gets called multitude of times in tight loops.
Modern C compilers support the C99 version of the C standard that introduces the so-called variable-length arrays (or VLAs) which resemble similar features available in other languages. VLA's size is specified at run-time, like in this case:
void func(int n)
{
int array[n];
...
}
array is still allocated on the stack as if memory for the array has been allocated by a call to alloca(3).
You definately have to use malloc() if you don't want your array to have a fixed size. Depending on what you are trying to do, you might not know in advance how much memory you are going to need for a given task or you might need to dynamically resize your array at runtime, for example you might enlarge it if there is more data coming in. The latter can be done using realloc() without data loss.
Instead of initializing an array as in your original post you should just initialize a pointer to integer like.
int* array; // this variable will just contain the addresse of an integer sized block in memory
int length = 5; // how long do you want your array to be;
array = malloc(sizeof(int) * length); // this allocates the memory needed for your array and sets the pointer created above to first block of that region;
int newLength = 10;
array = realloc(array, sizeof(int) * newLength); // increase the size of the array while leaving its contents intact;
Your code is very strange.
The answer to the question in the title is probably something like "use automatically allocated arrays when you need quite small amounts of data that is short-lived, heap allocations using malloc() for anything else". But it's hard to pin down an exact answer, it depends a lot on the situation.
Not sure why you are showing first an array, then another array that tries to compute its length from the first one, and finally a malloc() call which tries do to the same.
Normally you have an idea of the number of desired elements, rather than an existing array whose size you want to mimic.
The second line is better as:
int array[sizeof a / sizeof *a];
No need to repeat a dependency on the type of a, the above will define array as an array of int with the same number of elements as the array a. Note that this only works if a is indeed an array.
Also, the third line should probably be:
array = malloc(sizeof a);
No need to get too clever (especially since you got it wrong) about the sizeof argument, and no need to cast malloc()'s return value.

Why is a pointer used to access shared memory?

I have seen a lot of parallel programming code like finding the maximum of array, matrix multiplication, etc. use pointers. I don't understand why it is used. Example:(shseg+(offset*sizeof(float))) = sum;
The code for matrix multiplication:
shseg = shmat(handle,NULL,0);
for(row=SIZE/2;row<SIZE;row++){
for(column=0;column<SIZE;column++){
sum = 0;
for(tindex=0;tindex<SIZE;tindex++){
sum+=a[row][tindex]*b[tindex][column];
}
*(shseg+(offset*sizeof(float))) = sum;
offset++;
}
}
Can anyone explain why a pointer is used?
This is because the example you show uses shared memory API, which provides you a flat chunk of memory, not an array of, say, floats. Therefore, you need to do all your pointer manipulations manually.
You could also cast your shared pointer to float* and use an index, like this:
shseg = shmat(handle,NULL,0);
float *fshseg = (float*)shseg;
...
fshseg[index++] = sum;
Well, you have an allocated space of memory which is been shared with your program, you will be going all the way through the memory, if you didn't used a pointer you would not be able the get the memory address value, thats why you need to use it.

How to locally allocate array-pointer in C?

Think of a pointer-datatype, for instance to a floating-pointer number.
typedef float* flPtrt;
How would I allocate an array of 3 elements in the local scope? I guess using malloc and free withing the same scope produces overhead, but what's the alternative?
void foo() {
flPtrt ptr = malloc(sizeof(float)*3);
// ...
free(ptr);
}
If 3 is known at compile time and is small enough, you can declare a local array and use it as a pointer
void foo() {
float array[3];
flPtrt ptr = array;
}
If the size is bigger or variable, you have to use dynamic memory as in your example.
I think what your'e looking for is the alloca() function.
I'm not sure it's standard C, but it exists in GNU, and it worked on my visual studio.
So this is how you use it:
int n = 5;
int* a = (int*) alloca(sizeof(int) * n);
It creates an array of elements on the stack (rather than on the heap with malloc).
Advantages: less overhead, no need to free manually (when you return from your method, the stack folds back and the memory is lost)
Disadvantage: If you want to return a pointer from a method NEVER use alloca, since you will be pointing at something that no longer exists after exiting the function. One can also argue that the stack is usually smaller than the heap, so if you want larger space use malloc.
See more here
If you know the required size of the array ahead of time, you could just allocate it as a stack variable and avoid heap memory management.
Otherwise, the approach you outlined is appropriate and there is not really an alternative.
Use an array.
void foo(void) // note that "void foo()" is obsolete
{
float data[3];
float *ptr = data;
// ...
}

Resources