Why does free work like this? - c

Given the following code:
typedef struct Tokens {
char **data;
size_t count;
} Tokens;
void freeTokens(Tokens *tokens) {
int d;
for(d = 0;d < tokens->count;d++)
free(tokens->data[d]);
free(tokens->data);
free(tokens);
tokens = NULL;
}
Why do I need that extra:
free(tokens->data);
Shouldn't that be handled in the for loop?
I've tested both against valgrind/drmemory and indeed the top loop correctly deallocates all dynamic memory, however if I remove the identified line I leak memory.
Howcome?

Let's look at a diagram of the memory you're using in the program:
+---------+ +---------+---------+---------+-----+
| data | --> | char * | char * | char * | ... |
+---------+ +---------+---------+---------+-----+
| count | | | |
+---------+ v v v
+---+ +---+ +---+
| a | | b | | c |
+---+ +---+ +---+
|...| |...| |...|
+---+ +---+ +---+
In C, we can dynamically allocate space for a group (more simply, an array) of elements. However, we can't use an array type to reference that dynamic allocation, and instead use a pointer type. In this case, the pointer just points to the first element of the dynamically allocated array. If you add 1 to the pointer, you'll get a pointer to the second element of the dynamically allocated array, add two to get a pointer to the second element, and so on.
In C, the bracket syntax (data[1]) is shorthand for addition and dereferencing to a pointer. So pointers in C can be used like arrays in this way.
In the diagram, data pointing to the first char * in the dynamically allocated array, which is elsewhere in memory.
Each member of the array pointed to by data is a string, itself dynamically allocated (since the elements are char *s).
So, the loop deallocates the strings ('a...', 'b...', 'c...', etc), free(tokens->data) deallocates the array data points to, and finally, free(tokens) frees the entire struct.

data is a pointer to a pointer. This means data points to a dynamically allocated array of pointers, which then each point to the actual data. The first for loops frees each of the pointers IN the array, but you still need to free the original pointer TO that array of the other points which you freed already. That's the reason for the line you pointed out.

As a general rule of thumb, every malloc() should have a corresponding call to free(). If you look at the code which allocates the memory in this program, you will very likely see a very strict correspondence with the code you posted here that frees the memory.

Related

Is a null check required for array initialization in C? Can array initialization in C fail?

I know that in C it is good practice to check for NULL pointer every time malloc() or calloc() is called. Do I have to do the same for array initialization? For example:
int sigcheck[5];
if (sigcheck == NULL) {return;}
Is line 2 necessary? If I'm not wrong, array initialization works like calling calloc() under the hood, does this under-the-hood functioning take the possibility of NULL into account or is it necessary/good practice for us to do it ourselves.
it is good practice to check for NULL pointer every time malloc() or calloc() is called
Yes, because those are functions documented to return NULL upon failure. That's the sole reason why.
do I have to do the same for array initialization
No.
Is line 2 necessary?
No, it doesn't make any sense. The array can't have address null, it always holds a memory position somewhere - where it ends up depends on something that C calls storage duration. In this case it either has static storage duration or automatic storage duration, depending on if it was allocated outside a function or inside a function.
If I'm not wrong, array initialization works like calling calloc() under the hood
You are wrong. calloc is a special case used for explicit memory allocation by the programmer, so called allocated storage duration. malloc family of functions is never called implicitly or silently. (Unless you call a function which claims to call malloc in turn, such as strdup.)
You might find this question interesting: A program uses different regions of memory for static objects, automatic objects, and dynamically allocated objects
Memory for arrays is allocated like memory for any other variable type - there's no separate calloc or similar call under the hood. After the line
int sigcheck[5];
what you wind up with in memory is
+---+
sigcheck: | | sigcheck[0]
+---+
| | sigcheck[1]
+---+
...
+---+
| | sigcheck[4]
+---+
So there's no need to perform a NULL check against sigcheck in this case.
Where people get confused is that under most conditions the expression sigcheck will be converted, or "decay", from type "5-element array of int" to type "pointer to int", and the value of the expression will be the address of the first element of the array. This concept often gets garbled to where people think sigcheck is a pointer object separate from the array itself, but it isn't.
When you allocate memory dynamically through malloc or calloc, such as
int *sigcheck = calloc( 5, sizeof *sigcheck );
then (assuming the request succeeds) what you wind up with in memory is
+---+
sigcheck: | | ---+
+---+ |
+------+
|
V
+---+
| | sigcheck[0]
+---+
| | sigcheck[1]
+---+
...
+---+
| | sigcheck[4]
+---+
In this case the sigcheck is a separate object from the array elements. And because malloc, calloc, and realloc will return NULL if the memory request cannot be satisfied, then you do need to make a NULL check on sigcheck:
int *sigcheck = calloc( 5, sizeof *sigcheck );
if ( sigcheck )
{
// do stuff
}
else
{
// memory allocation failed, handle as appropriate
}
From the documentation of malloc:
If the function failed to allocate the requested block of memory, a null pointer is returned.
So you should check whether malloc returns NULL precisely because it might fail to allocate you the requested chunk (although this is usually unlikely).
Static allocation does not call calloc under the hood, since that would allocate your array on the heap and not on the stack. The space needed for static allocations is determined at compile time, and if sufficient amount of memory cannot be allocated your program will fail to load. Take a look at this question.

Allocating a pointer with calloc, and then dynamically allocate each cell with malloc = memory leakage?

In a recent exam question I got this code with following options:
char **mptr, *pt1;
int i;
mptr = calloc(10, sizeof(char*));
for (i=0; i<10; i++)
{
mptr[i] = ( char *)malloc(10);
}
Which of the following de-allocation strategies creates a memory leakage?
A. free(mptr);
B. for(i = 0; i < 10; i++): { free(mptr[i]); }
C. All of them
The answer is C. But it seems to me that applying free(mptr); would suffice covering the memory leak, and B as well, although I'm less sure of that, can someone explain me why all of them would cause a memory leak?
I'm guessing the C options expects that each operation ( A or B ) are applied separatly.
P.S.
I don't see the point of this code, if you already have allocated memory with calloc (and initialized it) why would you go as far as to allocate each cell with a cycle? Am I wrong to believe that?
Which of the following de-allocation strategies creates a memory leakage?
In my pedantic opinion the correct answer would have to be option A, it creates a memory leak because it deallocates mptr, making mptr[i] pointers inaccessible. They cannot be deallocated afterwards, assuming that the memory is completely inaccessible by other means.
Option B does not lead to memory leak per se, mptr is still accessible after you free mptr[i] pointers. You can reuse it or deallocate it later. Memory leak would only occur if and when you loose access to the memory pointed by mptr.
I believe the question is somewhat ill-formed, if the question was "Which option would you use to correctly deallocate all the memory?", then yes, option C would be correct.
I do agree that the correct strategy to deallocate all the memory is B + A, albeit A first will cause immediate memory leak whereas B first will allow for later deallocation of mptr, as long as the access to the memory pointed by it is not lost.
I don't see the point of this code, if you already have allocated memory with calloc (and initialized it) why would you go as far as to allocate each cell with a cycle? Am I wrong to believe that?
The allocation is correct.
//pointer to pointer to char, has no access to any memory
char **mptr;
//allocates memory for 10 pointers to char
mptr = calloc(10, sizeof(char*));
//allocates memory for each of the 10 mptr[i] pointers to point to
for (i = 0; i < 10; i++)
{
mptr[i] = malloc(10); //no cast needed, #include <stdlib.h>
}
Check this thread for more info.
Let's draw this out:
char ** char * char
+---+ +---+ +---+---+ +---+
mptr: | | -------->| | mptr[0] -------->| | | ... | |
+---+ +---+ +---+---+ +---+
| | mptr[1] ------+
+---+ | +---+---+ +---+
... +->| | | ... | |
+---+ +---+---+ +---+
| | mptr[9] ----+
+---+ | +---+---+ +---+
+--->| | | ... | |
+---+---+ +---+
If all you do is free the memory pointed to by mptr, you wind up with this:
char ** char
+---+ +---+---+ +---+
mptr: | | | | | ... | |
+---+ +---+---+ +---+
+---+---+ +---+
| | | ... | |
+---+---+ +---+
+---+---+ +---+
| | | ... | |
+---+---+ +---+
The allocations for each mptr[i] are not freed. Those are all separate allocations, and each must be freed independently before you free mptr. free does not examine the contents of the memory it's deallocating to determine if there are any nested allocations that also need to be freed. The proper procedure would be to write
for ( int i = 0; i < 10; i++ )
free( mptr[i] );
free( mptr );
If all you do is free each mptr[i] but not mptr, you wind up with this:
char ** char *
+---+ +---+
mptr: | | -------->| | mptr[0]
+---+ +---+
| | mptr[1]
+---+
...
+---+
| | mptr[9]
+---+
You still have the array of pointers you allocated initially. Now, this isn't a memory leak yet - it only becomes one when you lose track of mptr.
So, these are the rules for memory management in C:
Every malloc, calloc, or realloc call must eventually have a corresponding free;
When doing nested allocations, always deallocate in reverse order that you allocated (i.e., deallocate each ptr[i] before deallocating ptr);
The argument to free must be a pointer returned from a malloc, calloc, or realloc call.
P.S. I don't see the point of this code, if you already have allocated memory with calloc (and initialized it) why would you go as far as to allocate each cell with a cycle? Am I wrong to believe that?
This is an example of a "jagged" array, where each "row" can be a different length (which you can't do with a regular 2D array). This can be handy if you're storing (for example) a list of words of all different lengths:
char ** char * char
+---+ +---+ +---+---+---+---+
| | -------->| |-------->|'f'|'o'|'o'| 0 |
+---+ +---+ +---+---+---+---+
| | -----+
+---+ | +---+---+---+---+---+---+---+
| | ---+ +->|'b'|'l'|'u'|'r'|'g'|'a'| 0 |
+---+ | +---+---+---+---+---+---+---+
... |
| +---+---+---+---+---+---+
+--->|'h'|'e'|'l'|'l'|'o'| 0 |
+---+---+---+---+---+---+
If necessary, you can easily resize each "row" without affecting any of the others, and you can easily add more "rows". This looks like a 2D array when you index it - you can access individual elements using mptr[i][j] like any other 2D array - but the "rows" are not contiguous in memory.
Compare this with a "real" 2D array, where all the rows are the same size and laid out contiguously:
+---+---+---+---+---+---+---+
|'f'|'o'|'o'| 0 | ? | ? | ? |
+---+---+---+---+---+---+---+
|'b'|'l'|'u'|'r'|'g'|'a'| 0 |
+---+---+---+---+---+---+---+
|'h'|'e'|'l'|'l'|'o'| 0 | ? |
+---+---+---+---+---+---+---+
The main disadvantage is some wasted space. Your array has to be sized for the longest word you want to store. If you have a table of 100 strings, one of which is 100 characters long and the rest 10, then you have a lot of wasted space. You can't have one row that's longer than the others.
The advantage is that the rows are contiguous, so it's easier to "walk" down the array.
Note that you can allocate a regular 2D array dynamically as well:
char (*ptr)[10] = calloc( 10, sizeof *ptr );
This allocates enough space for a 10x10 array of char in a single allocation, which you can index into like any other 2D array:
strcpy( ptr[0], "foo" );
ptr[0][0] = 'F';
Since this is a single allocation, you only need a single deallocation:
free( ptr );
free(mptr); just frees the memory allocated for the pointers, but not the memory the pointers point to.
If you free() the memory for the pointers before freeing the memory the pointer to point to, you got no reference anymore to the memory to be pointed to and hence you got a memory leak.
for(i = 0; i < 10; i++): { free(mptr[i]); }, on the other side, frees only the memory pointed to but not the pointers. Dependent upon how strictly you see it, you could also consider this as memory leak because the memory for the pointers is not deallocated.
So, dependent upon the point of view and one's own opinion, either A. or C. is correct.
I don't see the point of this code, if you already have allocated memory with calloc (and initialized it) why would you go as far as to allocate each cell with a cycle?
With
mptr = calloc(10, sizeof(char*));
you allocated memory for the pointers itself to which the pointer to pointer mptr is pointing to.
But pointers need to point to data memory which you can access by using the pointers. The pointers itself can't store any other data than the address of the memory to point to.
Thus, you allocate memory to point to for each pointer by each iteration inside of the for loop.
A pointer always needs a place to point to in order to use it correctly as pointer (Exception: null pointers).

Reinitializing Pointers for C Language

I'm currently learning C Programming through Dan Gookin's book Beginning C Programming for Dummies.
One of the topic I'm currently reading is on the fact that arrays are in fact pointers. Dan attempted to prove that with the following code:
#include <stdio.h>
int main()
{
int numbers[10];
int x;
int *pn;
pn = numbers; /* initialize pointer */
/* Fill array */
for(x=0;x<10;x++)
{
*pn=x+1;
pn++;
}
pn = numbers;
/* Display array */
for(x=0;x<10;x++)
{
printf("numbers[%d] = %d, address %p\n",
x+1,*pn,pn);
pn++;
}
return(0);
}
My question is really with line 17. I realized that if I do not reintialize the pointer again as in line 17, the peek values of pointer pn being displayed at the second for loop sequence are a bunch of garbage that do not make sense. Therefore, I would like to know why is there a need to reintialize the pointer pn again for the code to work as intended?
An array is not a pointer, but C allows you to assign the array to a pointer of the type of the variable of the array, with the effect that that pointer will point to the first item in the array. That's what pn = numbers does.
pn is a pointer to an int, not to an array. It points to a single integer. When you increment the pointer, it just shifts to the next memory location. The shift it makes is the size of the type of the pointer, so int in this case.
So what does this prove? Not that an array is a pointer, but only that an array is a continuous block of memory that consists of N times the size of the type of your array item.
When you run the second loop, your pointer arrives at a piece of memory that doesn't belong to the array anymore, and so you get 'garbage' which is just the information which happens to exist at that location.
If you want to iterate over the array again by incrementing a pointer, you will have to reinitialize that pointer to the first item. The for loop does only do one thing, which is counting to 10. It doesn't know about the array and it doesn't know about the pointer, so the loop isn't going to automatically reset the pointer for you.
Since pn is incremented in the first loop, after the first loop is finished, pn will point to an address beyond the numbers array. Therefore, you must initialize pn to the beginning of the array before the second loop since you use the same pointer for printing the contents.
Because you have changed the address contained in pn in the statement pn++ in the following code snippet.
for(x=0;x<10;x++)
{
*pn=x+1;
pn++;
}
The pn pointer is being used to point into the numbers array.
The first for-loop uses pn to set the values, stepping pn throught the data element by element. After the end of the loop, pn points off the end of numbers (at a non-allocated 11th element).
For the second for-loop to work, i.e. to use pn to loop through numbers again by stepping through the array, pn needs to be moved to the front of the numbers array, otherwise you'll access memory that you shouldn't be looking at (non-allocated memory).
First arrays are not pointers. They decay to pointers when used in function calls and can be used (almost) the same.
Some subtle differences
int a[5]; /* array */
int *pa = a; /* pointer */
pa[0] = 5;
printf("%d\n", a[0]); /* ok it is the same here */
printf("address of array %p - address of pointer %p, value of pointer\n",
&a, &pa, pa); /* &a is the same as pa not &pa */
printf("size of array %d - size of pointer %d\n", sizeof(a), sizeof(pa));
sizeof(a) is here 5 * sizeof(int) whereas sizeof(pa) is the size of a pointer.
Now for your question:
After first loop, pn points to p[10] and no longer to p[0]. That's the reason why you must reset it.
Just to drive the point home, arrays are not pointers. When you declare numbers as int numbers[10], you get the following in memory:
+---+
numbers: | | numbers[0]
+---+
| | numbers[1]
+---+
...
+---+
| | numbers[9]
+---+
There's no storage set aside for a separate pointer to the first element of numbers. What happens is that when the expression numbers appears anywhere, and it isn't the operand of the sizeof or unary & operators, it is converted ("decays") to an expression of type "pointer to int", and the value of the expression is the address of the first element of the array.
What you're doing with pn is setting it to point to the first element of numbers, and then "walking" through the array:
+---+
numbers: | | <------+
+---+ |
| | |
+---+ |
... |
+---+ |
| | |
+---+ |
... |
|
+---+ |
pn: | | -------+
+---+
The expression pn++ advances pn to point to the next integer object, which in this case is the next element of the array:
+---+
numbers: | |
+---+
| | <------+
+---+ |
... |
+---+ |
| | |
+---+ |
... |
|
+---+ |
pn: | | -------+
+---+
Each pn++ advances the pointer until, at the end of the first loop, you have the following:
+---+
numbers: | |
+---+
| |
+---+
...
+---+
| |
+---+
... <------+
|
+---+ |
pn: | | -------+
+---+
At this point, pn is pointing to the object immediately following the end of the array. This is why you have to reset pn before the next loop; otherwise you're walking through the memory immediately following numbers, which can contain pretty much anything, including trap representations (i.e., bit patterns that don't correspond to a legal value for the given type).
Trying to access memory more than one past the end of an array invokes undefined behavior, which can mean anything from your code crashing outright to displaying garbage to working as expected.
During the fill array, the pointer pn is incremented and the data is placed on array. Same pointer variable used to print the array content. Since this reinitialise is done.

Why both the pointers show same memory address

.....
// Some code
char *options[] = {"\nDATA:","\nSUBJECT:","\nMAILFROM:","\nRCPTO:"};
char *data[3] = {};
I am initializing this array of pointers.
But when I try to access each member of array of pointers, I can see that
options[0] = data[3]
0x40873b = 0x40873b
they both point to same memory location.
Even I have declared 'options' array before the 'data' array.
So How to resolve this.
How can be sure that they are at different memory location and store the contents properly.without overlapping , different data at 2 different location.
When you write
char *data[3] = {};
the [3] means "allocate space for three elements of the array".
It does not mean that you have just created a pointer named
data[3] (in fact data[3] is not a pointer)
nor that data[3] is part of the memory that was
just allocated; rather, the three elements of the memory allocated for
the array are data[0], data[1], and data[2],
which are at the memory locations data (0x408738),
data + 1 (0x408739), and data + 2 (0x40873a).
If you write
data[3] == options[0]
then data[3] means whatever is at the memory location data + 3,
which is the first thing after the last allocated element of data.
The compiler happens to have started the memory allocation
for options there, that is, location 0x40873b is where the
first contents of options are to be found.
It looks like the compiler laid your objects out as follows:
+---+
data: | | data[0]
+---+
| | data[1]
+---+
| | data[2] <-- last element of data array
+---+
options: | | options[0], data[3]
+---+
| | options[1], data[4]
+---+
| | options[2], data[5]
+---+
| | options[3], data[6] <-- last element of options array
+---+
Your data array contains 3 elements, indexed from 0 to 2. When you access data[3], you're accessing an object one past the end of the data array, and it just so happens to be the first object of the options array.
Note that attempting to read an object one past the end of an array invokes undefined behavior; C doesn't do any bounds checking on array accesses, so doing this won't raise an OutOfBounds exception or anything like that. In this particular case, you got a reasonable-looking value because the object following the last element of the data array has the same type as that element (char *). You could theoretically iterate through the entire options array using data (as shown above), although that will only "work" in this specific case; if you add another variable or change the code, the compiler could change the order in which things are laid out in memory, and this would suddenly not "work" anymore.

How is memory allocated for an implicitly defined multidimensional array in C99?

I'm trying to write a C99 program and I have an array of strings implicitly defined as such:
char *stuff[] = {"hello","pie","deadbeef"};
Since the array dimensions are not defined, how much memory is allocated for each string? Are all strings allocated the same amount of elements as the largest string in the definition? For example, would this following code be equivalent to the implicit definition above:
char stuff[3][9];
strcpy(stuff[0], "hello");
strcpy(stuff[1], "pie");
strcpy(stuff[2], "deadbeef");
Or is each string allocated just the amount of memory it needs at the time of definition (i.e. stuff[0] holds an array of 6 elements, stuff[1] holds an array of 4 elements, and stuff[2] holds an array of 9 elements)?
Pictures can help — ASCII Art is fun (but laborious).
char *stuff[] = {"hello","pie","deadbeef"};
+----------+ +---------+
| stuff[0] |--------->| hello\0 |
+----------+ +---------+ +-------+
| stuff[1] |-------------------------->| pie\0 |
+----------+ +------------+ +-------+
| stuff[2] |--------->| deadbeef\0 |
+----------+ +------------+
The memory allocated for the 1D array of pointers is contiguous, but there is no guarantee that the pointers held in the array point to contiguous sections of memory (which is why the pointer lines are different lengths).
char stuff[3][9];
strcpy(stuff[0], "hello");
strcpy(stuff[1], "pie");
strcpy(stuff[2], "deadbeef");
+---+---+---+---+---+---+---+---+---+
| h | e | l | l | o | \0| x | x | x |
+---+---+---+---+---+---+---+---+---+
| p | i | e | \0| x | x | x | x | x |
+---+---+---+---+---+---+---+---+---+
| d | e | a | d | b | e | e | f | \0|
+---+---+---+---+---+---+---+---+---+
The memory allocated for the 2D array is contiguous. The x's denote uninitialized bytes. Note that stuff[0] is a pointer to the 'h' of 'hello', stuff[1] is a pointer to the 'p' of 'pie', and stuff[2] is a pointer to the first 'd' of 'deadbeef' (and stuff[3] is a non-dereferenceable pointer to the byte beyond the null byte after 'deadbeef').
The pictures are quite, quite different.
Note that you could have written either of these:
char stuff[3][9] = { "hello", "pie", "deadbeef" };
char stuff[][9] = { "hello", "pie", "deadbeef" };
and you would have the same memory layout as shown in the 2D array diagram (except that the x's would be zeroed).
char *stuff[] = {"hello","pie","deadbeef"};
Is not a multidimensional array! It is simply an array of pointers.
how much memory is allocated for each string?
The number of characters plus a null terminator. Same as any string literal.
I think you want this:
char foo[][10] = {"hello","pie","deadbeef"};
Here, 10 is the amount of space per string and all the strings are in contiguous memory. Thus, there will be padding for strings less than size 10.
In the first example, it is a jagged array I suppose.
It declares an array of const pointers to a char. So the string literal can be as long as you like. The length of the string is independent of the array columns.
In the second one.. the number of characters per row (string) lengths must be 9 as specified by your column size, or less.
Are all strings allocated the same amount of elements as the largest
string in the definition?
No, only 3 pointer are allocated and they point to 3 string literals.
char *stuff[] = {"hello","pie","deadbeef"};
and
char stuff[3][9];
are not at all equivalent. First is an array of 3 pointers whereas the second is a 2D array.
For the first only pointer are allocated and the string literals they point to may be stored in the read-only section. The second is allocated on automatic storage (usually stack).

Resources