realloc-ing a struct with flexible array - c

I ran into a rather weird problem,
I have the following code:
typedef struct{
char *a;
char *b;
char *c;
}Str;
typedef struct{
int size;
str array[]; //flexible array.
}strArr;
The purpose here is to allocate a,b, and c for the new element from the realloc.
StrArr *arr;
int arrSize;
arrSize = 1;
arr = malloc(sizeof(strArr)+sizeof(int)*arrSize);
arr->size++;
arr = realloc(arr, sizeof(strArr)+sizeof(int)*arr->size);
arr->array[arr->size-1].a = malloc(sizeof(char)*75);
arr->size++;
card = realloc(arr, sizeof(strArr)+sizeof(int)*arr->size);
The question is: whenever arr is realloc'd to be one bigger, do you have to allocate memory for the strings of the new element? This code will fail if it is run because it gives me glibc detected at the second realloc. What am I doing wrong? If i take off the malloc statement in the middle it runs. Also, if i try a strcpy into arr->array[arr->size-1].a, it would segfault.
Any help would be appreciated.
Thank you.

There are numerous issues with this code, enough to suggest that whatever you're experiencing can't be reproduced. Nonetheless, there are sufficient problems to cause instability (i.e. segmentation violations). I'm going to assume you meant to use a lowercase s in str rather than an uppercase S in Str; it only makes sense that way. Similarly for the lowercase s (which should be) in strArray.
At which point have you assigned arr->size a value in order for arr->size++; to be useful? That itself is a mistake, but that's interlaced into another mistake:
arr = realloc(arr, sizeof(strArr)+sizeof(int)*arr->size);
That turns out to be a major issue as you continue to use the uninitialised variable in critical pieces of logic, again and again, nonetheless, once that issue is resolved, the next mistake here is:
Anything that resembles the pattern X = realloc(X, Y); is suspicious. It's the Xes. Those should be different. You're not supposed to just replace the values like that. I mean, it'll work, kind of... but it's not much more effort to do it properly, and unless done properly, this won't be valgrind-friendly. That should be a big deal to you, because valgrind is a tool that helps us identify memory leaks!
You should store this into a temporary variable:
void *temp = realloc(X, Y);
... and then you can handle errors, perhaps by cleaning up and exiting properly:
if (temp == NULL) {
perror("realloc");
/* free(X); // what would valgrind cease complaining about? */
exit(EXIT_FAILURE);
}
... and replacing X with temp:
X = temp;
sizeof(int) should not be assumed to be the same size as sizeof str (whatever str is). Given the type of arr->array, I would expect sizeof str or, better yet, here's a nice pattern to keep in mind:
// X = realloc(Y, Z); or ...
void *temp = realloc(arr, sizeof *arr + arr->size * sizeof arr->array[0]);
// XXX: handle errors
The question is: whenever arr is realloc'd to be one bigger, do you have to allocate memory for the strings of the new element?
The strings themselves should be in a separate storage location to the list nodes. What is this? Strings and list nodes, in the same array?!
I suppose it might make sense if by strings you mean fixed-width, null padded fields. Fixing the width of the field makes expressing the array in a one-dimensional space much easier.
Otherwise, you should keep your strings allocated separately from your list nodes... in a manner with which the down-stream programmer has complete control over, if I may add, is kinda nice, though you lose that the moment you use realloc, malloc, etc (and thus the moment you use VLAs, hmmmm!)...
What am I doing wrong?
I think I've picked apart your code sufficing to say:
Initialise all of your variables before you use them. In this case, there are some variables pointed at by arr which are used without first being initialised.
Don't assume sizeof(int) and sizeof (/*any pointer type*/) have the same width. There are very real systems where this won't be true.
Remember to use that X = realloc(Y, Z); pattern, followed by error handling, followed by Y = X;.
I'm still not sure whether forcing down-stream programmers to rely upon malloc/realloc/etc and free is necessary, or even beneficial, here.
Also, if i try a strcpy into arr->array[arr->size-1].a, it would segfault.
Yes, well... there's that phantom arr->size-related issue again!

Related

Dynamically allocating and copying an array

I sometimes see code like this:
char* copyStr(char* input) {
int inputLength;
char *answer;
inputLength = strlen(input);
answer = malloc(inputLength + 1);
answer = input;
return answer;
}
People often say this code doesn't work and that this pattern
answer = malloc(inputLength + 1);
answer = input;
makes no sense. Why is it so? To my eye, the code is OK. It allocates the right amount of memory for the answer, and then copies the input to the answer. And it seems to work in my tests, for example
int main()
{
printf ("%s\n", copyStr("Hello world!"));
}
does what I expect it to do. So what's wrong with it?
To put it simply. This code:
var = foo();
var = bar();
is 100% equivalent to this in all1 situations:
foo();
var = bar();
Furthermore, if foo() has no side effects, it's 100% equivalent to just the last line:
// foo();
var = bar();
This goes for ANY function, including malloc. If we for a moment forget what malloc does and just focus on what just have been said, we can quickly realize what's written in the comments in this code:
answer = malloc(inputLength + 1);
// Here, the variable answer contains the return value from the call to malloc
answer = input;
// Here, it contains the value of input. The old value is overwritten, and
// is - unless you saved it in another variable - permanently lost.
What malloc does really simple. It returns a pointer to a memory block, or a NULL pointer if the allocation failed.2 That's it. What you are doing with a call like ptr = malloc(size) is absolutely nothing more fancy than storing that address in the pointer variable ptr. And pointer variables are in the same way no more fancy than other variables like int or float. An int stores an integer. A pointer stores a memory address. There's no magic here.
1It's 100% equivalent except you're doing really fancy stuff like reading the variable var with an external program
2malloc(0) can return a non-null pointer, but in practice it does not make a difference since it would be undefined behavior to dereference it, and allocating zero bytes is a pretty pointless (haha, point) operation.
To answer this question, let's look at a somewhat simpler code fragment first.
int answer;
answer = 42;
answer = 0;
Even the most cursory of observers would notice that the first assignment
answer = 42;
is useless. It places the value of 42 into answer, only to be thrown away and replaced with 0 at the very next instant of time. So that line of code can be thrown away completely.
Let's verify this by looking at optimised assembly code generated by a C compiler. As we can see, the line answer = 42; does not indeed have any effect on the resulting machine code.
Now compare this to the code in question
answer = malloc(inputLength + 1);
answer = input;
If reasoning by analogy is valid in this case, then we must conclude that the first assignment is useless and can omitted. We place something (the result of malloc) in answer, only to be thrown away and replaced by something else a moment later.
Of course we cannot say whether it is applicable without further research, but we can confirm our suspicion by looking at the generated assembly again. And it is confirmed. The compiler does not even generate any calls to malloc and strlen! They are indeed useless.
So where does this intuition
It allocates the right amount of memory for the answer, and then copies the input to the answer
break down?
The problem lies in the eternal confusion between pointers and arrays.
One may often see claims that in C, arrays are pointers, or that pointers are arrays, or that arrays and pointers are interchangeable, or any number of variations thereof. These claims are all false and misleading. Pointers and arrays are completely different things. They often work together, but that's far cry from being one and the same. Let's break down pointers and arrays in the code example.
input is a pointer variable
input (presumably) points into a string, which is an array of char
answer is another pointer variable
malloc(...) dynamically allocates a new array of char and returns a pointer that points into said array
answer = malloc(...) copies that pointer to answer, now answer points into the array allocated by malloc
answer = input copies another pointer (that we have already seen above) into answer
now answer and input point into the same string, and the result of malloc is forgotten and thrown away
So this explains why your code is doing what you expect it to do. Instead of having two identical copies of the string "Hello world!" you have just one string and two different pointers into it. Which might seem like that's just what the doctor ordered, but it breaks down as soon as we do something ever so slightly complicated. For example, code like this
char *lineArray[MAX_LINES];
char buffer[BUF_LEN];
int i = 0;
while (i < MAX_LINES && fgets(buffer, BUF_LEN, stdin)) {
lineArray[i++] = copyStr(buffer);
}
will end up with every element of stringArray pointing into the same string, instead of into a bunch of different lines taken from stdin.
OK, so now we have established that answer = input copies a pointer. But we want to copy an array, which we have just allocated space for! How do we do that?
Since our arrays are presumably NUL-terminated character strings, we can use a standard library function designed for copying NUL-terminated character strings.
strcpy(answer, input);
For other arrays we can use memcpy. The main difference is that we have to pass down the array length.
memcpy(answer, input, inputLength + 1);
Both variants will work in our case, but the first one is preferred because it reaffirms that we are dealing with strings. Here's the fixed copyStr for completeness:
char* copyStr(char* input) {
int inputLength;
char *answer;
inputLength = strlen(input);
answer = malloc(inputLength + 1);
strcpy(answer, input);
return answer;
}
Incidentally, it works almost the same as the non-standard but widely available strdup function (strdup has a better signature and working error checks, which we have omitted here).

What happens to the array elements when changing the pointer (C)?

So I'm aware that this question is most likely asked before, but after near an hour of searching I decided to ask all the same. Pointing me to a dublicate question which has already been answered would really be appreciated.
Then, programming in basic C, I'm curious to what happens to the array-elements when changing its pointer to pointing something else? Is it safe, without first freeing it? For instance,
int main()
{
const int size = 3;
int *p_arr = malloc(size * sizeof(int));
for( int i=0; i<size; i++)
p_arr[i] = i;
int arr[size] = {0,0,0};
p_arr = arr; // safe!?
// What happens to the data previously allocated
// and stored in *p_arr? Should one first call,
// free(p_arr)
// and then reallocate ..?
}
Essentially, changing the pointer will leave the data {0,1,2} in memory. Is this okay?
Thanks alot for any help!
Nothing happens to the data, except that it becomes unreachable ("leaked") and thus the memory is forever wasted, it can't be used for anything else until your program terminates (typically).
Don't do this, it's very bad practice to leak memory.
You should free() the memory when you no longer need it.
Also, the allocation can be written:
p_arr = malloc(size * sizeof *p_arr);
to remove the duplication of the int type, and lock the size to the actual variable. This is at least somewhat safer.
The array{0,1,2}is already in the memory,but you can't get it unless you point a pointer to the head address of the array again.

Pointer Copying for Two Dynamically Growing Arrays in C

UPDATE: I think I've answered my own question, except for some possible issues with memory leaks.
ORIGINAL QUESTION HERE, ANSWER BELOW.
Background: I'm doing some numerical computing, but I almost never use languages that require me to manage memory on my own. I'm piping something out to C now and am having trouble with (I think) pointer reference issues.
I have two arrays of doubles that are growing in a while loop, and at each iteration, I want to free the memory for the smaller, older array, set 'old' to point to the newer array, and then set 'new' to point to a larger block of memory.
After looking around a bit, it seemed as though I should be using pointers to pointers, so I've tried this, but am running into "lvalue required as unary ‘&’ operand" errors.
I start with:
double ** oldarray;
oldarray = &malloc(1*sizeof(double));
double ** newarray;
newarray = &malloc(2*sizeof(double));
These initializations give me an "lvalue required as unary ‘&’ operand" error, and I'm not sure whether I should replace it with
*oldarray = (double *) malloc(1*sizeof(double));
When I do that, I can compile a simple program (It just has the lines I have above and returns 0) but I get a seg fault.
The rest of the program is as follows:
while ( <some condition> ) {
// Do a lot of processing, most of which is updating
// the values in newarray using old array and other newarray values.
// Now I'm exiting the loop, and growing and reset ing arrays.
free(*oldarray) // I want to free the memory used by the smaller, older array.
*oldarray = *newarray // Then I want oldarray to point to the larger, newer array.
newarray = &malloc( <previous size + 1>*sizeof(double))
}
So I'd like to be, at each iteration, updating an array of size (n) using itself and an older array of size (n-1). Then I want to free up the memory of the array of size (n-1), set 'oldarray' to point to the array I just created, and then set 'newarray' to point to a new block of size (n+1) doubles.
Do I actually need to be using pointers to pointers? I think my main issue is that, when I set old to new, they share a pointee, and I then don't know how to set new to a new array. I think that using pointers to pointers gets me out of this, but, I'm not sure, and I still have the lvalue errors with pointers to pointers.
I've checked out C dynamically growing array and a few other stack questions, and have been googling pointers, malloc, and copying for about half a day.
Thanks!
HERE IS MY OWN ANSWER
I've now got a working solution. My only worry is that it might contain some memory leaks.
Using realloc() works, and I also need to be careful to make sure I'm only free()ing pointers that I initialized using malloc or realloc, and not pointers initialized with double * oldarray;.
The working version goes like this:
double * olddiagonal = (double *) malloc(sizeof(double));
olddiagonal[0] = otherfunction(otherstuff);
int iter = 1;
// A bunch of other stuff
while (<error tolerance condition>) {
double * new diagonal = (double *) malloc((iter+1)*sizeof(double));
newdiagonal[0] = otherfunction(moreotherstuff);
// Then I do a bunch of things and fill in the values of new diagonal using
// its other values and the values in olddiagonal.
// To finish, I free the old stuff, and use realloc to point old to new.
free(olddiagonal);
olddiagonal = (double *) realloc(newdiagonal, sizeof(double) * (iter+1));
iter++
}
This seems to work for my purposes. My only concern is possible memory leaks, but for now, it's behaving well and getting the correct values.
Here are some explanations:
double ** oldarray;
oldarray = &malloc(1*sizeof(double));
is wrong, because you don't store the result of malloc() anywhere, and since it is not stored anywhere, you can't take its address. You can get the effect that you seem to have had in mind by adding an intermediate variable:
double* intermediatePointer;
double** oldarray = &intermediatePointer;
intermediatePointer = malloc(1*sizeof(*intermediatePointer);
oldarray is now a pointer to the memory location of intermediatePointer, which points to the allocated memory slap in turn.
*oldarray = (double *) malloc(1*sizeof(double));
is wrong, because you are dereferencing an unitialized pointer. When you declare oldarray with double** oldarray;, you are only reserving memory for one pointer, not for anything the pointer is supposed to point to (the memory reservation is independent of what the pointer points to!). The value that you find in that pointer variable is undefined, so you have absolutely no control about what memory address you are writing to when you assign something to *oldarray.
Whenever you declare a pointer, you must initialize the pointer before you dereference it:
int* foo;
*foo = 7; //This is always a bug.
int bar;
int* baz = &bar; //Make sure the pointer points to something sensible.
*baz = 7; //OK.
Your answer code is indeed correct. However, it can be improved concerning style:
The combination of
int iter = 1;
while (<error tolerance condition>) {
...
iter++
}
calls for the use of the for() loop, which encapsulates the definition and incrementation of the loop variable into the loop control statement:
for(int iter = 1; <error tolerance condition>; iter++) {
...
}
In C, the cast of the return value of malloc() is entirely superfluous, it only clutters your code. (Note however that C++ does not allow the implicit conversion of void*s as C does, so int *foo = malloc(sizeof(*foo)) is perfectly valid C, but not legal in C++. However, in C++ you wouldn't be using malloc() in the first place.)

How to prevent dangling pointers/junk in c?

I'm new to C and haven't really grasped when C decides to free an object and when it decides to keep an object.
heap_t is pointer to a struct heap.
heap_t create_heap(){
heap_t h_t = (heap_t)malloc(sizeof(heap));
h_t->it = 0;
h_t->len = 10;
h_t->arr = (token_t)calloc(10, sizeof(token));
//call below a couple of times to fill up arr
app_heap(h_t, ENUM, "enum", 1);
return h_t;
}
putting h_t through
int app_heap(heap_t h, enum symbol s, char* word, int line){
int it = h->it;
int len = h->len;
if (it + 1 < len ){
token temp;
h->arr[it] = temp;
h->arr[it].sym = s;
h->arr[it].word = word;
h->arr[it].line = line;
h->it = it + 1;
printf(h->arr[it].word);
return 1;
} else {
h->len = len*2;
h->arr = realloc(h->arr, len*2);
return app_heap(h, s, word, line);
}
}
Why does my h_t->arr fill up with junk and eventually I get a segmentation fault? How do I fix this? Any C coding tips/styles to avoid stuff like this?
First, to answer your question about the crash, I think the reason you are getting segmentation fault is that you fail to multiply len by sizeof(token) in the call to realloc. You end up writing past the end of the block that has been allocated, eventually triggering a segfault.
As far as "deciding to free an object and when [...] to keep an object" goes, C does not decide any of it for you: it simply does it when you tell it to by calling free, without asking you any further questions. This "obedience" ends up costing you sometimes, because you can accidentally free something you still need. It is a good idea to NULL out the pointer, to improve your chance of catching the issue faster (unfortunately, this is not enough to eliminate the problem altogether, because of shared pointers).
free(h->arr);
h -> arr = NULL; // Doing this is a good practice
To summarize, managing memory in C is a tedious task that requires a lot of thinking and discipline. You need to check the result of every allocation call to see if it has failed, and perform many auxiliary tasks when it does.
C does not "decide" anything, if you have allocated something yourself with an explicit call to e.g. malloc(), it will stay allocated until you free() it (or until the program terminates, typically).
I think this:
token temp;
h->arr[it] = temp;
h->arr[it].sym = s;
/* more accesses */
is very weird, the first two lines don't do anything sensible.
As pointed out by dasblinkenlight, you're failing to scale the re-allocation into bytes, which will cause dramatic shrinkage of the array when it tries to grow, and corrupt it totally.
You shouldn't cast the return values of malloc() and realloc(), in C.
Remember that realloc() might fail, in which case you will lose your pointer if you overwrite it like you do.
Lots of repetition in your code, i.e. realloc(h->arr, len*2) instead of realloc(h->arr, h->len * sizeof *h->arr) and so on.
Note how the last bullet point also fixes the realloc() scaling bug mentioned above.
You're not reallocating to the proper size, the realloc statement needs to be:
realloc(h->arr, sizeof(token) * len*2);
^^^^^^^^^^^^
(Or perhaps better realloc(h->arr, sizeof *h->arr * h->h_len);)
In C, you are responsible to free the memory you allocate. You have to free() the memory you've malloc/calloc/realloc'ed when it's suitable to do so. The C runtime never frees anything, except when the program has terminated(some more esoteric systems might not release the memory even then).
Also, try to be consistent, the general form for allocating is always T *foo = malloc(sizeof *foo), and dont duplicate stuff.
e.g.
h_t->arr = (token_t)calloc(10, sizeof(token));
^^^^^^^^ ^^ ^^^^^^^^^^^^^
Don't cast the return value of malloc in C. It's unncessesary and might hide a serious compiler warning and bug if you forget to include stdlib.h
the cast is token_t but the sizeof applies to token, why are they different, and are they the same type as *h_t->arr ?
You already have the magic 10 value, use h_t->len
If you ever change the type of h_t->arr, you have to remember to change the sizeof(..)
So make this
h_t->arr = calloc(h_t->len, sizeof *h_t->arr);
Two main problems in creating dangling pointers in C are the not assigning
NULL to a pointer after freeing its allocated memory, and shared pointers.
There is a solution to the first problem, of automatically nulling out the pointer.
void SaferFree(void *AFree[])
{
free(AFree[0]);
AFree[0] = NULL;
}
The caller, instead calling
free(p);
will call
SaferFree(&p);
In respect to the second and harder to be siolved issue:
The rule of three says:
If you need to explicitly declare either the destructor, copy constructor or copy assignment operator yourself, you probably need to explicitly declare all three of them.
Sharing a pointer in C is simply copying it (copy assignment). It means that using the rule of three (or the general rule of 0)
when programming in C obliges the programmer to supply a way to construct and especially destruct such an assignment, which is possible, but not an
easy task especially when C does not supply a descructor that is implicitly activated as in C++.

Code crashes unless I put a printf statement in it

This is a snippet of code from an array library I'm using. This runs fine on windows, but when I compile with gcc on linux if crashes in this function. when trying to narrow down the problem, I added a printf statement to it, and the code stopped crashing.
void _arrayCreateSize( void ***array, int capacity )
{
(*array) = malloc( (capacity * sizeof(int)) + sizeof(ArrayHeader) );
((ArrayHeader*)(*array))->size = 0;
((ArrayHeader*)(*array))->capacity = capacity;
// printf("Test!\n");
*(char**)array += sizeof(ArrayHeader);
}
As soon as that printf is taken out it starts crashing on me again. I'm completely baffled as to why it's happening.
The last line in the function is not doing what was intended. The code is obscure to the point of impenetrability.
It appears that the goal is to allocate an array of int, because of the sizeof(int) in the first memory allocation. At the very least, if you are meant to be allocating an array of structure pointers, you need to use sizeof(SomeType *), the size of some pointer type (sizeof(void *) would do). As written, this will fail horribly in a 64-bit environment.
The array is allocated with a structure header (ArrayHeader) followed by the array proper. The returned value is supposed to the start of the array proper; the ArrayHeader will presumably be found by subtraction from the pointer. This is ugly as sin, and unmaintainable to boot. It can be made to work, but it requires extreme care, and (as Brian Kernighan said) "if you're as clever as possible when you write the code, how are you ever going to debug it?".
Unfortunately, the last line is wrong:
void _arrayCreateSize( void ***array, int capacity )
{
(*array) = malloc( (capacity * sizeof(int)) + sizeof(ArrayHeader) );
((ArrayHeader*)(*array))->size = 0;
((ArrayHeader*)(*array))->capacity = capacity;
// printf("Test!\n");
*(char**)array += sizeof(ArrayHeader);
}
It adds sizeof(ArrayHeader) * sizeof(char *) to the address, instead of the intended sizeof(ArrayHeader) * sizeof(char). The last line should read, therefore:
*(char *)array += sizeof(ArrayHeader);
or, as noted in the comments and an alternative answer:
*(ArrayHeader *)array += 1;
*(ArrayHeader *)array++;
I note in passing that the function name should not really start with an underscore. External names starting with an underscore are reserved to the implementation (of the C compiler and library).
The question asks "why does the printf() statement 'fix' things". The answer is because it moves the problem around. You've got a Heisenbug because there is abuse of the allocated memory, and the presence of the printf() manages to alter the behaviour of the code slightly.
Recommendation
Run the program under valgrind. If you don't have it, get it.
Revise the code so that the function checks the return value from malloc(), and so it returns a pointer to a structure for the allocated array.
Use the clearer code outlined in Michael Burr's answer.
Arbitrary random crashing when adding seemingly unrelated printf() statements often is a sign of a corrupted heap. The compiler sometimes stores information about allocated memory directly on the heap itself. Overwriting that metadata leads to surprising runtime behavior.
A few suggestions:
are you sure that you need void ***?
try replacing your argument to malloc() with 10000. Does it work now?
Moreover, if you just want arrays that store some metadata, your current code is a bad approach. A clean solution would probably use a structure like the following:
struct Array {
size_t nmemb; // size of an array element
size_t size; // current size of array
size_t capacity; // maximum size of array
void *data; // the array itself
};
Now you can pass an object of type Array to functions that know about the Array type, and Array->data cast to the proper type to everything else. The memory layout might even be the same as in your current approach, but access to the metadata is significantly easier and especially more obvious.
Your main audience is the poor guy that has to maintain your code 5 years from now.
Now that Jonathan Leffler has pointed out what the bug was, might I suggest that the function be written in a manner that's a little less puzzling?:
void _arrayCreateSize( void ***array, int capacity )
{
// aloocate a header followed by an appropriately sized array of pointers
ArrayHeader* p = malloc( sizeof(ArrayHeader) + (capacity * sizeof(void*)));
p->size = 0;
p->capacity = capacity;
*array = (void**)(p+1); // return a pointer to just past the header
// (pointing at the array of pointers)
}
Mix in your own desired handling of malloc() failure.
I think this will probably help the next person who needs to look at it.

Resources