So I'm looking at a solution to some coding interview type questions, and there's an array inside a struct
#define MAX_SIZE 1000000
typedef struct _heap {
int data[MAX_SIZE];
int heap_size;
}heap;
heap* init(heap* h) {
h = (heap*)malloc(sizeof(heap));
h->heap_size = 0;
return h;
}
This heap struct is later created like so
heap* max_heap = NULL;
max_heap = init(max_heap);
First of all, I'd wish this was written in C++ style than C, but secondly if I'm just conscerned about the array, I'm assuming it is equivalent to solely analyze the array portion by changing the code like this
int* data = NULL;
data = (int*)malloc(1000000 * sizeof(int));
Now in that case, is there any problems with declaring the array with the max size if you are probably just using a little bit of it?
I guess this boils down to the question of when an array is created in the heap, how does the system block out that portion of the memory? In which case does the system prevent you from accessing memory that is part of the array? I wouldn't want a giant array holding up space if I'm not using much of it.
is there any problems with declaring the array with the max size if you are probably just using a little bit of it?
Yes. The larger the allocation size the greater the risk of an out-of-memory error. If not here, elsewhere in code.
Yet some memory allocation systems handle this well as real memory allocations do not immediately occur, but later when needed.
I guess this boils down to the question of when an array is created in the heap, how does the system block out that portion of the memory?
That is an implementation defined issue not defined by C. It might happen immediately or deferred.
For maximum portability, code would take a more conservative approach and allocate large memory chunks only as needed, rather than rely on physical allocation occurring in a delayed fashion.
Alternative
In C, consider a struct with a flexible member array.
typedef struct _heap {
size_t heap_size;
int data[];
} heap;
I have a user-defined struct call MyStruct and allocate a 2D dynamic array:
MyStruct** arr = (MyStruct**) malloc(sizeof(myStruct*)*size);
I want to process the array in a function:
void compute(MyStruct** lst)
{
int index = 0;
while(lst[index] != NULL)
{
//do something
index++;
}
}
I called compute(arr) and it works fine. But valgrind complains that there is an invalid read of size sizeof(MyStruct) at line while(...). I understand that at this point index is out of bound by 1 element. An easy fix is to pass size to the function and check if index < size through the loop.
Out of curiosity, is there anyway I can still traverse through the array without indexing that extra element AND not passing size to the function?
There is no standard way, no.
That said, there may be some nonstandard ways you can get the allocated size of a malloced piece of memory. For example, my machine has a size_t malloc_size(const void *); function in <malloc/malloc.h>; glibc has a malloc_usable_size function with a similar signature; and Microsoft’s libc implementation has an _msize function, also with a similar signature.
These cannot simply be dropped in, though; besides the obvious portability concerns, these return the actual amount of memory allocated for you, which might be slightly more than you requested. That might be okay for some applications, but perhaps not for iterating through your array.
You probably ought to just pass the size as a second parameter. Boring, I know.
Is there a way to initialize an array without defining the size.
The size of an array increases on its own as as when the loop runs it reallocates the array.
There is no such thing out of the box. You will have to create your own array-like data structure that does this. It shouldn't be very hard to implement, if you're careful.
What you're looking for is, roughly, a data structure that, when created, allocates (using malloc, for instance) a predefined size and starts using the consecutive space inside it as slots of an array. Then, as more items are added, it reallocates (say, using realloc) that space.
Of course, you won't be able to use the indexer syntax you're used to with simple arrays. Instead, your data structure will have to provide its own pair of set/get functions that take care of the above, under the hood. Therefore, the set function will check the index specified in its arguments and, if that index is greater than the current size of the array, perform a reallocation. Then, in any case, set the value provided to the specified index.
You can initialize an array without specifying the size but it would not be useful unless you allocated space for it before you used it. Normally when you declare a variable in C, the compiler reserves a specific amount of memory for that variable on the "stack". If you want an array to be able to grow throughout the program, however, this is not what you are looking for because the amount of space allocated for a variable on the "stack" is static.
The solution, therefore, is to have the program decide how much memory to allocate to your variable at run-time, instead of compile-time. This way, while the program is running, you will be able to decide how much space your variable needs to have reserved.
In practice, this is called dynamic memory allocation and it is accomplished in C using the functions malloc() and realloc(). I would suggest reading up on these functions, I think they will be very useful to you.
If you have follow up questions feel free to ask.
One last thing!
Whenever you use malloc() to allocate memory for a variable, you should remember to call the function free() on that variable at the end of the program or whenever you are done using the variable.
Here's a simple implementation of such a datastructure (for ints, but you can replace the int with whatever type you need). I've omitted error-handling for clarity.
typedef struct array_s {
int len, cap;
int *a;
} array_s, *array_t;
/* Create a new array with 0 length, and the given capacity. */
array_t array_new(int cap) {
array_t result = malloc(sizeof(array_s));
array_s a = {0, cap, malloc(sizeof(int) * cap)};
*result = a;
return result;
}
/* Destroy an array. */
void array_free(array_t a) {
free(a->a);
free(a);
}
/* Change the size of an array, truncating if necessary. */
void array_resize(array_t a, int new_cap) {
result->cap = new_cap;
result->a = realloc(result->a, new_cap * sizeof(int));
if (result->len > result->cap) {
result->len = result->cap;
}
}
/* Add a new element to the end of the array, resizing if necessary. */
void array_append(array_t a, int x) {
if (result->len == result->cap) {
// max the new size with 4 in case cap is 0.
array_resize(a, max(4, result->cap * 2));
}
a->a[a->len++] = x;
}
By storing len (the current length of the array), and cap (the amount of space you've reserved for the array), you can extend the array in O(1) up to the point when len is cap, then resize the array (eg: using realloc), perhaps by multiplying the existing cap by 2 or 1.5 or something. This is what most vector or list types do in languages that support resizable arrays. I've coded this in array_append as an example.
This is a snippet of code from an array library I'm using. This runs fine on windows, but when I compile with gcc on linux if crashes in this function. when trying to narrow down the problem, I added a printf statement to it, and the code stopped crashing.
void _arrayCreateSize( void ***array, int capacity )
{
(*array) = malloc( (capacity * sizeof(int)) + sizeof(ArrayHeader) );
((ArrayHeader*)(*array))->size = 0;
((ArrayHeader*)(*array))->capacity = capacity;
// printf("Test!\n");
*(char**)array += sizeof(ArrayHeader);
}
As soon as that printf is taken out it starts crashing on me again. I'm completely baffled as to why it's happening.
The last line in the function is not doing what was intended. The code is obscure to the point of impenetrability.
It appears that the goal is to allocate an array of int, because of the sizeof(int) in the first memory allocation. At the very least, if you are meant to be allocating an array of structure pointers, you need to use sizeof(SomeType *), the size of some pointer type (sizeof(void *) would do). As written, this will fail horribly in a 64-bit environment.
The array is allocated with a structure header (ArrayHeader) followed by the array proper. The returned value is supposed to the start of the array proper; the ArrayHeader will presumably be found by subtraction from the pointer. This is ugly as sin, and unmaintainable to boot. It can be made to work, but it requires extreme care, and (as Brian Kernighan said) "if you're as clever as possible when you write the code, how are you ever going to debug it?".
Unfortunately, the last line is wrong:
void _arrayCreateSize( void ***array, int capacity )
{
(*array) = malloc( (capacity * sizeof(int)) + sizeof(ArrayHeader) );
((ArrayHeader*)(*array))->size = 0;
((ArrayHeader*)(*array))->capacity = capacity;
// printf("Test!\n");
*(char**)array += sizeof(ArrayHeader);
}
It adds sizeof(ArrayHeader) * sizeof(char *) to the address, instead of the intended sizeof(ArrayHeader) * sizeof(char). The last line should read, therefore:
*(char *)array += sizeof(ArrayHeader);
or, as noted in the comments and an alternative answer:
*(ArrayHeader *)array += 1;
*(ArrayHeader *)array++;
I note in passing that the function name should not really start with an underscore. External names starting with an underscore are reserved to the implementation (of the C compiler and library).
The question asks "why does the printf() statement 'fix' things". The answer is because it moves the problem around. You've got a Heisenbug because there is abuse of the allocated memory, and the presence of the printf() manages to alter the behaviour of the code slightly.
Recommendation
Run the program under valgrind. If you don't have it, get it.
Revise the code so that the function checks the return value from malloc(), and so it returns a pointer to a structure for the allocated array.
Use the clearer code outlined in Michael Burr's answer.
Arbitrary random crashing when adding seemingly unrelated printf() statements often is a sign of a corrupted heap. The compiler sometimes stores information about allocated memory directly on the heap itself. Overwriting that metadata leads to surprising runtime behavior.
A few suggestions:
are you sure that you need void ***?
try replacing your argument to malloc() with 10000. Does it work now?
Moreover, if you just want arrays that store some metadata, your current code is a bad approach. A clean solution would probably use a structure like the following:
struct Array {
size_t nmemb; // size of an array element
size_t size; // current size of array
size_t capacity; // maximum size of array
void *data; // the array itself
};
Now you can pass an object of type Array to functions that know about the Array type, and Array->data cast to the proper type to everything else. The memory layout might even be the same as in your current approach, but access to the metadata is significantly easier and especially more obvious.
Your main audience is the poor guy that has to maintain your code 5 years from now.
Now that Jonathan Leffler has pointed out what the bug was, might I suggest that the function be written in a manner that's a little less puzzling?:
void _arrayCreateSize( void ***array, int capacity )
{
// aloocate a header followed by an appropriately sized array of pointers
ArrayHeader* p = malloc( sizeof(ArrayHeader) + (capacity * sizeof(void*)));
p->size = 0;
p->capacity = capacity;
*array = (void**)(p+1); // return a pointer to just past the header
// (pointing at the array of pointers)
}
Mix in your own desired handling of malloc() failure.
I think this will probably help the next person who needs to look at it.
I've always heard that in C you have to really watch how you manage memory. And I'm still beginning to learn C, but thus far, I have not had to do any memory managing related activities at all.. I always imagined having to release variables and do all sorts of ugly things. But this doesn't seem to be the case.
Can someone show me (with code examples) an example of when you would have to do some "memory management" ?
There are two places where variables can be put in memory. When you create a variable like this:
int a;
char c;
char d[16];
The variables are created in the "stack". Stack variables are automatically freed when they go out of scope (that is, when the code can't reach them anymore). You might hear them called "automatic" variables, but that has fallen out of fashion.
Many beginner examples will use only stack variables.
The stack is nice because it's automatic, but it also has two drawbacks: (1) The compiler needs to know in advance how big the variables are, and (2) the stack space is somewhat limited. For example: in Windows, under default settings for the Microsoft linker, the stack is set to 1 MB, and not all of it is available for your variables.
If you don't know at compile time how big your array is, or if you need a big array or struct, you need "plan B".
Plan B is called the "heap". You can usually create variables as big as the Operating System will let you, but you have to do it yourself. Earlier postings showed you one way you can do it, although there are other ways:
int size;
// ...
// Set size to some value, based on information available at run-time. Then:
// ...
char *p = (char *)malloc(size);
(Note that variables in the heap are not manipulated directly, but via pointers)
Once you create a heap variable, the problem is that the compiler can't tell when you're done with it, so you lose the automatic releasing. That's where the "manual releasing" you were referring to comes in. Your code is now responsible to decide when the variable is not needed anymore, and release it so the memory can be taken for other purposes. For the case above, with:
free(p);
What makes this second option "nasty business" is that it's not always easy to know when the variable is not needed anymore. Forgetting to release a variable when you don't need it will cause your program to consume more memory that it needs to. This situation is called a "leak". The "leaked" memory cannot be used for anything until your program ends and the OS recovers all of its resources. Even nastier problems are possible if you release a heap variable by mistake before you are actually done with it.
In C and C++, you are responsible to clean up your heap variables like shown above. However, there are languages and environments such as Java and .NET languages like C# that use a different approach, where the heap gets cleaned up on its own. This second method, called "garbage collection", is much easier on the developer but you pay a penalty in overhead and performance. It's a balance.
(I have glossed over many details to give a simpler, but hopefully more leveled answer)
Here's an example. Suppose you have a strdup() function that duplicates a string:
char *strdup(char *src)
{
char * dest;
dest = malloc(strlen(src) + 1);
if (dest == NULL)
abort();
strcpy(dest, src);
return dest;
}
And you call it like this:
main()
{
char *s;
s = strdup("hello");
printf("%s\n", s);
s = strdup("world");
printf("%s\n", s);
}
You can see that the program works, but you have allocated memory (via malloc) without freeing it up. You have lost your pointer to the first memory block when you called strdup the second time.
This is no big deal for this small amount of memory, but consider the case:
for (i = 0; i < 1000000000; ++i) /* billion times */
s = strdup("hello world"); /* 11 bytes */
You have now used up 11 gig of memory (possibly more, depending on your memory manager) and if you have not crashed your process is probably running pretty slowly.
To fix, you need to call free() for everything that is obtained with malloc() after you finish using it:
s = strdup("hello");
free(s); /* now not leaking memory! */
s = strdup("world");
...
Hope this example helps!
You have to do "memory management" when you want to use memory on the heap rather than the stack. If you don't know how large to make an array until runtime, then you have to use the heap. For example, you might want to store something in a string, but don't know how large its contents will be until the program is run. In that case you'd write something like this:
char *string = malloc(stringlength); // stringlength is the number of bytes to allocate
// Do something with the string...
free(string); // Free the allocated memory
I think the most concise way to answer the question in to consider the role of the pointer in C. The pointer is a lightweight yet powerful mechanism that gives you immense freedom at the cost of immense capacity to shoot yourself in the foot.
In C the responsibility of ensuring your pointers point to memory you own is yours and yours alone. This requires an organized and disciplined approach, unless you forsake pointers, which makes it hard to write effective C.
The posted answers to date concentrate on automatic (stack) and heap variable allocations. Using stack allocation does make for automatically managed and convenient memory, but in some circumstances (large buffers, recursive algorithms) it can lead to the horrendous problem of stack overflow. Knowing exactly how much memory you can allocate on the stack is very dependent on the system. In some embedded scenarios a few dozen bytes might be your limit, in some desktop scenarios you can safely use megabytes.
Heap allocation is less inherent to the language. It is basically a set of library calls that grants you ownership of a block of memory of given size until you are ready to return ('free') it. It sounds simple, but is associated with untold programmer grief. The problems are simple (freeing the same memory twice, or not at all [memory leaks], not allocating enough memory [buffer overflow], etc) but difficult to avoid and debug. A hightly disciplined approach is absolutely mandatory in practive but of course the language doesn't actually mandate it.
I'd like to mention another type of memory allocation that's been ignored by other posts. It's possible to statically allocate variables by declaring them outside any function. I think in general this type of allocation gets a bad rap because it's used by global variables. However there's nothing that says the only way to use memory allocated this way is as an undisciplined global variable in a mess of spaghetti code. The static allocation method can be used simply to avoid some of the pitfalls of the heap and automatic allocation methods. Some C programmers are surprised to learn that large and sophisticated C embedded and games programs have been constructed with no use of heap allocation at all.
There are some great answers here about how to allocate and free memory, and in my opinion the more challenging side of using C is ensuring that the only memory you use is memory you've allocated - if this isn't done correctly what you end up with is the cousin of this site - a buffer overflow - and you may be overwriting memory that's being used by another application, with very unpredictable results.
An example:
int main() {
char* myString = (char*)malloc(5*sizeof(char));
myString = "abcd";
}
At this point you've allocated 5 bytes for myString and filled it with "abcd\0" (strings end in a null - \0).
If your string allocation was
myString = "abcde";
You would be assigning "abcde" in the 5 bytes you've had allocated to your program, and the trailing null character would be put at the end of this - a part of memory that hasn't been allocated for your use and could be free, but could equally be being used by another application - This is the critical part of memory management, where a mistake will have unpredictable (and sometimes unrepeatable) consequences.
A thing to remember is to always initialize your pointers to NULL, since an uninitialized pointer may contain a pseudorandom valid memory address which can make pointer errors go ahead silently. By enforcing a pointer to be initialized with NULL, you can always catch if you are using this pointer without initializing it. The reason is that operating systems "wire" the virtual address 0x00000000 to general protection exceptions to trap null pointer usage.
Also you might want to use dynamic memory allocation when you need to define a huge array, say int[10000]. You can't just put it in stack because then, hm... you'll get a stack overflow.
Another good example would be an implementation of a data structure, say linked list or binary tree. I don't have a sample code to paste here but you can google it easily.
(I'm writing because I feel the answers so far aren't quite on the mark.)
The reason you have to memory management worth mentioning is when you have a problem / solution that requires you to create complex structures. (If your programs crash if you allocate to much space on the stack at once, that's a bug.) Typically, the first data structure you'll need to learn is some kind of list. Here's a single linked one, off the top of my head:
typedef struct listelem { struct listelem *next; void *data;} listelem;
listelem * create(void * data)
{
listelem *p = calloc(1, sizeof(listelem));
if(p) p->data = data;
return p;
}
listelem * delete(listelem * p)
{
listelem next = p->next;
free(p);
return next;
}
void deleteall(listelem * p)
{
while(p) p = delete(p);
}
void foreach(listelem * p, void (*fun)(void *data) )
{
for( ; p != NULL; p = p->next) fun(p->data);
}
listelem * merge(listelem *p, listelem *q)
{
while(p != NULL && p->next != NULL) p = p->next;
if(p) {
p->next = q;
return p;
} else
return q;
}
Naturally, you'd like a few other functions, but basically, this is what you need memory management for. I should point out that there are a number tricks that are possible with "manual" memory management, e.g.,
Using the fact that malloc is guaranteed (by the language standard) to return a pointer divisible by 4,
allocating extra space for some sinister purpose of your own,
creating memory pools..
Get a good debugger... Good luck!
#Euro Micelli
One negative to add is that pointers to the stack are no longer valid when the function returns, so you cannot return a pointer to a stack variable from a function. This is a common error and a major reason why you can't get by with just stack variables. If your function needs to return a pointer, then you have to malloc and deal with memory management.
#Ted Percival:
...you don't need to cast malloc()'s return value.
You are correct, of course. I believe that has always been true, although I don't have a copy of K&R to check.
I don't like a lot of the implicit conversions in C, so I tend to use casts to make "magic" more visible. Sometimes it helps readability, sometimes it doesn't, and sometimes it causes a silent bug to be caught by the compiler. Still, I don't have a strong opinion about this, one way or another.
This is especially likely if your compiler understands C++-style comments.
Yeah... you caught me there. I spend a lot more time in C++ than C. Thanks for noticing that.
In C, you actually have two different choices. One, you can let the system manage the memory for you. Alternatively, you can do that by yourself. Generally, you would want to stick to the former as long as possible. However, auto-managed memory in C is extremely limited and you will need to manually manage the memory in many cases, such as:
a. You want the variable to outlive the functions, and you don't want to have global variable. ex:
struct pair{
int val;
struct pair *next;
}
struct pair* new_pair(int val){
struct pair* np = malloc(sizeof(struct pair));
np->val = val;
np->next = NULL;
return np;
}
b. you want to have dynamically allocated memory. Most common example is array without fixed length:
int *my_special_array;
my_special_array = malloc(sizeof(int) * number_of_element);
for(i=0; i
c. You want to do something REALLY dirty. For example, I would want a struct to represent many kind of data and I don't like union (union looks soooo messy):
struct data{
int data_type;
long data_in_mem;
};
struct animal{/*something*/};
struct person{/*some other thing*/};
struct animal* read_animal();
struct person* read_person();
/*In main*/
struct data sample;
sampe.data_type = input_type;
switch(input_type){
case DATA_PERSON:
sample.data_in_mem = read_person();
break;
case DATA_ANIMAL:
sample.data_in_mem = read_animal();
default:
printf("Oh hoh! I warn you, that again and I will seg fault your OS");
}
See, a long value is enough to hold ANYTHING. Just remember to free it, or you WILL regret. This is among my favorite tricks to have fun in C :D.
However, generally, you would want to stay away from your favorite tricks (T___T). You WILL break your OS, sooner or later, if you use them too often. As long as you don't use *alloc and free, it is safe to say that you are still virgin, and that the code still looks nice.
Sure. If you create an object that exists outside of the scope you use it in. Here is a contrived example (bear in mind my syntax will be off; my C is rusty, but this example will still illustrate the concept):
class MyClass
{
SomeOtherClass *myObject;
public MyClass()
{
//The object is created when the class is constructed
myObject = (SomeOtherClass*)malloc(sizeof(myObject));
}
public ~MyClass()
{
//The class is destructed
//If you don't free the object here, you leak memory
free(myObject);
}
public void SomeMemberFunction()
{
//Some use of the object
myObject->SomeOperation();
}
};
In this example, I'm using an object of type SomeOtherClass during the lifetime of MyClass. The SomeOtherClass object is used in several functions, so I've dynamically allocated the memory: the SomeOtherClass object is created when MyClass is created, used several times over the life of the object, and then freed once MyClass is freed.
Obviously if this were real code, there would be no reason (aside from possibly stack memory consumption) to create myObject in this way, but this type of object creation/destruction becomes useful when you have a lot of objects, and want to finely control when they are created and destroyed (so that your application doesn't suck up 1GB of RAM for its entire lifetime, for example), and in a Windowed environment, this is pretty much mandatory, as objects that you create (buttons, say), need to exist well outside of any particular function's (or even class') scope.