I am sure someone must have implemented something like this already!
What I am looking for is the ability to "checkpoint" the heap state and then clear all allocations that have happened since the last checkpoint.
Basically what I am looking for is a natural corollary of the _CrtMemCheck Apis.
Something like(preferably cross-platform)
//we save the heap state here in s1
_CrtMemCheckpoint( &s1 );
//allocs and frees
//Get rid of all allocs since checkpoint s1 that have not been freed!
_CrtMemClearAllObjectsSince(&s1);
There is no standard way to use mark/release memory allocation in C. If you know for a fact that all malloc/free calls will be used in a LIFO fashion, you may be able to link in your ownmalloc/free` functions using something like the following:
#define MY_HEAP_SIZE 12345678
unsigned char my_mem[MY_HEAP_SIZE];
unsigned char *my_alloc_ptr = my_mem;
void *malloc(size_t size)
{
void *ret = my_alloc_ptr;
if (size <= MY_HEAP_SIZE && ((my_alloc_ptr - my_mem)+size) <= MY_HEAP_SIZE)
{
my_alloc_ptr += size;
return (void*)ret;
}
else
return (void*)0;
}
void free(void *ptr)
{
if (ptr)
my_alloc_ptr = ptr;
}
This approach requires zero bytes of overhead per allocation block, but calling free() on any block will also free all blocks that were allocated later. An alternative approach which could be used if the external code doesn't use malloc/free in LIFO order, but it would be okay if blocks don't freed until your code does so, would be to make free() do nothing, but have some other function which behaves like free above. More sophisticated variations are possible as well, but in cases where the first approach will suffice, there's no beating its efficiency. Very nice for embedded systems (though I'd usually call it something other than malloc).
You can modify malloc()/free() using hooks to remember allocated memory (for example, suppose that your record the new pointer in an array of pointers). Then your can have two functions:
int get_checkpoint(), that returns the next free array index,
void free_until(int checkpoint), that frees memory from the current stored pointer in the array backwards, until checkpoint is reached.
This way, you can do:
int cpoint = get_checkpoint();
LibraryDoSomething();
free_until(cpoint);
Of course, this technique is still dangerous; calling a C library function can have side effects that you can easily affect. The best advice is still that of Amardeep.
Another possible and interesting solution could be the use of LD_PRELOAD. As the man page for LD_PRELOAD states "This can be used to selectively override functions in other shared libraries."
Thus, you can have your own implementations of malloc and free wherein you can implement the required checks and then call the default malloc or free.
You can check the details here: http://somethingswhichidintknow.blogspot.com/2009/10/dll-injection.html
Related
I have the following C function:
void mySwap(void * p1, void * p2, int elementSize)
{
void * temp = (void*) malloc(elementSize);
assert(temp != NULL);
memcpy(temp, p1, elementSize);
memcpy(p1, p2, elementSize);
memcpy(p2, temp, elementSize);
free(temp);
}
that I want to use in a generic sorting function. Let's suppose that I use it to sort a dynamically allocated array owned by main(). Now let's suppose that at some point temp in mySwap() is actually NULL and the whole program is aborted without freeing the dynamically allocated array in main(). I thought that both mySwap() and the sorting function could return a bool value indicating whether the allocation was successful or not and by using if statements I could free the array in main() and exit(EXIT_FAILURE), but it doesn't seem like a very elegant sollution. What would be a good way to prevent a memory leak in such an instance?
assert is typically used during debugging to identify problems/errors that should never occur.
Out of memory is something that can occur, and so either should not be handled by assert, or, if you do use assert, beware that it will abort the program. Once the program aborts, all memory used by the program is deallocated, so don't worry about that.
Note: If you don't want to have unwieldy if statements everywhere just to handle errors that hardly ever occur, you can use setjmp/longjmp to return to a recoverable state.
You have to realize that the reason malloc fails is because your computer has ran out of memory. From that point and onwards, there's nothing meaningful that your program can do, except terminating as gracefully as you can.
The OS will free the memory upon program termination, so that's not something you need to worry about.
Still, in the normal case, it is of course good practice to free() yourself, manually. Not so much for the sake of "making the memory available again" - the OS will ensure that - but to verify that your program has not gone terribly wrong along the way and created heap corruption, leaks or other bugs. If you have such bugs in your program, it will crash during the free() call, which is a good thing, as the bugs will surface.
assert should preferably not be used in production code. Build your own error handling if needed, that's something better than just violently terminating your own program in the middle of execution.
Avoid the problem by not using malloc.
Instead of allocating a block of memory for every swap, do the swap one byte at a time;
for (int i = 0; i < elementSize; ++i) {
char tmp = ((char*)p1)[i];
((char*)p1)[i] = ((char*)p2)[i];
((char*)p2)[i] = tmp;
}
Only use assert() to catch programmer-error during development, in release-builds it doesn't do anything. If you need to test other things, use proper error-handling, whether that means abort(), return-codes or emulating exceptions using setjmp()/longjmp().
As an aside, do not cast the result of malloc().
Which it is the correct way to release the memory in this case; there is some difference between the two methods?
void allocateArray1(int size, int value)
{
int* arr = malloc(size * sizeof(int));
/* ... */
free(arr);
}
int* allocateArray2(int size, int value)
{
int* arr = malloc(size * sizeof(int));
/* ... */
return arr;
}
int main()
{
int* vector = allocateArray2(5,45);
free(vector);
allocateArray1(5,45);
return 0;
}
They are equivalent, because both allocation with malloc and release with free. The allocateArray1 method does it all in one function, which makes it easier to remember to free the memory. But sometimes you need the function to provide main (or some other function) with memory, so it can use it. In that case you'll just have to delete it later, as in the allocateArray2 method.
This is sometimes what's known as “ownership semantics”, i.e. who owns the object (and therefore who is responsible for freeing the object).
Some functions require the caller to free the returned object, e.g. strdup(), or sometimes the POSIX getline() function too. In these cases, the strdup() and getline() functions can't know what you plan to do with the result or how long you'll need the result for, so they delegate the task of freeing the object to the caller of the function.
Other library functions may return an object whose lifetime is already maintained by the library itself, so there is no need to free anything.
It's important when developing a project to have consistent ownership semantics. For example, perhaps any function that delegates the task of freeing objects could start with alloc (or new or create etc.), and then you'll always know that freeing the result of these functions is your responsibility. It's not really important how the ownership semantics are defined, as long as they are consistent.
Which it is the correct way to release the memory in this case; there is some difference between the two methods?
Both methods are correct.
However, i will prefer to use int* allocateArray2(int size, int value) function which allocates some memory from the heap inside the function and return pointer to allocated memory space.
The primary reason malloc is needed is when you have data that must have a lifetime that is different from code scope. Your code calls malloc in one routine, stores the pointer somewhere and eventually calls free in a different routine.
void allocateArray1(int size, int value) function which asks for some memory, does some processing and frees the memory before returning is not the efficient method if the size is less. You can instead create the array on the stack and use it for further processing. The advantage of using the stack to store variables, is that memory is managed for you. You don't have to allocate memory by hand, or free it once you don't need it any more. What's more, because the CPU organizes stack memory so efficiently, reading from and writing to stack variables is very fast. However, it may cause stack overflow, if you attempt to allocate more memory on the stack than will fit, for example by creating local array variables that are too large.
An example of a very large stack variable in C:
int foo()
{
double x[1048576];
}
The declared array consumes 8 mebibytes of data (assuming each double is 8 bytes); if this is more memory than is available on the stack (as set by thread creation parameters or operating system limits), a stack overflow will occur.
I am pretty new to C programming and I have several functions returning type char *
Say I declare char a[some_int];, and I fill it later on. When I attempt to return it at the end of the function, it will only return the char at the first index. One thing I noticed, however, is that it will return the entirety of a if I call any sort of function on it prior to returning it. For example, my function to check the size of a string (calling something along the lines of strLength(a);).
I'm very curious what the situation is with this exactly. Again, I'm new to C programming (as you probably can tell).
EDIT: Additionally, if you have any advice concerning the best method of returning this, please let me know. Thanks!
EDIT 2: For example:
I have char ret[my_strlen(a) + my_strlen(b)]; in which a and b are strings and my_strlen returns their length.
Then I loop through filling ret using ret[i] = a[i]; and incrementing.
When I call my function that prints an input string (as a test), it prints out how I want it, but when I do
return ret;
or even
char *ptr = ret;
return ptr;
it never supplies me with the full string, just the first char.
A way not working to return a chunk of char data is to return it in memory temporaryly allocated on the stack during the execution of your function and (most probably) already used for another purpose after it returned.
A working alternative would be to allocate the chunk of memory ont the heap. Make sure you read up about and understand the difference between stack and heap memory! The malloc() family of functions is your friend if you choose to return your data in a chunk of memory allocated on the heap (see man malloc).
char* a = (char*) malloc(some_int * sizeof(char)) should help in your case. Make sure you don't forget to free up memory once you don't need it any more.
char* ret = (char*) malloc((my_strlen(a) + my_strlen(b)) * sizeof(char)) for the second example given. Again don't forget to free once the memory isn't used any more.
As MByD correctly pointed out, it is not forbidden in general to use memory allocated on the stack to pass chunks of data in and out of functions. As long as the chunk is not allocated on the stack of the function returning this is also quite well.
In the scenario below function b will work on a chunk of memory allocated on the stackframe created, when function a entered and living until a returns. So everything will be pretty fine even though no memory allocated on the heap is involved.
void b(char input[]){
/* do something useful here */
}
void a(){
char buf[BUFFER_SIZE];
b(buf)
/* use data filled in by b here */
}
As still another option you may choose to leave memory allocation on the heap to the compiler, using a global variable. I'd count at least this option to the last resort category, as not handled properly, global variables are the main culprits in raising problems with reentrancy and multithreaded applications.
Happy hacking and good luck on your learning C mission.
In C, it is possible for functions to return pointers to memory that that function dynamically-allocated and require the calling code to free it. It's also common to require that the calling code supplies a buffer to a second function, which then sets the contents of that buffer. For example:
struct mystruct {
int a;
char *b;
};
struct mystruct *get_a_struct(int a, char*b)
{
struct mystruct *m = malloc(sizeof(struct mystruct));
m->a = a;
m->b = b;
return m;
}
int init_a_struct(int a, char*b, struct mystruct *s)
{
int success = 0;
if (a < 10) {
success = 1;
s->a = a;
s->b = b;
}
return success;
}
Is one or the other method preferable? I can think of arguments for both: for the get_a_struct method the calling code is simplified because it only needs to free() the returned struct; for the init_a_struct method there is a very low likelihood that the calling code will fail to free() dynamically-allocated memory since the calling code itself probably allocated it.
It depends on the specific situation but in general supplying the allocated buffer seems to be preferable.
As mentioned by Jim, DLLs can cause problems if called function allocates memory. That would be the case if you decide to distribute the code as a Dll and get_a_struct is exported to/is visible by the users of the DLL. Then the users have to figure out, hopefully from documentation, if they should free the memory using free, delete or other OS specific function. Furthermore, even if they use the correct function to free the memory they might be using a different version of the C/C++ runtime. This can lead to bugs that are rather hard to find. Check this Raymond Chen post or search for "memory allocation dll boundaries". The typical solution is export from the DLL your own free function. So you will have the pair: get_a_struct/release_a_struct.
In the other hand, sometimes only the called function knows the amount of memory that needs to be allocated. In this case it makes more sense for the called function to do the allocation. If that is not possible, say because of the DLL boundary issue, a typical albeit ugly solution is to provide a mechanism to find this information. For example in Windows the GetCurrentDirectory function will return the required buffer size if you pass 0 and NULL as its parameters.
I think that providing the already allocated struct as an argument is preferable, because in most cases you wouldn't need to call malloc/calloc in the calling code, and therefore worrying about free'ing it. Example:
int init_struct(struct some_struct *ss, args...)
{
// init ss
}
int main()
{
struct some_struct foo;
init_struct(&foo, some_args...);
// free is not needed
}
The "pass an pointer in is preferred", unless it's absolutely required that every object is a "new object allocated from the heap" for some logistical reason - e.g. it's going to be put into a linked list as a node, and the linked-list handler will eventually destroy the elements by calling free - or some other situation where "all things created from here will go to free later on).
Note that "not calling malloc" is always the preferred solution if possible. Not only does calling malloc take some time, it also means that some place, you will have to call free on the allocated memory, and every allocated object takes several bytes (typically 12-40 bytes) of "overhead" - so allocating space for small objects is definitely wasteful.
I agree with other answers that passing the allocated struct is preferred, but there is one situation where returning a pointer may be preferred.
In case you need to explicitly free some resource at the end (close a file or socket, or free some memory internal to the struct, or join a thread, or something else that would require a destructor in C++), I think it may be better to allocate internally, then returning the pointer.
I think it so because, in C, it means some kind of a contract: if you allocate your own struct, you shouldn't have to do anything to destroy it, and it be automatically cleared at the end of the function. On the other hand, if you received some dynamically allocated pointer, you feel compelled to call something to destroy it at the end, and this destroy_a_struct function is where you will put the other cleanup tasks needed, alongside free.
I've always heard that in C you have to really watch how you manage memory. And I'm still beginning to learn C, but thus far, I have not had to do any memory managing related activities at all.. I always imagined having to release variables and do all sorts of ugly things. But this doesn't seem to be the case.
Can someone show me (with code examples) an example of when you would have to do some "memory management" ?
There are two places where variables can be put in memory. When you create a variable like this:
int a;
char c;
char d[16];
The variables are created in the "stack". Stack variables are automatically freed when they go out of scope (that is, when the code can't reach them anymore). You might hear them called "automatic" variables, but that has fallen out of fashion.
Many beginner examples will use only stack variables.
The stack is nice because it's automatic, but it also has two drawbacks: (1) The compiler needs to know in advance how big the variables are, and (2) the stack space is somewhat limited. For example: in Windows, under default settings for the Microsoft linker, the stack is set to 1 MB, and not all of it is available for your variables.
If you don't know at compile time how big your array is, or if you need a big array or struct, you need "plan B".
Plan B is called the "heap". You can usually create variables as big as the Operating System will let you, but you have to do it yourself. Earlier postings showed you one way you can do it, although there are other ways:
int size;
// ...
// Set size to some value, based on information available at run-time. Then:
// ...
char *p = (char *)malloc(size);
(Note that variables in the heap are not manipulated directly, but via pointers)
Once you create a heap variable, the problem is that the compiler can't tell when you're done with it, so you lose the automatic releasing. That's where the "manual releasing" you were referring to comes in. Your code is now responsible to decide when the variable is not needed anymore, and release it so the memory can be taken for other purposes. For the case above, with:
free(p);
What makes this second option "nasty business" is that it's not always easy to know when the variable is not needed anymore. Forgetting to release a variable when you don't need it will cause your program to consume more memory that it needs to. This situation is called a "leak". The "leaked" memory cannot be used for anything until your program ends and the OS recovers all of its resources. Even nastier problems are possible if you release a heap variable by mistake before you are actually done with it.
In C and C++, you are responsible to clean up your heap variables like shown above. However, there are languages and environments such as Java and .NET languages like C# that use a different approach, where the heap gets cleaned up on its own. This second method, called "garbage collection", is much easier on the developer but you pay a penalty in overhead and performance. It's a balance.
(I have glossed over many details to give a simpler, but hopefully more leveled answer)
Here's an example. Suppose you have a strdup() function that duplicates a string:
char *strdup(char *src)
{
char * dest;
dest = malloc(strlen(src) + 1);
if (dest == NULL)
abort();
strcpy(dest, src);
return dest;
}
And you call it like this:
main()
{
char *s;
s = strdup("hello");
printf("%s\n", s);
s = strdup("world");
printf("%s\n", s);
}
You can see that the program works, but you have allocated memory (via malloc) without freeing it up. You have lost your pointer to the first memory block when you called strdup the second time.
This is no big deal for this small amount of memory, but consider the case:
for (i = 0; i < 1000000000; ++i) /* billion times */
s = strdup("hello world"); /* 11 bytes */
You have now used up 11 gig of memory (possibly more, depending on your memory manager) and if you have not crashed your process is probably running pretty slowly.
To fix, you need to call free() for everything that is obtained with malloc() after you finish using it:
s = strdup("hello");
free(s); /* now not leaking memory! */
s = strdup("world");
...
Hope this example helps!
You have to do "memory management" when you want to use memory on the heap rather than the stack. If you don't know how large to make an array until runtime, then you have to use the heap. For example, you might want to store something in a string, but don't know how large its contents will be until the program is run. In that case you'd write something like this:
char *string = malloc(stringlength); // stringlength is the number of bytes to allocate
// Do something with the string...
free(string); // Free the allocated memory
I think the most concise way to answer the question in to consider the role of the pointer in C. The pointer is a lightweight yet powerful mechanism that gives you immense freedom at the cost of immense capacity to shoot yourself in the foot.
In C the responsibility of ensuring your pointers point to memory you own is yours and yours alone. This requires an organized and disciplined approach, unless you forsake pointers, which makes it hard to write effective C.
The posted answers to date concentrate on automatic (stack) and heap variable allocations. Using stack allocation does make for automatically managed and convenient memory, but in some circumstances (large buffers, recursive algorithms) it can lead to the horrendous problem of stack overflow. Knowing exactly how much memory you can allocate on the stack is very dependent on the system. In some embedded scenarios a few dozen bytes might be your limit, in some desktop scenarios you can safely use megabytes.
Heap allocation is less inherent to the language. It is basically a set of library calls that grants you ownership of a block of memory of given size until you are ready to return ('free') it. It sounds simple, but is associated with untold programmer grief. The problems are simple (freeing the same memory twice, or not at all [memory leaks], not allocating enough memory [buffer overflow], etc) but difficult to avoid and debug. A hightly disciplined approach is absolutely mandatory in practive but of course the language doesn't actually mandate it.
I'd like to mention another type of memory allocation that's been ignored by other posts. It's possible to statically allocate variables by declaring them outside any function. I think in general this type of allocation gets a bad rap because it's used by global variables. However there's nothing that says the only way to use memory allocated this way is as an undisciplined global variable in a mess of spaghetti code. The static allocation method can be used simply to avoid some of the pitfalls of the heap and automatic allocation methods. Some C programmers are surprised to learn that large and sophisticated C embedded and games programs have been constructed with no use of heap allocation at all.
There are some great answers here about how to allocate and free memory, and in my opinion the more challenging side of using C is ensuring that the only memory you use is memory you've allocated - if this isn't done correctly what you end up with is the cousin of this site - a buffer overflow - and you may be overwriting memory that's being used by another application, with very unpredictable results.
An example:
int main() {
char* myString = (char*)malloc(5*sizeof(char));
myString = "abcd";
}
At this point you've allocated 5 bytes for myString and filled it with "abcd\0" (strings end in a null - \0).
If your string allocation was
myString = "abcde";
You would be assigning "abcde" in the 5 bytes you've had allocated to your program, and the trailing null character would be put at the end of this - a part of memory that hasn't been allocated for your use and could be free, but could equally be being used by another application - This is the critical part of memory management, where a mistake will have unpredictable (and sometimes unrepeatable) consequences.
A thing to remember is to always initialize your pointers to NULL, since an uninitialized pointer may contain a pseudorandom valid memory address which can make pointer errors go ahead silently. By enforcing a pointer to be initialized with NULL, you can always catch if you are using this pointer without initializing it. The reason is that operating systems "wire" the virtual address 0x00000000 to general protection exceptions to trap null pointer usage.
Also you might want to use dynamic memory allocation when you need to define a huge array, say int[10000]. You can't just put it in stack because then, hm... you'll get a stack overflow.
Another good example would be an implementation of a data structure, say linked list or binary tree. I don't have a sample code to paste here but you can google it easily.
(I'm writing because I feel the answers so far aren't quite on the mark.)
The reason you have to memory management worth mentioning is when you have a problem / solution that requires you to create complex structures. (If your programs crash if you allocate to much space on the stack at once, that's a bug.) Typically, the first data structure you'll need to learn is some kind of list. Here's a single linked one, off the top of my head:
typedef struct listelem { struct listelem *next; void *data;} listelem;
listelem * create(void * data)
{
listelem *p = calloc(1, sizeof(listelem));
if(p) p->data = data;
return p;
}
listelem * delete(listelem * p)
{
listelem next = p->next;
free(p);
return next;
}
void deleteall(listelem * p)
{
while(p) p = delete(p);
}
void foreach(listelem * p, void (*fun)(void *data) )
{
for( ; p != NULL; p = p->next) fun(p->data);
}
listelem * merge(listelem *p, listelem *q)
{
while(p != NULL && p->next != NULL) p = p->next;
if(p) {
p->next = q;
return p;
} else
return q;
}
Naturally, you'd like a few other functions, but basically, this is what you need memory management for. I should point out that there are a number tricks that are possible with "manual" memory management, e.g.,
Using the fact that malloc is guaranteed (by the language standard) to return a pointer divisible by 4,
allocating extra space for some sinister purpose of your own,
creating memory pools..
Get a good debugger... Good luck!
#Euro Micelli
One negative to add is that pointers to the stack are no longer valid when the function returns, so you cannot return a pointer to a stack variable from a function. This is a common error and a major reason why you can't get by with just stack variables. If your function needs to return a pointer, then you have to malloc and deal with memory management.
#Ted Percival:
...you don't need to cast malloc()'s return value.
You are correct, of course. I believe that has always been true, although I don't have a copy of K&R to check.
I don't like a lot of the implicit conversions in C, so I tend to use casts to make "magic" more visible. Sometimes it helps readability, sometimes it doesn't, and sometimes it causes a silent bug to be caught by the compiler. Still, I don't have a strong opinion about this, one way or another.
This is especially likely if your compiler understands C++-style comments.
Yeah... you caught me there. I spend a lot more time in C++ than C. Thanks for noticing that.
In C, you actually have two different choices. One, you can let the system manage the memory for you. Alternatively, you can do that by yourself. Generally, you would want to stick to the former as long as possible. However, auto-managed memory in C is extremely limited and you will need to manually manage the memory in many cases, such as:
a. You want the variable to outlive the functions, and you don't want to have global variable. ex:
struct pair{
int val;
struct pair *next;
}
struct pair* new_pair(int val){
struct pair* np = malloc(sizeof(struct pair));
np->val = val;
np->next = NULL;
return np;
}
b. you want to have dynamically allocated memory. Most common example is array without fixed length:
int *my_special_array;
my_special_array = malloc(sizeof(int) * number_of_element);
for(i=0; i
c. You want to do something REALLY dirty. For example, I would want a struct to represent many kind of data and I don't like union (union looks soooo messy):
struct data{
int data_type;
long data_in_mem;
};
struct animal{/*something*/};
struct person{/*some other thing*/};
struct animal* read_animal();
struct person* read_person();
/*In main*/
struct data sample;
sampe.data_type = input_type;
switch(input_type){
case DATA_PERSON:
sample.data_in_mem = read_person();
break;
case DATA_ANIMAL:
sample.data_in_mem = read_animal();
default:
printf("Oh hoh! I warn you, that again and I will seg fault your OS");
}
See, a long value is enough to hold ANYTHING. Just remember to free it, or you WILL regret. This is among my favorite tricks to have fun in C :D.
However, generally, you would want to stay away from your favorite tricks (T___T). You WILL break your OS, sooner or later, if you use them too often. As long as you don't use *alloc and free, it is safe to say that you are still virgin, and that the code still looks nice.
Sure. If you create an object that exists outside of the scope you use it in. Here is a contrived example (bear in mind my syntax will be off; my C is rusty, but this example will still illustrate the concept):
class MyClass
{
SomeOtherClass *myObject;
public MyClass()
{
//The object is created when the class is constructed
myObject = (SomeOtherClass*)malloc(sizeof(myObject));
}
public ~MyClass()
{
//The class is destructed
//If you don't free the object here, you leak memory
free(myObject);
}
public void SomeMemberFunction()
{
//Some use of the object
myObject->SomeOperation();
}
};
In this example, I'm using an object of type SomeOtherClass during the lifetime of MyClass. The SomeOtherClass object is used in several functions, so I've dynamically allocated the memory: the SomeOtherClass object is created when MyClass is created, used several times over the life of the object, and then freed once MyClass is freed.
Obviously if this were real code, there would be no reason (aside from possibly stack memory consumption) to create myObject in this way, but this type of object creation/destruction becomes useful when you have a lot of objects, and want to finely control when they are created and destroyed (so that your application doesn't suck up 1GB of RAM for its entire lifetime, for example), and in a Windowed environment, this is pretty much mandatory, as objects that you create (buttons, say), need to exist well outside of any particular function's (or even class') scope.