Why does GCC not attempt memory leak checking? - c

While I rarely use C anymore, I was thinking about the rule I was always told that "if you call malloc() (or new), you must call free() (or delete)". It brought me to wondering if GCC (or another C compiler) attempts to perform any kind of memory and warn the user of a potential issue. I never heard about it, so I suspected it wasn't the case, but I wanted to find out.
Here's the example I used:
#include <stdlib.h>
int main() {
int* first = malloc(sizeof(int) * 10);
int* second = malloc(sizeof(int) * 10);
int i = 0;
for (; i < 10; i++) {
first[i] = i * 2;
second[i] = i * 10;
}
free(first); /* Forgot to free second */
return 0;
}
When compiling with gcc -Wall free_test.c no warnings were generated. While I can see why the compiler cannot provide a perfect answer because you're dealing with heap memory and managing this at run time, why does the compiler not appear to attempt to provide a warning that there could be a memory leak?
Some of my thoughts on why include:
The compiler may not have a perfect way of telling when memory is freed. For example, if one of those pointers was passed into a function and freed from within the function, the compiler may not be able to tell that.
If a different thread takes ownership of the memory, the compiler would not have any way to tell that somebody else could be freeing memory.

Cases that can't be detected via static analysis (let alone via trivial static analysis) vastly outnumber those that can. The compiler authors presumably decided the benefits of adding this extra complexity to GCC was outweighed by the costs.

It's a bit more complex than counting the free and malloc calls. Imagine a library, where some library functions assume that the library calls malloc, but assumes that the user of the library will call free.

Related

What does "C6011 dereferencing null pointer" mean in my program?

I have a simple exercise in C (see code below). The program takes a vector with three components and double each component. The IDE showed me this warning (green squiggle): C6011 dereferencing null pointer v. in this line: v[0] = 12;. I think it's a bug because in the debugger I read the program exited with code 0. What do you think about it?
#include <stdlib.h>
#include <stdint.h>
void twice_three(uint32_t *x) {
for (size_t i = 0; i < 3; ++i) {
x[i] = 2 * *x;
}
}
int main(void) {
uint32_t *v = malloc(3 * sizeof(uint32_t));
v[0] = 12;
v[1] = 59;
v[2] = 83;
twice_three(v);
free(v);
return 0;
}
NB: I'm using Visual Studio.
First of all, note that the warning is generated by your compiler (or static analyzer, or linter), not by your debugger, as you initially wrote.
The warning is telling you that your program possibly might dereference a null pointer. The reason for this warning is that you perform a malloc() and then use the result (the pointer) without checking for NULL values. In this specific code example, malloc() will most likely just return the requested block of memory. On any desktop computer or laptop, there's generally no reason why it would fail to allocate 12 bytes. That's why your application just runs fine and exits successfully. However, if this would be part of a larger application and/or run on a memory-limited system such as an embedded system, malloc() could fail and return NULL. Note that malloc() does not only fail if there is not enough memory available, it could also fail if there is no large enough consecutive block of memory available, due to fragmentation.
According to the C standard, dereferencing a NULL pointer is undefined behavior, meaning that anything could happen. On modern computers it would likely get your application killed (which could lead to data loss or corruption, depending on what the application does). On older computers or embedded systems the problem might be undetected and your application would read from or (worse) write to the address NULL (which is most likely 0, but even that isn't guaranteed by the C standard). This could lead to data corruption, crashes or other unexpected behavior at an arbitrary time after this happened.
Note that the compiler/analyzer/linter doesn't know anything about your application or the platform you will be running it on, and it doesn't make any assumptions about it. It just warns you about this possible problem. It's up to you to determine if this specific warning is relevant for your situation and how to deal with it.
Generally speaking, there are three things you can do about it:
If you know for sure that malloc() would never fail (for example, in such a toy example that you would only run on a modern computer with gigabytes of memory) or if you don't care about the results (because the application will be killed by your OS and you don't mind), then there's no need for this warning. Just disable it in your compiler, or ignore the warning message.
If you don't expect malloc() to fail, but do want to be informed when it happens, the quick-and-dirty solution is to add assert(v != NULL); after the malloc. Note that this will also exit your application when it happens, but in a slightly more controlled way, and you'll get an error message stating where the problem occurred. I would recommend this for simple hobby projects, where you do not want to spend much time on error handling and corner cases but just want to have some fun programming :-)
When there is a realistic change that malloc() would fail and you want a well-defined behavior of your application, you should definitely add code to handle that situation (check for NULL values). If this is the case, you would generally have to do more than just add an if-statement. You would have to think about how the application can continue to work or gracefully shutdown without requiring more memory allocations. And on an embedded system, you would also have to think about things such as memory fragmentation.
The easiest fix for the example code in question is add the NULL-check. This would make the warning go away, and (assuming malloc() would not fail) your program would run still the same.
int main(void) {
uint32_t *v = malloc(3 * sizeof(uint32_t));
if (v != NULL) {
v[0] = 12;
v[1] = 59;
v[2] = 83;
twice_three(v);
free(v);
}
return 0;
}
I believe your IDE is warning you that you didn't make sure that malloc returned something other than NULL. malloc can return NULL when you run out of memory to allocate.
It's debatable whether such a check is needed. In the unlikely event malloc returned NULL, your program would end up getting killed (on modern computers with virtualized memory).[1] So the question is whether you want a clean message or not on exit in the very very rare situation that you run out of memory.
If you do add a check, don't use assert. That's useless. For starters, it only works in dev builds (not production builts) where malloc returning NULL is unlikely, and where it's already super easy to find memory leaks (e.g. by using valgrind). Use a proper check (if (!v) { perror(NULL); exit(1) }).
Since people are trying to debate the issue in the comments despite the rules, it looks like I'll have to go into my claim in more detail.
A couple of people suggested in the comments that "anything could happen" if you ones doesn't check for NULL, but that's simply not true on modern computers with virtualized memory.
When the C spec doesn't define the behaviour of something (what is called "undefined behaviour"), it doesn't mean anything can happen; it just means the C language doesn't care what the compiler/machine does in such situations. And a NULL dereference is very well defined on such systems. Catching such situations is a raison d'être of memory virtualization!
Just like you can rely on other compiler-specific features such as gcc's field packing attributes, one can argue it's fine to rely on memory virtualization to detect a failure by malloc.
Always check the result of malloc.
Use objects not types in sizeof
int main(void) {
uint32_t *v = malloc(3 * sizeof(*v));
if(v)
{
v[0] = 12;
v[1] = 59;
v[2] = 83;
twice_three(v);
}
free(v);
return 0;
}

How to make malloc return the same address every time using MSVC?

For debugging purposes, I would like malloc to return the same addresses every time the program is executed, however in MSVC this is not the case.
For example:
#include <stdlib.h>
#include <stdio.h>
int main() {
int test = 5;
printf("Stack: %p\n", &test);
printf("Heap: %p\n", malloc(4));
return 0;
}
Compiling with cygwin's gcc, I get the same Stack address and Heap address everytime, while compiling with MSVC with aslr off...
cl t.c /link /DYNAMICBASE:NO /NXCOMPAT:NO
...I get the same Stack address every time, but the Heap address changes.
I have already tried adding the registry value HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\MoveImages but it does not work.
Both the stack address and the pointer returned by malloc() may be different every time. As a matter of fact both differ when the program is compiled and run on Mac/OS multiple times.
The compiler and/or the OS may cause this behavior to try and make it more difficult to exploit software flaws. There might be a way to prevent this in some cases, but if your goal is to replay the same series of malloc() addresses, other factors may change the addresses, such as time sensitive behaviors, file system side effects, not to mention non-deterministic thread behavior. You should try and avoid relying on this for your tests.
Note also that &test should be cast as (void *) as %p expects a void pointer, which is not guaranteed to have the same representation as int *.
It turns out that you may not be able to obtain deterministic behaviour from the MSVC runtime libraries. Both the debug and the production versions of the C/C++ runtime libraries end up calling a function named _malloc_base(), which in turn calls the Win32 API function HeapAlloc(). Unfortunately, neither HeapAlloc() nor the function that provides its heap, HeapCreate(), document a flag or other way to obtain deterministic behaviour.
You could roll up your own allocation scheme on top of VirtualAlloc(), as suggested by #Enosh_Cohen, but then you'd loose the debug functionality offered by the MSVC allocation functions.
Diomidis' answer suggests making a new malloc on top of VirtualAlloc, so I did that. It turned out to be somewhat challenging because VirtualAlloc itself is not deterministic, so I'm documenting the procedure I used.
First, grab Doug Lea's malloc. (The ftp link to the source is broken; use this http alternative.)
Then, replace the win32mmap function with this (hereby placed into the public domain, just like Doug Lea's malloc itself):
static void* win32mmap(size_t size) {
/* Where to ask for the next address from VirtualAlloc. */
static char *next_address = (char*)(0x1000000);
/* Return value from VirtualAlloc. */
void *ptr = 0;
/* Number of calls to VirtualAlloc we have made. */
int tries = 0;
while (!ptr && tries < 100) {
ptr = VirtualAlloc(next_address, size,
MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
if (!ptr) {
/* Perhaps the requested address is already in use. Try again
* after moving the pointer. */
next_address += 0x1000000;
tries++;
}
else {
/* Advance the request boundary. */
next_address += size;
}
}
/* Either we got a non-NULL result, or we exceeded the retry limit
* and are going to return MFAIL. */
return (ptr != 0)? ptr: MFAIL;
}
Now compile and link the resulting malloc.c with your program, thereby overriding the MSVCRT allocator.
With this, I now get consistent malloc addresses.
But beware:
The exact address I used, 0x1000000, was chosen by enumerating my address space using VirtualQuery to look for a large, consistently available hole. The address space layout appears to have some unavoidable non-determinism even with ASLR disabled. You may have to adjust the value.
I confirmed this works, in my particular circumstances, to get the same addresses during 100 sequential runs. That's good enough for the debugging I want to do, but the values might change after enough iterations, or after rebooting, etc.
This modification should not be used in production code, only for debugging. The retry limit is a hack, and I've done nothing to track when the heap shrinks.

How to deal with assert() in a function, when you have dynamically allocated memory in main?

I have the following C function:
void mySwap(void * p1, void * p2, int elementSize)
{
void * temp = (void*) malloc(elementSize);
assert(temp != NULL);
memcpy(temp, p1, elementSize);
memcpy(p1, p2, elementSize);
memcpy(p2, temp, elementSize);
free(temp);
}
that I want to use in a generic sorting function. Let's suppose that I use it to sort a dynamically allocated array owned by main(). Now let's suppose that at some point temp in mySwap() is actually NULL and the whole program is aborted without freeing the dynamically allocated array in main(). I thought that both mySwap() and the sorting function could return a bool value indicating whether the allocation was successful or not and by using if statements I could free the array in main() and exit(EXIT_FAILURE), but it doesn't seem like a very elegant sollution. What would be a good way to prevent a memory leak in such an instance?
assert is typically used during debugging to identify problems/errors that should never occur.
Out of memory is something that can occur, and so either should not be handled by assert, or, if you do use assert, beware that it will abort the program. Once the program aborts, all memory used by the program is deallocated, so don't worry about that.
Note: If you don't want to have unwieldy if statements everywhere just to handle errors that hardly ever occur, you can use setjmp/longjmp to return to a recoverable state.
You have to realize that the reason malloc fails is because your computer has ran out of memory. From that point and onwards, there's nothing meaningful that your program can do, except terminating as gracefully as you can.
The OS will free the memory upon program termination, so that's not something you need to worry about.
Still, in the normal case, it is of course good practice to free() yourself, manually. Not so much for the sake of "making the memory available again" - the OS will ensure that - but to verify that your program has not gone terribly wrong along the way and created heap corruption, leaks or other bugs. If you have such bugs in your program, it will crash during the free() call, which is a good thing, as the bugs will surface.
assert should preferably not be used in production code. Build your own error handling if needed, that's something better than just violently terminating your own program in the middle of execution.
Avoid the problem by not using malloc.
Instead of allocating a block of memory for every swap, do the swap one byte at a time;
for (int i = 0; i < elementSize; ++i) {
char tmp = ((char*)p1)[i];
((char*)p1)[i] = ((char*)p2)[i];
((char*)p2)[i] = tmp;
}
Only use assert() to catch programmer-error during development, in release-builds it doesn't do anything. If you need to test other things, use proper error-handling, whether that means abort(), return-codes or emulating exceptions using setjmp()/longjmp().
As an aside, do not cast the result of malloc().

Working of malloc in C

I am a beginner with C. I am wondering, how's malloc working.
Here is a sample code, I wrote on while trying to understand it's working.
CODE:
#include<stdio.h>
#include<stdlib.h>
int main() {
int i;
int *array = malloc(sizeof *array);
for (i = 0; i < 5; i++) {
array[i] = i+1;
}
printf("\nArray is: \n");
for (i = 0; i < 5; i++) {
printf("%d ", array[i]);
}
free(array);
return 0;
}
OUTPUT:
Array is:
1 2 3 4 5
In the program above, I have only allocated space for 1 element, but the array now holds 5 elements. So as the programs runs smoothly without any error, what is the purpose of realloc().
Could anybody explain why?
Thanks in advance.
The fact that the program runs smoothly does not mean it is correct!
Try to increase the 5 in the for loop to some extent (500000, for instance, should suffices). At some point, it will stop working giving you a SEGFAULT.
This is called Undefined Behaviour.
valgrind would also warn you about the issue with something like the following.
==16812== Invalid write of size 4
==16812== at 0x40065E: main (test.cpp:27)
If you dont know what valgrind is check this out: How do I use valgrind to find memory leaks?. (BTW it's a fantastic tool)
This should help gives you some more clarifications: Accessing unallocated memory C++
This is typical undefined behavior (UB).
You are not allowed to code like that. As a beginner, think it is a mistake, a fault, a sin, something very dirty etc.
Could anybody explain why?
If you need to understand what is really happening (and the details are complex) you need to dive into your implementation details (and you don't want to). For example, on Linux, you could study the source code of your C standard library, of the kernel, of the compiler, etc. And you need to understand the machine code generated by the compiler (so with GCC compile with gcc -S -O1 -fverbose-asm to get an .s assembler file).
See also this (which has more references).
Read as soon as possible Lattner's blog on What Every C programmer should know about undefined behavior. Every one should have read it!
The worst thing about UB is that sadly, sometimes, it appears to "work" like you want it to (but in fact it does not).
So learn as quickly as possible to avoid UB systematically.
BTW, enabling all warnings in the compiler might help (but perhaps not in your particular case). Take the habit to compile with gcc -Wall -Wextra -g if using GCC.
Notice that your program don't have any arrays. The array variable is a pointer (not an array) so is very badly named. You need to read more about pointers and C dynamic memory allocation.
int *array = malloc(sizeof *array); //WRONG
is very wrong. The name array is very poorly chosen (it is a pointer, not an array; you should spend days in reading what is the difference - and what do "arrays decay into pointers" mean). You allocate for a sizeof(*array) which is exactly the same as sizeof(int) (and generally 4 bytes, at least on my machine). So you allocate space for only one int element. Any access beyond that (i.e. with any even small positive index, e.g. array[1] or array[i] with some positive i) is undefined behavior. And you don't even test against failure of malloc (which can happen).
If you want to allocate memory space for (let's say) 8 int-s, you should use:
int* ptr = malloc(sizeof(int) * 8);
and of course you should check against failure, at least:
if (!ptr) { perror("malloc"); exit(EXIT_FAILURE); };
and you need to initialize that array (the memory you've got contain unpredictable junk), e.g.
for (int i=0; i<8; i++) ptr[i] = 0;
or you could clear all bits (with the same result on all machines I know of) using
memset(ptr, 0, sizeof(int)*8);
Notice that even after a successful such malloc (or a failed one) you always have sizeof(ptr) be the same (on my Linux/x86-64 box, it is 8 bytes), since it is the size of a pointer (even if you malloc-ed a memory zone for a million int-s).
In practice, when you use C dynamic memory allocation you need to know conventionally the allocated size of that pointer. In the code above, I used 8 in several places, which is poor style. It would have been better to at least
#define MY_ARRAY_LENGTH 8
and use MY_ARRAY_LENGTH everywhere instead of 8, starting with
int* ptr = malloc(MY_ARRAY_LENGTH*sizeof(int));
In practice, allocated memory has often a runtime defined size, and you would keep somewhere (in a variable, a parameter, etc...) that size.
Study the source code of some existing free software project (e.g. on github), you'll learn very useful things.
Read also (perhaps in a week or two) about flexible array members. Sometimes they are very useful.
So as the programs runs smoothly without any error
That's just because you were lucky. Keep running this program and you might segfault soon. You were relying on undefined behaviour (UB), which is always A Bad Thing™.
What is the purpose of realloc()?
From the man pages:
void *realloc(void *ptr, size_t size);
The realloc() function changes the size of the memory block pointed to
by ptr to size bytes. The contents will be unchanged in the range
from the start of the region up to the minimum of the old and new sizes. If the new size is larger than the old size, the added
memory
will not be initialized. If ptr is NULL, then the call is equivalent to malloc(size), for all values of size; if size is equal
to zero,
and ptr is not NULL, then the call is equivalent to free(ptr). Unless ptr is NULL, it must have been returned by an
earlier call to malloc(), calloc() or realloc(). If the area pointed to was moved, a free(ptr) is done.

How do I annotate BoehmGC-collected code for Splint?

Splint does a good job tracking down memory leaks in C code. Every malloc() should have a matching free(). But BoehmGC-collected code uses GC_MALLOC() with no matching GC_FREE(). This makes Splint go crazy with tons of messages about memory leaks that aren't actually there.
Does anyone know the proper annotation for such code so that Splint no longer shows spurious memory leak messages?
In particular, could someone annotate Wikipedia's BoehmGC example?
#include <assert.h>
#include <stdio.h>
#include <gc.h>
int main(void)
{
int i;
GC_INIT();
for (i = 0; i < 10000000; ++i)
{
int **p = GC_MALLOC(sizeof(int *));
int *q = GC_MALLOC_ATOMIC(sizeof(int));
assert(*p == 0);
*p = GC_REALLOC(q, 2 * sizeof(int));
if (i % 100000 == 0)
printf("Heap size = %zu\n", GC_get_heap_size());
}
return 0;
}
I think that you should annotate the BoehmGC API itself, and then the annotations needed for the example (if any) will become obvious.
For starters, any pointer returned by a function with no annotation is implicitly #only, which means that you must release the associated memory before reference is lost. Therefore, the first step would be to annotate the allocators so that they no longer return an #only reference. Instead, the manual advises using shared references:
If Splint is used to check a program designed to be used in a
garbage-collected environment, there may be storage that is shared by
one or more references and never explicitly released. The shared
annotation declares storage that may be shared arbitrarily, but never
released.
If you didn't want to modify the BoehmGC API, you could work around it by creating properly annotated, wrapper functions. In addition, you would need to disable specific transfer errors within your wrapper functions (because they get an implicit #only reference from the BoehmGC API and then return it as #shared).
For example, this is the way you would disable the "Statement has no effect" error at a given point of your code:
/*#-noeffectuncon#*/
not_annotated_void_function();
/*#=noeffectuncon#*/
The wrapper function would be something like this:
/*#shared#*/ /*#null#*/ /*#out#*/ static void * MY_GC_MALLOC(size_t size) /*#*/{
/*#-onlytrans#*/
return( GC_MALLOC(size) );
/*#=onlytrans#*/
}
Then in the example you'd use MY_GC_MALLOC rather than GC_MALLOC.

Resources