How can I do automatic memory management in C? - c

In C memory allocation/deallocation done by malloc and free.
In C++ memory allocation/deallocation done by new and delete.
There are some solutions in C++ for automatic memory management like:
Smart Pointers.
RAII (Resource Acquisition Is Initialization)
Reference counting and cyclic references
...
But how can I do automatic memory management in C?
Is there any solutions for AUTOMATIC memory management in C?
Is there any guidelines or something like that for C?
I want when I foget free a block of memory:
My code doesn't compile
-- or --
Memory automatically deallocated
And then I say: Oh, C is better than C++, Java and C#. :-)

You may use a Boehm garbage collector library.

As answered by Juraj Blaho, you can use a garbage collection library, such as the Boehm conservative garbage collector, but there are other ones : Ravenbrook's memory pool system, my (unmaintained) Qish GC, Matthew Plant's GC, etc...
And often, you can write your own garbage collector specialized for your use case. You could use in C the techniques mentioned in your question (smart pointers, reference counting), but you can also implement a mark & sweep GC, or a copying GC.
An important issue when coding your GC is to keep track of local pointer variables (to garbage collected data). You could keep them in local struct and chain them together.
I strongly suggest to read more about GC, e.g. the GC handbook. The algorithms there are useful in many situations.
You could even customize your GCC compiler (e.g. using MELT) to add checks or to generate code (e.g. code to scan local variables) for your particular GC implementation. Or you could use some pre-processor (e.g. GPP) for that
In practice, Boehm's GC is often good enough.
Notice that liveness of some data is a whole-program property. So it better to think about GC very early in the design phase of your software development.
Notice also that detecting reliably memory leaks by static source code analysis is in general impossible (undecidable), since it can be proven equivalent to the halting problem.

For linux, I use valgrind. Sure, the original reason for why valgrind was build was to debug your code, but it does a lot more. It will even tell you where potentially erroneous code could be in a non-invasive way. My own command line of choice is as follows.
# Install valgrind. Remove this line of code if you already have it installed
apt install valgrind
# Now, compile and valgrind the C
gcc main.c -Werror -fshort-enums -std=gnu11 -Og -g3 -dg -gdwarf-2 -rdynamic -o main
valgrind --quiet --leak-check=yes --tool=memcheck -Wall ./main
Hope this helps. ~ Happy Coding!

Related

is it possible to execute C code during C++ stack unwinding / exception

I need to write a C library which will be integrated into a C++ code base. This library may call C++ code passed as a callback. These functions may throw C++ exceptions.
I'd like to ensure that the cleanup code is run during the stack unwinding process. I could use the cleanup attribute to ensure that:
If -fexceptions is enabled, then cleanup_function is run during the stack unwinding that happens during the processing of the exception.
From the GCC docs.
Unfortunately I can't use the cleanup attribute. I'd like to register the cleanup function to be run during stack unwinding programmatically using portable C.
Is this possible?
I'd like to register the cleanup function to be run during stack unwinding programmatically using portable C.
Not possible in portable C.
The C11 standard n1570 does not even require any call stack and permit compiler optimizations not using it. In some cases, there is no "stack unwinding". Think of tail-call optimizations (try gcc -Wall -O3 -S -fverbose-asm with a recent GCC) and read this draft report explaining some gcc optimizations (work in progress in June 2020). If you think of C++, read n3337, its C++11 standard.
However, if you decide to use (specifically) a recent enough GCC (so GCC 10 in June 2020) you could consider using specific builtins or pragmas. GCC has a chapter about C language extensions and another one on C++ extensions and also one about invoking it.
You might even be interested in writing your GCC plugin, or in using its libgccjit or in reusing its libbacktrace by Ian Taylor.
On Linux, see also dlopen(3) and dlsym(3) and consider using Clang.
You could ask some help from e.g. AdaCore or on gcc-help#gcc.gnu.org public mailing list.

Is it possible to generate ansi C functions with type information for a moving GC implementation?

I am wondering what methods there are to add typing information to generated C methods. I'm transpiling a higher-level programming language to C and I'd like to add a moving garbage collector. However to do that I need the method variables to have typing information, otherwise I could modify a primitive value that looks like a pointer.
An obvious approach would be to encapsulate all (primitive and non-primitive) variables in a struct that has an extra (enum) variable for typing information, however this would cause memory and performance overhead, the transpiled code is namely meant for embedded platforms. If I were to accept the memory overhead the obvious option would be to use a heap handle for all objects and then I'd be able to freely move heap blocks. However I'm wondering if there's a more efficient better approach.
I've come up with a potential solution, namely to predeclare and group variables based whether they're primitives or not (I can do that in the transpiler), and add an offset variable to each method at the end (I need to be able to find it accurately when scanning the stack area), that tells me where the non-primitive variables begin and where they end, so I can only scan those. This means that each method will use an additional 16/32-bit (depending on arch) of memory, however this should still be more memory efficient than the heap handle approach.
Example:
void my_func() {
int i = 5;
int z = 3;
bool b = false;
void* person;
void* person_info = ...;
.... // logic
volatile int offset = 0x034;
}
My aim is for something that works universally across GCC compilers, thus my concerns are:
Can the compiler reorder the variables from how they're declared in
the source code?
Can I force the compiler to put some data in the
method's stack frame (using volatile)?
Can I find the offset accurately when scanning the stack?
I'd like to avoid assembly so this approach can work (by default) across multiple platforms, however I'm open for methods even if they involve assembly (if they're reliable).
Typing information could be somehow encoded in the C function name; this is done by C++ and other implementations and called name mangling.
Actually, you could decide, since all your C code is generated, to adopt a different convention: generate long C identifiers which are practically unique and sort-of random program-wide, such as tiziw_7oa7eIzzcxv03TmmZ and keep their typing information elsewhere (e.g. some database). On Linux, such an approach is friendly to both libbacktrace and dlsym(3) + dladdr(3) (and of course nm(1) or readelf(1) or gdb(1)), so used in both bismon and RefPerSys projects.
Typing information is practically tied to calling conventions and ABIs. For example, the x86-64 ABI for Linux mandates different processor registers for passing floating points or pointers.
Read the Garbage Collection handbook or at least P.Wilson Uniprocessor Garbage Collection Techniques survey. You could decide to use tagged integers instead of boxing them, and you could decide to have a conservative GC (e.g. Boehm's GC) instead of a precise one. In my old GCC MELT project I generated C or C++ code for a generational copying GC. Similar techniques are used both in Bismon and in RefPerSys.
Since you are transpiling to C, consider also alternatives, such as libgccjit or LLVM. Look into libjit and asmjit.
Study also the implementation of other transpilers (compilers to C), including Chicken/Scheme and Bigloo.
Can the GCC compiler reorder the variables from how they're declared in the source code?
Of course yes, depending upon the optimizations you are asking. Some variables won't even exist in the binary (e.g. those staying in registers).
Can I force the compiler to put some data in the method's stack frame (using volatile)?
Better generate a single struct variable containing all your language variables, and leave optimizations to the compiler. You will be surprised (see this draft report).
Can I find the offset accurately when scanning the stack?
This is the most difficult, and depends a lot of compiler optimizations (e.g. if you run gcc with -O1 or -O3 on the generated C code; in some cases a recent GCC -e.g GCC 9 or GCC 10 on x86-64 for Linux- is capable of tail-call optimizations; check by compiling using gcc -O3 -S -fverbose-asm then looking into the produced assembler code). If you accept some small target processor and compiler specific tricks, this is doable. Study the implementation of the Ocaml compiler.
Send me (to basile#starynkevitch.net) an email for discussion. Please mention the URL of your question in it.
If you want to have an efficient generational copying GC with multi-threading, things become extremely tricky. The question is then how many years of development can you afford spending.
If you have exceptions in your language, take also a great care. You could with great caution generate calls to longjmp.
See of course this answer of mine.
With transpiling techniques, the evil is in the details
On Linux (specifically!) see also my manydl.c program. It demonstrates that on a Linux x86-64 laptop you could generate, in practice, hundred of thousands of dlopen(3)-ed plugins. Read then How to write shared libraries
Study also the implementation of SBCL and of GNU Prolog, at least for inspiration.
PS. The dream of a totally architecture-neutral and operating-system independent transpiler is an illusion.

warning for not using free() malloc

Are there any safeguards built into GCC that check for memory leaks? If so how can I use them? When I compile with "gcc -Wall -o run run.c", the compiler does not seem to care if any allocated heap-space is being freed at the end of the code. I could not find any simple fixes for this on Google.
Thanks much for your time.
EDIT:
Google Searches did point to Valgrind among other tools. But I was curious as to why the compiler cant deal with this issue. As a newbie, it seemed a simple enough task to check if every "malloc" has a "free" associated with it.
There are two ways to analyze code for problems - static analysis and run-time analysis. Static analysis reads the code - this is what compilers do really well. Run-time analysis for code problems happens when the code is linked against another set of libraries that see what the code actually does as it runs under surveillance. Finding memory leaks is difficult for static analysis but not for a run-time analysis package.
Other run-time analyses are things like code coverage - does all parts of your code run? gcov does this, like valgrind and electric fence look for memory problems like leaks.
So, no, there are no really good compiler safeguards for testing memory leaks.
There is -fsanitize=leak GCC flag.
It overrides malloc/calloc/free to make them count allocated and freed blocks of memory.
If your program is compiled with this flag, it prints information about detected leaks to the terminal after execution.
You can read about it here and here.
Also, I have never used it, so this answer is completely based on GCC manual.

Examples showing how switching to a modern C compiler can help discover bugs?

I am preparing a note to convince people that switching from GCC2 to GCC4 (as a C compiler) is a good idea.
In particular, I think it can reveal existing bugs. I would like to give examples, but as a Java programmer my experience of this situations is limited. One example is return type checking, I guess.
What are other convincing examples showing that switching to a modern compiler can help discover bugs that exist in C code?
Well, some gcc options which is very useful in bugs discovery:
-finstrument-functions - helps to build function call stack tracer. Especially on architectures where built-in __builtin_return_address() scope is limited only to current function at hand. Stack tracer together with linker's symbol file generated with -Map linker option are indispensable tools for detecting memory leaks (suppose you develop embedded system on which Valgrind can't be run or etc.)
-fstack-protector-all is very useful for detecting where code writes bytes to memory in out-of-buffer place. So this option detects buffer-overflow type bugs.
Errr... just those two options are in mind. Possibly there are more which I don't know ...
I assume these people have a particular piece of code they're using gcc2 with. The best thing to do might be to just take that code and compile it in gcc4 with all possible warnings turned on and compare the difference.
Some other differences between gcc2 and gcc4 are likely to be:
Better compile times (gcc4 is probably faster)
Much better code run times (gcc4 is better at optimizing, and has knowledge of CPU architecture that did not exist when gcc2 came out).
Better warning/error messages
I'm sure there are some interesting new GNU C extensions in gcc4

dlmalloc crash on Win7

For some time now I've been happily using dlmalloc for a cross-platform project (Windows, Mac OS X, Ubuntu). Recently, however, it seems that using dlmalloc leads to a crash-on-exit on Windows 7.
To make sure that it wasn't something goofy in my project, I created a super-minimal test program-- it doesn't do anything but return from main. One version ("malloctest") links to dlmalloc and the other ("regulartest") doesn't. On WinXP, both run fine. On Windows 7, malloctest crashes. You can see screencasts of the tests here.
My question is: why is this happening? Is it a bug in dlmalloc? Or has the loader in Windows 7 changed? Is there a workaround?
fyi, here is the test code (test.cpp):
#include <stdio.h>
int main() {
return 0;
}
and here is the nmake makefile:
all: regulartest.exe malloctest.exe
malloctest.exe: malloc.obj test.obj
link /out:$# $**
regulartest.exe: test.obj
link /out:$# $**
clean:
del *.exe *.obj
For brevity, I won't include the dlmalloc source in this post, but you can get it (v2.8.4) here.
Edit: See these other relavent SO posts:
Is there a way to redefine malloc at link time on Windows?
Globally override malloc in visual c++
Looks like a bug in the C runtime. Using Visual Studio 2008 on Windows 7, I reproduced the same problem. After some quick debugging by putting breakpoints in dlmalloc and dlfree, I saw that dlfree was getting called with an address that it never returned earlier from dlmalloc, and then it was hitting an access violation shortly thereafter.
Thankfully, the C runtime's source code is distributed along with VS, so I could see that this call to free was coming from the __endstdio function in _file.c. The corresponding allocation was in __initstdio, and it was calling _calloc_crt to allocate its memory. _calloc_crt calls _calloc_impl, which calls HeapAlloc to get memory. _malloc_crt (used elsewhere in the C runtime, such as to allocate memory for the environment and for argv), on the other hand, calls straight to malloc, and _free_crt calls straight to free.
So, for the memory that gets allocated with _malloc_crt and freed with _free_crt, everything is fine and dandy. But for the memory that gets allocated with _calloc_crt and freed with _free_crt, bad things happen.
I don't know if replacing malloc like this is supported -- if it is, then this is a bug with the CRT. If not, I'd suggest looking into a different C runtime (e.g. MinGW or Cygwin GCC).
Using dlmalloc in cross-platform code is an oxymoron. Replacing any standard C functions (especially malloc and family) results in undefined behavior. The closest thing to a portable way to replace malloc is using search-and-replace (not #define; that's also UB) on the source files to call (for example) my_malloc instead of malloc. Note that internal C library functions will still use their standard malloc, so if the two conflict, things will still blow up. Basically, trying to replace malloc is just really misguided. If your system really has a broken malloc implementation (too slow, too much fragmentation, etc.) then you need to do your replacement in an implementation-specific way, and disable the replacement on all systems except ones where you've carefully checked that your implementation-specific replacement works correctly.

Resources