Garbage collection libraries - c

Are there any "good" C libraries for garbage collection?
I know about the Boehm GC, is it maintained nowadays?
What about http://tinygc.sourceforge.net?
What are your experiences with these libraries?

You could use Boehm's Garbage Collector. Many projects I have worked with use it.

Although I do not know of a good, easy, and effective GC for C, I would like to weigh in some thoughts.
For many years, I avoided heap allocations for fear of memory leaks. And I do not know of any way to have a memory leak without allocating from the heap.
But I do not know of any better way to return memory from a function (such as a string, or an array of structs) than by heap allocation. Heap allocation allows you to allocate precisely the amount of memory you need without knowing how much memory you need at compile time. If your program will load files to memory, you probably will not know at compile time how much memory you might need. If you go the static memory route, then you always have to allocate the maximum memory you might ever need. Then your program can become a memory hog when it does not need to be. Heap is better.
But then keeping track of heap allocations can be difficult in some kinds of programs, for example, a program that has many modules inserting and deleting elements from an in-memory database. And memory leaks can be so difficult to solve. A good GC seems to me a very good way to stop memory leaks because of the automated tracking and freeing of heap allocations.
So, this post responds to the comment that "If you have problems in C using malloc() and free() correctly, you should switch to another language." If the programming problem is complicated and the solution needs heap allocations, the even the best programmers will have to debug memory leaks.
In some programs, there could be an alternative to using garbage collection: You can assume that an operating system will free all the program's allocations when the program terminates. So, your main program might be able to call a second program that uses heap allocations then writes results to a file and terminates. I tested this on Windows. I observed that Windows deallocated heap memory that my test program allocated on the test program's termination. I ran the test program many times. There was no diminishing of available memory.
Of course, having a second program that runs a short while and terminates is often not a viable solution.

Related

Free() before return 0;

What happen if I end the execution by passing return 0; after using a malloc and without freeing the part of memory allocated?
int * var;
var = (int *)malloc(sizeof(int)) ;
free(var) ;
return 0;
The program contains a memory leak, as explained here. To answer the question specifically, the effect of the memory leak depends on the environment; in the best case, nothing happens, in the worst case, the machine might crash sooner or later. The existence of the memory leak is to be regarded as a bug in any case.
It is implementation specific.
On most operating systems (notably desktop or server OSes, e.g. Linux, MacOSX, Windows...) once a process is terminated, every used resources is released, and that includes its virtual address space. Hence even unfreed heap memory is released.
In particular, if you code a quickly running program (e.g. you know that it always will run in less than few seconds), it might be easier to accept some memory leak (but if you do, please comment that in your code and/or documentation). And many real life programs are doing that (in particular the GCC compiler, and probably several Unix shells).
On the contrary, if you are coding a server (e.g. a database or compute server), you should avoid any memory leaks which would make the server's process RSS grow indefinitely (till some crash). You usually should take great care that every allocated heap memory to deal with one request gets free-d after replying to that request.
On some embedded operating systems, if you don't release all the resources explicitly (free all heap allocated memory, fclose all opened streams) and properly, you have a resource leak.
See also this related question. On many OSes (including Linux) you can use valgrind to hunt memory leak bugs. With a recent gcc you might use debugging options like -g -fsanitize=address
Read also (at least for the concepts and the terminology) something about garbage collection and about fragmentation. If programming in C, you might consider using Boehm's garbage collector.
There is however a very good practical reason to systematically free all previously malloc-ed memory: it is a good programming discipline, and it helps a lot using tools like valgrind which helps debugging genuine memory bugs. Also it makes your code more clean, and you could reuse it in some different contexts (e.g. as part of some library, usable in long-running processes).
If you don't free the memory, your application will have a memory leak. So if you don't free the memory and leave the application on for a couple of days, it will start to slow down, and eventually crash.

Do we really have to free() when we malloc()? What makes it different from an automatic variable then?

The OS will just recover it (after the program exits) right? So what's the use other than good programming style? Or is there something I'm misunderstanding? What makes it different from "automatic" allocation since both can be changed during run time, and both end after program execution?
When your application is working with vast amounts of data, you must free in order to conserve heap space. If you don't, several bad things can happen:
the OS will stop allocating memory for you (crashing)
the OS will start swapping your data to disk (thrashing)
other applications will have less space to put their data
The fact that the OS collects all the space you allocate when the application exits does not mean you should rely upon this to write a solid application. This would be like trying to rely on the compiler to optimize poor programming. Memory management is crucial for good performance, scalability, and reliability.
As others have mentioned, malloc allocates space in the heap, while auto variables are created on the stack. There are uses for both, but they are indeed very different. Heap space must be allocated and managed by the OS and can store data dynamically and of different sizes.
If you call a macro for thousand times without using free() then compiler or safe to say system will assign you thousand different address, but if you use free() after each malloc then only one memory address will be given to you every time.
So chances of memory leak, bus error, memory out of bound and crash would be minimum.
Its safe to use free().
In C/C++ "auto" variables are allocated on the stack. They are destroyed right at the exit from the function. This will happen automatically. You do not need to write anything for this.
Heap allocations (result of a call to malloc) are either released explicitly (with a call to free) or they are cleaned up when the process ends.
If you are writing small program that will be used maybe once or twice, then it is ok not to free your heap allocations. This is not nice but acceptable.
If you are writing medium or big project or are planning to include your code into other project, you should definitely release every heap allocation. Not doing this will create HUGE trouble. The heap memory is not endless. Program may use it all. Even if you will allocate small amount of memory, this will still create unnedded pressure on the OS, cause swapping, etc.
The bottom line: freeing allocations is much more than just a style or a good habit.
An automatic variable is destroyed (and its memory is re-usable) as soon as you exit the scope in which it is defined. For most variables that's much earlier than program exit.
If you malloc and don't free, then the memory isn't re-usable until the program exits. Not even then, on some systems with very minimal OS.
So yes, there's big difference between an automatic variable and a leaked memory allocation. Call a function that leaks an allocation enough times, and you'll run out of memory. Call a function with an automatic variable in it as many times as you like, the memory is re-usable.
It is good programming style and it's more than that. Not doing proper memory management in non-trivial programs will eventually influence the usability of your program. Sure the OS can reclaim any resources that you've allocated/used after your program terminates, but that doesn't alleviate the burden or potential issues during program execution.
Consider the web browser that you've used to post this question: if the browser is written in a language that requires memory management, and the code didn't do it properly, how long do you think it would be before you'd notice that it's eating up all your memory? How long do you think the browser would remain usable? Now consider that users often leave browsers open for long periods of time: without proper memory management, they would become unusable after few page loads.
If your program does not exit immediately and you're not freeing your memory you're going to end up wasting it. Either you'll run out of memory eventually, or you'll start swapping to disk (which is slow, and also not unlimited).
automatic variable is on the stack and its size should be known on compilation time. if you need to store data that you don't the size, for example, maintain a binary tree, where the user add and removes objects. beside that stack size might be limited (depends on your target), for example, linux kernel the stack is 4k-8k usually. you also trash the instruction cache, which affects performance,
Yes you absolutely have to use free() after malloc() (as well as closing files and other resources when you're done). While it's true that the OS will recover it after execution, a long running process will leak memory that way. If your program is as simple as a main method that runs a single method then exists, it's probably not a big deal, albeit incredibly sloppy. You should get in the habit of managing memory properly in C because one day you may want to write a nontrivial program that runs for more than a second, and if you don't learn how to do it in advance, you'll have a huge headache dealing with memory leaks.

Linux heap - is doing a ton of new/deletes okay or does the heap become badly fragmented?

I'm not familiar with how the Linux heap is allocated.
I'm calling malloc()/free() many many times a second, always with the same sizes (there are about 10 structs, each fixed size). Aside from init time, none of my memory remains allocated for long periods of time.
Is this considered poor form with the standard heaps? (I'm sure someone will ask 'what heap are you using?' - 'Ugh. The standard static heap' ..meaning I'm unsure.)
Should I instead use a free list or does the heap tolerate lots of the same allocations. I'm trying to balance readability with performance.
Any tools to help me measure?
First of all, unless you have measured a problem with memory usage blowing up, don't even think about using a custom allocator. It's one of the worst forms of premature optimization.
At the same time, even if you do have a problem, a better solution than a custom allocator would be figuring out why you're allocating and freeing objects so much, and fixing the design issue that's causing it.
To address your specific question, glibc's allocator is based on the dlmalloc algorithm, which is near-optimal when it comes to fragmentation. The only way you'll get it to badly fragment memory is the unavoidable way: by allocating objects with radically different lifetimes in alternation, e.g. allocating a large number of objects but only freeing every other one. I think you'll have a hard time working out an allocation pattern that will give worse total memory usage than pools...
Valgrind has a special tool Massif for measuring memory usage. This should help to profile heap allocations.
I think the best performance optimization is to avoid heap allocation wherever it is possible (and reasonable). Whenever an object is stack allocated the compiler will just move the stack pointer up instead of trying to find a free spot or return the allocated memory to some free list.
How is the lifetime of your structures defined? If you can express your object lifetime by scoping then this would indeed increase performance.

C - Design your own free( ) function

Today, I appeared for an interview and the interviewer asked me this,
Tell me the steps how will you design your own free( ) function for
deallocate the allocated memory.
How can it be more efficient than C's default free() function ? What can you conclude ?
I was confused, couldn't think of the way to design.
What do you think guys ?
EDIT : Since we need to know about how malloc() works, can you tell me the steps to write our own malloc() function
That's actually a pretty vague question, and that's probably why you got confused. Does he mean, given an existing malloc implementation, how would you go about trying to develop a more efficient way to free the underlying memory? Or was he expecting you to start discussing different kinds of malloc implementations and their benefits and problems? Did he expect you to know how virtual memory functions on the x86 architecture?
Also, by more efficient, does he mean more space efficient or more time efficient? Does free() have to be deterministic? Does it have to return as much memory to the OS as possible because it's in a low-memory, multi-tasking environment? What's our criteria here?
It's hard to say where to start with a vague question like that, other than to start asking your own questions to get clarification. After all, in order to design your own free function, you first have to know how malloc is implemented. So chances are, the question was really about whether or not you knew anything about how malloc can be implemented.
If you're not familiar with the internals of memory management, the easiest way to get started with understanding how malloc is implemented is to first write your own.
Check out this IBM DeveloperWorks article called "Inside Memory Management" for starters.
But before you can write your own malloc/free, you first need memory to allocate/free. Unfortunately, in a protected mode OS, you can't directly address the memory on the machine. So how do you get it?
You ask the OS for it. With the virtual memory features of the x86, any piece of RAM or swap memory can be mapped to a memory address by the OS. What your program sees as memory could be physically fragmented throughout the entire system, but thanks to the kernel's virtual memory manager, it all looks the same.
The kernel usually provides system calls that allow you to map in additional memory for your process. On older UNIX OS's this was usually brk/sbrk to grow heap memory onto the edge of your process or shrink it off, but a lot of systems also provide mmap/munmap to simply map a large block of heap memory in. It's only once you have access to a large, contiguous looking block of memory that you need malloc/free to manage it.
Once your process has some heap memory available to it, it's all about splitting it into chunks, with each chunk containing its own meta information about its size and position and whether or not it's allocated, and then managing those chunks. A simple list of structs, each containing some fields for meta information and a large array of bytes, could work, in which case malloc has to run through the list until if finds a large enough unallocated chunk (or chunks it can combine), and then map in more memory if it can't find a big enough chunk. Once you find a chunk, you just return a pointer to the data. free() can then use that pointer to reverse back a few bytes to the member fields that exist in the structure, which it can then modify (i.e. marking chunk.allocated = false;). If there's enough unallocated chunks at the end of your list, you can even remove them from the list and unmap or shrink that memory off your process's heap.
That's a real simple method of implementing malloc though. As you can imagine, there's a lot of possible ways of splitting your memory into chunks and then managing those chunks. There's as many ways as there are data structures and algorithms. They're all designed for different purposes too, like limiting fragmentation due to small, allocated chunks mixed with small, unallocated chunks, or ensuring that malloc and free run fast (or sometimes even more slowly, but predictably slowly). There's dlmalloc, ptmalloc, jemalloc, Hoard's malloc, and many more out there, and many of them are quite small and succinct, so don't be afraid to read them. If I remember correctly, "The C Programming Language" by Kernighan and Ritchie even uses a simple malloc implementation as one of their examples.
You can't blindly design free() without knowing how malloc() works under the hood because your implementation of free() would need to know how to manipulate the bookkeeping data and that's impossible without knowing how malloc() is implemented.
So an unswerable question could be how you would design malloc() and free() instead which is not a trivial question but you could answer it partially for example by proposing some very simple implementation of a memory pool that would not be equivalent to malloc() of course but would indicate your presence of knowledge.
One common approach when you only have access to user space (generally known as memory pool) is to get a large chunk of memory from the OS on application start-up. Your malloc needs to check which areas of the right size of that pool are still free (through some data structure) and hand out pointers to that memory. Your free needs to mark the memory as free again in the data structure and possibly needs to check for fragmentation of the pool.
The benefits are that you can do allocation in nearly constant time, the drawback is that your application consumes more memory than actually is needed.
Tell me the steps how will you design your own free( ) function for deallocate the allocated memory.
#include <stdlib.h>
#undef free
#define free(X) my_free(X)
inline void my_free(void *ptr) { }
How can it be more efficient than C's default free() function ?
It is extremely fast, requiring zero machine cycles. It also makes use-after-free bugs go away. It's a very useful free function for use in programs which are instantiated as short-lived batch processes; it can usefully be deployed in some production situations.
What can you conclude ?
I really want this job, but in another company.
Memory usage patterns could be a factor. A default implementation of free can't assume anything about how often you allocate/deallocate and what sizes you allocate when you do.
For example, if you frequently allocate and deallocate objects that are of similar size, you could gain speed, memory efficiency, and reduced fragmentation by using a memory pool.
EDIT: as sharptooth noted, only makes sense to design free and malloc together. So the first thing would be to figure out how malloc is implemented.
malloc and free only have a meaning if your app is to work on top of an OS. If you would like to write your own memory management functions you would have to know how to request the memory from that specific OS or you could reserve the heap memory right away using existing malloc and then use your own functions to distribute/redistribute the allocated memory through out your app
There is an architecture that malloc and free are supposed to adhere to -- essentially a class architecture permitting different strategies to coexist. Then the version of free that is executed corresponds to the version of malloc used.
However, I'm not sure how often this architecture is observed.
The knowledge of working of malloc() is necessary to implement free(). You can find a implementation of malloc() and free() using the sbrk() system call in K&R The C Programming Language Chapter 8, Section 8.7 "Example--A Storage Allocator" pp.185-189.

linux mechanism to measure process memory consumption f

what is the most efficient and accurate way/ API to measure heap memory consumption from the same running process programmatically? I want to estimate (as accurately as is reasonably possible) how much memory has been new or malloc since startup, minus the memory that has been free or delete
the scope of the question is linux and possibly other linux environments. The language is either C or C++
EDIT
It is enough for my purposes to know the actual number (and size) of allocated/held blocks by any malloc implementation, i don't need the detail of actual malloc memory minus the the freed memory
Assuming new uses malloc look here.
For more details of a processes memory allocation, look at the /proc/[pid]/maps.
Also note that linux implements copy-on-write. This means that sometimes processes can share memory. This is especially true if the process was forked without calling exec afterwards.
You can use mallinfo for an estimation. I've just found this, not sure whether this is process or system.. :/
I'm not totally sure what you are asking, malloc minus freed is less than the actual usage because of the memory fragmentation, if you really need that number you have to use custom allocators (which are tiny wrappers around existing ones) everywhere in your code which is going to be painful.
Have you considered reading from /proc/u/stat? (where "u" is your pid)
If you use valgrind and run your program to completion, it gives you a report on memory usage.
http://valgrind.org/
You can get quite a bit of information about your heap usage by linking against tcmalloc from Google Perftools. It is designed to locate memory leaks and determine "who the heck allocated all that RAM", but it provides enough instrumentation to answer most questions you might have about your heap.

Resources