Pb in using nedflush for memory leaks - c

I try to use the debug functionalities of nedmalloc to find potential memory leaks in my code. So I activate the flags ENABLE_LOGGING and NEDMALLOC_TESTLOGENTRY.
in my program, I only use the system memory pool. At the very end of my program, I call the function neddestroysyspool in order to flush all memory events.
First of all, I don't manage to activate the stack trace functionality. When I change this depth, the program crashes after a few allocations. In order to compile Under VS2010, I had to define DeinitSym myself with a call to CloseHandle; I hope I'm doing right ... but it does not work properly. So I don't use it.
So I just parse the file nedmalloc.csv: I sort it thanks to addresses, sum allocated sizes and substract freed ones wrt the address. For an unknown reason, for several big chunks (size>400kb), the size given at allocation is right but the size given at free is different, above the allocated size. For example, I allocated a block of 840352 bytes, but when freed, the recorded size was 851932 bytes. Is is normal?
Does anyone has some answer(s) or hint(s) for this problem?

Firstly, I really wouldn't use nedmalloc's logging facilities for memory leak detection. valgrind is a vastly superior way of finding resource leaks. I even have a hack there in nedmalloc to have it use the system allocator instead of dlmalloc precisely in order to avail of valgrind.
Secondly, yes the stack backtracing code can be a little brittle. If I remember rightly I did something naughty to get the performance considerably higher at the expense of it not quite working well. I'll be blunt in saying almost no one but me ever used that code path, so I never had much reason to debug it properly. It worked well enough for my purposes.
Thirdly for larger blocks, dlmalloc will round up allocations including their bookkeeping to a 64Kb multiple, so it is expected that a 12 chunk allocation would be turned into a 13 chunk allocation.
Lastly, I as the author of nedmalloc would recommend you not use nedmalloc on Windows 7 or better, or any Linux with a 3.x kernel, or any Mac OS X produced in the past three years. Why? The system allocator is probably good enough and as I personally haven't needed nedmalloc for some years, I'll freely admit the code is bit rotting.
Hope that helps.
Niall

Related

System malloc vs DLMalloc on large malloc

I haven't coded in a while, so excuse me upfront. I have this odd problem. I am trying to malloc 8GB in one go and I plan to manage that heap with TLSF later on. That is, I want to avoid mallocing throughout my application at all, just get one big glob at the beginning and freeing it in the end. Here is the peculiarity though. I was always using dlmalloc until now in my programs. Linking it in and everything went well. However, now when I try to malloc 8GB at once and link in dlmalloc to use it I get segmentation fault 11 on OSX when I run it, without dlmalloc everything goes well. Doesn't matter if I use either gcc or clang. System doesn't have 8GB of RAM though, but it has 4GB. Interestingly enough same thing happens on Windows machine which has 32GB of RAM and Ubuntu one that has 16GB of RAM. With system malloc it all works, allocation goes through and simple iteration through allocated memory works as expected on all three systems. But, when I link in dlmalloc it fails. Tried it both with malloc and dlmalloc function calls.
Allocation itself is nothing extraordinary, plain c99.
[...]
size_t bytes = 1024LL*1024LL*1024LL*8LL;
unsigned long *m = (unsigned long*)malloc(bytes);
[...]
I'm confused by several things here. How come system malloc gives me 8GB malloc even without system having 4GB or RAM, are those virtual pages? Why dlmalloc doesn't do the same? I am aware there might not be a continuos block of 8GB of RAM to allocate, but why segmentation fault then, why not a null ptr?
Is there a viable robust (hopefully platform neutral) solution to get that amount of RAM in one go from malloc even if I'm not sure system will have that much RAM?
edit: program is 64-bit as are OS' which I'm running on.
edit2:
So I played with it some more. Turns out if I break down allocation into 1GB chunks, that is 8 separate mallocs, then it works with dlmalloc. So it seems to be an issue with contiguous range allocation where dlmalloc probably tries to allocate only if there is a contiguous block. This makes my question then even harder to formulate. Is there a somewhat sure way to get that size of a memory chunk with or without dlmalloc across platforms, and not have it fail if there is no physical memory left (can be in swap, as long as it doesn't fail). Also would it be possible in a cross platform manner to tell if malloc is in ram or swap.
I will give you just a bit of perspective, if not an outright answer. When I see you attempting to allocate 8GB of contiguous RAM, I cringe. Yes, with 64-bit computing and all, that is probably "legal", but on a normal machine, you are probably going to run into a lot of edge cases, 32-bit legacy code choking on a 64-bit size, and just plain usability issues getting a chunk of memory big enough to make this work. If you want to try this sort of thing, perhaps attempt to malloc the single chunk, then if that fails, use smaller chunks. This somewhat defeats the purpose of a 1 chunk system though. Perhaps there is some sort of "page size" in the OS that you could link your malloc size to - in order to help performance and just plain ability to get memory in the amount you wish.
On game consoles, this approach to memory management is somewhat common - allocate 1 buffer from the OS at bootup as big as possible, then place your own memory manager on there to avoid OS overhead and possible inferior allocation code. It also allows one to better control memory fragmentation on such systems where virtual memory doesn't exist. But on these systems, you also know up front exactly how much RAM you have.
Is there a way to see if memory is physical or virtual in a platform independent way? I don't think so, but perhaps someone else can give a good answer to that and I'll edit this part away.
So not a 100% answer, but some random thoughts to help out and my internally wondering what you are doing that wants 8GB of RAM in one chunk when it sounds like multiple chunks will work fine. :)

How to keep track of the memory usage in C?

I have to do a project in C where I have to constantly allocate memory for big data structures and then free it. Does there exista a library with a function that helps to keep track of the memory usage so I can be sure if I am doing things correctly? (I'm new to C)
For example, a function that returns:
A) The total of memory used by the program at the moment, OR
B) The total of memory left,
would do the job. I already googled for that and searched in other answers.
Thanks!
Try tcmalloc: you are looking for a heap profiler, although valgrind might be more useful initially.
If you're worried about memory leaks, valgrind is probably what you need. On the other hand, if you're more concerned just with whether you're data structures are using excessive memory, you might just use the common mallinfo function included as an extension to malloc in many unix standard libraries including glibc on Linux.
Although some people excoriate it, the book "Writing Solid Code" by Steve Maguire has a lot of reasonable ideas about how to track your memory usage without modifying the system memory allocation functions. Basically, instead of calling the raw malloc() etc functions directly, you call your own memory allocation API built on top of the standard one. Your API can track allocations and frees, detect double frees, frees of non-allocated memory, unreleased (leaked) memory, complete dumps of what is allocated, etc. You either need to crib the code from the book or write your own equivalent code. One interesting problem is providing a stack trace for each allocation; there isn't a standard way to determine the call stack. (The book is a bit dated now; it was written just a few years after the C89 standard was published and does not exploit const qualifiers.)
Some will argue that these services can be provided by the system malloc(); indeed, they can, and these days often are. You should look carefully at the manual provided for your version of malloc(), and decide whether it provides enough for you. If not, then the wrapper API mechanism is reasonable. Note that using your own API means you track what you explicitly allocate, while leaving library functions not written to use your API using the system services - as, indeed, does your code, under the covers.
You should also look into valgrind. It does a superb job tracking memory abuses, and in particular will report leaked memory (memory that was allocated but not freed). It also spots when you read or write outside the bounds of an allocated space, spotting buffer overflows.
Nevertheless, ultimately, you need to be disciplined in the way you write your code, ensuring that every time you allocate memory, you know when it will be released.
Every time you allocate/free memory, you could log how big your data structure is.

linux mechanism to measure process memory consumption f

what is the most efficient and accurate way/ API to measure heap memory consumption from the same running process programmatically? I want to estimate (as accurately as is reasonably possible) how much memory has been new or malloc since startup, minus the memory that has been free or delete
the scope of the question is linux and possibly other linux environments. The language is either C or C++
EDIT
It is enough for my purposes to know the actual number (and size) of allocated/held blocks by any malloc implementation, i don't need the detail of actual malloc memory minus the the freed memory
Assuming new uses malloc look here.
For more details of a processes memory allocation, look at the /proc/[pid]/maps.
Also note that linux implements copy-on-write. This means that sometimes processes can share memory. This is especially true if the process was forked without calling exec afterwards.
You can use mallinfo for an estimation. I've just found this, not sure whether this is process or system.. :/
I'm not totally sure what you are asking, malloc minus freed is less than the actual usage because of the memory fragmentation, if you really need that number you have to use custom allocators (which are tiny wrappers around existing ones) everywhere in your code which is going to be painful.
Have you considered reading from /proc/u/stat? (where "u" is your pid)
If you use valgrind and run your program to completion, it gives you a report on memory usage.
http://valgrind.org/
You can get quite a bit of information about your heap usage by linking against tcmalloc from Google Perftools. It is designed to locate memory leaks and determine "who the heck allocated all that RAM", but it provides enough instrumentation to answer most questions you might have about your heap.

What are alternatives to malloc() in C?

I am writing C for an MPC 555 board and need to figure out how to allocate dynamic memory without using malloc.
Typically malloc() is implemented on Unix using sbrk() or mmap(). (If you use the latter, you want to use the MAP_ANON flag.)
If you're targetting Windows, VirtualAlloc may help. (More or less functionally equivalent to anonymous mmap().)
Update: Didn't realize you weren't running under a full OS, I somehow got the impression instead that this might be a homework assignment running on top of a Unix system or something...
If you are doing embedded work and you don't have a malloc(), I think you should find some memory range that it's OK for you to write on, and write your own malloc(). Or take someone else's.
Pretty much the standard one that everybody borrows from was written by Doug Lea at SUNY Oswego. For example glibc's malloc is based on this. See: malloc.c, malloc.h.
You might want to check out Ralph Hempel's Embedded Memory Manager.
If your runtime doesn't support malloc, you can find an open source malloc and tweak it to manage a chunk of memory yourself.
malloc() is an abstraction that is use to allow C programs to allocate memory without having to understand details about how memory is actually allocated from the operating system. If you can't use malloc, then you have no choice other than to use whatever facilities for memory allocation that are provided by your operating system.
If you have no operating system, then you must have full control over the layout of memory. At that point for simple systems the easiest solution is to just make everything static and/or global, for more complex systems, you will want to reserve some portion of memory for a heap allocator and then write (or borrow) some code that use that memory to implement malloc.
An answer really depends on why you might need to dynamically allocate memory. What is the system doing that it needs to allocate memory yet cannot use a static buffer? The answer to that question will guide your requirements in managing memory. From there, you can determine which data structure you want to use to manage your memory.
For example, a friend of mine wrote a thing like a video game, which rendered video in scan-lines to the screen. That team determined that memory would be allocated for each scan-line, but there was a specific limit to how many bytes that could be for any given scene. After rendering each scan-line, all the temporary objects allocated during that rendering were freed.
To avoid the possibility of memory leaks and for performance reasons (this was in the 90's and computers were slower then), they took the following approach: They pre-allocated a buffer which was large enough to satisfy all the allocations for a scan-line, according to the scene parameters which determined the maximum size needed. At the beginning of each scan-line, a global pointer was set to the beginning of the scan line. As each object was allocated from this buffer, the global pointer value was returned, and the pointer was advanced to the next machine-word-aligned position following the allocated amount of bytes. (This alignment padding was including in the original calculation of buffer size, and in the 90's was four bytes but should now be 16 bytes on some machinery.) At the end of each scan-line, the global pointer was reset to the beginning of the buffer.
In "debug" builds, there were two scan buffers, which were protected using virtual memory protection during alternating scan lines. This method detects stale pointers being used from one scan-line to the next.
The buffer of scan-line memory may be called a "pool" or "arena" depending on whome you ask. The relevant detail is that this is a very simple data structure which manages memory for a certain task. It is not a general memory manager (or, properly, "free store implementation") such as malloc, which might be what you are asking for.
Your application may require a different data structure to keep track of your free storage. What is your application?
You should explain why you can't use malloc(), as there might be different solutions for different reasons, and there are several reasons why it might be forbidden or unavailable on small/embedded systems:
concern over memory fragmentation. In this case a set of routines that allocate fixed size memory blocks for one or more pools of memory might be the solution.
the runtime doesn't provide a malloc() - I think most modern toolsets for embedded systems do provide some way to link in a malloc() implementation, but maybe you're using one that doesn't for whatever reason. In that case, using Doug Lea's public domain malloc might be a good choice, but it might be too large for your system (I'm not familiar with the MPC 555 off the top of my head). If that's the case, a very simple, custom malloc() facility might be in order. It's not too hard to write, but make sure you unit test the hell out of uit because it's also easy to get details wrong. For example, I have a set of very small routines that use a brain dead memory allocation strategy using blocks on a free list (the allocator can be compile-time configured for first, best or last fit). I give it an array of char at initialization, and subsequent allocation calls will split free blocks as necessary. It's nowhere near as sophisticated as Lea's malloc(), but it's pretty dang small so for simple uses it'll do the trick.
many embedded projects forbid the use of dynamic memory allocation - in this case, you have to live with statically allocated structures
Write your own. Since your allocator will probably be specialized to a few types of objects, I recommend the Quick Fit scheme developed by Bill Wulf and Charles Weinstock. (I have not been able to find a free copy of this paper, but many people have access to the ACM digital library.) The paper is short, easy to read, and well suited to your problem.
If you turn out to need a more general allocator, the best guide I have found on the topic of programming on machines with fixed memory is Donald Knuth's book The Art of Computer Programming, Volume 1. If you want examples, you can find good ones in Don's epic book-length treatment of the source code of TeX, TeX: The Program.
Finally, the undergraduate textbook by Bryant and O'Hallaron is rather expensive, but it goes through the implementation of malloc in excruciating detail.
Write your own. Preallocate a big chunk of static RAM, then write some functions to grab and release chunks of it. That's the spirit of what malloc() does, except that it asks the OS to allocate and deallocate memory pages dynamically.
There are a multitude of ways of keeping track of what is allocated and what is not (bitmaps, used/free linked lists, binary trees, etc.). You should be able to find many references with a few choice Google searches.
malloc() and its related functions are the only game in town. You can, of course, roll your own memory management system in whatever way you choose.
If there are issues allocating dynamic memory from the heap, you can try allocating memory from the stack using alloca(). The usual caveats apply:
The memory is gone when you return.
The amount of memory you can allocate is dependent on the maximum size of your stack.
You might be interested in: liballoc
It's a simple, easy-to-implement malloc/free/calloc/realloc replacement which works.
If you know beforehand or can figure out the available memory regions on your device, you can also use their libbmmm to manage these large memory blocks and provide a backing-store for liballoc. They are BSD licensed and free.
FreeRTOS contains 3 examples implementations of memory allocation (including malloc()) to achieve different optimizations and use cases appropriate for small embedded systems (AVR, ARM, etc). See the FreeRTOS manual for more information.
I don't see a port for the MPC555, but it shouldn't be difficult to adapt the code to your needs.
If the library supplied with your compiler does not provide malloc, then it probably has no concept of a heap.
A heap (at least in an OS-less system) is simply an area of memory reserved for dynamic memory allocation. You can reserve such an area simply by creating a suitably sized statically allocated array and then providing an interface to provide contiguous chunks of this array on demand and to manage chunks in use and returned to the heap.
A somewhat neater method is to have the linker allocate the heap from whatever memory remains after stack and static memory allocation. That way the heap is always automatically as large as it possibly can be, allowing you to use all available memory simply. This will require modification of the application's linker script. Linker scripts are specific to the particular toolchain, and invariable somewhat arcane.
K&R included a simple implementation of malloc for example.

What's a good C memory allocator for embedded systems? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I have an single threaded, embedded application that allocates and deallocates lots and lots of small blocks (32-64b). The perfect scenario for a cache based allocator. And although I could TRY to write one it'll likely be a waste of time, and not as well tested and tuned as some solution that's already been on the front lines.
So what would be the best allocator I could use for this scenario?
Note: I'm using a Lua Virtual Machine in the system (which is the culprit of 80+% of the allocations), so I can't trivially refactor my code to use stack allocations to increase allocation performance.
I'm a bit late to the party, but I just want to share very efficient memory allocator for embedded systems I've recently found and tested: https://github.com/dimonomid/umm_malloc
This is a memory management library specifically designed to work with the ARM7, personally I use it on PIC32 device, but it should work on any 16- and 8-bit device (I have plans to test in on 16-bit PIC24, but I haven't tested it yet)
I was seriously beaten by fragmentation with default allocator: my project often allocates blocks of various size, from several bytes to several hundreds of bytes, and sometimes I faced 'out of memory' error. My PIC32 device has total 32K of RAM, and 8192 bytes is used for heap. At the particular moment there is more than 5K of free memory, but default allocator has maximum non-fragmented memory block just of about 700 bytes, because of fragmentation. This is too bad, so I decided to look for more efficient solution.
I already was aware of some allocators, but all of them has some limitations (such as block size should be a power or 2, and starting not from 2 but from, say, 128 bytes), or was just buggy. Every time before, I had to switch back to the default allocator.
But this time, I'm lucky: I've found this one: http://hempeldesigngroup.com/embedded/stories/memorymanager/
When I tried this memory allocator, in exactly the same situation with 5K of free memory, it has more than 3800 bytes block! It was so unbelievable to me (comparing to 700 bytes), and I performed hard test: device worked heavily more than 30 hours. No memory leaks, everything works as it should work.
I also found this allocator in the FreeRTOS repository: http://svnmios.midibox.org/listing.php?repname=svn.mios32&path=%2Ftrunk%2FFreeRTOS%2FSource%2Fportable%2FMemMang%2F&rev=1041&peg=1041# , and this fact is an additional evidence of stability of umm_malloc.
So I completely switched to umm_malloc, and I'm quite happy with it.
I just had to change it a bit: configuration was a bit buggy when macro UMM_TEST_MAIN is not defined, so, I've created the github repository (the link is at the top of this post). Now, user dependent configuration is stored in separate file umm_malloc_cfg.h
I haven't got deeply yet in the algorithms applied in this allocator, but it has very detailed explanation of algorithms, so anyone who is interested can look at the top of the file umm_malloc.c . At least, "binning" approach should give huge benefit in less-fragmentation: http://g.oswego.edu/dl/html/malloc.html
I believe that anyone who needs for efficient memory allocator for microcontrollers, should at least try this one.
In a past project in C I worked on, we went down the road of implementing our own memory management routines for a library ran on a wide range of platforms including embedded systems. The library also allocated and freed a large number of small buffers. It ran relatively well and didn't take a large amount of code to implement. I can give you a bit of background on that implementation in case you want to develop something yourself.
The basic implementation included a set of routines that managed buffers of a set size. The routines were used as wrappers around malloc() and free(). We used these routines to manage allocation of structures that we frequently used and also to manage generic buffers of set sizes. A structure was used to describe each type of buffer being managed. When a buffer of a specific type was allocated, we'd malloc() the memory in blocks (if a list of free buffers was empty). IE, if we were managing 10 byte buffers, we might make a single malloc() that contained space for 100 of these buffers to reduce fragmentation and the number of underlying mallocs needed.
At the front of each buffer would be a pointer that would be used to chain the buffers in a free list. When the 100 buffers were allocated, each buffer would be chained together in the free list. When the buffer was in use, the pointer would be set to null. We also maintained a list of the "blocks" of buffers, so that we could do a simple cleanup by calling free() on each of the actual malloc'd buffers.
For management of dynamic buffer sizes, we also added a size_t variable at the beginning of each buffer telling the size of the buffer. This was then used to identify which buffer block to put the buffer back into when it was freed. We had replacement routines for malloc() and free() that did pointer arithmetic to get the buffer size and then to put the buffer into the free list. We also had a limit on how large of buffers we managed. Buffers larger than this limit were simply malloc'd and passed to the user. For structures that we managed, we created wrapper routines for allocation and freeing of the specific structures.
Eventually we also evolved the system to include garbage collection when requested by the user to clean up unused memory. Since we had control over the whole system, there were various optimizations we were able to make over time to increase performance of the system. As I mentioned, it did work quite well.
I did some research on this very topic recently, as we had an issue with memory fragmentation. In the end we decided to stay with GNU libc's implementation, and add some application-level memory pools where necessary. There were other allocators which had better fragmentation behavior, but we weren't comfortable enough with them replace malloc globally. GNU's has the benefit of a long history behind it.
In your case it seems justified; assuming you can't fix the VM, those tiny allocations are very wasteful. I don't know what your whole environment is, but you might consider wrapping the calls to malloc/realloc/free on just the VM so that you can pass it off to a handler designed for small pools.
Although its been some time since I asked this, my final solution was to use LoKi's SmallObjectAllocator it work great. Got rid off all the OS calls and improved the performance of my Lua engine for embedded devices. Very nice and simple, and just about 5 minutes worth of work!
Since version 5.1, Lua has allowed a custom allocator to be set when creating new states.
I'd just also like to add to this even though it's an old thread. In an embedded application if you can analyze your memory usage for your application and come up with a max number of memory allocation of the varying sizes usually the fastest type of allocator is one using memory pools. In our embedded apps we can determine all allocation sizes that will ever be needed during run time. If you can do this you can completely eliminate heap fragmentation and have very fast allocations. Most these implementations have an overflow pool which will do a regular malloc for the special cases which will hopefully be far and few between if you did your analysis right.
I have used the 'binary buddy' system to good effect under vxworks. Basically, you portion out your heap by cutting blocks in half to get the smallest power of two sized block to hold your request, and when blocks are freed, you can make a pass up the tree to merge blocks back together to mitigate fragmentation. A google search should turn up all the info you need.
I am writing a C memory allocator called tinymem that is intended to be able to defragment the heap, and re-use memory. Check it out:
https://github.com/vitiral/tinymem
Note: this project has been discontinued to work on the rust implementation:
https://github.com/vitiral/defrag-rs
Also, I had not heard of umm_malloc before. Unfortunately, it doesn't seem to be able to deal with fragmentation, but it definitely looks useful. I will have to check it out.

Resources