I'm experiencing what appears to be a stack/heap collision in an embedded environment (see this question for some background).
I'd like to try rewriting the code so that it doesn't allocate memory on the heap.
Can I write an application without using the heap in C? For example, how would I use the stack only if I have a need for dynamic memory allocation?
I did it once in an embedded environment where we were writing "super safe" code for biomedical machines.
Malloc()s were explicitly forbidden, partly for the resources limits and for the unexpected behavior you can get from dynamic memory (look for malloc(), VxWorks/Tornado and fragmentation and you'll have a good example).
Anyway, the solution was to plan in advance the needed resources and statically allocate the "dynamic" ones in a vector contained in a separate module, having some kind of special purpose allocator give and take back pointers. This approach avoided fragmentation issues altogether and helped getting finer grained error info, if a resource was exhausted.
This may sound silly on big iron, but on embedded systems, and particularly on safety critical ones, it's better to have a very good understanding of which -time and space- resources are needed beforehand, if only for the purpose of sizing the hardware.
Funnily enough, I once saw a database application which completly relied on static allocated memory. This application had a strong restriction on field and record lengths. Even the embedded text editor (I still shiver calling it that) was unable to create texts with more than 250 lines of text. That solved some question I had at this time: why are only 40 records allowed per client?
In serious applications you can not calculate in advance the memory requirements of your running system. Therefore it is a good idea to allocate memory dynamically as you need it. Nevertheless it is common case in embedded systems to preallocate memory you really need to prevent unexpected failures due to memory shortage.
You might allocate dynamic memory on the stack using the alloca() library calls. But this memory is tight to the execution context of the application and it is a bad idea to return memory of this type the caller, because it will be overwritten by later subroutine calls.
So I might answer your question with a crisp and clear "it depends"...
You can use alloca() function that allocates memory on the stack - this memory will be freed automatically when you exit the function. alloca() is GNU-specific, you use GCC so it must be available.
See man alloca.
Another option is to use variable-length arrays, but you need to use C99 mode.
It's possible to allocate a large amount of memory from the stack in main() and have your code sub-allocate it later on. It's a silly thing to do since it means your program is taking up memory that it doesn't actually need.
I can think of no reason (save some kind of silly programming challenge or learning exercise) for wanting to avoid the heap. If you've "heard" that heap allocation is slow and stack allocation is fast, it's simply because the heap involves dynamic allocation. If you were to dynamically allocate memory from a reserved block within the stack, it would be just as slow.
Stack allocation is easy and fast because you may only deallocate the "youngest" item on the stack. It works for local variables. It doesn't work for dynamic data structures.
Edit: Having seen the motivation for the question...
Firstly, the heap and the stack have to compete for the same amount of available space. Generally, they grow towards each other. This means that if you move all your heap usage into the stack somehow, then rather than stack colliding with heap, the stack size will just exceed the amount of RAM you have available.
I think you just need to watch your heap and stack usage (you can grab pointers to local variables to get an idea of where the stack is at the moment) and if it's too high, reduce it. If you have lots of small dynamically-allocated objects, remember that each allocation has some memory overhead, so sub-allocating them from a pool can help cut down on memory requirements. If you use recursion anywhere think about replacing it with an array-based solution.
You can't do dynamic memory allocation in C without using heap memory. It would be pretty hard to write a real world application without using Heap. At least, I can't think of a way to do this.
BTW, Why do you want to avoid heap? What's so wrong with it?
1: Yes you can - if you don't need dynamic memory allocation, but it could have a horrible performance, depending on your app. (i.e. not using the heap won't give you better apps)
2: No I don't think you can allocate memory dynamically on the stack, since that part is managed by the compiler.
Yes, it's doable. Shift your dynamic needs out of memory and onto disk (or whatever mass storage you have available) -- and suffer the consequent performance penalty.
E.g., You need to build and reference a binary tree of unknown size. Specify a record layout describing a node of the tree, where pointers to other nodes are actually record numbers in your tree file. Write routines that let you add to the tree by writing an additional record to file, and walk the tree by reading a record, finding its child as another record number, reading that record, etc.
This technique allocates space dynamically, but it's disk space, not RAM space. All the routines involved can be written using statically allocated space -- on the stack.
Embedded applications need to be careful with memory allocations but I don't think using the stack or your own pre-allocated heap is the answer. If possible, allocate all required memory (usually buffers and large data structures) at initialization time from a heap. This requires a different style of program than most of us are used to now but it's the best way to get close to deterministic behavior.
A large heap that is sub-allocated later would still be subject to running out of memory and the only thing to do then is have a watchdog kick in (or similar action). Using the stack sounds appealing but if you're going to allocate large buffers/data structures on the stack you have to be sure that the stack is large enough to handle all possible code paths that your program could execute. This is not easy and in the end is similar to a sub-allocated heap.
My foremost concern is, does abolishing the heap really helps?
Since your wish of not using heap stems from stack/heap collision, assuming the start of stack and start of heap are set properly (e.g. in the same setting, small sample programs have no such collision problem), then the collision means the hardware has not enough memory for your program.
Not using heap, one may indeed save some waste space from heap fragmentation; but if your program does not use the heap for a bunch of irregular large size allocation, the waste there are probably not much. I will see your collision problem more of an out of memory problem, something not fixable by merely avoiding heap.
My advices on tackling this case:
Calculate the total potential memory usage of your program. If it is too close to but not yet exceeding the amount of memory you prepared for the hardware, then you may
Try using less memory (improve the algorithms) or using the memory more efficiently (e.g. smaller and more-regular-sized malloc() to reduce heap fragmentation); or
Simply buy more memory for the hardware
Of course you may try pushing everything into pre-defined static memory space, but it is very probable that it will be stack overwriting into static memory this time. So improve the algorithm to be less memory-consuming first and buy more memory the second.
I'd attack this problem in a different way - if you think the the stack and heap are colliding, then test this by guarding against it.
For example (assuming a *ix system) try mprotect()ing the last stack page (assuming a fixed size stack) so it is not accessible. Or - if your stack grows - then mmap a page in the middle of the stack and heap. If you get a segv on your guard page you know you've run off the end of the stack or heap; and by looking at the address of the seg fault you can see which of the stack & heap collided.
It is often possible to write your embedded application without using dynamic memory allocation. In many embedded applications the use of dynamic allocation is deprecated because of the problems that can arise due to heap fragmentation. Over time it becomes highly likely that there will not be a suitably sized region of free heap space to allow the memory to be allocated and unless there is a scheme in place to handle this error the application will crash. There are various schemes to get around this, one being to always allocate fixed size objects on the heap so that a new allocation will always fit into a freed memory area. Another to detect the allocation failure and to perform a defragmentation process on all of the objects on the heap (left as an exercise for the reader!)
You do not say what processor or toolset you are using but in many the static, heap and stack are allocated to separate defined segments in the linker. If this is the case then it must be that your stack is growing outside the memory space that you have defined for it. The solution that you require is to reduce the heap and/or static variable size (assuming that these two are contiguous) so that there is more available for the stack. It may be possible to reduce the heap unilaterally although this can increase the probability of fragmentation problems. Ensuring that there are no unnecessary static variables will free some space at the cost of possibly increasing the stack usage if the variable is made auto.
Related
I'm creating a list of elements inside a task in the following way:
l = (dllist*)pvPortMalloc(sizeof(dllist));
dllist is 32 byte big.
My embedded system has 60kB SRAM so I expected my 200 element list can be handled easily by the system. I found out that after allocating space for 8 elements the system is crashing on the 9th malloc function call (256byte+).
If possible, where can I change the heap size inside freeRTOS?
Can I somehow request the current status of heap size?
I couldn't find this information in the documentation so I hope somebody can provide some insight in this matter.
Thanks in advance!
(Yes - FreeRTOS pvPortMalloc() returns void*.)
If you have 60K of SRAM, and configTOTAL_HEAP_SIZE is large, then it is unlikely you are going to run out of heap after allocating 256 bytes unless you had hardly any heap remaining before hand. Many FreeRTOS demos will just keep creating objects until all the heap is used, so if your application is based on one of those, then you would be low on heap before your code executed. You may have also done something like use up loads of heap space by creating tasks with huge stacks.
heap_4 and heap_5 will combine adjacent blocks, which will minimise fragmentation as far as practical, but I don't think that will be your problem - especially as you don't mention freeing anything anywhere.
Unless you are using heap_3.c (which just makes the standard C library malloc and free thread safe) you can call xPortGetFreeHeapSize() to see how much free heap you have. You may also have xPortGetMinimumEverFreeHeapSize() available to query how close you have ever come to running out of heap. More information: http://www.freertos.org/a00111.html
You could also define a malloc() failed hook (http://www.freertos.org/a00016.html) to get instant notification of pvPortMalloc() returning NULL.
For the standard allocators you will find a config option in FreeRTOSConfig.h .
However:
It is very well possible you run out of memory already, depending on the allocator used. IIRC there is one that does not free() any blocks (free() is just a dummy). So any block returned will be lost. This is still useful if you only allocate memory e.g. at startup, but then work with what you've got.
Other allocators might just not merge adjacent blocks once returned, increasing fragmentation much faster than a full-grown allocator.
Also, you might loose memory to fragmentation. Depending on your alloc/free pattern, you quickly might end up with a heap looking like swiss cheese: Many holes between allocated blocks. So while there is still enough free memory, no single block is big enough for the size required.
If you only allocate blocks that size there, you might be better of using your own allocator or a pool (blocks of fixed size). Thaqt would be statically allocated (e.g. array) and chained as a linked list during startup. Alloc/free would then just be push/pop on a stack (or put/get on a queue). That would also be very fast and have complexity O(1) (interrupt-safe if properly written).
Note that normal malloc()/free() are not interrupt-safe.
Finally: Do not cast void *. (Well, that's actually what standard malloc() returns and I expect that FreeRTOS-variant does the same).
If I need to do string manipulation or manipulating any kind of arrays be it stantard types like int or a self defined data structure. What is better local variable or dynamically allocating and de-allocating memory?
I know that if you are using a local variable you will not need to allocate/de-allocate of the memory and that might save us from memory leaks. But I want to know why people like using dynamically allocate memory. Is it just a coding style or does it really have its benefits. Also does it depends on system on which we are compiling or does it depend on the compiler?
Even if the system have enough resources for memory and speed which technique is better suited to optimize the code?
Local, stack-allocated operations will pretty much always trump dynamic memory allocation in terms of speed. The reason being that, during dynamic memory allocation, your program needs to ask the operating system for help. This causes a context switch (which are very slow/expensive), and blocks until the operating system returns you a block of memory (which it may not even be able to).
Your program however, has its own stack already, and so it can manipulate that as necessary without interrupting the flow of execution (other than multi-tasking which is out of your control anyway).
The benefits to dynamic memory allocation are that sometimes we do not know how much we need to allocate until runtime. Without dynamic allocation, we would need to allocate a static buffer upfront, and reserve enough memory for the worst-case scenario (which we may or may not have enough resources for).
The question you should ask in this matter is: do I know the memory size the program's going to need in runtime?
If you know you'll be needing for instance only 3 int variables, then you should go for local variables. No memory leak, and your program won't run unless there's enough memory available.
If you can't predict how much memory you'll need, let's say if you need to read a file into memory, you have no choice but to go for dynamic allocation.
There are good reason for having both as options.
Typically, you will use a heap allocation (e.g. malloc) when:
you don't know the size of the allocation you will need until execution
when it consumes a good amount of the stack memory (or all)
when the allocation needs to live beyond the current scope (e.g. is returned from a function)
Also does it depends on system on which we are compiling or does it depend on the compiler?
It depends on both the compiler and the system you are targeting.
Even if the system have enough resources for memory and speed which technique is better suited to optimize the code?
What's more important is how you need to use and access the memory and the hardware in most cases. A local variable has more advantage for optimization. However, the compiler may optimize away calls to malloc, in some cases.
As a default position, unless you have a need for the allocation to outlive the scope in which it's created, or the allocation is significant, stack allocation is preferred. In your case, you're asking about a local variable, which should presumably be free'd at the end of the enclosing scope. One rule of thumb is that if the allocation is less than 1K, stack is appropriate, otherwise heap. Microsoft's _malloca uses this threshold to switch between stack and heap allocation. Of course, much depends on what state you expect the stack to be in for your allocation. For example, a 1K stack allocation in a deeply recursive function may not be wise.
Local, non-malloc, calls will generally allocate/deallocate faster as they require only stack work and not heap management - and they have the added benefit of limited scope. But, they have limited scope and often you need your memory to last beyond a function.
On some embedded systems with limited stack space, one will call malloc even on local vars just to get access to the larger heap, but in general, you want to limit your memory access to the scope it is intended - regardless of performance considerations.
I've been told not to use stack allocated arrays because the stack is a precious resource.
Other people have suggested to me that in fact it is perfectly fine to use stack allocated arrays so long as the array is relatively small.
I would like to have a general rule of thumb: when should I use a stack allocated array?
And when should I use a heap allocated array?
While all of your memory is limited, even today with enormous amounts of RAM and virtual memory, there is still a limit. However, it's rather large, especially compared with the stack which can be anything from a couple of kb on small embedded systems to a couple of megabytes on a PC.
Besides that, there is also the question about how you are using it, and for what. For example, if you want to return an "array" from a function, it should never be on the stack.
In general, I would say that try to keep arrays on the stack small if you can. If you are creating an array with thousands of entries on the stack you should stop and think about what you want it for.
It depends on your platform.
Nowadays, if working on the popular x64 platform, you don't really have to worry about it.
Depending on the Operating System you use, you can check how much stack space and how much heap space a userland process is allowed to use.
For example, UNIX-like systems have soft and hard limits. Some you can crank up, some you can not.
Bottom line is that you don't usually need to worry about such things. And when you need to know, you are usually tied so closely to the platform you'll be developing for that you know all these details.
Hope I answered your question. If you want specific values please specify your exact hardware, operating system and user privileges.
The answer to this question is context dependent. When you write for an operating system kernel, for example, the stack might be quite limited, and allocating more than a thousand bytes in a stack frame could cause a problem.
In modern consumer systems, the space available for the stack is typically quite large. One problem systems used to have was that address space was limited and, once the stack was assigned an address, it could not grow any further than the next object in the address space in the direction of stack growth, regardless of the availability of physical memory or of virtual memory elsewhere in the address space. This is less of a problem with today’s address spaces.
Commonly, megabytes of space can be allocated in a stack frame, and doing so is cheap and easy. However, if many routines that allocate large amounts of space are called, or one or a few routines that allocate large amounts of space are called recursively, then problems can occur because too much space is used, running into some limit (such as address space or physical memory).
Of course, running into a physical memory limit will not be alleviated by allocating space from the heap. So only the issue of consuming the address space available for the stack is relevant to the question of whether to use stack or heap.
A simple test for whether this is a problem is to insert use of a great deal of stack space in your main routine. If you use additional stack space and your application still functions under a load that uses large amounts of stack space normally, then, when you remove this artificial reservation in main, you will have plenty of margin.
A better way would be to calculate the maximum your program could use and compare that to the stack space available from the system. But that is rarely easy with today’s software.
If you are running into stack space limits, your linker or your operating system may have options to make more available.
Scope of Global and static variables will be through out the life of a process. Memory for these variable will be allocated when a process is started and it will be freed only process exits.
But local variable(stack variable) has scope only to a function on which it is defined. Memory will be allocated when a function is invoked and it will be freed once control exits from the function.
Main intention of dynamic memory is to create a variable of user defined scope. If you want to control a scope of variable means, you can allocate memory for a variable x at one function and then pass the reference(address) to as many function you want and then finally you can free it.
So with the help of dynamic allocated memory, we can create a variable which has scope higher than a local variable and lesser than global or static variable.
Apart from this if the size is very very high its better to go for dynamic memroy, if the architecture contains memory constraint.
The good reason to use heap allocated memory is passing its ownership to some other function/struct. From the other hand, stack gives you memory management for free, you can not forget to deallocate memory from stack, while there is risk of leak if you use heap.
If you create an array just for local usage, the criteria of size of the one to use, however it is hard to give exact size, above which memory should be allocated on heap. One could say that a few hundreds bytes is enough to move to heap, for some others it will be less or more than that.
I'm not familiar with how the Linux heap is allocated.
I'm calling malloc()/free() many many times a second, always with the same sizes (there are about 10 structs, each fixed size). Aside from init time, none of my memory remains allocated for long periods of time.
Is this considered poor form with the standard heaps? (I'm sure someone will ask 'what heap are you using?' - 'Ugh. The standard static heap' ..meaning I'm unsure.)
Should I instead use a free list or does the heap tolerate lots of the same allocations. I'm trying to balance readability with performance.
Any tools to help me measure?
First of all, unless you have measured a problem with memory usage blowing up, don't even think about using a custom allocator. It's one of the worst forms of premature optimization.
At the same time, even if you do have a problem, a better solution than a custom allocator would be figuring out why you're allocating and freeing objects so much, and fixing the design issue that's causing it.
To address your specific question, glibc's allocator is based on the dlmalloc algorithm, which is near-optimal when it comes to fragmentation. The only way you'll get it to badly fragment memory is the unavoidable way: by allocating objects with radically different lifetimes in alternation, e.g. allocating a large number of objects but only freeing every other one. I think you'll have a hard time working out an allocation pattern that will give worse total memory usage than pools...
Valgrind has a special tool Massif for measuring memory usage. This should help to profile heap allocations.
I think the best performance optimization is to avoid heap allocation wherever it is possible (and reasonable). Whenever an object is stack allocated the compiler will just move the stack pointer up instead of trying to find a free spot or return the allocated memory to some free list.
How is the lifetime of your structures defined? If you can express your object lifetime by scoping then this would indeed increase performance.
Today, I appeared for an interview and the interviewer asked me this,
Tell me the steps how will you design your own free( ) function for
deallocate the allocated memory.
How can it be more efficient than C's default free() function ? What can you conclude ?
I was confused, couldn't think of the way to design.
What do you think guys ?
EDIT : Since we need to know about how malloc() works, can you tell me the steps to write our own malloc() function
That's actually a pretty vague question, and that's probably why you got confused. Does he mean, given an existing malloc implementation, how would you go about trying to develop a more efficient way to free the underlying memory? Or was he expecting you to start discussing different kinds of malloc implementations and their benefits and problems? Did he expect you to know how virtual memory functions on the x86 architecture?
Also, by more efficient, does he mean more space efficient or more time efficient? Does free() have to be deterministic? Does it have to return as much memory to the OS as possible because it's in a low-memory, multi-tasking environment? What's our criteria here?
It's hard to say where to start with a vague question like that, other than to start asking your own questions to get clarification. After all, in order to design your own free function, you first have to know how malloc is implemented. So chances are, the question was really about whether or not you knew anything about how malloc can be implemented.
If you're not familiar with the internals of memory management, the easiest way to get started with understanding how malloc is implemented is to first write your own.
Check out this IBM DeveloperWorks article called "Inside Memory Management" for starters.
But before you can write your own malloc/free, you first need memory to allocate/free. Unfortunately, in a protected mode OS, you can't directly address the memory on the machine. So how do you get it?
You ask the OS for it. With the virtual memory features of the x86, any piece of RAM or swap memory can be mapped to a memory address by the OS. What your program sees as memory could be physically fragmented throughout the entire system, but thanks to the kernel's virtual memory manager, it all looks the same.
The kernel usually provides system calls that allow you to map in additional memory for your process. On older UNIX OS's this was usually brk/sbrk to grow heap memory onto the edge of your process or shrink it off, but a lot of systems also provide mmap/munmap to simply map a large block of heap memory in. It's only once you have access to a large, contiguous looking block of memory that you need malloc/free to manage it.
Once your process has some heap memory available to it, it's all about splitting it into chunks, with each chunk containing its own meta information about its size and position and whether or not it's allocated, and then managing those chunks. A simple list of structs, each containing some fields for meta information and a large array of bytes, could work, in which case malloc has to run through the list until if finds a large enough unallocated chunk (or chunks it can combine), and then map in more memory if it can't find a big enough chunk. Once you find a chunk, you just return a pointer to the data. free() can then use that pointer to reverse back a few bytes to the member fields that exist in the structure, which it can then modify (i.e. marking chunk.allocated = false;). If there's enough unallocated chunks at the end of your list, you can even remove them from the list and unmap or shrink that memory off your process's heap.
That's a real simple method of implementing malloc though. As you can imagine, there's a lot of possible ways of splitting your memory into chunks and then managing those chunks. There's as many ways as there are data structures and algorithms. They're all designed for different purposes too, like limiting fragmentation due to small, allocated chunks mixed with small, unallocated chunks, or ensuring that malloc and free run fast (or sometimes even more slowly, but predictably slowly). There's dlmalloc, ptmalloc, jemalloc, Hoard's malloc, and many more out there, and many of them are quite small and succinct, so don't be afraid to read them. If I remember correctly, "The C Programming Language" by Kernighan and Ritchie even uses a simple malloc implementation as one of their examples.
You can't blindly design free() without knowing how malloc() works under the hood because your implementation of free() would need to know how to manipulate the bookkeeping data and that's impossible without knowing how malloc() is implemented.
So an unswerable question could be how you would design malloc() and free() instead which is not a trivial question but you could answer it partially for example by proposing some very simple implementation of a memory pool that would not be equivalent to malloc() of course but would indicate your presence of knowledge.
One common approach when you only have access to user space (generally known as memory pool) is to get a large chunk of memory from the OS on application start-up. Your malloc needs to check which areas of the right size of that pool are still free (through some data structure) and hand out pointers to that memory. Your free needs to mark the memory as free again in the data structure and possibly needs to check for fragmentation of the pool.
The benefits are that you can do allocation in nearly constant time, the drawback is that your application consumes more memory than actually is needed.
Tell me the steps how will you design your own free( ) function for deallocate the allocated memory.
#include <stdlib.h>
#undef free
#define free(X) my_free(X)
inline void my_free(void *ptr) { }
How can it be more efficient than C's default free() function ?
It is extremely fast, requiring zero machine cycles. It also makes use-after-free bugs go away. It's a very useful free function for use in programs which are instantiated as short-lived batch processes; it can usefully be deployed in some production situations.
What can you conclude ?
I really want this job, but in another company.
Memory usage patterns could be a factor. A default implementation of free can't assume anything about how often you allocate/deallocate and what sizes you allocate when you do.
For example, if you frequently allocate and deallocate objects that are of similar size, you could gain speed, memory efficiency, and reduced fragmentation by using a memory pool.
EDIT: as sharptooth noted, only makes sense to design free and malloc together. So the first thing would be to figure out how malloc is implemented.
malloc and free only have a meaning if your app is to work on top of an OS. If you would like to write your own memory management functions you would have to know how to request the memory from that specific OS or you could reserve the heap memory right away using existing malloc and then use your own functions to distribute/redistribute the allocated memory through out your app
There is an architecture that malloc and free are supposed to adhere to -- essentially a class architecture permitting different strategies to coexist. Then the version of free that is executed corresponds to the version of malloc used.
However, I'm not sure how often this architecture is observed.
The knowledge of working of malloc() is necessary to implement free(). You can find a implementation of malloc() and free() using the sbrk() system call in K&R The C Programming Language Chapter 8, Section 8.7 "Example--A Storage Allocator" pp.185-189.