For what is heap allocation necessary? [duplicate] - c

This question already has answers here:
What and where are the stack and heap?
(31 answers)
Closed 8 years ago.
I've always thought that heap allocation is needed for allocating data structures with sizes that aren't known at compile time. But I recently learned about alloca, which allows for dynamic allocation on the stack. So, if heap allocation isn't needed for allocation of dynamic sizes, are there things for which heap allocation is necessary? My first thought is that resizing data structures may be difficult, but I'm not sure whether it's impossible or not without using the heap.

It's about live time. Objects on the heap live until you free() them. Objects allocated with alloca live in the current stack frame until your function returns.
Resizing of data on the heap will cause a new allocation if the first allocation did not reserve sufficcient memory. This will give the first heap item back to the free list. That is impossible in the stack frame. There is no realloc pendant.
BTW: There is no need that objects on the heap are of dynamic size. The function signature of malloc takes a size parameter that allows you to calculate the size at run-time. But you often allocate object without a dynamic size. e.g.
typedef struct {
int a;
char s[3];
} example;
example *ex = malloc(sizeof(example));
free(ex);
Edit: Although you could allocate again and again on the stack unless you get a stackoverflow the semantic differs from realloc. It wouldn't be possible to give the space back when a realloca should allocate the for a larger size. It's even impossible to shrink the size. The reason for both is the mechanismn how alloca works. There is no free list and there are no chunks on the stack that are managed like heap objects. It's just changing the stack pointer that gives space for the result of alloca.

If you restrict yourself to using the stack for memory then you restrict the lifetime of your data to the lifetime of the scope they were conceived in. Yes you could make them global but then you have a fixed amount of memory for your data.
How about we reserve a large chunk of global memory? Well now you are just emulating the heap in a memory wasteful way.

Heap is not necessary, double is not necessary, int is not necessary, all that is needed is 0 and 1 and a Turning Machine.
Different programming models exists to efficiently create and maintain code. Code could exist with just about everything on the stack. Depending on how the OS/environment is set up, the stack does not run out any faster than heap. Memory is memory - it is just that typical OS's put more stringent limits on the stack size as it is more likely that code needing large stack is errant than correct.
Processors tend to be designed that way too, allowing a large, but limited stack, and a relatively enormous dynamic data space.
So for now, live with performing most variable large size allocations via malloc() and save alloca() for niche applications. Maybe in 20 years everything will be on the stack due to Dr. Whippersnapper latest programming model.

Related

Performance benefit of using buffer on heap over stack (C)

I have a small buffer of 1024 bytes that I am using to store temporary data in and then write to a larger buffer. I am reusing that small buffer several times.
Is there any performance benefit to creating this buffer on the heap rather than the stack?
It's existing code so it was done on the heap but I'm not sure if it would be faster to use the stack or what exactly the reasoning was to using the heap in the first place.
Any ideas? This is C code.
If you are writing code for a very small system, you may need to get the buffer using malloc (or one of the related routines, such as calloc) so that you do not use limited stack space.
Otherwise, on modern systems, 1024 bytes is a modest amount of stack space to use, and creating a buffer on the stack is typically faster than using malloc. (A normal malloc call requires at least some amount of bookkeeping work that stack allocation does not. However, if a routine merely allocates a fixed-size buffer with malloc, uses it, and frees it, a compiler might optimize the buffer to a stack allocation anyway, in which case they would be equivalent.)
For reference, on macOS, Appleā€™s tools default to 8 MiB of space for the main stack and 2 MiB for each thread.
In general, stack allocation is always faster than heap allocation.
This is because stack allocation is as easy as a single write to the stack pointer, whereas dynamic memory allocation contains a lot of overhead code during allocation - malloc has to go look for the next free segment, possibly also handling issues with fragmentation etc.
If you re-use a buffer, you should make sure to only allocate it once, no matter where you allocate it. This might be in favour of the heap, since heap-allocated variables don't go out of scope.
As for accessing memory once it is allocated, the stack and heap should perform identically.
Most importantly, allocating a large chunk of data on the stack isn't recommended, since it has a limited size. 1024 bytes is fairly large, so the recommended practice would be to store it on the heap for that reason alone.

Why are auto variables allocated in stack memory and not in heap memory in C? [duplicate]

This question already has answers here:
Why would you ever want to allocate memory on the heap rather than the stack? [duplicate]
(9 answers)
Closed 9 years ago.
I am curious to know the necessity of allocating auto variables on Stack memory in C. Please don't say that stack memory is faster. Stack memory generally has less size compared to heap and there is no necessity of implementing stack algorithm for auto variables. Then why are the auto variables stored in stack memory?
It's not necessary. Implicitly heap-allocating all automatic variables (and freeing them at the end of their lifetime) would be entirely correct, it's just a rather bad solution. The stack isn't even the best option, registers are even better. But yes, a stack is the way of allocating automatic storage when registers run out. The code for allocating on a stack is much smaller and faster (just bump a pointer, once). Even the fast path of a general heap allocator is several orders of magnitude more expensive.
Even segmented stacks, where the stack model is kept and only augmented with an overflow check and dynamic growth (to avoid overflow), can make function calls measurably slower than C. Rust abandoned segmented stacks because, aside from being very tricky to implement and optimize, they were an obstacle in competing with C for applications like operating system modules.
Note that you can make stacks arbitrarily large. Of course, then it needs more address space and (if you actually use all that memory) more physical memory, but that was kind of the point of the exercise, wasn't it?

C: Malloc and Free

I am trying to undestand the C functions malloc and free. I know this has been discussed a lot on StackOverflow. However, I think I kind of know what these functions do by now. I want to know why to use them. Let's take a look at this piece of code:
int n = 10;
char* array;
array = (char*) malloc(n * sizeof(char));
// Check whether memory could be allocated or not...
// Do whatever with array...
free(array);
array = NULL;
I created a pointer of type char which I called array. Then I used malloc to find a chunk of memory that is currently not used and (10 * sizeof(char)) bytes large. That address I casted to type char pointer before assigning it to my previously created char pointer. Now I can work with my char array. When I am done, I'll use free to free that chunk of memory since it's not being used anymore.
I have one question: Why wouldn't I just do char array[10];? Wikipedia has only one small sentence to give to answer that, and that sentence I unfortunately don't understand:
However, the size of the array is fixed at compile time. If one wishes to allocate a similar array dynamically...
The slide from my university is similarily concise:
It is also possible to allocate memory from the heap.
What is the heap? I know a data structure called heap. :)
However, I've someone could explain to me in which case it makes sense to use malloc and free instead of the regular declaration of a variable, that'd be great. :)
C provides three different possible "storage durations" for objects:
Automatic - local storage that's specific to the invocation of the function it's in. There may be more than one instance of objects created with automatic storage, if a function is called recursively or from multiple threads. Or there may be no instances (if/when the function isn't being called).
Static - storage that exists, in exactly one instance, for the entire duration of the running program.
Allocated (dynamic) - created by malloc, and persists until free is called to free it or the program terminates. Allocated storage is the only type of storage with which you can create arbitrarily large or arbitrarily many objects which you can keep even when functions return. This is what malloc is useful for.
First of all there is no need to cast the malloc
array = malloc(n * sizeof(char));
I have one question: Why wouldn't I just do char array[10];?
What will you do if you don't know how many storage space do you want (Say, if you wanted to have an array of arbitrary size like a stack or linked list for example)?
In this case you have to rely on malloc (in C99 you can use Variable Length Arrays but for small memory size).
The function malloc is used to allocate a certain amount of memory during the execution of a program. The malloc function will request a block of memory from the heap. If the request is granted, the operating system will reserve the requested amount of memory.
When the amount of memory is not needed anymore, you must return it to the operating system by calling the function free.
In simple: you use an array when you know the number of elements the array will need to hold at compile time. you use malloc with pointers when you don't know how many elements the array will need to be at compile time.
For more detail read Heap Management With malloc() and free().
Imagine you want to allocate 1,000 arrays.
If you did not have malloc and free... but needed a declaration in your source for each array, then you'd have to make 1,000 declarations. You'd have to give them all names. (array1, array2, ... array1000).
The idea in general of dynamic memory management is to handle items when the quantity of items is not something you can know in advance at the time you are writing your program.
Regarding your question: Why wouldn't I just do char array[10];?. You can, and most of the time, that will be completely sufficient. However, what if you wanted to do something similar, but much much bigger? Or what if the size of your data needs to change during execution? These are a few of the situations that point to using dynamically allocated memory (calloc() or malloc()).
Understanding a little about how/when the stack and heap are used would be good: When you use malloc() or calloc(), it uses memory from the heap, where automatic/static variables are given memory on the stack, and are freed when you leave the scope of that variable, i.e the function or block it was declared in.
Using malloc and calloc become very useful when the size of the data you need is not known until run-time. When the size is determined, you can easily call one of these to allocate memory onto the heap, then when you are finished, free it with free()
Regarding What is the heap? There is a good discussion on that topic here (slightly different topic, but good discussion)
In response to However, I've someone could explain to me in which case it makes sense to use malloc() and free()...?
In short, If you know what your memory requirements are at build time (before run-time) for a particular variable(s), use static / automatic creation of variables (and corresponding memory usage). If you do not know what size is necessary until run-time, use malloc() or calloc() with a corresponding call to free() (for each use) to create memory. This is of course a rule-of-thumb, and a gross generalization. As you gain experience using memory, you will find scenarios where even when size information is known before run-time, you will choose to dynamically allocate due to some other criteria. (size comes to mind)
If you know in advance that you only require an array of 10 chars, you should just say char array[10]. malloc is useful if you don't know in advance how much storage you need. It is also useful if you need storage that is valid after the current function returns. If you declare array as char array[10], it will be allocated on the stack. This data will not be valid after your function returns. Storage that you obtain from malloc is valid until you call free on it.
Also, there is no need to cast the return value of malloc.
Why to use free after malloc can be understood in the way that it is a good style to free memory as soon as you don't need it. However if you dont free the memory then it would not harm much but only your run time cost will increase.
You may also choose to leave memory unfreed when you exit the program. malloc() uses the heap and the complete heap of a process is freed when the process exits. The only reason why people insist on freeing the memory is to avoid memory leaks.
From here:
Allocation Myth 4: Non-garbage-collected programs should always
deallocate all memory they allocate.
The Truth: Omitted deallocations in frequently executed code cause
growing leaks. They are rarely acceptable. but Programs that retain
most allocated memory until program exit often perform better without
any intervening deallocation. Malloc is much easier to implement if
there is no free.
In most cases, deallocating memory just before program exit is
pointless. The OS will reclaim it anyway. Free will touch and page in
the dead objects; the OS won't.
Consequence: Be careful with "leak detectors" that count allocations.
Some "leaks" are good!
Also the wiki has a good point in Heap base memory allocation:-
The heap method suffers from a few inherent flaws, stemming entirely
from fragmentation. Like any method of memory allocation, the heap
will become fragmented; that is, there will be sections of used and
unused memory in the allocated space on the heap. A good allocator
will attempt to find an unused area of already allocated memory to use
before resorting to expanding the heap. The major problem with this
method is that the heap has only two significant attributes: base, or
the beginning of the heap in virtual memory space; and length, or its
size. The heap requires enough system memory to fill its entire
length, and its base can never change. Thus, any large areas of unused
memory are wasted. The heap can get "stuck" in this position if a
small used segment exists at the end of the heap, which could waste
any magnitude of address space, from a few megabytes to a few hundred.

Why would you ever want to allocate memory on the heap rather than the stack? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
When is it best to use a Stack instead of a Heap and vice versa?
I've read a few of the other questions regarding the heap vs stack, but they seem to focus more on what the heap/stack do rather than why you would use them.
It seems to me that stack allocation would almost always be preferred since it is quicker (just moving the stack pointer vs looking for free space in the heap), and you don't have to manually free allocated memory when you're done using it. The only reason I can see for using heap allocation is if you wanted to create an object in a function and then use it outside that functions scope, since stack allocated memory is automatically unallocated after returning from the function.
Are there other reasons for using heap allocation instead of stack allocation that I am not aware of?
There are a few reasons:
The main one is that with heap allocation, you have the most flexible control over the object's lifetime (from malloc/calloc to free);
Stack space is typically a more limited resource than heap space, at least in default configurations;
A failure to allocate heap space can be handled gracefully, whereas running out of stack space is often unrecoverable.
Without the flexible object lifetime, useful data structures such as binary trees and linked lists would be virtually impossible to write.
You want an allocation to live beyond a function invocation
You want to conserve stack space (which is typically limited to a few MBs)
You're working with re-locatable memory (Win16, databases, etc.), or want to recover from allocation failures.
Variable length anything. You can fake around this, but your code will be really nasty.
The big one is #1. As soon as you get into any sort of concurrency or IPC #1 is everywhere. Even most non-trivial single threaded applications are tricky to devise without some heap allocation. That'd practically be faking a functional language in C/C++.
So I want to make a string. I can make it on the heap or on the stack. Let's try both:
char *heap = malloc(14);
if(heap == NULL)
{
// bad things happened!
}
strcat(heap, "Hello, world!");
And for the stack:
char stack[] = "Hello, world!";
So now I have these two strings in their respective places. Later, I want to make them longer:
char *tmp = realloc(heap, 20);
if(tmp == NULL)
{
// bad things happened!
}
heap = tmp;
memmove(heap + 13, heap + 7);
memcpy(heap + 7, "cruel ", 6);
And for the stack:
// umm... What?
This is only one benefit, and others have mentioned other benefits, but this is a rather nice one. With the heap, we can at least try to make our allocated space larger. With the stack, we're stuck with what we have. If we want room to grow, we have to declare it all up front, and we all know how it stinks to see this:
char username[MAX_BUF_SIZE];
The most obvious rationale for using the heap is when you call a function and need something of unknown length returned. Sometimes the caller may pass a memory block and size to the function, but at other times this is just impractical, especially if the returned stuff is complex (e.g. a collection of different objects with pointers flying around, etc.).
Size limits are a huge dealbreaker in a lot of cases. The stack is usually measured in the low megabytes or even kilobytes (that's for everything on the stack), whereas all modern PCs allow you a few gigabytes of heap. So if you're going to be using a large amount of data, you absolutely need the heap.
just to add
you can use alloca to allocate memory on the stack, but again memory on the stack is limited and also the space exists only during the function execution only.
that does not mean everything should be allocated on the heap. like all design decisions this is also somewhat difficult, a "judicious" combination of both should be used.
Besides manual control of object's lifetime (which you mentioned), the other reasons for using heap would include:
Run-time control over object's size (both initial size and it's "later" size, during the program's execution).
For example, you can allocate an array of certain size, which is only known at run time.
With the introduction of VLA (Variable Length Arrays) in C99, it became possible to allocate arrays of fixed run-time size without using heap (this is basically a language-level implementation of 'alloca' functionality). However, in other cases you'd still need heap even in C99.
Run-time control over the total number of objects.
For example, when you build a binary tree stucture, you can't meaningfully allocate the nodes of the tree on the stack in advance. You have to use heap to allocated them "on demand".
Low-level technical considerations, as limited stack space (others already mentioned that).
When you need a large, say, I/O buffer, even for a short time (inside a single function) it makes more sense to request it from the heap instead of declaring a large automatic array.
Stack variables (often called 'automatic variables') is best used for things you want to always be the same, and always be small.
int x;
char foo[32];
Are all stack allocations, These are fixed at compile time too.
The best reason for heap allocation is that you cant always know how much space you need. You often only know this once the program is running. You might have an idea of limits but you would only want to use the exact amount of space required.
If you had to read in a file that could be anything from 1k to 50mb, you would not do this:-
int readdata ( FILE * f ) {
char inputdata[50*1024*1025];
...
return x;
}
That would try to allocate 50mb on the stack, which would usually fail as the stack is usually limited to 256k anyway.
The stack and heap share the same "open" memory space and will have to eventually come to a point where they meet, if you use the entire segment of memory. Keeping the balance between the space that each of them use will have amortized cost later for allocation and de-allocation of memory a smaller asymptotic value.

Can I write a C application without using the heap?

I'm experiencing what appears to be a stack/heap collision in an embedded environment (see this question for some background).
I'd like to try rewriting the code so that it doesn't allocate memory on the heap.
Can I write an application without using the heap in C? For example, how would I use the stack only if I have a need for dynamic memory allocation?
I did it once in an embedded environment where we were writing "super safe" code for biomedical machines.
Malloc()s were explicitly forbidden, partly for the resources limits and for the unexpected behavior you can get from dynamic memory (look for malloc(), VxWorks/Tornado and fragmentation and you'll have a good example).
Anyway, the solution was to plan in advance the needed resources and statically allocate the "dynamic" ones in a vector contained in a separate module, having some kind of special purpose allocator give and take back pointers. This approach avoided fragmentation issues altogether and helped getting finer grained error info, if a resource was exhausted.
This may sound silly on big iron, but on embedded systems, and particularly on safety critical ones, it's better to have a very good understanding of which -time and space- resources are needed beforehand, if only for the purpose of sizing the hardware.
Funnily enough, I once saw a database application which completly relied on static allocated memory. This application had a strong restriction on field and record lengths. Even the embedded text editor (I still shiver calling it that) was unable to create texts with more than 250 lines of text. That solved some question I had at this time: why are only 40 records allowed per client?
In serious applications you can not calculate in advance the memory requirements of your running system. Therefore it is a good idea to allocate memory dynamically as you need it. Nevertheless it is common case in embedded systems to preallocate memory you really need to prevent unexpected failures due to memory shortage.
You might allocate dynamic memory on the stack using the alloca() library calls. But this memory is tight to the execution context of the application and it is a bad idea to return memory of this type the caller, because it will be overwritten by later subroutine calls.
So I might answer your question with a crisp and clear "it depends"...
You can use alloca() function that allocates memory on the stack - this memory will be freed automatically when you exit the function. alloca() is GNU-specific, you use GCC so it must be available.
See man alloca.
Another option is to use variable-length arrays, but you need to use C99 mode.
It's possible to allocate a large amount of memory from the stack in main() and have your code sub-allocate it later on. It's a silly thing to do since it means your program is taking up memory that it doesn't actually need.
I can think of no reason (save some kind of silly programming challenge or learning exercise) for wanting to avoid the heap. If you've "heard" that heap allocation is slow and stack allocation is fast, it's simply because the heap involves dynamic allocation. If you were to dynamically allocate memory from a reserved block within the stack, it would be just as slow.
Stack allocation is easy and fast because you may only deallocate the "youngest" item on the stack. It works for local variables. It doesn't work for dynamic data structures.
Edit: Having seen the motivation for the question...
Firstly, the heap and the stack have to compete for the same amount of available space. Generally, they grow towards each other. This means that if you move all your heap usage into the stack somehow, then rather than stack colliding with heap, the stack size will just exceed the amount of RAM you have available.
I think you just need to watch your heap and stack usage (you can grab pointers to local variables to get an idea of where the stack is at the moment) and if it's too high, reduce it. If you have lots of small dynamically-allocated objects, remember that each allocation has some memory overhead, so sub-allocating them from a pool can help cut down on memory requirements. If you use recursion anywhere think about replacing it with an array-based solution.
You can't do dynamic memory allocation in C without using heap memory. It would be pretty hard to write a real world application without using Heap. At least, I can't think of a way to do this.
BTW, Why do you want to avoid heap? What's so wrong with it?
1: Yes you can - if you don't need dynamic memory allocation, but it could have a horrible performance, depending on your app. (i.e. not using the heap won't give you better apps)
2: No I don't think you can allocate memory dynamically on the stack, since that part is managed by the compiler.
Yes, it's doable. Shift your dynamic needs out of memory and onto disk (or whatever mass storage you have available) -- and suffer the consequent performance penalty.
E.g., You need to build and reference a binary tree of unknown size. Specify a record layout describing a node of the tree, where pointers to other nodes are actually record numbers in your tree file. Write routines that let you add to the tree by writing an additional record to file, and walk the tree by reading a record, finding its child as another record number, reading that record, etc.
This technique allocates space dynamically, but it's disk space, not RAM space. All the routines involved can be written using statically allocated space -- on the stack.
Embedded applications need to be careful with memory allocations but I don't think using the stack or your own pre-allocated heap is the answer. If possible, allocate all required memory (usually buffers and large data structures) at initialization time from a heap. This requires a different style of program than most of us are used to now but it's the best way to get close to deterministic behavior.
A large heap that is sub-allocated later would still be subject to running out of memory and the only thing to do then is have a watchdog kick in (or similar action). Using the stack sounds appealing but if you're going to allocate large buffers/data structures on the stack you have to be sure that the stack is large enough to handle all possible code paths that your program could execute. This is not easy and in the end is similar to a sub-allocated heap.
My foremost concern is, does abolishing the heap really helps?
Since your wish of not using heap stems from stack/heap collision, assuming the start of stack and start of heap are set properly (e.g. in the same setting, small sample programs have no such collision problem), then the collision means the hardware has not enough memory for your program.
Not using heap, one may indeed save some waste space from heap fragmentation; but if your program does not use the heap for a bunch of irregular large size allocation, the waste there are probably not much. I will see your collision problem more of an out of memory problem, something not fixable by merely avoiding heap.
My advices on tackling this case:
Calculate the total potential memory usage of your program. If it is too close to but not yet exceeding the amount of memory you prepared for the hardware, then you may
Try using less memory (improve the algorithms) or using the memory more efficiently (e.g. smaller and more-regular-sized malloc() to reduce heap fragmentation); or
Simply buy more memory for the hardware
Of course you may try pushing everything into pre-defined static memory space, but it is very probable that it will be stack overwriting into static memory this time. So improve the algorithm to be less memory-consuming first and buy more memory the second.
I'd attack this problem in a different way - if you think the the stack and heap are colliding, then test this by guarding against it.
For example (assuming a *ix system) try mprotect()ing the last stack page (assuming a fixed size stack) so it is not accessible. Or - if your stack grows - then mmap a page in the middle of the stack and heap. If you get a segv on your guard page you know you've run off the end of the stack or heap; and by looking at the address of the seg fault you can see which of the stack & heap collided.
It is often possible to write your embedded application without using dynamic memory allocation. In many embedded applications the use of dynamic allocation is deprecated because of the problems that can arise due to heap fragmentation. Over time it becomes highly likely that there will not be a suitably sized region of free heap space to allow the memory to be allocated and unless there is a scheme in place to handle this error the application will crash. There are various schemes to get around this, one being to always allocate fixed size objects on the heap so that a new allocation will always fit into a freed memory area. Another to detect the allocation failure and to perform a defragmentation process on all of the objects on the heap (left as an exercise for the reader!)
You do not say what processor or toolset you are using but in many the static, heap and stack are allocated to separate defined segments in the linker. If this is the case then it must be that your stack is growing outside the memory space that you have defined for it. The solution that you require is to reduce the heap and/or static variable size (assuming that these two are contiguous) so that there is more available for the stack. It may be possible to reduce the heap unilaterally although this can increase the probability of fragmentation problems. Ensuring that there are no unnecessary static variables will free some space at the cost of possibly increasing the stack usage if the variable is made auto.

Resources