Which is preferable way to allocate memory for a function that is frequently allocating and freeing memory ? Assume this function is getting called around 500 to 1000 times a second on a 1GHz Processor.
(Please ignore static and global variables/allocation. I am interested only this specific case:)
void Test()
{
ptr=malloc(512) // 512 bytes
...
free(ptr)
}
OR
void Test()
{
struct MyStruct localvar; // 512 byte sized structure
...
}
stack allocation of local variables is faster than heap allocation with malloc. However, the total stack space is limited (e.g. to several megabytes). So you should limit yourself to "small" data on the local stack. (and 512 bytes is small by today's standard, but 256Kb would be too large for local stack allocation).
If your function is very deeply recursive, then perhaps even 512 bytes could be too big, because you'll need that for each recursive call frame.
But calling malloc a few thousands time per second should be painless (IMHO a typical small-sized malloctakes a few dozens of microseconds).
For your curiosity, and outside of the C world, you might be interested by old A.Appel's paper garbage collection can be faster than stack allocation (but perhaps cache performance considerations could weaken this claim today).
Local variables are allocated essentially "for free", so there is no contest here if we are only interested in performance.
However:
the choice between a local and a heap-allocated variable is not normally something that you are free to decide without constraint; usually there are factors that mandate the choice, so your question is a bit suspect because it seems to disregard this issue
while allocating on the stack is "free" performance-wise, space on the stack might be limited (although of course 512 bytes is nothing)
Which is preferable way to allocate memory....
Which allocation is faster ?
Do you want the faster way,or the preferable way?
Anyway, in the case you mentioned, I think the second option:
struct MyStruct localvar;
is more efficient, since the memory allocation is done by the Stack. Which is a lot more efficient that using dynamic memory allocation functions like malloc.
Optimizing
Also, if you are doing this for performance & optimizing...
On my PC, using malloc to allocate strings instead of declaring a char array from the stack gives me a lag of ~ 73 nanoseconds per string.
if you copied 50 strings in your program:
4757142 / 50 = 95142 (and a bit) runs of your program
If I run your program 50 times a day:
95142 / 50 = 1902 (and a bit) days
1902 days = 5 1/5 years
So if you run your program every day for 5 years and 2 months, you'll save the time to blink your eye an extra time. Wow, how rewarding...
Turn on your disassembler when you enter your function, and step through the 2 cases.
The local variable (stack based) will require 0 extra cycles -- you won't even see where the allocation comes, because the function will allocate all the local variables in 1 cycle by just moving the stack pointer, and free all the local variables in 1 cycle by restoring the stack pointer. It doesn't matter if you have 1 or 1000 local variables, the "allocation" takes the same amount of time.
The malloc variable ... well, you will quickly get bored click-stepping through the thousands of instructions that are executed to get memory from the system heap. On top of that, you might notice that the number of cycles varies from call to call, depending on how many things you have already allocated from the heap, as malloc requires a "search" through the heap structure every time you ask for memory.
My rule of thumb: always use the stack if possible, instead of malloc/free or new/delete. In addition to faster performance, you get the added benefit of not having to worry about memory or resource leaks. In C this just means forgetting to call free(), but in C++ exceptions can ruin your day if something throws an exception before you call delete. If you use stack variables, this is all handled automatically! However, only use the stack if you are talking about "small" pieces of memory (bytes and KB) and not huge objects (not MB or GB!). If you are talking about huge objects anyways, you are not talking about performance any more and you will probably not be calling free/delete in the same function call anyways.
Stack allocation is faster than malloc+free.
Stack allocations are typically measured in instructions, while malloc+free may require multiple locks (as one example of why it takes long in comparison).
The local variable case will be much faster: allocating a variable on the stack takes no extra time, it just changes the amount the stack pointer is moved. Whereas malloc will have to do some bookkeeping.
Another advantage by using the stack is that it does not fragment the memory space, which the malloc has a tendency to do. Of course this is just an issue for long-running processes.
Related
You need to implement a memory manager in C with the following 3 functions:
void init() - initialize the memory manager.
void* get(int numOfBytes) - return a memory block (on the heap) of size "numOfBytes". The value of "numOfBytes" can be in the range [1,8k].
void free(void* ptr) - free the memory block pointed by "ptr".
Few rules:
You can call for malloc function only in the "init()" function.
The methods "get" and "free" should be as efficient as possible, but the method "init" doesn't have to be as long as you don't waste too much memory or something like that.
You can assume your memory manager will not need to allocate more than some fixed size number of bytes in total, say no more than 1GB at total.
My attempt:
I thought of just implementing fixed size memory pool where each block is 8k bytes, like in here. This will give us O(1) run time for methods "get" and "free" which is great, but the problem is that we are wasting a lot of memoy like that if the user only calls "get" for small number of bytes (say, 1 byte each time).
But if I try to implement it with variable block sizes - I'll need to handle fragmentation which will make the run time worse.
So do you have a better idea?
I'd avoid a fixed size block.
A common strategy is to form pools at power-of-2: 16,32,...1G, with everything initially in the largest pool.
Each block allocated is the user size n + overhead (est. 4-8 bytes) "ceiling" up to a power-of-2.
If a pool lacks an available block, cut a larger one in half.
As similar allocation sizes tend to occur in groups, this avoids excess size waste.
On de-allocation (and collapsing for reuse) only requires free'ing of a pair'd block to re-form the larger block (which may in turn re-join another block) and reduce fragmentation.
Note: All *alloc() return a pointer OK to align max_align_t, thus that is the lower bound expected likewise for get() - (maybe size 4?). As part of an interview, mentioning alignment and portability concerns is good.
There are various improves like well accommodating power-of-2 size blocks, yet for an interview question, only need to touch on such improvement ideas.
free() is a standard lib. function - best to not redefine - use a different name.
This is my problem in essence. In the life of a function, I generate some integers, then use the array of integers in an algorithm that is also part of the same function. The array of integers will only be used within the function, so naturally it makes sense to store the array on the stack.
The problem is I don't know the size of the array until I'm finished generating all the integers.
I know how to allocate a fixed size and variable sized array on the stack. However, I do not know how to grow an array on the stack, and that seems like the best way to solve my problem. I'm fairly certain this is possible to do in assembly, you just increment stack pointer and store an int for each int generated, so the array of ints would be at the end of the stack frame. Is this possible to do in C though?
I would disagree with your assertion that "so naturally it makes sense to store the array on the stack". Stack memory is really designed for when you know the size at compile time. I would argue that dynamic memory is the way to go here
C doesn't define what the "stack" is. It only has static, automatic and dynamic allocations. Static and automatic allocations are handled by the compiler, and only dynamic allocation puts the controls in your hands. Thus, if you want to manually deallocate an object and allocate a bigger one, you must use dynamic allocation.
Don't use dynamic arrays on the stack (compare Why is the use of alloca() not considered good practice?), better allocate memory from the heap using malloc and resize it using realloc.
Never Use alloca()
IMHO this point hasn't been made well enough in the standard references.
One rule of thumb is:
If you're not prepared to statically allocate the maximum possible size as a
fixed length C array then you shouldn't do it dynamically with alloca() either.
Why? The reason you're trying to avoid malloc() is performance.
alloca() will be slower and won't work in any circumstance static allocation will fail. It's generally less likely to succeed than malloc() too.
One thing is sure. Statically allocating the maximum will outdo both malloc() and alloca().
Static allocation is typically damn near a no-op. Most systems will advance the stack pointer for the function call anyway. There's no appreciable difference for how far.
So what you're telling me is you care about performance but want to hold back on a no-op solution? Think about why you feel like that.
The overwhelming likelihood is you're concerned about the size allocated.
But as explained it's free and it gets taken back. What's the worry?
If the worry is "I don't have a maximum or don't know if it will overflow the stack" then you shouldn't be using alloca() because you don't have a maximum and know it if it will overflow the stack.
If you do have a maximum and know it isn't going to blow the stack then statically allocate the maximum and go home. It's a free lunch - remember?
That makes alloca() either wrong or sub-optimal.
Every time you use alloca() you're either wasting your time or coding in one of the difficult-to-test-for arbitrary scaling ceilings that sleep quietly until things really matter then f**k up someone's day.
Don't.
PS: If you need a big 'workspace' but the malloc()/free() overhead is a bottle-neck for example called repeatedly in a big loop, then consider allocating the workspace outside the loop and carrying it from iteration to iteration. You may need to reallocate the workspace if you find a 'big' case but it's often possible to divide the number of allocations by 100 or even 1000.
Footnote:
There must be some theoretical algorithm where a() calls b() and if a() requires a massive environment b() doesn't and vice versa.
In that event there could be some kind of freaky play-off where the stack overflow is prevented by alloca(). I have never heard of or seen such an algorithm. Plausible specimens will be gratefully received!
The innards of the C compiler requires stack sizes to be fixed or calculable at compile time. It's been a while since I used C (now a C++ convert) and I don't know exactly why this is. http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html provides a useful comparison of the pros and cons of the two approaches.
I appreciate your assembly code analogy but C is largely managed, if that makes any sense, by the Operating System, which imposes/provides the task, process and stack notations.
In order to address your issue dynamic memory allocation looks ideal.
int *a = malloc(sizeof(int));
and dereference it to store the value .
Each time a new integer needs to be added to the existing list of integers
int *temp = realloc(a,sizeof(int) * (n+1)); /* n = number of new elements */
if(temp != NULL)
a = temp;
Once done using this memory free() it.
Is there an upper limit on the size? If you can impose one, so the size is at most a few tens of KiB, then yes alloca is appropriate (especially if this is a leaf function, not one calling other functions that might also allocate non-tiny arrays this way).
Or since this is C, not C++, use a variable-length array like int foo[n];.
But always sanity-check your size, otherwise it's a stack-clash vulnerability waiting to happen. (Where a huge allocation moves the stack pointer so far that it ends up in the middle of another memory region, where other things get overwritten by local variables and return addresses.) Some distros enable hardening options that make GCC generate code to touch every page in between when moving the stack pointer by more than a page.
It's usually not worth it to check the size and use alloc for small, malloc for large, since you also need another check at the end of your function to call free if the size was large. It might give a speedup, but this makes your code more complicated and more likely to get broken during maintenance if future editors don't notice that the memory is only sometimes malloced. So only consider a dual strategy if profiling shows this is actually important, and you care about performance more than simplicity / human-readability / maintainability for this particular project.
A size check for an upper limit (else log an error and exit) is more reasonable, but then you have to choose an upper limit beyond which your program will intentionally bail out, even though there's plenty of RAM you're choosing not to use. If there is a reasonable limit where you can be pretty sure something's gone wrong, like the input being intentionally malicious from an exploit, then great, if(size>limit) error(); int arr[size];.
If neither of those conditions can be satisfied, your use case is not appropriate for C automatic storage (stack memory) because it might need to be large. Just use dynamic allocation autom don't want malloc.
Windows x86/x64 the default user-space stack size is 1MiB, I think. On x86-64 Linux it's 8MiB. (ulimit -s). Thread stacks are allocated with the same size. But remember, your function will be part of a chain of function calls (so if every function used a large fraction of the total size, you'd have a problem if they called each other). And any stack memory you dirty won't get handed back to the OS even after the function returns, unlike malloc/free where a large allocation can give back the memory instead of leaving it on the free list.
Kernel thread stack are much smaller, like 16 KiB total for x86-64 Linux, so you never want VLAs or alloca in kernel code, except maybe for a tiny max size, like up to 16 or maybe 32 bytes, not large compared to the size of a pointer that would be needed to store a kmalloc return value.
Apologies if this is a stupid question, but it's been kinda bothering me for a long time.
I'd like to know some details on how the memory manager knows what memory is in use.
Imagine a one-chip microcomputer with 1024B of RAM - not much to spare.
Now you allocate 100 ints - each int is 4 bytes, each pointer 4 bytes too (yea, 8bit one-chip will most likely have smaller pointers, but w/e).
So you've just used 800B of ram for 100 ints? But it's worse - the allocation system must somehow take note on where the memory is malloc'd and where it's free - 200 extra bytes or something? Or some bit marks?
If this is true, why is C favoured over assembler so often?
Is this really how it works? So super inefficient?
(Or am I having a totally incorrect idea about it?)
It may surprise younger developers to learn that greying old ones like myself used to write in C on systems with 1 or 2k of RAM.
In systems this size, dynamic memory allocation would have been a luxury that we could not afford. It's not just the pointer overhead of managing the free store, but also the effects of free store fragmentation making memory allocation inefficient and very probably leading to a fatal out-of-memory condition (virtual memory was not an option).
So we used to use static memory allocation (i.e. global variables), kept a very tight control on the depth of function all nesting, and an even tighter control over nested interrupt handling.
When writing on these systems, I didn't even link the standard library. I wrote my own C startup routines and provided custom minimal I/O routines.
One program I wrote in a 2k ram system used the lower part of RAM as the data area and the upper part as the stack. In the final cut, I proved that the maximal use of stack reached so far down in memory that it was 1 byte away from the last variable in the data area.
Ah, the good old days...
EDIT:
To answer your question specifically, the original K&R free store manager used to add a header block to the beginning of each block of memory allocated via malloc.
The header block looked something like this:
union header {
struct {
union header *ptr;
unsigned size;
} s;
};
Where ptr is the address of the next header block and size is the size of the memory allocated (in blocks). The malloc function would actually return the address computed by &header + sizeof(header). The free function would subtract the size of the header from the pointer you provided in order to re-link the block back into the free list.
There are several approaches how you can do that:
as you write, malloc() one memory block for every int you have. Completely inefficient, thus I strike it out.
malloc() an array of 100 ints. That needs in total 100*sizeof(int) + 1* sizeof(int*) + whatever malloc() internally needs. Much better.
statically allocate the array. Here you just need 100*sizeof(int).
allocate the array on the stack. That needs the same, but only for the current function call.
Which of these you need depends on how long you need the memory and other criteria.
If you have that few RAM, it might even be questionable if it is useful to use malloc() at all. It can be an option if several code blocks need a lot of RAM, but not at the same time.
On how the memory addresses are tracked, that as well depends:
for malloc(), you have to put the pointer in a place where you don't lose it.
for an array on the stack, it is expressed relatively to the current frame pointer. The code sets up the frame and thus knows about the offset, so it is normally not needed to store it anywhere.
for an array in the data segment, the compiler and linker know about the address and statically put the address where it is needed.
If this is true, why is C favoured over assembler so often?
You're simplifying the problem too much. C or assembler - doesn't matter, you still need to manage the memory chunks. The main issue is fragmentation, not the actual management overhead. In a system like the one you described, you would probably just allocate the memory and never ever release it, thus no need to check what's free - whatever is below the watermark is free, and that's it.
Is this really how it works? So super inefficient?
There are many algorithms around this problem, but if you're simplifying - yes, it basically is. In reality its a much more complicated problem - and there are much more complicated systems revolving around servicing memory, dealing with fragmentation, garbage collection (on a OS level), etc etc.
for example I can do
int *arr;
arr = (int *)malloc(sizeof(int) * 1048575);
but I cannot do this without the program crashing:
int arr[1048575];
why is this so?
Assuming arr is a local variable, declaring it as an array uses memory from the (relatively limited) stack, while malloc() uses memory from the (comparatively limitless) heap.
If you're allocating these as local variables in functions (which is the only place you could have the pointer declaration immediately followed by a malloc call), then the difference is that malloc will allocate a chunk of memory from the heap and give you its address, while directly doing int arr[1048575]; will attempt to allocate the memory on the stack. The stack has much less space available to it.
The stack is limited in size for two main reasons that I'm aware of:
Traditional imperative programming makes very little use of recursion, so deep recursion (and heavy stack growth) is "probably" a sign of infinite recursion, and hence a bug that's going to kill the process. It's therefore best if it is caught before the process consumes the gigabytes of virtual memory (on a 32 bit architecture) that will cause the process to exhaust its address space (at which point the machine is probably using far more virtual memory than it actually has RAM, and is therefore operating extremely slowly).
Multi-threaded programs need multiple stacks. Therefore, the runtime system needs know that the stack will never grow beyond a certain bound, so it can put another stack after that bound if a new thread is created.
When you declare an array, you are placing it on the stack.
When you call malloc(), the memory is taken from the heap.
The stack is usually more limited compared to the heap, and is usually transient (but it depends on how often you enter and exit the function that this array is declared in.
For such a large (maybe not by today's standards?) memory, it is good practice to malloc it, assuming you want the array to last around for a bit.
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
When is it best to use a Stack instead of a Heap and vice versa?
I've read a few of the other questions regarding the heap vs stack, but they seem to focus more on what the heap/stack do rather than why you would use them.
It seems to me that stack allocation would almost always be preferred since it is quicker (just moving the stack pointer vs looking for free space in the heap), and you don't have to manually free allocated memory when you're done using it. The only reason I can see for using heap allocation is if you wanted to create an object in a function and then use it outside that functions scope, since stack allocated memory is automatically unallocated after returning from the function.
Are there other reasons for using heap allocation instead of stack allocation that I am not aware of?
There are a few reasons:
The main one is that with heap allocation, you have the most flexible control over the object's lifetime (from malloc/calloc to free);
Stack space is typically a more limited resource than heap space, at least in default configurations;
A failure to allocate heap space can be handled gracefully, whereas running out of stack space is often unrecoverable.
Without the flexible object lifetime, useful data structures such as binary trees and linked lists would be virtually impossible to write.
You want an allocation to live beyond a function invocation
You want to conserve stack space (which is typically limited to a few MBs)
You're working with re-locatable memory (Win16, databases, etc.), or want to recover from allocation failures.
Variable length anything. You can fake around this, but your code will be really nasty.
The big one is #1. As soon as you get into any sort of concurrency or IPC #1 is everywhere. Even most non-trivial single threaded applications are tricky to devise without some heap allocation. That'd practically be faking a functional language in C/C++.
So I want to make a string. I can make it on the heap or on the stack. Let's try both:
char *heap = malloc(14);
if(heap == NULL)
{
// bad things happened!
}
strcat(heap, "Hello, world!");
And for the stack:
char stack[] = "Hello, world!";
So now I have these two strings in their respective places. Later, I want to make them longer:
char *tmp = realloc(heap, 20);
if(tmp == NULL)
{
// bad things happened!
}
heap = tmp;
memmove(heap + 13, heap + 7);
memcpy(heap + 7, "cruel ", 6);
And for the stack:
// umm... What?
This is only one benefit, and others have mentioned other benefits, but this is a rather nice one. With the heap, we can at least try to make our allocated space larger. With the stack, we're stuck with what we have. If we want room to grow, we have to declare it all up front, and we all know how it stinks to see this:
char username[MAX_BUF_SIZE];
The most obvious rationale for using the heap is when you call a function and need something of unknown length returned. Sometimes the caller may pass a memory block and size to the function, but at other times this is just impractical, especially if the returned stuff is complex (e.g. a collection of different objects with pointers flying around, etc.).
Size limits are a huge dealbreaker in a lot of cases. The stack is usually measured in the low megabytes or even kilobytes (that's for everything on the stack), whereas all modern PCs allow you a few gigabytes of heap. So if you're going to be using a large amount of data, you absolutely need the heap.
just to add
you can use alloca to allocate memory on the stack, but again memory on the stack is limited and also the space exists only during the function execution only.
that does not mean everything should be allocated on the heap. like all design decisions this is also somewhat difficult, a "judicious" combination of both should be used.
Besides manual control of object's lifetime (which you mentioned), the other reasons for using heap would include:
Run-time control over object's size (both initial size and it's "later" size, during the program's execution).
For example, you can allocate an array of certain size, which is only known at run time.
With the introduction of VLA (Variable Length Arrays) in C99, it became possible to allocate arrays of fixed run-time size without using heap (this is basically a language-level implementation of 'alloca' functionality). However, in other cases you'd still need heap even in C99.
Run-time control over the total number of objects.
For example, when you build a binary tree stucture, you can't meaningfully allocate the nodes of the tree on the stack in advance. You have to use heap to allocated them "on demand".
Low-level technical considerations, as limited stack space (others already mentioned that).
When you need a large, say, I/O buffer, even for a short time (inside a single function) it makes more sense to request it from the heap instead of declaring a large automatic array.
Stack variables (often called 'automatic variables') is best used for things you want to always be the same, and always be small.
int x;
char foo[32];
Are all stack allocations, These are fixed at compile time too.
The best reason for heap allocation is that you cant always know how much space you need. You often only know this once the program is running. You might have an idea of limits but you would only want to use the exact amount of space required.
If you had to read in a file that could be anything from 1k to 50mb, you would not do this:-
int readdata ( FILE * f ) {
char inputdata[50*1024*1025];
...
return x;
}
That would try to allocate 50mb on the stack, which would usually fail as the stack is usually limited to 256k anyway.
The stack and heap share the same "open" memory space and will have to eventually come to a point where they meet, if you use the entire segment of memory. Keeping the balance between the space that each of them use will have amortized cost later for allocation and de-allocation of memory a smaller asymptotic value.