Usage of malloc and free

Usage of malloc and free - c

I am trying to decode an mp4 video into YUV frames. I want to allocate memory for each frame to be decoded, is it OK if I continuously allocate memory and free it. Is there any problem in doing so (ie, contentiously allocating and freeing memory using malloc and free)?

I would be better to allocate sufficient buffer once and reuse the same buffer. Other than some performance hit, repeated malloc-free doesn't pose any problems.

Technically, there's no problem with that at all as long as you don't try to access memory that you've already freed.
On the other hand, making all these calls repeatedly creates an overhead that you could (and should) avoid by allocating a sufficient amount of memory ahead of time, and then free it all at the end.
The approach of repeatedly allocating/freeing should really only be used if you are under very tight memory constraints, otherwise, reserve a big block of memory and allocate parts of it as you need yourself. Or alternatively, if possible, reuse the same chunk of memory.
Update: As mentioned by #EricPostpischil in a helpful comment (see below), malloc is a library call, not a system call which would only happen when the current heap is exhausted. For more information on this see this explanation

If the objects that you allocate have the same size, there shouldn't be much of a performance hit. In case of doubt, measure it.
Correctly tracking allocated memory is often not trivial, so it is probably easier to allocate a buffer once and use this throughout your program. But here the principal rule should be to use what corresponds to the logic of your program, is the easiest to read and to maintain.

Constantly mallocing and freeing is will not break the program, but it will cause a lot of performance issues, especially since you say you're going to be doing it every frame. Mallocing and freeing that often can cause a noticeable performance decrease.
What you can do is just malloc the memory once, and then re-use the same allocation for each frame. If you don't need to store the memory after you've done what you want with the frame before the next frame is read, there isn't any problem.

Related

Disadvantages of calling realloc in a loop

I'm trying to implement some math algorithms in C on Windows 7, and I need to repeatedly increase size of my array.
Sometimes it fails because realloc can't allocate memory.
But if I allocate a lot of memory at once in the beginning it works fine.
Is it a problem with the memory manager? Can anyone explain me this please?

When you allocate/deallocate memory many times, it may create fragmentation in the memory and you may not get big contiguous chunk of the memory.
When you do a realloc, some extra memory may be needed for a short period of time to move the data.
If your algorithm does not need contiguous memory or can be changed to work on non-contiguous memory, consider using linked-lists of array (Something link std::dequeue of C++) that will avoid copying of data and your code may not suffer OOM. If you know the worst case memory requirement for the array, it is better to keep that memory allocated from the beginning itself, as it will avoid the cost of allocation and data moving when compared with realloc.

I you want your algorithms to work fast, try to do all the memory allocation up front. Memory allocation is an unbounded operation and will kill your performance. So speculate a reasonable worst case and allocate enough for that. If you do need to realloc later fine but don't do so continuously.

Linked lists. Where to allocate and how to cope with fragmentation?

Location
in the heap, fragmented (malloc for every node) -inefficient in several different ways (slow allocation, slow access, memory fragmentation)
in the heap, in one large chunk - all the flexibility, gained by the data structure is lost, when needing to realloc
in the stack - the stack tends to be rather limited in size, so allocating large structures on it is not recommended at all
Their big advantage, insert O(1), seems rather useless in the environment of fragmented memory and thousands of calls to the memory allocator to give us another 10 bytes.
EDIT to clarify:
This question was asked on an interview. It is not a workplace question and so the usual heuristics of hoping to stumble blindly on the correct decision out of a small set of standard algorithms is not applicable.
The existing answers and comments mention that "malloc is not so slow", "malloc partially fights fragmentation". OK, if we use another data structure, for example a C port of the C++ vector (that is - allocate sequential memory of sufficient size, if data expands, reallocate to a twice larger chunk) all problems are solved, but we loose the fast insert/remove. Any scenarios where a linked list (allocated where?) has vast superiority to a vector?

This sounds like premature optimization. I think the correct way to go about it is to:
use the simplest implementation possible;
profile the entire program.
if profiling shows that there is a performance issue with the list, consider alternative implementations (including alternative allocators).

If you're worried about the standard allocators not handling your specialized 10-byte allocations efficiently, write a custom allocator that grabs a big chunk of memory from the standard (malloc()) allocator and doles out small items efficiently. You should not reallocate when you run out of memory in the initial big chunk; you should allocate a new (extra) big chunk and allocate from that. You get to decide how to handle released memory. You could simply ignore releases until you release all the big chunks at the end of processing with the lists. Or you can complicate life by keeping track of freed memory in the big chunks. This reduces memory usage on the whole, but it is more complex to write initially.
On the other hand, you should be aware of the risk of premature optimization. Have you measured a performance hit? Given what I've seen of your recent questions, you should probably stick with using malloc() and not try writing your own allocators (yet).

Your linked list implementation, ideally, shouldn't use any of the above. It should be up to the caller to allocate and destroy memory. Think about functions like sprintf, fgets, etc... Do they allocate any memory? No. There's a reason for that: Simplicity. Imagine if you had to free everything you got from fgets (or worse yet, fscanf). Wouldn't that be painful? Try to develop your functions to be consistent with the standard library.
It might be helpful to declare a listnode_alloc function, but this would only wrap malloc and a listnode_init function. Consider how realloc handles NULL input...

This may not be the exact solution, but another approach
Handle fragmentation on your own.
Allocate big pool of memory
For each new node provide memory from this node, until there is free memory.
If pool is utilized, allocate another pool and use that to allocate new chunks
However, this is very easier said than done. There will be lot many issues that you may encounter doing this.
So, will suggest to let this kind of optimization to malloc and related functions.
Allocation from stack is not a viable option. You can't malloc from stack. You will have to pre-allocate big chunk of buffer in some function. In that case, array is better than linked lists.

Well, the memory allocation strategy might varies due to memory fragmentation, thousands of syscalls etc. and this is exactly why O is used! ;-)

C re-alloc/malloc alternative

I have a block of memory that I want to re-allocate to a different size, but I don't care if the memory is discarded or not. Would it be faster to free() the memory and then malloc() a new memory block, or is realloc() the way to go?
I'm thinking that either solution is not optimal because extra work is performed. I'm betting that realloc() is faster in locating a memory block that is large enough, because of the possibility that the current fragment is large or small enough to hold the new memory block. But, if the fragment is not large enough, it has to copy memory which malloc() does not.
I'm using Linux. Is there perhaps a special function for this?
Thanks! :)

If you don't care about the content, the standard idiom is to do free followed by malloc. Finding a block is cheaper than copying it, and there is no guarantee that realloc doesn't do some searching of its own.
As always in such situations, if you really care about performance, it is best to benchmark both solutions on the target platform and under realistic workloads, and see which one performs better.

realloc = Alloc then copy and then free
the other solution is alloc and free only surely is faster

I would go ahead and trust the realloc implementation to do the right thing. Also in the future you should not be worrying about whether or not the memory is moved ,realloced, etc. this sort of preemptive optimization is unneeded as most of the time spent will be in the context switch from user space to kernel space.

Minimizing the amount of malloc() calls improves performance?

Consider two applications: one (num. 1) that invokes malloc() many times, and the other (num. 2) that invokes malloc() few times.
Both applications allocate the same amount of memory (assume 100MB).
For which application the next malloc() call will be faster, #1 or #2?
In other words: Does malloc() have an index of allocated locations in memory?

You asked 2 questions:
for which application the next malloc() call will be faster, #1 or #2?
In other words: Does malloc() have an index of allocated locations in memory?
You've implied that they are the same question, but they are not. The answer to the latter question is YES.
As for which will be faster, it is impossible to say. It depends on the allocator algorithm, the machine state, the fragmentation in the current process, and so on.
Your idea is sound, though: you should think about how malloc usage will affect performance.
There was once an app I wrote that used lots of little blobs of memory, each allocated with malloc(). It worked correctly but was slow. I replaced the many calls to malloc with just one, and then sliced up that large block within my app. It was much much faster.
I don't recommend this approach; it's just an illustration of the point that malloc usage can materially affect performance.
My advice is to measure it.

Of course this completely depends on the malloc implementation, but in this case, with no calls to free, most malloc implementations will probably give you the same algorithmic speed.
As another answer commented, usually there will be a list of free blocks, but if you have not called free, there will just be one, so it should be O(1) in both cases.
This assumes that the memory allocated for the heap is big enough in both cases. In case #1, you will have allocated more total memory, as each allocation involves memory overhead to store meta-data, as a result you may need to call sbrk(), or equivalent to grow the heap in case #1, which would add an additional overhead.
They will probably be different due to cache and other second order effects, since the memory alignments for the new allocation won't be the same.
If you have been freeing some of the memory blocks, then it is likely that #2 will be faster due to less fragmentation, and so a smaller list of free blocks to search.
If you have freed all the memory blocks, it should end up being exactly the same, since any sane free implementation will have coalesced the blocks back into a single arena of memory.

Malloc has to run through a linked list of free blocks to find one to allocate. This takes time. So, #1 will usually be slower:
The more often you call malloc, the more time it will take - so reducing the number of calls will give you a speed improvement (though whether it is significant will depend on your exact circumstances).
In addition, if you malloc many small blocks, then as you free those blocks, you will fragment the heap much more than if you only allocate and free a few large blocks. So you are likely to end up with many small free blocks on your heap rather than a few big blocks, and therefore your mallocs may have to search further through the free-space lists to find a suitable block to allocate. WHich again will make them slower.

These are of course implementation details, but typically free() will insert the memory into a list of free blocks. malloc() will then look at this list for a free block that is the right size, or larger. Typically, only if this fails does malloc() ask the kernel for more memory.
There are also other considerations, such as when to coalesce multiple adjacent blocks into a single, larger block.
And, another reason that malloc() is expensive: If malloc() is called from multiple threads, there must be some kind of synchronization on these global structures. (i.e. locks.) There exist malloc() implementations with different optimization schemes to make it better for multple threads, but generally, keeping it multi-thread safe adds to the cost, as multiple threads will contend for those locks and block progress on each other.

You can always do a better job using malloc() to allocate a large chunk of memory and sub-dividing it yourself. Malloc() was optimized to work well in the general case and makes no assumptions whether or not you use threads or what the size of the program's allocations might be.
Whether it is a good idea to implement your own sub-allocator is a secondary question. It rarely is, explicit memory management is already hard enough. You rarely need another layer of code that can screw up and crash your program without any good way to debug it. Unless you are writing a debug allocator.

The answer is that it depends, most of the potential slowness rather comes from malloc() and free() in combination and usually #1 and #2 will be of similar speed.
All malloc() implementations do have an indexing mechanism, but the speed of adding a new block to the index is usually not dependant on the number of blocks already in the index.
Most of the slowness of malloc comes from two sources
searching for a suitable free block among the previously freed(blocks)
multi-processor problems with locking
Writing my own almost standards compliant malloc() replacement tool malloc() && free() times from 35% to 3-4%, and it seriously optimised those two factors. It would likely have been a similar speed to use some other high-performance malloc, but having our own was more portable to esoteric devices and of course allows free to be inlined in some places.

You don't define the relative difference between "many" and "few" but I suspect most mallocs would function almost identically in both scenarios. The question implies that each call to malloc has as much overhead as a system call and page table updates. When you do a malloc call, e.g. malloc(14), in a non-brain-dead environment, malloc will actually allocate more memory than you ask for, often a multiple of the system MMU page size. You get your 14 bytes and malloc keeps track of the newly allocated area so that later calls can just return a chunk of the already allocated memory, until more memory needs to be requested from the OS.
In other words, if I call malloc(14) 100 times or malloc(1400) once, the overhead will be about the same. I'll just have to manage the bigger allocated memory chunk myself.

Allocating one block of memory is faster than allocating many blocks. There is the overhead of the system call and also searching for available blocks. In programming reducing the number of operations usually speeds up the execution time.
Memory allocators may have to search to find a block of memory that is the correct size. This adds to the overhead of the execution time.
However, there may be better chances of success when allocating small blocks of memory versus one large block. Is your program allocating one small block and releasing it or does it need to allocate (and preserve) small blocks. When memory becomes fragmented, there are less big chunks available, so the memory allocator may have to coalesce all the blocks to form a block big enough for the allocation.
If your program is allocating and destroying many small blocks of memory you may want to consider allocating a static array and using that for your memory.

Can I write a C application without using the heap?

I'm experiencing what appears to be a stack/heap collision in an embedded environment (see this question for some background).
I'd like to try rewriting the code so that it doesn't allocate memory on the heap.
Can I write an application without using the heap in C? For example, how would I use the stack only if I have a need for dynamic memory allocation?

I did it once in an embedded environment where we were writing "super safe" code for biomedical machines.
Malloc()s were explicitly forbidden, partly for the resources limits and for the unexpected behavior you can get from dynamic memory (look for malloc(), VxWorks/Tornado and fragmentation and you'll have a good example).
Anyway, the solution was to plan in advance the needed resources and statically allocate the "dynamic" ones in a vector contained in a separate module, having some kind of special purpose allocator give and take back pointers. This approach avoided fragmentation issues altogether and helped getting finer grained error info, if a resource was exhausted.
This may sound silly on big iron, but on embedded systems, and particularly on safety critical ones, it's better to have a very good understanding of which -time and space- resources are needed beforehand, if only for the purpose of sizing the hardware.

Funnily enough, I once saw a database application which completly relied on static allocated memory. This application had a strong restriction on field and record lengths. Even the embedded text editor (I still shiver calling it that) was unable to create texts with more than 250 lines of text. That solved some question I had at this time: why are only 40 records allowed per client?
In serious applications you can not calculate in advance the memory requirements of your running system. Therefore it is a good idea to allocate memory dynamically as you need it. Nevertheless it is common case in embedded systems to preallocate memory you really need to prevent unexpected failures due to memory shortage.
You might allocate dynamic memory on the stack using the alloca() library calls. But this memory is tight to the execution context of the application and it is a bad idea to return memory of this type the caller, because it will be overwritten by later subroutine calls.
So I might answer your question with a crisp and clear "it depends"...

You can use alloca() function that allocates memory on the stack - this memory will be freed automatically when you exit the function. alloca() is GNU-specific, you use GCC so it must be available.
See man alloca.
Another option is to use variable-length arrays, but you need to use C99 mode.

It's possible to allocate a large amount of memory from the stack in main() and have your code sub-allocate it later on. It's a silly thing to do since it means your program is taking up memory that it doesn't actually need.
I can think of no reason (save some kind of silly programming challenge or learning exercise) for wanting to avoid the heap. If you've "heard" that heap allocation is slow and stack allocation is fast, it's simply because the heap involves dynamic allocation. If you were to dynamically allocate memory from a reserved block within the stack, it would be just as slow.
Stack allocation is easy and fast because you may only deallocate the "youngest" item on the stack. It works for local variables. It doesn't work for dynamic data structures.
Edit: Having seen the motivation for the question...
Firstly, the heap and the stack have to compete for the same amount of available space. Generally, they grow towards each other. This means that if you move all your heap usage into the stack somehow, then rather than stack colliding with heap, the stack size will just exceed the amount of RAM you have available.
I think you just need to watch your heap and stack usage (you can grab pointers to local variables to get an idea of where the stack is at the moment) and if it's too high, reduce it. If you have lots of small dynamically-allocated objects, remember that each allocation has some memory overhead, so sub-allocating them from a pool can help cut down on memory requirements. If you use recursion anywhere think about replacing it with an array-based solution.

You can't do dynamic memory allocation in C without using heap memory. It would be pretty hard to write a real world application without using Heap. At least, I can't think of a way to do this.
BTW, Why do you want to avoid heap? What's so wrong with it?

1: Yes you can - if you don't need dynamic memory allocation, but it could have a horrible performance, depending on your app. (i.e. not using the heap won't give you better apps)
2: No I don't think you can allocate memory dynamically on the stack, since that part is managed by the compiler.

Yes, it's doable. Shift your dynamic needs out of memory and onto disk (or whatever mass storage you have available) -- and suffer the consequent performance penalty.
E.g., You need to build and reference a binary tree of unknown size. Specify a record layout describing a node of the tree, where pointers to other nodes are actually record numbers in your tree file. Write routines that let you add to the tree by writing an additional record to file, and walk the tree by reading a record, finding its child as another record number, reading that record, etc.
This technique allocates space dynamically, but it's disk space, not RAM space. All the routines involved can be written using statically allocated space -- on the stack.

Embedded applications need to be careful with memory allocations but I don't think using the stack or your own pre-allocated heap is the answer. If possible, allocate all required memory (usually buffers and large data structures) at initialization time from a heap. This requires a different style of program than most of us are used to now but it's the best way to get close to deterministic behavior.
A large heap that is sub-allocated later would still be subject to running out of memory and the only thing to do then is have a watchdog kick in (or similar action). Using the stack sounds appealing but if you're going to allocate large buffers/data structures on the stack you have to be sure that the stack is large enough to handle all possible code paths that your program could execute. This is not easy and in the end is similar to a sub-allocated heap.

My foremost concern is, does abolishing the heap really helps?
Since your wish of not using heap stems from stack/heap collision, assuming the start of stack and start of heap are set properly (e.g. in the same setting, small sample programs have no such collision problem), then the collision means the hardware has not enough memory for your program.
Not using heap, one may indeed save some waste space from heap fragmentation; but if your program does not use the heap for a bunch of irregular large size allocation, the waste there are probably not much. I will see your collision problem more of an out of memory problem, something not fixable by merely avoiding heap.
My advices on tackling this case:
Calculate the total potential memory usage of your program. If it is too close to but not yet exceeding the amount of memory you prepared for the hardware, then you may
Try using less memory (improve the algorithms) or using the memory more efficiently (e.g. smaller and more-regular-sized malloc() to reduce heap fragmentation); or
Simply buy more memory for the hardware
Of course you may try pushing everything into pre-defined static memory space, but it is very probable that it will be stack overwriting into static memory this time. So improve the algorithm to be less memory-consuming first and buy more memory the second.

I'd attack this problem in a different way - if you think the the stack and heap are colliding, then test this by guarding against it.
For example (assuming a *ix system) try mprotect()ing the last stack page (assuming a fixed size stack) so it is not accessible. Or - if your stack grows - then mmap a page in the middle of the stack and heap. If you get a segv on your guard page you know you've run off the end of the stack or heap; and by looking at the address of the seg fault you can see which of the stack & heap collided.

It is often possible to write your embedded application without using dynamic memory allocation. In many embedded applications the use of dynamic allocation is deprecated because of the problems that can arise due to heap fragmentation. Over time it becomes highly likely that there will not be a suitably sized region of free heap space to allow the memory to be allocated and unless there is a scheme in place to handle this error the application will crash. There are various schemes to get around this, one being to always allocate fixed size objects on the heap so that a new allocation will always fit into a freed memory area. Another to detect the allocation failure and to perform a defragmentation process on all of the objects on the heap (left as an exercise for the reader!)
You do not say what processor or toolset you are using but in many the static, heap and stack are allocated to separate defined segments in the linker. If this is the case then it must be that your stack is growing outside the memory space that you have defined for it. The solution that you require is to reduce the heap and/or static variable size (assuming that these two are contiguous) so that there is more available for the stack. It may be possible to reduce the heap unilaterally although this can increase the probability of fragmentation problems. Ensuring that there are no unnecessary static variables will free some space at the cost of possibly increasing the stack usage if the variable is made auto.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight