Optimizing realloc function - c

I am writing a realloc function and currently my realloc handles two cases (not counting the null cases)
If enough memory exists next to the block, expand it
else allocate a new block and do a memcpy
My question is, are there any more cases I should be handling? I can't think of any.
I thought of a case where the previous block maybe free and to expand my earlier block forward, but that would require a memcpy, so it would be pointless implementing that.

Include the case where the new size is smaller than the old size; ideally you should split your current block and make the end of it free.

Messing around with memory allocation routines is extremely risky; most are already as optimized as they physically can be, without impacting security. Any optimizations you may do may very well open up a hole that can be exploited - there are many 'use after free' style security issues around at the moment.
With this in mind, where better to look than the OpenBSD source for it: http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdlib/malloc.c?rev=1.140;content-type=text%2Fx-cvsweb-markup

There are those cases where the realloc reduces the block size by a non-trivial amount that is worth reclaiming for allocation elsewhere.

You could strive towards optimising realloc for performance (i.e. avoid moving the block and the memcpy), or you could optimise against memory fragmentation.
If it's the latter, you may consider moving the block to fill the best gap instead of just expanding or shrinking it.
Memory allocators are always a trade-off.

Related

Linked lists. Where to allocate and how to cope with fragmentation?

Location
in the heap, fragmented (malloc for every node) -inefficient in several different ways (slow allocation, slow access, memory fragmentation)
in the heap, in one large chunk - all the flexibility, gained by the data structure is lost, when needing to realloc
in the stack - the stack tends to be rather limited in size, so allocating large structures on it is not recommended at all
Their big advantage, insert O(1), seems rather useless in the environment of fragmented memory and thousands of calls to the memory allocator to give us another 10 bytes.
EDIT to clarify:
This question was asked on an interview. It is not a workplace question and so the usual heuristics of hoping to stumble blindly on the correct decision out of a small set of standard algorithms is not applicable.
The existing answers and comments mention that "malloc is not so slow", "malloc partially fights fragmentation". OK, if we use another data structure, for example a C port of the C++ vector (that is - allocate sequential memory of sufficient size, if data expands, reallocate to a twice larger chunk) all problems are solved, but we loose the fast insert/remove. Any scenarios where a linked list (allocated where?) has vast superiority to a vector?
This sounds like premature optimization. I think the correct way to go about it is to:
use the simplest implementation possible;
profile the entire program.
if profiling shows that there is a performance issue with the list, consider alternative implementations (including alternative allocators).
If you're worried about the standard allocators not handling your specialized 10-byte allocations efficiently, write a custom allocator that grabs a big chunk of memory from the standard (malloc()) allocator and doles out small items efficiently. You should not reallocate when you run out of memory in the initial big chunk; you should allocate a new (extra) big chunk and allocate from that. You get to decide how to handle released memory. You could simply ignore releases until you release all the big chunks at the end of processing with the lists. Or you can complicate life by keeping track of freed memory in the big chunks. This reduces memory usage on the whole, but it is more complex to write initially.
On the other hand, you should be aware of the risk of premature optimization. Have you measured a performance hit? Given what I've seen of your recent questions, you should probably stick with using malloc() and not try writing your own allocators (yet).
Your linked list implementation, ideally, shouldn't use any of the above. It should be up to the caller to allocate and destroy memory. Think about functions like sprintf, fgets, etc... Do they allocate any memory? No. There's a reason for that: Simplicity. Imagine if you had to free everything you got from fgets (or worse yet, fscanf). Wouldn't that be painful? Try to develop your functions to be consistent with the standard library.
It might be helpful to declare a listnode_alloc function, but this would only wrap malloc and a listnode_init function. Consider how realloc handles NULL input...
This may not be the exact solution, but another approach
Handle fragmentation on your own.
Allocate big pool of memory
For each new node provide memory from this node, until there is free memory.
If pool is utilized, allocate another pool and use that to allocate new chunks
However, this is very easier said than done. There will be lot many issues that you may encounter doing this.
So, will suggest to let this kind of optimization to malloc and related functions.
Allocation from stack is not a viable option. You can't malloc from stack. You will have to pre-allocate big chunk of buffer in some function. In that case, array is better than linked lists.
Well, the memory allocation strategy might varies due to memory fragmentation, thousands of syscalls etc. and this is exactly why O is used! ;-)

C re-alloc/malloc alternative

I have a block of memory that I want to re-allocate to a different size, but I don't care if the memory is discarded or not. Would it be faster to free() the memory and then malloc() a new memory block, or is realloc() the way to go?
I'm thinking that either solution is not optimal because extra work is performed. I'm betting that realloc() is faster in locating a memory block that is large enough, because of the possibility that the current fragment is large or small enough to hold the new memory block. But, if the fragment is not large enough, it has to copy memory which malloc() does not.
I'm using Linux. Is there perhaps a special function for this?
Thanks! :)
If you don't care about the content, the standard idiom is to do free followed by malloc. Finding a block is cheaper than copying it, and there is no guarantee that realloc doesn't do some searching of its own.
As always in such situations, if you really care about performance, it is best to benchmark both solutions on the target platform and under realistic workloads, and see which one performs better.
realloc = Alloc then copy and then free
the other solution is alloc and free only surely is faster
I would go ahead and trust the realloc implementation to do the right thing. Also in the future you should not be worrying about whether or not the memory is moved ,realloced, etc. this sort of preemptive optimization is unneeded as most of the time spent will be in the context switch from user space to kernel space.

Usage of malloc and free

I am trying to decode an mp4 video into YUV frames. I want to allocate memory for each frame to be decoded, is it OK if I continuously allocate memory and free it. Is there any problem in doing so (ie, contentiously allocating and freeing memory using malloc and free)?
I would be better to allocate sufficient buffer once and reuse the same buffer. Other than some performance hit, repeated malloc-free doesn't pose any problems.
Technically, there's no problem with that at all as long as you don't try to access memory that you've already freed.
On the other hand, making all these calls repeatedly creates an overhead that you could (and should) avoid by allocating a sufficient amount of memory ahead of time, and then free it all at the end.
The approach of repeatedly allocating/freeing should really only be used if you are under very tight memory constraints, otherwise, reserve a big block of memory and allocate parts of it as you need yourself. Or alternatively, if possible, reuse the same chunk of memory.
Update: As mentioned by #EricPostpischil in a helpful comment (see below), malloc is a library call, not a system call which would only happen when the current heap is exhausted. For more information on this see this explanation
If the objects that you allocate have the same size, there shouldn't be much of a performance hit. In case of doubt, measure it.
Correctly tracking allocated memory is often not trivial, so it is probably easier to allocate a buffer once and use this throughout your program. But here the principal rule should be to use what corresponds to the logic of your program, is the easiest to read and to maintain.
Constantly mallocing and freeing is will not break the program, but it will cause a lot of performance issues, especially since you say you're going to be doing it every frame. Mallocing and freeing that often can cause a noticeable performance decrease.
What you can do is just malloc the memory once, and then re-use the same allocation for each frame. If you don't need to store the memory after you've done what you want with the frame before the next frame is read, there isn't any problem.

Is realloc() safe in embedded system?

While developing a piece of software for embedded system I used realloc() function many times. Now I've been said that I "should not use realloc() in embedded" without any explanation.
Is realloc() dangerous for embedded system and why?
Yes, all dynamic memory allocation is regarded as dangerous, and it is banned from most "high integrity" embedded systems, such as industrial/automotive/aerospace/med-tech etc etc. The answer to your question depends on what sort of embedded system you are doing.
The reasons it's banned from high integrity embedded systems is not only the potential memory leaks, but also a lot of dangerous undefined/unspecified/impl.defined behavior asociated with those functions.
EDIT: I also forgot to mention heap fragmentation, which is another danger. In addition, MISRA-C also mentions "data inconsistency, memory exhaustion, non-deterministic behaviour" as reasons why it shouldn't be used. The former two seem rather subjective, but non-deterministic behaviour is definitely something that isn't allowed in these kind of systems.
References:
MISRA-C:2004 Rule 20.4 "Dynamic heap memory allocation shall not be used."
IEC 61508 Functional safety, 61508-3 Annex B (normative) Table B1, >SIL1: "No dynamic objects", "No dynamic variables".
It depends on the particular embedded system. Dynamic memory management on an small embedded system is tricky to begin with, but realloc is no more complicated than a free and malloc (of course, that's not what it does). On some embedded systems you'd never dream of calling malloc in the first place. On other embedded systems, you almost pretend it's a desktop.
If your embedded system has a poor allocator or not much RAM, then realloc might cause fragmentation problems. Which is why you avoid malloc too, cause it causes the same problems.
The other reason is that some embedded systems must be high reliability, and malloc / realloc can return NULL. In these situations, all memory is allocated statically.
In many embedded systems, a custom memory manager can provide better semantics than are available with malloc/realloc/free. Some applications, for example, can get by with a simple mark-and-release allocator. Keep a pointer to the start of not-yet-allocated memory, allocate things by moving the pointer upward, and jettison them by moving the pointer below them. That won't work if it's necessary to jettison some things while keeping other things that were allocated after them, but in situations where that isn't necessary the mark-and-release allocator is cheaper than any other allocation method. In some cases where the mark-and-release allocator isn't quite good enough, it may be helpful to allocate some things from the start of the heap and other things from the end of the heap; one may free up the things allocated from one end without affecting those allocated from the other.
Another approach that can sometimes be useful in non-multitasking or cooperative-multitasking systems is to use memory handles rather than direct pointers. In a typical handle-based system, there's a table of all allocated objects, built at the top of memory working downward, and objects themselves are allocated from the bottom up. Each allocated object in memory holds either a reference to the table slot that references it (if live) or else an indication of its size (if dead). The table entry for each object will hold the object's size as well as a pointer to the object in memory. Objects may be allocated by simply finding a free table slot (easy, since table slots are all fixed size), storing the address of the object's table slot at the start of free memory, storing the object itself just beyond that, and updating the start of free memory to point just past the object. Objects may be freed by replacing the back-reference with a length indication, and freeing the object in the table. If an allocation would fail, relocate all live objects starting at the top of memory, overwriting any dead objects, and updating the object table to point to their new addresses.
The performance of this approach is non-deterministic, but fragmentation is not a problem. Further, it may be possible in some cooperative multitasking systems to perform garbage collection "in the background"; provided that the garbage collector can complete a pass in the time it takes to chug through the slack space, long waits can be avoided. Further, some fairly simple "generational" logic may be used to improve average-case performance at the expense of worst-case performance.
realloc can fail, just like malloc can. This is one reason why you probably should not use either in an embedded system.
realloc is worse than malloc in that you will need to have the old and new pointers valid during the realloc. In other words, you will need 2X the memory space of the original malloc, plus any additional amount (assuming realloc is increasing the buffer size).
Using realloc is going to be very dangerous, because it may return a new pointer to your memory location. This means:
All references to the old pointer must be corrected after realloc.
For a multi-threaded system, the realloc must be atomic. If you are disabling interrupts to achieve this, the realloc time might be long enough to cause a hardware reset by the watchdog.
Update: I just wanted to make it clear. I'm not saying that realloc is worse than implementing realloc using a malloc/free. That would be just as bad. If you can do a single malloc and free, without resizing, it's slightly better, yet still dangerous.
The issues with realloc() in embedded systems are no different than in any other system, but the consequences may be more severe in systems where memory is more constrained, and the sonsequences of failure less acceptable.
One problem not mentioned so far is that realloc() (and any other dynamic memory operation for that matter) is non-deterministic; that is it's execution time is variable and unpredictable. Many embedded systems are also real-time systems, and in such systems, non-deterministic behaviour is unacceptable.
Another issue is that of thread-safety. Check your library's documantation to see if your library is thread-safe for dynamic memory allocation. Generally if it is, you will need to implement mutex stubs to integrate it with your particular thread library or RTOS.
Not all emebdded systems are alike; if your embedded system is not real-time (or the process/task/thread in question is not real-time, and is independent of the real-time elements), and you have large amounts of memory unused, or virtual memory capabilities, then the use of realloc() may be acceptable, if perhaps ill-advised in most cases.
Rather than accept "conventional wisdom" and bar dynamic memory regardless, you should understand your system requirements, and the behaviour of dynamic memory functions and make an appropriate decision. That said, if you are building code for reuability and portability to as wide a range of platforms and applications as possible, then reallocation is probably a really bad idea. Don't hide it in a library for example.
Note too that the same problem exists with C++ STL container classes that dynamically reallocate and copy data when the container capacity is increased.
Well, it's better to avoid using realloc if it's possible, since this operation is costly especially being put into the loop: for example, if some allocated memory needs to be extended and there no gap between after current block and the next allocated block - this operation is almost equals: malloc + memcopy + free.

Minimizing the amount of malloc() calls improves performance?

Consider two applications: one (num. 1) that invokes malloc() many times, and the other (num. 2) that invokes malloc() few times.
Both applications allocate the same amount of memory (assume 100MB).
For which application the next malloc() call will be faster, #1 or #2?
In other words: Does malloc() have an index of allocated locations in memory?
You asked 2 questions:
for which application the next malloc() call will be faster, #1 or #2?
In other words: Does malloc() have an index of allocated locations in memory?
You've implied that they are the same question, but they are not. The answer to the latter question is YES.
As for which will be faster, it is impossible to say. It depends on the allocator algorithm, the machine state, the fragmentation in the current process, and so on.
Your idea is sound, though: you should think about how malloc usage will affect performance.
There was once an app I wrote that used lots of little blobs of memory, each allocated with malloc(). It worked correctly but was slow. I replaced the many calls to malloc with just one, and then sliced up that large block within my app. It was much much faster.
I don't recommend this approach; it's just an illustration of the point that malloc usage can materially affect performance.
My advice is to measure it.
Of course this completely depends on the malloc implementation, but in this case, with no calls to free, most malloc implementations will probably give you the same algorithmic speed.
As another answer commented, usually there will be a list of free blocks, but if you have not called free, there will just be one, so it should be O(1) in both cases.
This assumes that the memory allocated for the heap is big enough in both cases. In case #1, you will have allocated more total memory, as each allocation involves memory overhead to store meta-data, as a result you may need to call sbrk(), or equivalent to grow the heap in case #1, which would add an additional overhead.
They will probably be different due to cache and other second order effects, since the memory alignments for the new allocation won't be the same.
If you have been freeing some of the memory blocks, then it is likely that #2 will be faster due to less fragmentation, and so a smaller list of free blocks to search.
If you have freed all the memory blocks, it should end up being exactly the same, since any sane free implementation will have coalesced the blocks back into a single arena of memory.
Malloc has to run through a linked list of free blocks to find one to allocate. This takes time. So, #1 will usually be slower:
The more often you call malloc, the more time it will take - so reducing the number of calls will give you a speed improvement (though whether it is significant will depend on your exact circumstances).
In addition, if you malloc many small blocks, then as you free those blocks, you will fragment the heap much more than if you only allocate and free a few large blocks. So you are likely to end up with many small free blocks on your heap rather than a few big blocks, and therefore your mallocs may have to search further through the free-space lists to find a suitable block to allocate. WHich again will make them slower.
These are of course implementation details, but typically free() will insert the memory into a list of free blocks. malloc() will then look at this list for a free block that is the right size, or larger. Typically, only if this fails does malloc() ask the kernel for more memory.
There are also other considerations, such as when to coalesce multiple adjacent blocks into a single, larger block.
And, another reason that malloc() is expensive: If malloc() is called from multiple threads, there must be some kind of synchronization on these global structures. (i.e. locks.) There exist malloc() implementations with different optimization schemes to make it better for multple threads, but generally, keeping it multi-thread safe adds to the cost, as multiple threads will contend for those locks and block progress on each other.
You can always do a better job using malloc() to allocate a large chunk of memory and sub-dividing it yourself. Malloc() was optimized to work well in the general case and makes no assumptions whether or not you use threads or what the size of the program's allocations might be.
Whether it is a good idea to implement your own sub-allocator is a secondary question. It rarely is, explicit memory management is already hard enough. You rarely need another layer of code that can screw up and crash your program without any good way to debug it. Unless you are writing a debug allocator.
The answer is that it depends, most of the potential slowness rather comes from malloc() and free() in combination and usually #1 and #2 will be of similar speed.
All malloc() implementations do have an indexing mechanism, but the speed of adding a new block to the index is usually not dependant on the number of blocks already in the index.
Most of the slowness of malloc comes from two sources
searching for a suitable free block among the previously freed(blocks)
multi-processor problems with locking
Writing my own almost standards compliant malloc() replacement tool malloc() && free() times from 35% to 3-4%, and it seriously optimised those two factors. It would likely have been a similar speed to use some other high-performance malloc, but having our own was more portable to esoteric devices and of course allows free to be inlined in some places.
You don't define the relative difference between "many" and "few" but I suspect most mallocs would function almost identically in both scenarios. The question implies that each call to malloc has as much overhead as a system call and page table updates. When you do a malloc call, e.g. malloc(14), in a non-brain-dead environment, malloc will actually allocate more memory than you ask for, often a multiple of the system MMU page size. You get your 14 bytes and malloc keeps track of the newly allocated area so that later calls can just return a chunk of the already allocated memory, until more memory needs to be requested from the OS.
In other words, if I call malloc(14) 100 times or malloc(1400) once, the overhead will be about the same. I'll just have to manage the bigger allocated memory chunk myself.
Allocating one block of memory is faster than allocating many blocks. There is the overhead of the system call and also searching for available blocks. In programming reducing the number of operations usually speeds up the execution time.
Memory allocators may have to search to find a block of memory that is the correct size. This adds to the overhead of the execution time.
However, there may be better chances of success when allocating small blocks of memory versus one large block. Is your program allocating one small block and releasing it or does it need to allocate (and preserve) small blocks. When memory becomes fragmented, there are less big chunks available, so the memory allocator may have to coalesce all the blocks to form a block big enough for the allocation.
If your program is allocating and destroying many small blocks of memory you may want to consider allocating a static array and using that for your memory.

Resources