C re-alloc/malloc alternative - c

I have a block of memory that I want to re-allocate to a different size, but I don't care if the memory is discarded or not. Would it be faster to free() the memory and then malloc() a new memory block, or is realloc() the way to go?
I'm thinking that either solution is not optimal because extra work is performed. I'm betting that realloc() is faster in locating a memory block that is large enough, because of the possibility that the current fragment is large or small enough to hold the new memory block. But, if the fragment is not large enough, it has to copy memory which malloc() does not.
I'm using Linux. Is there perhaps a special function for this?
Thanks! :)

If you don't care about the content, the standard idiom is to do free followed by malloc. Finding a block is cheaper than copying it, and there is no guarantee that realloc doesn't do some searching of its own.
As always in such situations, if you really care about performance, it is best to benchmark both solutions on the target platform and under realistic workloads, and see which one performs better.

realloc = Alloc then copy and then free
the other solution is alloc and free only surely is faster

I would go ahead and trust the realloc implementation to do the right thing. Also in the future you should not be worrying about whether or not the memory is moved ,realloced, etc. this sort of preemptive optimization is unneeded as most of the time spent will be in the context switch from user space to kernel space.

Related

Linked lists. Where to allocate and how to cope with fragmentation?

Location
in the heap, fragmented (malloc for every node) -inefficient in several different ways (slow allocation, slow access, memory fragmentation)
in the heap, in one large chunk - all the flexibility, gained by the data structure is lost, when needing to realloc
in the stack - the stack tends to be rather limited in size, so allocating large structures on it is not recommended at all
Their big advantage, insert O(1), seems rather useless in the environment of fragmented memory and thousands of calls to the memory allocator to give us another 10 bytes.
EDIT to clarify:
This question was asked on an interview. It is not a workplace question and so the usual heuristics of hoping to stumble blindly on the correct decision out of a small set of standard algorithms is not applicable.
The existing answers and comments mention that "malloc is not so slow", "malloc partially fights fragmentation". OK, if we use another data structure, for example a C port of the C++ vector (that is - allocate sequential memory of sufficient size, if data expands, reallocate to a twice larger chunk) all problems are solved, but we loose the fast insert/remove. Any scenarios where a linked list (allocated where?) has vast superiority to a vector?
This sounds like premature optimization. I think the correct way to go about it is to:
use the simplest implementation possible;
profile the entire program.
if profiling shows that there is a performance issue with the list, consider alternative implementations (including alternative allocators).
If you're worried about the standard allocators not handling your specialized 10-byte allocations efficiently, write a custom allocator that grabs a big chunk of memory from the standard (malloc()) allocator and doles out small items efficiently. You should not reallocate when you run out of memory in the initial big chunk; you should allocate a new (extra) big chunk and allocate from that. You get to decide how to handle released memory. You could simply ignore releases until you release all the big chunks at the end of processing with the lists. Or you can complicate life by keeping track of freed memory in the big chunks. This reduces memory usage on the whole, but it is more complex to write initially.
On the other hand, you should be aware of the risk of premature optimization. Have you measured a performance hit? Given what I've seen of your recent questions, you should probably stick with using malloc() and not try writing your own allocators (yet).
Your linked list implementation, ideally, shouldn't use any of the above. It should be up to the caller to allocate and destroy memory. Think about functions like sprintf, fgets, etc... Do they allocate any memory? No. There's a reason for that: Simplicity. Imagine if you had to free everything you got from fgets (or worse yet, fscanf). Wouldn't that be painful? Try to develop your functions to be consistent with the standard library.
It might be helpful to declare a listnode_alloc function, but this would only wrap malloc and a listnode_init function. Consider how realloc handles NULL input...
This may not be the exact solution, but another approach
Handle fragmentation on your own.
Allocate big pool of memory
For each new node provide memory from this node, until there is free memory.
If pool is utilized, allocate another pool and use that to allocate new chunks
However, this is very easier said than done. There will be lot many issues that you may encounter doing this.
So, will suggest to let this kind of optimization to malloc and related functions.
Allocation from stack is not a viable option. You can't malloc from stack. You will have to pre-allocate big chunk of buffer in some function. In that case, array is better than linked lists.
Well, the memory allocation strategy might varies due to memory fragmentation, thousands of syscalls etc. and this is exactly why O is used! ;-)

Usage of malloc and free

I am trying to decode an mp4 video into YUV frames. I want to allocate memory for each frame to be decoded, is it OK if I continuously allocate memory and free it. Is there any problem in doing so (ie, contentiously allocating and freeing memory using malloc and free)?
I would be better to allocate sufficient buffer once and reuse the same buffer. Other than some performance hit, repeated malloc-free doesn't pose any problems.
Technically, there's no problem with that at all as long as you don't try to access memory that you've already freed.
On the other hand, making all these calls repeatedly creates an overhead that you could (and should) avoid by allocating a sufficient amount of memory ahead of time, and then free it all at the end.
The approach of repeatedly allocating/freeing should really only be used if you are under very tight memory constraints, otherwise, reserve a big block of memory and allocate parts of it as you need yourself. Or alternatively, if possible, reuse the same chunk of memory.
Update: As mentioned by #EricPostpischil in a helpful comment (see below), malloc is a library call, not a system call which would only happen when the current heap is exhausted. For more information on this see this explanation
If the objects that you allocate have the same size, there shouldn't be much of a performance hit. In case of doubt, measure it.
Correctly tracking allocated memory is often not trivial, so it is probably easier to allocate a buffer once and use this throughout your program. But here the principal rule should be to use what corresponds to the logic of your program, is the easiest to read and to maintain.
Constantly mallocing and freeing is will not break the program, but it will cause a lot of performance issues, especially since you say you're going to be doing it every frame. Mallocing and freeing that often can cause a noticeable performance decrease.
What you can do is just malloc the memory once, and then re-use the same allocation for each frame. If you don't need to store the memory after you've done what you want with the frame before the next frame is read, there isn't any problem.

Optimizing realloc function

I am writing a realloc function and currently my realloc handles two cases (not counting the null cases)
If enough memory exists next to the block, expand it
else allocate a new block and do a memcpy
My question is, are there any more cases I should be handling? I can't think of any.
I thought of a case where the previous block maybe free and to expand my earlier block forward, but that would require a memcpy, so it would be pointless implementing that.
Include the case where the new size is smaller than the old size; ideally you should split your current block and make the end of it free.
Messing around with memory allocation routines is extremely risky; most are already as optimized as they physically can be, without impacting security. Any optimizations you may do may very well open up a hole that can be exploited - there are many 'use after free' style security issues around at the moment.
With this in mind, where better to look than the OpenBSD source for it: http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdlib/malloc.c?rev=1.140;content-type=text%2Fx-cvsweb-markup
There are those cases where the realloc reduces the block size by a non-trivial amount that is worth reclaiming for allocation elsewhere.
You could strive towards optimising realloc for performance (i.e. avoid moving the block and the memcpy), or you could optimise against memory fragmentation.
If it's the latter, you may consider moving the block to fill the best gap instead of just expanding or shrinking it.
Memory allocators are always a trade-off.

Linux heap - is doing a ton of new/deletes okay or does the heap become badly fragmented?

I'm not familiar with how the Linux heap is allocated.
I'm calling malloc()/free() many many times a second, always with the same sizes (there are about 10 structs, each fixed size). Aside from init time, none of my memory remains allocated for long periods of time.
Is this considered poor form with the standard heaps? (I'm sure someone will ask 'what heap are you using?' - 'Ugh. The standard static heap' ..meaning I'm unsure.)
Should I instead use a free list or does the heap tolerate lots of the same allocations. I'm trying to balance readability with performance.
Any tools to help me measure?
First of all, unless you have measured a problem with memory usage blowing up, don't even think about using a custom allocator. It's one of the worst forms of premature optimization.
At the same time, even if you do have a problem, a better solution than a custom allocator would be figuring out why you're allocating and freeing objects so much, and fixing the design issue that's causing it.
To address your specific question, glibc's allocator is based on the dlmalloc algorithm, which is near-optimal when it comes to fragmentation. The only way you'll get it to badly fragment memory is the unavoidable way: by allocating objects with radically different lifetimes in alternation, e.g. allocating a large number of objects but only freeing every other one. I think you'll have a hard time working out an allocation pattern that will give worse total memory usage than pools...
Valgrind has a special tool Massif for measuring memory usage. This should help to profile heap allocations.
I think the best performance optimization is to avoid heap allocation wherever it is possible (and reasonable). Whenever an object is stack allocated the compiler will just move the stack pointer up instead of trying to find a free spot or return the allocated memory to some free list.
How is the lifetime of your structures defined? If you can express your object lifetime by scoping then this would indeed increase performance.

Minimizing the amount of malloc() calls improves performance?

Consider two applications: one (num. 1) that invokes malloc() many times, and the other (num. 2) that invokes malloc() few times.
Both applications allocate the same amount of memory (assume 100MB).
For which application the next malloc() call will be faster, #1 or #2?
In other words: Does malloc() have an index of allocated locations in memory?
You asked 2 questions:
for which application the next malloc() call will be faster, #1 or #2?
In other words: Does malloc() have an index of allocated locations in memory?
You've implied that they are the same question, but they are not. The answer to the latter question is YES.
As for which will be faster, it is impossible to say. It depends on the allocator algorithm, the machine state, the fragmentation in the current process, and so on.
Your idea is sound, though: you should think about how malloc usage will affect performance.
There was once an app I wrote that used lots of little blobs of memory, each allocated with malloc(). It worked correctly but was slow. I replaced the many calls to malloc with just one, and then sliced up that large block within my app. It was much much faster.
I don't recommend this approach; it's just an illustration of the point that malloc usage can materially affect performance.
My advice is to measure it.
Of course this completely depends on the malloc implementation, but in this case, with no calls to free, most malloc implementations will probably give you the same algorithmic speed.
As another answer commented, usually there will be a list of free blocks, but if you have not called free, there will just be one, so it should be O(1) in both cases.
This assumes that the memory allocated for the heap is big enough in both cases. In case #1, you will have allocated more total memory, as each allocation involves memory overhead to store meta-data, as a result you may need to call sbrk(), or equivalent to grow the heap in case #1, which would add an additional overhead.
They will probably be different due to cache and other second order effects, since the memory alignments for the new allocation won't be the same.
If you have been freeing some of the memory blocks, then it is likely that #2 will be faster due to less fragmentation, and so a smaller list of free blocks to search.
If you have freed all the memory blocks, it should end up being exactly the same, since any sane free implementation will have coalesced the blocks back into a single arena of memory.
Malloc has to run through a linked list of free blocks to find one to allocate. This takes time. So, #1 will usually be slower:
The more often you call malloc, the more time it will take - so reducing the number of calls will give you a speed improvement (though whether it is significant will depend on your exact circumstances).
In addition, if you malloc many small blocks, then as you free those blocks, you will fragment the heap much more than if you only allocate and free a few large blocks. So you are likely to end up with many small free blocks on your heap rather than a few big blocks, and therefore your mallocs may have to search further through the free-space lists to find a suitable block to allocate. WHich again will make them slower.
These are of course implementation details, but typically free() will insert the memory into a list of free blocks. malloc() will then look at this list for a free block that is the right size, or larger. Typically, only if this fails does malloc() ask the kernel for more memory.
There are also other considerations, such as when to coalesce multiple adjacent blocks into a single, larger block.
And, another reason that malloc() is expensive: If malloc() is called from multiple threads, there must be some kind of synchronization on these global structures. (i.e. locks.) There exist malloc() implementations with different optimization schemes to make it better for multple threads, but generally, keeping it multi-thread safe adds to the cost, as multiple threads will contend for those locks and block progress on each other.
You can always do a better job using malloc() to allocate a large chunk of memory and sub-dividing it yourself. Malloc() was optimized to work well in the general case and makes no assumptions whether or not you use threads or what the size of the program's allocations might be.
Whether it is a good idea to implement your own sub-allocator is a secondary question. It rarely is, explicit memory management is already hard enough. You rarely need another layer of code that can screw up and crash your program without any good way to debug it. Unless you are writing a debug allocator.
The answer is that it depends, most of the potential slowness rather comes from malloc() and free() in combination and usually #1 and #2 will be of similar speed.
All malloc() implementations do have an indexing mechanism, but the speed of adding a new block to the index is usually not dependant on the number of blocks already in the index.
Most of the slowness of malloc comes from two sources
searching for a suitable free block among the previously freed(blocks)
multi-processor problems with locking
Writing my own almost standards compliant malloc() replacement tool malloc() && free() times from 35% to 3-4%, and it seriously optimised those two factors. It would likely have been a similar speed to use some other high-performance malloc, but having our own was more portable to esoteric devices and of course allows free to be inlined in some places.
You don't define the relative difference between "many" and "few" but I suspect most mallocs would function almost identically in both scenarios. The question implies that each call to malloc has as much overhead as a system call and page table updates. When you do a malloc call, e.g. malloc(14), in a non-brain-dead environment, malloc will actually allocate more memory than you ask for, often a multiple of the system MMU page size. You get your 14 bytes and malloc keeps track of the newly allocated area so that later calls can just return a chunk of the already allocated memory, until more memory needs to be requested from the OS.
In other words, if I call malloc(14) 100 times or malloc(1400) once, the overhead will be about the same. I'll just have to manage the bigger allocated memory chunk myself.
Allocating one block of memory is faster than allocating many blocks. There is the overhead of the system call and also searching for available blocks. In programming reducing the number of operations usually speeds up the execution time.
Memory allocators may have to search to find a block of memory that is the correct size. This adds to the overhead of the execution time.
However, there may be better chances of success when allocating small blocks of memory versus one large block. Is your program allocating one small block and releasing it or does it need to allocate (and preserve) small blocks. When memory becomes fragmented, there are less big chunks available, so the memory allocator may have to coalesce all the blocks to form a block big enough for the allocation.
If your program is allocating and destroying many small blocks of memory you may want to consider allocating a static array and using that for your memory.

Resources