What if I allocate memory using mmap instead of malloc? - c

What are the disadvantages of allocating memory using mmap (with MAP_PRIVATE and MAP_ANONYMOUS) than using malloc? For data in function scope, I would use stack memory anyway and therefore not malloc.
One disadvantage that comes to mind is for dynamic data structures such as trees and linked lists, where you frequently require to allocate and deallocate small chunks of data. Using mmap there would be expensive for two reasons, one for allocating at granularity of 4096 bytes and the other for requiring to make a system call.
But in other scenarios, do you think malloc is better than mmap? Secondly, am I overestimating disadvantage of mmap for dynamic data structures?
One advantage of mmap over malloc I can think of is that memory is immediately returned to the OS, when you do munmap, whereas with malloc/free, I guess memory uptil the data segment break point is never returned, but kept for reusage.

Yes, malloc is better than mmap. It's much easier to use, much more fine-grained and much more portable. In the end, it will call mmap anyway.
If you start doing everyday memory management with mmap, you'll want to implement some way of parceling it out in smaller chunks than pages and you will end up reimplementing malloc -- in a suboptimal way, probably.

First off, mmap() is a platform specific construct, so if you plan on writing portable C, it's already out.
Second, malloc() is essentially implemented in terms of mmap(), but it's a sort of intelligent library wrapper around the system call: it will request new memory from the system when needed, but until then it will pick a piece of memory in an area that's already committed to the process.
Therefore, if you want to do ordinary dynamic memory allocation, use malloc(), end of story. Use of mmap() for memory allocation should be reserved for special situations (e.g. if you actually want a whole page for yourself, aligned at the page boundary), and always abstracted into a single piece of library code so that others may easily understand what you're doing.

One feature that mmap has that malloc doesn't, is mmap allows you to allocate using Huge Pages (flag argument has MAP_HUGETLB set), while malloc doesn't have that option.

Related

Is it safe to use mmap and malloc to allocate memory in same program?

Till now what I understood is as follow:
malloc internally uses sbrk and brk to allocate memory by increasing top of heap.
mmap allocate memory in form of pages.
Now, let's say current top of sbrk/malloc is 0x001000. And I use mmap to allocate a page of 4KB which is allocated at 0x0020000. Later, if I used malloc multiple times and because of that it had to increase sbrk top. So, what if top reaches 0x002000?
So, it will be great if someone can clarify the following.
Is above scenario possible?
If no than please point out flaw in my understanding of malloc and mmap.
If yes than I assume it is not safe to use it in this way. So, is there any other way to use both safely?
Thank you.
malloc is normally not implemented this way today... malloc used sbrk(2) in old implementations, when extending the data segment was the only way to ask the system for more virtual memory. Newer systems use mmap(2) if available, as they allow more flexibility when the virtual space is large enough (each mmaped chunk is managed as a new data segment for the process requesting it). sbrk(2) expands and shrinks the data segment, just like a stack.... so you have to be careful using sbrk(2) in case you are going to use it intermixed with a sbrk implementation of malloc. The way malloc operates, normally disallows you to return any memory obtained with sbrk(2) if you intermix the calls... so you can only use it to grow the data segment safely.
sbrk(2) also allocates memory in pages. Since paged virtual memory emerged, almost all o.s. allocation is made in page units. Newer systems have even more than one pagesize (e.g. 4Kb and 2Mb sizes), so you can get benefit of that, depending on the application.
As 64bit systems get more and more use, there's no problem in allocating address space large enough to allow for both mecanisms to live together. This is an advantage for a multiple heap malloc implementation, as memory is allocated and deallocated independently, and never in LIFO allocated order.
Malloc uses different approaches to allocate memory, but implementations normally try not to interfere with user sbrk(2) usage. You have to be careful, that is, if you intermix malloc(3) calls with sbrk(2) in a sbrk(2) malloc system. then you run the risk of sbrk(2)ing over the malloc adjusted data segment, and breaking the malloc internal data structures. You had better not to use sbrk(2) yourself if you are using a sbrk(2) implementation of malloc.
Finally, to answer your question, mmap(2) allocates memory as malloc(3) does, so malloc is not, and has not to be, aware of the allocated memory you did for your own use with mmap(2).

When is it more appropriate to use valloc() as opposed to malloc()?

C (and C++) include a family of dynamic memory allocation functions, most of which are intuitively named and easy to explain to a programmer with a basic understanding of memory. malloc() simply allocates memory, while calloc() allocates some memory and clears it eagerly. There are also realloc() and free(), which are pretty self-explanatory.
The manpage for malloc() also mentions valloc(), which allocates (size) bytes aligned to the page border.
Unfortunately, my background isn't thorough enough in low-level intricacies; what are the implications of allocating and using page border-aligned memory, and when is this appropriate as opposed to regular malloc() or calloc()?
The manpage for valloc contains an important note:
The function valloc() appeared in 3.0BSD. It is documented as being obsolete in 4.3BSD, and as legacy in SUSv2. It does not appear in POSIX.1-2001.
valloc is obsolete and nonstandard - to answer your question, it would never be appropriate to use in new code.
While there are some reasons to want to allocate aligned memory - this question lists a few good ones - it is usually better to let the memory allocator figure out which bit of memory to give you. If you are certain that you need your freshly-allocated memory aligned to something, use aligned_alloc (C11) or posix_memalign (POSIX) instead.
Allocations with page alignment usually are not done for speed - they're because you want to take advantage of some feature of your processor's MMU, which typically works with page granularity.
One example is if you want to use mprotect(2) to change the access rights on that memory. Suppose, for instance, that you want to store some data in a chunk of memory, and then make it read only, so that any buggy part of your program that tries to write there will trigger a segfault. Since mprotect(2) can only change permissions page by page (since this is what the underlying CPU hardware can enforce), the block where you store your data had better be page aligned, and its size had better be a multiple of the page size. Otherwise the area you set read-only might include other, unrelated data that still needs to be written.
Or, perhaps you are going to generate some executable code in memory and then want to execute it later. Memory you allocate by default probably isn't set to allow code execution, so you'll have to use mprotect to give it execute permission. Again, this has to be done with page granularity.
Another example is if you want to allocate memory now, but might want to mmap something on top of it later.
So in general, a need for page-aligned memory would relate to some fairly low-level application, often involving something system-specific. If you needed it, you'd know. (And as mentioned, you should allocate it not with valloc, but using posix_memalign, or perhaps an anonymous mmap.)
First of all valloc is obsolete, and memalignshould be used instead.
Second thing it's not part of the C (C++) standard at all.
It's a special allocation which is aligned to _SC_PAGESIZE boundary.
When is it useful to use it? I guess never, unless you have some specific low level requirement. If you would need it, you would know to need it, since it's rarely useful (maybe just when trying some micro-optimizations or creating shared memory between processes).
The self-evident answer is that it is appropriate to use valloc when malloc is unsuitable (less efficient) for the application (virtual) memory usage pattern and valloc is better suited (more efficient). This will depend on the OS and libraries and architecture and application...
malloc traditionally allocated real memory from freed memory if available and by increasing the brk point if not, in which case it is cleared by the OS for security reasons.
calloc in a dumb implementation does a malloc and then (re)clears the memory, while a smart implementation would avoid reclearing newly allocated memory that is automatically cleared by the operating system.
valloc relates to virtual memory. In a virtual memory system using the file system, you can allocate a large amount of memory or filespace/swapspace, even more than physical memory, and it will be swapped in by pages so alignment is a factor. In Unix creation of file of a specified file and adding/deleting pages is done using inodes to define the file but doesn't deal with actual disk blocks till needed, in which case it creates them cleared. So I would expect a valloc system to increase the size of the data segment swap without actually allocating physical or swap pages, or running a for loop to clear it all - as the file and paging system does that as needed. Thus valloc should be a heck of a lot faster than malloc. But as with calloc, how particular idiotsyncratic *x/C flavours do it is up to them, and the valloc man page is totally unhelpful about these expectations.
Traditionally this was implemented with brk/sbrk. Of course in a virtual memory system, whether a paged or a segmented system, there is no real need for any of this brk/sbrk stuff and it is enough to simply write the last location in a file or address space to extend up to that point.
Re the allocation to page boundaries, that is not usually something the user wants or needs, but rather is usually something the system wants or needs.
A (probably more expensive) way to simulate valloc is to determine the page boundary and then call aligned_alloc or posix_memalign with this alignment spec.
The fact that valloc is deprecated or has been removed or is not required in some OS' doesn't mean that it isn't still useful and required for best efficiency in others. If it has been deprecated or removed, one would hope that there are replacements that are as efficient (but I wouldn't bet on it, and might, indeed have, written my own malloc replacement).
Over the last 40 years the tradeoffs of real and (once invented) virtual memory have changed periodically, and mainstream OS has tended to go for frills rather than efficiency, with programmers who don't have (time or space) efficiency as a major imperative. In the embedded systems, efficiency is more critical, but even there efficiency is often not well supported by the standard OS and/or tools. But when in doubt, you can roll your own malloc replacement for your application that does what you need, rather than depend on what someone else woke up and decided to do/implement, or to undo/deprecate.
So the real answer is you don't necessarily want to use valloc or malloc or calloc or any of the replacements your current subversion of an OS provides.

Is there a set of guidelines or advices for switching a `malloc()` implementation from using `sbrk()` to `mmap()`?

I am working on an embedded system that contains some of its own memory management code. This code works when compiled with uClibc however modern C libraries like musl disable sbrk(). What do I need to know to start rewriting a sbrk() based malloc() implementation into an mmap() based one.
Several months ago I had to code a malloc implementation as an assignment. I followed this very good tutorial that, unfortunately, used brk and sbrk to code simple malloc, free and realloc functions, while I had to use mmap to code my malloc, free and realloc. If I remember well, those are the things I noticed between mmap and sbrk :
Thou shalt keep track of your allocations
Calling sbrk with a 0 value gives you the current location of the program break. mmap doesn't work like that. Like malloc, a mmap call returns a pointer to a newly allocated zone. And you will have to stock that pointer somewhere. If you allocate several zones, you will have to keep track of all of them, with "handmade" linked lists as described in the above tutorial.
Thou shalt use mmap wisely
mmap is a system call, and a quite slow one. It allocates huge memory pages (a multiple of the default page size of your system, (that might be 4096 bytes). To avoid too many calls to mmap, you will have to allocate a big chunk of memory, and divide it into tiny chunks for your program allocations. Again, read the above tutorial. For my assignment, the trick was to create three mmap' ed "zones". One for tiny allocations, one for medium-sized allocations, big allocations were made with one call to mmap. All this for efficiency and optimization.
Thou shalt munmap your mmap's
If you don't use a mmap'ed zone anymore, that means if all its chunks of memory are not used, you must give it back to the system with the munmap() system call. And to do it efficiently you must pass the pointer to the beginning of the mmap'ed zone to do it. Hence the importance of keeping track of your allocations.
Hope this helps you somehow.
I think that the code and particularly point #7 of malloc(), free(), realloc() using brk() and sbrk()'s review may give you a good starting point :)
(As I'm researching a similar topic myself, I'll try to keep a list of (what I think are) useful links in here.)
Tips of malloc & free: Making your own malloc library for troubleshooting: a beginner-friendly presentation
Inside memory management: The choices, tradeoffs, and implementations of dynamic allocation: a step by step guide to implementing a sbrk()-based malloc() and replacing the standard one (LD_PRELOAD)
A Quick Tutorial on Implementing and Debugging Malloc, Free, Calloc, and Realloc: a walk-through of a publicly-available implementation of malloc()

C - Design your own free( ) function

Today, I appeared for an interview and the interviewer asked me this,
Tell me the steps how will you design your own free( ) function for
deallocate the allocated memory.
How can it be more efficient than C's default free() function ? What can you conclude ?
I was confused, couldn't think of the way to design.
What do you think guys ?
EDIT : Since we need to know about how malloc() works, can you tell me the steps to write our own malloc() function
That's actually a pretty vague question, and that's probably why you got confused. Does he mean, given an existing malloc implementation, how would you go about trying to develop a more efficient way to free the underlying memory? Or was he expecting you to start discussing different kinds of malloc implementations and their benefits and problems? Did he expect you to know how virtual memory functions on the x86 architecture?
Also, by more efficient, does he mean more space efficient or more time efficient? Does free() have to be deterministic? Does it have to return as much memory to the OS as possible because it's in a low-memory, multi-tasking environment? What's our criteria here?
It's hard to say where to start with a vague question like that, other than to start asking your own questions to get clarification. After all, in order to design your own free function, you first have to know how malloc is implemented. So chances are, the question was really about whether or not you knew anything about how malloc can be implemented.
If you're not familiar with the internals of memory management, the easiest way to get started with understanding how malloc is implemented is to first write your own.
Check out this IBM DeveloperWorks article called "Inside Memory Management" for starters.
But before you can write your own malloc/free, you first need memory to allocate/free. Unfortunately, in a protected mode OS, you can't directly address the memory on the machine. So how do you get it?
You ask the OS for it. With the virtual memory features of the x86, any piece of RAM or swap memory can be mapped to a memory address by the OS. What your program sees as memory could be physically fragmented throughout the entire system, but thanks to the kernel's virtual memory manager, it all looks the same.
The kernel usually provides system calls that allow you to map in additional memory for your process. On older UNIX OS's this was usually brk/sbrk to grow heap memory onto the edge of your process or shrink it off, but a lot of systems also provide mmap/munmap to simply map a large block of heap memory in. It's only once you have access to a large, contiguous looking block of memory that you need malloc/free to manage it.
Once your process has some heap memory available to it, it's all about splitting it into chunks, with each chunk containing its own meta information about its size and position and whether or not it's allocated, and then managing those chunks. A simple list of structs, each containing some fields for meta information and a large array of bytes, could work, in which case malloc has to run through the list until if finds a large enough unallocated chunk (or chunks it can combine), and then map in more memory if it can't find a big enough chunk. Once you find a chunk, you just return a pointer to the data. free() can then use that pointer to reverse back a few bytes to the member fields that exist in the structure, which it can then modify (i.e. marking chunk.allocated = false;). If there's enough unallocated chunks at the end of your list, you can even remove them from the list and unmap or shrink that memory off your process's heap.
That's a real simple method of implementing malloc though. As you can imagine, there's a lot of possible ways of splitting your memory into chunks and then managing those chunks. There's as many ways as there are data structures and algorithms. They're all designed for different purposes too, like limiting fragmentation due to small, allocated chunks mixed with small, unallocated chunks, or ensuring that malloc and free run fast (or sometimes even more slowly, but predictably slowly). There's dlmalloc, ptmalloc, jemalloc, Hoard's malloc, and many more out there, and many of them are quite small and succinct, so don't be afraid to read them. If I remember correctly, "The C Programming Language" by Kernighan and Ritchie even uses a simple malloc implementation as one of their examples.
You can't blindly design free() without knowing how malloc() works under the hood because your implementation of free() would need to know how to manipulate the bookkeeping data and that's impossible without knowing how malloc() is implemented.
So an unswerable question could be how you would design malloc() and free() instead which is not a trivial question but you could answer it partially for example by proposing some very simple implementation of a memory pool that would not be equivalent to malloc() of course but would indicate your presence of knowledge.
One common approach when you only have access to user space (generally known as memory pool) is to get a large chunk of memory from the OS on application start-up. Your malloc needs to check which areas of the right size of that pool are still free (through some data structure) and hand out pointers to that memory. Your free needs to mark the memory as free again in the data structure and possibly needs to check for fragmentation of the pool.
The benefits are that you can do allocation in nearly constant time, the drawback is that your application consumes more memory than actually is needed.
Tell me the steps how will you design your own free( ) function for deallocate the allocated memory.
#include <stdlib.h>
#undef free
#define free(X) my_free(X)
inline void my_free(void *ptr) { }
How can it be more efficient than C's default free() function ?
It is extremely fast, requiring zero machine cycles. It also makes use-after-free bugs go away. It's a very useful free function for use in programs which are instantiated as short-lived batch processes; it can usefully be deployed in some production situations.
What can you conclude ?
I really want this job, but in another company.
Memory usage patterns could be a factor. A default implementation of free can't assume anything about how often you allocate/deallocate and what sizes you allocate when you do.
For example, if you frequently allocate and deallocate objects that are of similar size, you could gain speed, memory efficiency, and reduced fragmentation by using a memory pool.
EDIT: as sharptooth noted, only makes sense to design free and malloc together. So the first thing would be to figure out how malloc is implemented.
malloc and free only have a meaning if your app is to work on top of an OS. If you would like to write your own memory management functions you would have to know how to request the memory from that specific OS or you could reserve the heap memory right away using existing malloc and then use your own functions to distribute/redistribute the allocated memory through out your app
There is an architecture that malloc and free are supposed to adhere to -- essentially a class architecture permitting different strategies to coexist. Then the version of free that is executed corresponds to the version of malloc used.
However, I'm not sure how often this architecture is observed.
The knowledge of working of malloc() is necessary to implement free(). You can find a implementation of malloc() and free() using the sbrk() system call in K&R The C Programming Language Chapter 8, Section 8.7 "Example--A Storage Allocator" pp.185-189.

making your own malloc function?

I read that some games rewrite their own malloc to be more efficient. I don't understand how this is possible in a virtual memory world. If I recall correctly, malloc actually calls an OS specific function, which maps the virtual address to a real address with the MMU. So then how can someone make their own memory allocator and allocate real memory, without calling the actual runtime's malloc?
Thanks
It's certainly possible to write an allocator more efficient than a general purpose one.
If you know the properties of your allocations, you can blow general purpose allocators out of the water.
Case in point: many years ago, we had to design and code up a communication subsystem (HDLC, X.25 and proprietary layers) for embedded systems. The fact that we knew the maximum allocation would always be less than 128 bytes (or something like that) meant that we didn't have to mess around with variable sized blocks at all. Every allocation was for 128 bytes no matter how much you asked for.
Of course, if you asked for more, it returned NULL.
By using fixed-length blocks, we were able to speed up allocations and de-allocations greatly, using bitmaps and associated structures to hold accounting information rather than relying on slower linked lists. In addition, the need to coalesce freed blocks was not needed.
Granted, this was a special case but you'll find that's so for games as well. In fact, we've even used this in a general purpose system where allocations below a certain threshold got a fixed amount of memory from a self-managed pre-allocated pool done the same way. Any other allocations (larger than the threshold or if the pool was fully allocated) were sent through to the "real" malloc.
Just because malloc() is a standard C function doesn't mean that it's the lowest level access you have to the memory system. In fact, malloc() is probably implemented in terms of lower-level operating system functionality. That means you could call those lower level interfaces too. They might be OS-specific, but they might allow you better performance than you would get from the malloc() interface. If that were the case, you could implement your own memory allocation system any way you want, and maybe be even more efficient about it - optimizing the algorithm for the characteristics of the size and frequency of allocations you're going to make, for example.
In general, malloc will call an OS-specific function to obtain a bunch of memory (at least one VM page), and will then divide that memory up into smaller chunks as needed to return to the caller of malloc.
The malloc library will also have a list (or lists) of free blocks, so it can often meet a request without asking the OS for more memory. Determining how many different block sizes to handle, deciding whether to attempt to combine adjacent free blocks, and so forth, are the choices the malloc library implementor has to make.
It's possible for you to bypass the malloc library and directly invoke the OS-level "give me some memory" function and do your own allocation/freeing within the memory you get from the OS. Such implementations are likely to be OS-specific. Another alternative is to use malloc for initial allocations, but maintain your own cache of freed objects.
One thing you can do is have your allocator allocate a pool of memory, then service requests from than (and allocate a bigger pool if it runs out). I'm not sure if that's what they're doing though.
If I recall correctly, malloc actually
calls an OS specific function
Not quite. Most hardware has a 4KB page size. Operating systems generally don't expose a memory allocation interface offering anything smaller than page-sized (and page-aligned) chunks.
malloc spends most of its time managing the virtual memory space that has already been allocated, and only occasionally requests more memory from the OS (obviously this depends on the size of the items you allocate and how often you free).
There is a common misconception that when you free something it is immediately returned to the operating system. While this sometimes occurs (particularly for larger memory blocks) it is generally the case that freed memory remains allocated to the process and can then be re-used by later mallocs.
So most of the work is in bookkeeping of already-allocated virtual space. Allocation strategies can have many aims, such as fast operation, low memory wastage, good locality, space for dynamic growth (e.g. realloc) and so on.
If you know more about your pattern of memory allocation and release, you can optimise malloc and free for your usage patterns or provide a more extensive interface.
For instance, you may be allocating lots of equal-sized objects, which may change the optimal allocation parameters. Or you may always free large amounts of objects at once, in which case you don't want free to be doing fancy things.
Have a look at memory pools and obstacks.
See How do games like GTA IV not fragment the heap?.

Resources