According to mmap's manual:
MAP_NORESERVE
Do not reserve swap space for this mapping. When swap space
is reserved, one has the guarantee that it is possible to modify
the mapping. When swap space is not reserved one might get
SIGSEGV upon a write if no physical memory is available.
To my understanding, if a regular file is mapped into the virtual address range, there is no need for any swap space. Only MAP_ANONYMOUS may need some swap space.
So, is it correct to always add MAP_NORESERVE flag in mmap for a regular file?
Update:
To be more specific, is it correct to always add MAP_NORESERVE flag in mmap for a regular file, when MAP_SHARED is used?
To my understanding, if a regular file is mapped into the virtual address range, there is no need for any swap space. Only MAP_ANONYMOUS may need some swap space.
That depends on the mmap flags. If a regular file is mapped with MAP_PRIVATE then the memory region is initialized from the file but not backed by the file. The system will need swap space for such a mapping if it decides to swap out any of its pages.
So, is it correct to always add MAP_NORESERVE flag in mmap for a regular file?
It is not incorrect to specify MAP_NORESERVE for any mapping. It's simply a question of what guarantees you want to have about program behavior.
Moreover, you seem looking at this from the wrong direction. If a particular mapping can never require swap space then the system will not reserve swap space for it, regardless of the flags. It doesn't hurt to use MAP_NORESERVE in such a case, but it doesn't help, either, so what would be the point?
On the other hand, if you want to be sure that mapping cannot fail on account of using MAP_NORESERVE then the most appropriate course of action is to avoid using that flag. You can completely ignore its existence if you wish, and in fact you should do so if you want maximum portability, because MAP_NORESERVE is a Linux extension not specified by POSIX.
Update:
As I understand it, you are asserting that you can reproducibly observe successful mapping of existing ranges of existing files with MAP_SHARED to require MAP_NORESERVE. That is, two such mapping attempts that differ only in whether MAP_NORESERVE is specified will produce different results for you, in a manner that you can predict and reliably reproduce.
I find that surprising, even dubious. I do not expect pages of a process's virtual address space that the page table maps to existing regions of a regular file to have any association with swap space, and therefore I do not expect the system to try to reserve any swap space for such pages when it establishes a mapping, flags notwithstanding. If you genuinely observe different behavior then I would attribute it to a library or kernel bug, about which you should file an issue.
Consider, for example, the GNU libc manual, which explicitly says that a memory mapping can be larger than physical memory and swap space, but does not document Linux-specific MAP_NORESERVE at all.
With that said, again, it is not incorrect (on Linux) to specify MAP_NORESERVE for any given memory mapping. I expect it to be meaningless for a MAP_SHARED mapping of an existing region of a regular file, however, so I would not consider it good -- much less best -- practice to routinely use that flag for such mappings. On the other hand, if specifying that flag works around a library or kernel bug that otherwise interferes with establishing certain mappings then I see no particular reason to avoid doing that, but I would expect each such use to be accompanied by a documentary comment explaining why that flag is used.
Related
I want to make sure the return address of sbrk is within a certain specific range. I read somewhere that sbrk allocates from an area allocated at program initialization. So I'm wondering if there's anyway I can enforce the program initialization to allocate from a specific address? For example, with mmap, I'll be able to do so with MAP_FIXED_NOREPLACE . Is it possible to have something similar?
No, this is not possible. brk and sbrk refer to the data segment of the program, and that can be loaded at any valid address that meets the needs of the dynamic linker. Different architectures can and do use different addresses, and even machines of the same architecture can use different ranges depending on the configuration of the kernel. Using a fixed address or address range is extremely nonportable and will make your program very brittle to future changes. I fully expect that doing this will cause your program to break in the future simply by upgrading libc.
In addition, modern programs are typically compiled as position-independent executables so that ASLR can be used to improve security. Therefore, even if you knew the address range that was used for one invocation of your program, the very next invocation of your program might use a totally different address range.
In addition, you almost never want to invoke brk or sbrk by hand. In almost all cases, you will want to use the system memory allocator (or a replacement like jemalloc), which will handle this case for you. For example, glibc's malloc implementation, like most others, will allocate large chunks of memory using mmap, which can significantly reduce memory usage in long-running programs, since these large chunks can be freed independently. The memory allocator also may not appreciate you changing the size of the data segment without consulting it.
Finally, in case you care about portability to other Unix systems, not all systems even have brk and sbrk. OpenBSD allocates all memory using mmap which improves security by expanding the use of ASLR (at the cost of performance).
If you absolutely must use a fixed address or address range and there is no alternative, you'll need to use mmap to allocate that range of memory.
I am writing a 64-bit application in C (with GCC) and NASM under Linux.
Is there a way to specify, where I want my heap and stack to be located. Specifically, I want all my malloc'ed data to be anywhere in range 0x00000000-0x7FFFFFFF. This can be done at either compile time, linking or runtime, via C code or otherwise. It doesn't matter.
If this is not possible, please explain, why.
P.S. For those interested, what the heck I am doing:
The program I am working on is written in C. During runtime it generates NASM code, compiles it and dynamically links to the already running program. This is needed for extreme optimization, because that code will be run thousands-if-not-billions of times, and is not known at compile time. So the reason I need 0x00000000-0x7FFFFFFF addresses is because they fit in immediates in assembler code. If I don't need to load the addresses separately, I can just about half the number of memory accesses needed and increase locality.
For Linux, the standard way of acquiring any Virtual Address range is using the mmap(2) function.
You can specify the starting virtual address and the size. If the address is not already in use and it not reserved by prior calls (or by the kernel) you will get access to the virtual address.
The success of this call can be checked by comparing the return value to the start address you passed. If the call fails, the function returns NULL.
In general mmap is used to map virtual addresses to file descriptors. But this mapping has to happen through physical pages on the RAM. Since the applications cannot directly access the disk.
Since you do not want any file backing, you can use the MAP_ANONYMOUS flag in the mmap call (also pass -1 as the fd).
This is the excerpt for the related part of the man-page -
MAP_ANONYMOUS
The mapping is not backed by any file; its contents are
initialized to zero. The fd argument is ignored; however,
some implementations require fd to be -1 if MAP_ANONYMOUS (or
MAP_ANON) is specified, and portable applications should
ensure this. The offset argument should be zero. The use of
MAP_ANONYMOUS in conjunction with MAP_SHARED is supported on
Linux only since kernel 2.4.
C (and C++) include a family of dynamic memory allocation functions, most of which are intuitively named and easy to explain to a programmer with a basic understanding of memory. malloc() simply allocates memory, while calloc() allocates some memory and clears it eagerly. There are also realloc() and free(), which are pretty self-explanatory.
The manpage for malloc() also mentions valloc(), which allocates (size) bytes aligned to the page border.
Unfortunately, my background isn't thorough enough in low-level intricacies; what are the implications of allocating and using page border-aligned memory, and when is this appropriate as opposed to regular malloc() or calloc()?
The manpage for valloc contains an important note:
The function valloc() appeared in 3.0BSD. It is documented as being obsolete in 4.3BSD, and as legacy in SUSv2. It does not appear in POSIX.1-2001.
valloc is obsolete and nonstandard - to answer your question, it would never be appropriate to use in new code.
While there are some reasons to want to allocate aligned memory - this question lists a few good ones - it is usually better to let the memory allocator figure out which bit of memory to give you. If you are certain that you need your freshly-allocated memory aligned to something, use aligned_alloc (C11) or posix_memalign (POSIX) instead.
Allocations with page alignment usually are not done for speed - they're because you want to take advantage of some feature of your processor's MMU, which typically works with page granularity.
One example is if you want to use mprotect(2) to change the access rights on that memory. Suppose, for instance, that you want to store some data in a chunk of memory, and then make it read only, so that any buggy part of your program that tries to write there will trigger a segfault. Since mprotect(2) can only change permissions page by page (since this is what the underlying CPU hardware can enforce), the block where you store your data had better be page aligned, and its size had better be a multiple of the page size. Otherwise the area you set read-only might include other, unrelated data that still needs to be written.
Or, perhaps you are going to generate some executable code in memory and then want to execute it later. Memory you allocate by default probably isn't set to allow code execution, so you'll have to use mprotect to give it execute permission. Again, this has to be done with page granularity.
Another example is if you want to allocate memory now, but might want to mmap something on top of it later.
So in general, a need for page-aligned memory would relate to some fairly low-level application, often involving something system-specific. If you needed it, you'd know. (And as mentioned, you should allocate it not with valloc, but using posix_memalign, or perhaps an anonymous mmap.)
First of all valloc is obsolete, and memalignshould be used instead.
Second thing it's not part of the C (C++) standard at all.
It's a special allocation which is aligned to _SC_PAGESIZE boundary.
When is it useful to use it? I guess never, unless you have some specific low level requirement. If you would need it, you would know to need it, since it's rarely useful (maybe just when trying some micro-optimizations or creating shared memory between processes).
The self-evident answer is that it is appropriate to use valloc when malloc is unsuitable (less efficient) for the application (virtual) memory usage pattern and valloc is better suited (more efficient). This will depend on the OS and libraries and architecture and application...
malloc traditionally allocated real memory from freed memory if available and by increasing the brk point if not, in which case it is cleared by the OS for security reasons.
calloc in a dumb implementation does a malloc and then (re)clears the memory, while a smart implementation would avoid reclearing newly allocated memory that is automatically cleared by the operating system.
valloc relates to virtual memory. In a virtual memory system using the file system, you can allocate a large amount of memory or filespace/swapspace, even more than physical memory, and it will be swapped in by pages so alignment is a factor. In Unix creation of file of a specified file and adding/deleting pages is done using inodes to define the file but doesn't deal with actual disk blocks till needed, in which case it creates them cleared. So I would expect a valloc system to increase the size of the data segment swap without actually allocating physical or swap pages, or running a for loop to clear it all - as the file and paging system does that as needed. Thus valloc should be a heck of a lot faster than malloc. But as with calloc, how particular idiotsyncratic *x/C flavours do it is up to them, and the valloc man page is totally unhelpful about these expectations.
Traditionally this was implemented with brk/sbrk. Of course in a virtual memory system, whether a paged or a segmented system, there is no real need for any of this brk/sbrk stuff and it is enough to simply write the last location in a file or address space to extend up to that point.
Re the allocation to page boundaries, that is not usually something the user wants or needs, but rather is usually something the system wants or needs.
A (probably more expensive) way to simulate valloc is to determine the page boundary and then call aligned_alloc or posix_memalign with this alignment spec.
The fact that valloc is deprecated or has been removed or is not required in some OS' doesn't mean that it isn't still useful and required for best efficiency in others. If it has been deprecated or removed, one would hope that there are replacements that are as efficient (but I wouldn't bet on it, and might, indeed have, written my own malloc replacement).
Over the last 40 years the tradeoffs of real and (once invented) virtual memory have changed periodically, and mainstream OS has tended to go for frills rather than efficiency, with programmers who don't have (time or space) efficiency as a major imperative. In the embedded systems, efficiency is more critical, but even there efficiency is often not well supported by the standard OS and/or tools. But when in doubt, you can roll your own malloc replacement for your application that does what you need, rather than depend on what someone else woke up and decided to do/implement, or to undo/deprecate.
So the real answer is you don't necessarily want to use valloc or malloc or calloc or any of the replacements your current subversion of an OS provides.
I've been looking at the different flags for the mmap function, namely MAP_FIXED, MAP_SHARED, MAP_PRIVATE. Can someone explain to me the purpose of MAP_FIXED? There's no guarantee that the address space will be used in the first place.
MAP_FIXED is dup2 for memory mappings, and it's useful in exactly the same situations where dup2 is useful for file descriptors: when you want to perform a replace operation that atomically reassigns a resource identifier (memory range in the case of MAP_FIXED, or fd in the case of dup2) to refer to a new resource without the possibility of races where it might get reassigned to something else if you first released the old resource then attempted to regain it for the new resource.
As an example, take loading a shared library (by the dynamic loader). It consists of at least three types of mappings: read+exec-only mapping of the program code and read-only data from the executable file, read-write mapping of the initialized data (also from the executable file, but typically with a different relative offset), and read-write zero-initialized anonymous memory (for .bss). Creating these as separate mappings would not work because they must be at fixed relative addresses relative to one another. So instead you first make a dummy mapping of the total length needed (the type of this mapping doesn't matter) without MAP_FIXED just to reserve a sufficient range of contiguous addresses at a kernel-assigned location, then you use MAP_FIXED to map over top of parts of this range as needed with the three or more mappings you need to create.
Further, note that use of MAP_FIXED with a hard-coded address or a random address is always a bug. The only correct way to use MAP_FIXED is to replace an existing mapping whose address was assigned by a previous successful call to mmap without MAP_FIXED, or in some other way where you feel it's safe to replace whole pages. This aspect too is completely analogous to dup2; it's always a bug to use dup2 when the caller doesn't already have an open file on the target fd with the intent to replace it.
If the file you are loading contains pointers, you will need to load it at a fixed location in order to ensure that the pointers are correct. In some cases, this can merely be an optimization.
Executables which are not position-independent must be loaded at fixed addresses.
Shared memory may contain pointers.
Executables which use prebinding will attempt to load dynamic libraries at predetermined memory locations as an optimization, but will fall back to normal loading techniques if a different location is used (or if the library has changed).
So MAP_FIXED is not typical usage.
I am confused by the specification of mmap.
Let pa be the return address of mmap (the same as the specification)
pa = mmap(addr, len, prot, flags, fildes, off);
In my opinion after the function call succeed the following range is valid
[ pa, pa+len )
My question is whether the range of the following is still valid?
[ round_down(pa, pagesize) , round_up(pa+len, pagesize) )
[ base, base + size ] for short
That is to say:
is the base always aligned on the page boundary?
is the size always a multiple of pagesize (the granularity is pagesize in other words)?
Thanks for your help.
I think it is implied in this paragraph :
The off argument is constrained to be aligned and sized according to the value returned by sysconf() when passed _SC_PAGESIZE or _SC_PAGE_SIZE. When MAP_FIXED is specified, the application shall ensure that the argument addr also meets these constraints. The implementation performs mapping operations over whole pages. Thus, while the argument len need not meet a size or alignment constraint, the implementation shall include, in any mapping operation, any partial page specified by the range [pa,pa+len).
But I'm not sure and I do not have much experience on POSIX.
Please show me some more explicit and more definitive evidence
Or show me at least one system which supports POSIX and has different behavior
Thanks agian.
Your question is fairly open-ended, considering that mmap has many different modes and configurations, but I'll try to cover the most important points.
Take the case in which you are mapping a file into memory. The beginning of the data in the file will always be rooted at the return address of mmap(). While the operating system may have actually created maps at page boundaries, I do not believe the POSIX standard requires the OS to make this memory writable (for example it could force segfaults on these regions if it wanted to). In the case of mapping files it doesn't make sense for this additional memory address regions to be backed by a file, it makes more sense for these regions to be undefined.
For MMAP_ANONYMOUS, however, the memory is likely writable--but, again, it would be unwise to use this memory.
Additionally, when you are using mmap() you are actually using glibc's version of mmap(), and it may slice and dice memory anyway it sees fit. Finally, it is worth noting that on OSX, which is POSIX compliant, none of the quoted text you presented appears in the man page for mmap().