alignment and granularity of mmap - c

I am confused by the specification of mmap.
Let pa be the return address of mmap (the same as the specification)
pa = mmap(addr, len, prot, flags, fildes, off);
In my opinion after the function call succeed the following range is valid
[ pa, pa+len )
My question is whether the range of the following is still valid?
[ round_down(pa, pagesize) , round_up(pa+len, pagesize) )
[ base, base + size ] for short
That is to say:
is the base always aligned on the page boundary?
is the size always a multiple of pagesize (the granularity is pagesize in other words)?
Thanks for your help.
I think it is implied in this paragraph :
The off argument is constrained to be aligned and sized according to the value returned by sysconf() when passed _SC_PAGESIZE or _SC_PAGE_SIZE. When MAP_FIXED is specified, the application shall ensure that the argument addr also meets these constraints. The implementation performs mapping operations over whole pages. Thus, while the argument len need not meet a size or alignment constraint, the implementation shall include, in any mapping operation, any partial page specified by the range [pa,pa+len).
But I'm not sure and I do not have much experience on POSIX.
Please show me some more explicit and more definitive evidence
Or show me at least one system which supports POSIX and has different behavior
Thanks agian.

Your question is fairly open-ended, considering that mmap has many different modes and configurations, but I'll try to cover the most important points.
Take the case in which you are mapping a file into memory. The beginning of the data in the file will always be rooted at the return address of mmap(). While the operating system may have actually created maps at page boundaries, I do not believe the POSIX standard requires the OS to make this memory writable (for example it could force segfaults on these regions if it wanted to). In the case of mapping files it doesn't make sense for this additional memory address regions to be backed by a file, it makes more sense for these regions to be undefined.
For MMAP_ANONYMOUS, however, the memory is likely writable--but, again, it would be unwise to use this memory.
Additionally, when you are using mmap() you are actually using glibc's version of mmap(), and it may slice and dice memory anyway it sees fit. Finally, it is worth noting that on OSX, which is POSIX compliant, none of the quoted text you presented appears in the man page for mmap().

Related

Is it possible to allocate a single byte of memory at a specific address?

Is it possible to allocate a single byte of memory at a specific desired address, say 0x123?
This suggests follow up questions:
Is it possible to know if a specific address has already been malloced?
Some complications could be:
The byte at the desired address 0x123 was already malloc'ed. In this case, can I move the byte value elsewhere and notify the compiler (or whatever's keeping track of these things) of the new address of the byte?
The byte at the desired address 0x123 was malloc'ed along with other bytes. E.g. char *str = malloc(8); and str <= 0x123 < str + 8, or in other words, 0x123 overlaps some portion of already malloc'ed memory. In this case, is it possible to move the portion of malloc'ed memory elsewhere and notify the compiler (or whatever's keeping track of these things)?
There are also several variations:
Is this possible if the desired address is known at compile time?
Is this possible if the desired address is known at run time?
I know mmap takes a hint addr, but it allocates in multiples of the pagesize and may or may not allocate at the given hint addr.
It is possible to assign a specific value to a pointer as follows:
unsigned char *p = (unsigned char *)0x123;
However dereferencing such a pointer will almost certainly result in undefined behavior on any hosted system.
The only time such a construct would be valid is on an embedded system where it is allowed to access an arbitrary address and the implementation documents specific addresses for specific uses.
As for trying to manipulate the inner workings of a malloc implementation, such a task is very system specific and not likely to yield any benefit.
The are operating-system-specific ways to do this. On Windows, you can use VirtualAlloc (with the MEM_COMMIT | MEM_RESERVE flags). On Linux, you can use mmap (with the MAP_FIXED_NOREPLACE flag). These are the operating system functions which give you full control over your own address space.
In either case, you can only map entire pages. Addresses only become valid and invalid a page at a time. You can't have a page that is only half valid, and you can't have a page where only one address is valid. This is a CPU limitation.
If the page you want is already allocated, then obviously you can't allocate it again.
On both Windows and Linux, you can't allocate the first page. This is so that accesses to NULL pointers (which point to the first page) will always crash.
Is it possible to allocate a single byte of memory at a specific desired address, say 0x123?
Generally: no. The C language doesn't cover allocation at specific addresses, it only covers how to access a specific address. Many compilers do provide a non-standard language extensions for how to allocate at a fixed address. When sticking to standard C, the actual allocation must either be done:
In hardware, by for example having a MCU which provides a memory-mapped register map, or
By the system-specific linker, through custom linker scripts.
See How to access a hardware register from firmware? for details.
malloc doesn't make any sense in either case, since it exclusively uses heap allocation and the heap sits inside a pre-designated address space.

Always add MAP_NORESERVE flag in mmap for a regular file?

According to mmap's manual:
MAP_NORESERVE
Do not reserve swap space for this mapping. When swap space
is reserved, one has the guarantee that it is possible to modify
the mapping. When swap space is not reserved one might get
SIGSEGV upon a write if no physical memory is available.
To my understanding, if a regular file is mapped into the virtual address range, there is no need for any swap space. Only MAP_ANONYMOUS may need some swap space.
So, is it correct to always add MAP_NORESERVE flag in mmap for a regular file?
Update:
To be more specific, is it correct to always add MAP_NORESERVE flag in mmap for a regular file, when MAP_SHARED is used?
To my understanding, if a regular file is mapped into the virtual address range, there is no need for any swap space. Only MAP_ANONYMOUS may need some swap space.
That depends on the mmap flags. If a regular file is mapped with MAP_PRIVATE then the memory region is initialized from the file but not backed by the file. The system will need swap space for such a mapping if it decides to swap out any of its pages.
So, is it correct to always add MAP_NORESERVE flag in mmap for a regular file?
It is not incorrect to specify MAP_NORESERVE for any mapping. It's simply a question of what guarantees you want to have about program behavior.
Moreover, you seem looking at this from the wrong direction. If a particular mapping can never require swap space then the system will not reserve swap space for it, regardless of the flags. It doesn't hurt to use MAP_NORESERVE in such a case, but it doesn't help, either, so what would be the point?
On the other hand, if you want to be sure that mapping cannot fail on account of using MAP_NORESERVE then the most appropriate course of action is to avoid using that flag. You can completely ignore its existence if you wish, and in fact you should do so if you want maximum portability, because MAP_NORESERVE is a Linux extension not specified by POSIX.
Update:
As I understand it, you are asserting that you can reproducibly observe successful mapping of existing ranges of existing files with MAP_SHARED to require MAP_NORESERVE. That is, two such mapping attempts that differ only in whether MAP_NORESERVE is specified will produce different results for you, in a manner that you can predict and reliably reproduce.
I find that surprising, even dubious. I do not expect pages of a process's virtual address space that the page table maps to existing regions of a regular file to have any association with swap space, and therefore I do not expect the system to try to reserve any swap space for such pages when it establishes a mapping, flags notwithstanding. If you genuinely observe different behavior then I would attribute it to a library or kernel bug, about which you should file an issue.
Consider, for example, the GNU libc manual, which explicitly says that a memory mapping can be larger than physical memory and swap space, but does not document Linux-specific MAP_NORESERVE at all.
With that said, again, it is not incorrect (on Linux) to specify MAP_NORESERVE for any given memory mapping. I expect it to be meaningless for a MAP_SHARED mapping of an existing region of a regular file, however, so I would not consider it good -- much less best -- practice to routinely use that flag for such mappings. On the other hand, if specifying that flag works around a library or kernel bug that otherwise interferes with establishing certain mappings then I see no particular reason to avoid doing that, but I would expect each such use to be accompanied by a documentary comment explaining why that flag is used.

mmap and munmap behaviour

The Open Group standard says that munmap should be called with a page aligned address, but there doesn't seem to be any requirement that mmap should be returning a page aligned address. Is this something you need to handle when you're writing portable code?
mmap will only map whole pages, and can thus only return a page boundary. It's in the short description:
mmap - map pages of memory
(emphasis mine)
mmap documentation does mention this requirement, although in an off-handed manner. on my mac, for example:
[EINVAL] The offset argument was not page-aligned based on the
page size as returned by getpagesize(3).
http://pubs.opengroup.org/onlinepubs/009695399/functions/mmap.html also says
[EINVAL] The addr argument (if MAP_FIXED was specified) or off is not a multiple of the page size as returned by sysconf(), or is considered invalid by the implementation.
I think it's the most natural arrangement (that is, when both the physical and virtual addresses have the same page granularity and alignment). The whole purpose of page translation is to break the virtual address space into spans and independently map them onto blocks of physical memory (pages), with 1 span covering exactly 1 block (page). Even with pages of mixed sizes, the alignment is naturally preserved (e.g. regular page=4KB and large page=2GB/4GB on x86/64; some illustrations).
If I understand it correctly, if MAP_FIXED is not specified, the behavior of mmap is implementation dependent. So the only portable way of using mmap is with MAP_FIXED, which means you have to provide an address that is page aligned. Otherwise you'll receive EINVAL.

What non-NULL memory addresses are free to be used as error values in C?

I need to write my own memory allocation functions for the GMP library, since the default functions call abort() and leave no way I can think of to restore program flow after that occurs (I have calls to mpz_init all over the place, and how to handle the failure changes based upon what happened around that call). However, the documentation requires that the value the function returns to not be NULL.
Is there at least one range of addresses that can always be guaranteed to be invalid? It would be useful to know them all, so I could use different addresses for different error codes, or possibly even different ranges for different families of errors.
If the default memory allocation functions abort(), and GMP's code can't deal with a NULL, then GMP is likely not prepared to deal with the possibility of memory allocation failures at all. If you return a deliberately invalid address, GMP's probably going to try to dereference it, and promptly crash, which is just as bad as calling abort(). Worse, even, because the stacktrace won't point at what's really causing the problem.
As such, if you're going to return at all, you must return a valid pointer, one which isn't being used by anything else.
Now, one slightly evil option would be to use setjmp() and longjmp() to exit the GMP routines. However, this will leave GMP in an unpredictable state - you should assume that you can never call a GMP routine again after this point. It will also likely result in memory leaks... but that's probably the least of your concerns at this point.
Another option is to have a reserved pool in the system malloc - that is, at application startup:
emergencyMemory = malloc(bignumber);
Now if malloc() fails, you do free(emergencyMemory), and, hopefully, you have enough room to recover. Keep in mind that this only gives you a finite amount of headroom - you have to hope GMP will return to your code (and that code will check and see that the emergency pool has been used) before you truly run out of memory.
You can, of course, also use these two methods in combination - first use the reserved pool and try to recover, and if that fails, longjmp() out, display an error message (if you can), and terminate gracefully.
No, there isn't a portable range of invalid pointer values.
You could use platform-specific definitions, or you could use the addresses of some global objects:
const void *const error_out_of_bounds = &error_out_of_bounds;
const void *const error_no_sprockets = &error_no_sprockets;
[Edit: sorry, missed that you were hoping to return these values to a library. As bdonlan says, you can't do that. Even if you find some "invalid" values, the library won't be expecting them. It is a requirement that your function must return a valid value, or abort.]
You could do something like this in globals:
void (*error_handler)(void*);
void *error_data;
Then in your code:
error_handler = some_handler;
error_data = &some_data;
mpz_init(something);
In your allocator:
if (allocated_memory_ok) return the_memory;
error_handler(error_data);
abort();
Setting up the error handler and data before calling mzp_init might be somewhat tedious, but depending how different the behaviour is in different cases, you might be able to write some function or macro to deal with it.
What you can't do, though, is recover and carry on running if the GMP library isn't designed to cope after an allocation fails. You're at the mercy of your tools in that respect - if the library call doesn't return on error, then who knows what broken state its internals will be left in.
But that's a fully general view, whereas GMP is open source. You can find out what actually happens in mpz_init, at least for a particular release of GMP. There might be some way to ensure in advance that your allocator has enough memory to satisfy the request(s), or there might be some way to wriggle out without doing too much damage (like bdonlon says, a longjmp).
Since nobody has provided the correct answer, the set of non-NULL memory addresses you can safely use as error values is the same as the set of addresses you create for this purpose. Simply declare a static const char (or global const char if you need it to be globally visible) array whose size N is the number of error codes you need, and use pointers to the N elements of this array as the N error values.
If your pointer type is not char * but something else, you may need to use an object of that type instead of a char array, since converting these char pointers into another pointer type is not guaranteed to work.
Only garanteed on current main stream operating systems (with enabled virtual memory) and CPU architectures:
-1L (means all bits on in a value large enough for a pointer)
This is used by a lot of libraries to mark pointers which are freed. With this you can find out easily if the error cames from using a NULL pointer or a hanging reference.
Works on HP-UX, Windows, Solaris, AIX, Linux, Free-Net-OpenBSD and with i386, amd64, ia64, parisc, sparc and powerpc.
Think this works enough. Don't see any reason for more then this two values (0,-1)
If you only return e.g. 16-bit or 32-bit aligned pointers, an uneven pointer-address (LSB equal to 1) will be at least "mysterious", and would create an opportunity for using my all-time favorite bogus-value 0xDEADBEEF (for 32-bit pointers) or 0xDEADBEEFBADF00D (for 64-bit pointers).
There are several ranges you can use, they are operating system and architecture specific.
Typically most platforms will reserve the first page (usually 4K bytes in length), to catch dereferencing of null pointers (plus room for a slight offset).
You can also point to the reserved operating system pages, on Linux these occupy the region from 0xc0000000 to 0xffffffff (on a 32 bit system). From userspace you won't have necessary privileges to access this region.
Another option (if you want to allocate several such values, is to allocate a page without read or write permissions using mmap or equivalent, and use offsets into this page for each distinct error value.
The simplest solution, is just to use either values immediately negative to 0, (-1, -2, etc.), or immediately positive (1, 2, ...). You can be very certain these addresses are on inaccessible pages.
A possibility is to take C library addresses that are guaranteed to exist and that thus will never be returned by malloc or similar. To be most portable this should be object pointers and not function pointers, but casting ((void*)main) would probably be ok on most architectures. One data pointer that comes to my mind is environ, but which is POSIX, or stdin etc which are not guaranteed to be "real" variables.
To use this you could just use the following:
extern char** environ; /* guaranteed to exist in POSIX */
#define DEADBEAF ((void*)&environ)

What is the maximum size of buffers memcpy/memset etc. can handle?

What is the maximum size of buffers memcpy and other functions can handle? Is this implementation dependent? Is this restricted by the size(size_t) passed in as an argument?
This is entirely implementation dependent.
This depends on the hardware as much as anything, but also on the age of the compiler. For anyone with a reasonably modern compiler (meaning anything based on a standard from the early 90's or later), the size argument is a size_t. This can reasonably be the largest 16 bit unsigned, the largest 32 bit unsigned, or the largest 64 bit unsigned, depending on the memory model the compiler compiles to. In this case, you just have to find out what size a size_t is in your implementation. However, for very old compilers (that is, before ANSI-C and perhaps for some early versions of ANSI C), all bets are off.
On the standards side, looking at cygwin and Solaris 7, for example, the size argument is a size_t. Looking at an embedded system that I have available, the size argument is an unsigned (meaning 16-bit unsigned). (The compiler for this embedded system was written in the 80's.) I found a web reference to some ANSI C where the size parameter is an int.
You may want to see this article on size_t as well as the follow-up article about a mis-feature of some early GCC versions where size_t was erroneously signed.
In summary, for almost everyone, size_t will be the correct reference to use. For those few using embedded systems or legacy systems with very old compilers, however, you need to check your man page.
Functions normally use a size_t to pass a size as parameter. I say normally because fgets() uses an int parameter, which in my opinion is a flaw in the C standard.
size_t is defined as a type which can contain the size (in bytes) of any object you could access. Generally it's a typedef of unsigned int or unsigned long.
That's why the values returnes by the sizeof operator are of size_t type.
So 2 ** (sizeof(size_t) * CHAR_BIT) gives you a maximum amount of memory that your program could handle, but it's certainly not the most precise one.
(CHAR_BIT is defined in limits.h and yields the number of bits contained in a char).
They take a size_t argument; so the it's platform dependent.
Implementation dependent, but you can look in the header (.h) file that you need to include before you can use memcpy. The declaration will tell you (look for size_t or other).
And then you ask what size_t is, well, that's the implementation dependent part.
Right, you cannot copy areas that are greater then 2^(sizeof(size_t)*8) bytes. But that is nothing to worry about, because you cannot allocate more space either, because malloc also takes the size as a size_t parameter.
There is also an issue related to what size_t can represent verses what your platform will allow a process to actually address.
Even with virtual memory on a 64-bit platform, you are unlikely to be able to call memcpy() with sizes of more than a few TB or so this week, and even then that is a pretty hot machine.... it is hard to imagine what a machine on which it would be possible to install a fully covered 64-bit address space would look like.
Never mind the embedded systems with only a few KB of total writable memory, where it can't make sense to attempt to memcpy() more information than the RAM regardless of the definition of size_t. Do think about what just happened to the stack holding the return address from that call if you did?
Or systems where the virtual address space seen by a process is smaller than the physical memory installed. This is actually the case with a Win32 process running on a Win64 platform, for example. (I first encountered this under the time sharing OS TSX-11 running on a PDP-11 with 4MB of physical memory, and 64KB virtual address in each process. 4MB of RAM was a lot of memory then, and the IBM PC didn't exist yet.)

Resources