allocating address zero on Linux with mmap fails - c

I am writing a static program loader for Linux, I am reading ELF program headers and mapping the segments to the memory.
I have come across an executable which assumes that the virtual address of its first segment is at 0. My memory mapping fails, I get error allocating virtual page at address 0.
I wonder if it is possible to allocate at all memory at address 0 for the user-space.
See this example code:
/*mmaptests.c*/
#include <sys/mman.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
int main()
{
void* p = mmap(0, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
printf("mmap result %p (errno %s)\n",p,strerror(errno));
return 0;
}
I compile it with:
gcc mmaptests.c
This is what it returns :
$./a.out
mmap result 0xffffffffffffffff (errno Operation not permitted)
I will be happy for any insights.
Thanks
B

Linux will only let you mmap the 0-th page if you have privileges.
gcc mmaptests.c && sudo ./a.out
should get you:
mmap result (nil) (errno Success)

Related

Cannot create anonymous mapping with MAP_32BIT on MacOS

I'm on a 64-bit system, but want to use mmap to allocate pages within the first 2GB of memory. On Linux, I can do this with the MAP_32BIT flag:
#include <sys/mman.h>
#include <stdio.h>
int main() {
void *addr = mmap(
NULL, // address hint
4096, // size
PROT_READ | PROT_WRITE, // permissions
MAP_32BIT | MAP_PRIVATE | MAP_ANONYMOUS, // flags
-1, // file descriptor
0 // offset
);
if (addr == MAP_FAILED)
perror("mmap");
else
printf("%p", addr);
}
Godbolt link demonstrating that this works on Linux. As of version 10.15, MacOS also allegedly supports the MAP_32BIT flag. However, when I compile and run the program on my system (11.3), it fails with ENOMEM. The mapping does work when MAP_32BIT is removed.
I have a few potential explanations for why this doesn't work, but none of them are very compelling:
The permissions are wrong somehow (although removing either PROT_READ or PROT_WRITE didn't solve it).
I need to specify an address hint for this to work, for some reason.
MacOS (or my version of it) simply doesn't support MAP_32BIT for anonymous mappings.
The problem is the "zero page": on some 32-bit Unixes, the lowest page of memory is commonly configured to be inaccessible so that accesses to NULL can be detected and signal an error. On 64-bit systems, MacOS extends this to the entire first 4 GiB of memory by default. mmap therefore refuses to map addresses in this region, since they are already mapped to page zero.
This can be simply changed using a linker option:
$ cc -Wl,-pagezero_size,0x1000 test.c
$ ./a.out
0xb0e5000

c/linux - ftruncate and POSIX Shared Memory Segments

The end goal here is that I'd like to be able to extend the size of a shared memory segment and notify processes to remap the segment after the extension. However it seems that calling ftruncate a second time on a shared memory fd fails with EINVAL. The only other question I could find about this has no answer: ftruncate failed at the second time
The manpages for ftruncate and shm_open make no mention of disallowing the expansion of shared memory segments after creation, in fact they seem to indicate that they can be resized via ftruncate but so far my testing has shown otherwise. The only solution I can think of would be to destroy the shared memory segment and recreate it at a larger size, however this would require all processes that have mmap'd the segment to unmap it before the object will be destroyed and available for recreation.
Any thoughts? Thanks!
EDIT: As requested as simple example
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/types.h>
int main(int argc, char *argv[]){
const char * name = "testfile";
size_t sz = 4096; // page size on my sys
int fd;
if((fd = shm_open(name, O_CREAT | O_RDWR, 0666)) == -1){
perror("shm_open");
exit(1);
}
ftruncate(fd, sz);
perror("First truncate");
ftruncate(fd, 2*sz);
perror("second truncate");
shm_unlink(name);
return 0;
}
Output:
First truncate: Undefined error: 0
second truncate: Invalid argument
EDIT - Answer: Appears that this is an issue with OSX implementation of the POSIX standard, the above snippet works on a 3.13.0-53-generic GNU/Linux kernel and likely others I'd guess.
With respect to your end goal, here's an open source library I wrote that seems to be a match: rszshm - resizable pointer-safe shared memory.

Huge pages on Mac OS X

The Mac OS X mmap man page says that it is possible to allocate superpages and I gather it is the same thing as Linux huge pages.
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/mmap.2.html
However the following simple test fails on Mac OS X (Yosemite 10.10.5):
#include <stdio.h>
#include <sys/mman.h>
#include <mach/vm_statistics.h>
int
main()
{
void *p = mmap((void *) 0x000200000000, 8 * 1024 * 1024,
PROT_READ | PROT_WRITE,
MAP_ANON | MAP_FIXED | MAP_PRIVATE,
VM_FLAGS_SUPERPAGE_SIZE_2MB, 0);
printf("%p\n", p);
if (p == MAP_FAILED)
perror(NULL);
return 0;
}
The output is:
0xffffffffffffffff
Cannot allocate memory
The result is the same with MAP_FIXED removed from the flags and NULL supplied as the address argument. Replacing VM_FLAGS_SUPERPAGE_SIZE_2MB with -1 results in the expected result, that is no error occurs, but obviously the allocated memory space uses regular 4k pages then.
What might be a problem with allocating superpages this way?
This minor modification to the posted example works for me on Mac OS 10.10.5:
#include <stdio.h>
#include <sys/mman.h>
#include <mach/vm_statistics.h>
int
main()
{
void *p = mmap(NULL,
8*1024*1024,
PROT_READ | PROT_WRITE,
MAP_ANON | MAP_PRIVATE,
VM_FLAGS_SUPERPAGE_SIZE_2MB, // mach flags in fd argument
0);
printf("%p\n", p);
if (p == MAP_FAILED)
perror(NULL);
return 0;
}

Mmap a block device on Mac OS X?

I want to access an encrypted core storage volume in my program.
My plan is to mmap the decrypting block device to be able to jump around in the file system structures with ease and without having to deal with the crypto myself.
While mapping a big file works like a charm, I am getting an EINVAL error on the mmap syscall in the following code:
#include <stddef.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
int fd = open("/dev/disk2", O_RDONLY); // this works fine
if (fd < 0)
{
perror("Could not open file"); return(-1);
}
int pagesize = getpagesize(); // page size is 4096 on my system
void* address = mmap(NULL, pagesize, PROT_READ, MAP_SHARED | MAP_FILE, fd, 0); // try to map first page
if (address == MAP_FAILED)
{
perror("Could not mmap"); // Error complaining about an invalid argument
}
}
The device has a size of 112 GB and I am compiling with clang mmap.c -O0 -o mmap on Mavericks 10.9.3 for x86_64. My system has 4 GB of RAM and > 10 GB of free hard disk space.
The man-Page mmap(2) only lists the following explanations for an EINVAL error, but these do not seem to apply:
MAP_FIXED was specified and the addr argument was not page aligned, or part of the desired address space resides out of the valid address space for a user process.
flags does not include either MAP_PRIVATE or MAP_SHARED.
The len argument was negative.
The offset argument was not page-aligned based on the page size as returned by getpagesize(3).
[...]
The flags parameter must specify either MAP_PRIVATE or MAP_SHARED.
The size parameter must not be 0.
The off parameter must be a multiple of pagesize, as returned by sysconf().
While I have not figured out all the nitty gritty details of the implementation, the comments on this XNU kernel source file explicitly mention being able to map a block device (as long as it's shared): https://www.opensource.apple.com/source/xnu/xnu-2422.1.72/bsd/kern/kern_mman.c
What am I missing?
Your problem might be using MAP_FILE since this indicates a regular file rather than a block device.
If that doesn't work, trying calling fstat() after you open the file and check the file's length. When I use stat -x to get the information about the block devices on my system, the file sizes are reported as zero (ls -l reported the sizes as "1,"). A zero-length file might also prevent you from being able to create a mapping.

Behaviour of PROT_READ and PROT_WRITE with mprotect

I've been trying to use mprotect against reading first, and then writing.
Is here my code
#include <sys/types.h>
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(void)
{
int pagesize = sysconf(_SC_PAGE_SIZE);
int *a;
if (posix_memalign((void**)&a, pagesize, sizeof(int)) != 0)
perror("memalign");
*a = 42;
if (mprotect(a, pagesize, PROT_WRITE) == -1) /* Resp. PROT_READ */
perror("mprotect");
printf("a = %d\n", *a);
*a = 24;
printf("a = %d\n", *a);
free (a);
return 0;
}
Under Linux here are the results:
Here is the output for PROT_WRITE:
$ ./main
a = 42
a = 24
and for PROT_READ
$ ./main
a = 42
Segmentation fault
Under Mac OS X 10.7:
Here is the output for PROT_WRITE:
$ ./main
a = 42
a = 24
and for PROT_READ
$ ./main
[1] 2878 bus error ./main
So far, I understand that OSX / Linux behavior might be different, but I don't understand why PROT_WRITE does not crash the program when reading the value with printf.
Can someone explain this part?
There are two things that you are observing:
mprotect was not designed to be used with heap pages. Linux and OS X have slightly different handling of the heap (remember that OS X uses the Mach VM). OS X does not like it's heap pages to be tampered with.
You can get identical behaviour on both OSes if you allocate your page via mmap
a = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
if (a == MAP_FAILED)
perror("mmap");
This is a restriction of your MMU (x86 in my case). The MMU in x86 does not support writable, but not readable pages. Thus setting
mprotect(a, pagesize, PROT_WRITE)
does nothing. while
mprotect(a, pagesize, PROT_READ)
removed write priveledges and you get a SIGSEGV as expected.
Also although it doesn't seem to be an issue here, you should either compile your code with -O0 or set a to volatile int * to avoid any compiler optimisations.
Most operating systems and/or cpu architectures automatically make something readable when it writeable, so PROT_WRITE most often implies PROT_READ as well. It's simply not possible to make something writeable without making it readable. The reasons can be speculated on, either it's not worth the effort to make an additional readability bit in the MMU and caches, or as it was on some earlier architectures, you actually need to read through the MMU into a cache before you can write, so making something unreadable automatically makes it unwriteable.
Also, it's likely that printf tries to allocate from memory that you damaged with mprotect. You want to allocate a full page from libc when you're changing its protection, otherwise you'll be changing the protection of a page that you don't own fully and libc doesn't expect it to be protected. On your MacOS test with PROT_READ this is what happens. printf allocates some internal structures, tries to access them and crashes when they are read only.

Resources