Huge pages on Mac OS X - c

The Mac OS X mmap man page says that it is possible to allocate superpages and I gather it is the same thing as Linux huge pages.
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/mmap.2.html
However the following simple test fails on Mac OS X (Yosemite 10.10.5):
#include <stdio.h>
#include <sys/mman.h>
#include <mach/vm_statistics.h>
int
main()
{
void *p = mmap((void *) 0x000200000000, 8 * 1024 * 1024,
PROT_READ | PROT_WRITE,
MAP_ANON | MAP_FIXED | MAP_PRIVATE,
VM_FLAGS_SUPERPAGE_SIZE_2MB, 0);
printf("%p\n", p);
if (p == MAP_FAILED)
perror(NULL);
return 0;
}
The output is:
0xffffffffffffffff
Cannot allocate memory
The result is the same with MAP_FIXED removed from the flags and NULL supplied as the address argument. Replacing VM_FLAGS_SUPERPAGE_SIZE_2MB with -1 results in the expected result, that is no error occurs, but obviously the allocated memory space uses regular 4k pages then.
What might be a problem with allocating superpages this way?

This minor modification to the posted example works for me on Mac OS 10.10.5:
#include <stdio.h>
#include <sys/mman.h>
#include <mach/vm_statistics.h>
int
main()
{
void *p = mmap(NULL,
8*1024*1024,
PROT_READ | PROT_WRITE,
MAP_ANON | MAP_PRIVATE,
VM_FLAGS_SUPERPAGE_SIZE_2MB, // mach flags in fd argument
0);
printf("%p\n", p);
if (p == MAP_FAILED)
perror(NULL);
return 0;
}

Related

allocating address zero on Linux with mmap fails

I am writing a static program loader for Linux, I am reading ELF program headers and mapping the segments to the memory.
I have come across an executable which assumes that the virtual address of its first segment is at 0. My memory mapping fails, I get error allocating virtual page at address 0.
I wonder if it is possible to allocate at all memory at address 0 for the user-space.
See this example code:
/*mmaptests.c*/
#include <sys/mman.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
int main()
{
void* p = mmap(0, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
printf("mmap result %p (errno %s)\n",p,strerror(errno));
return 0;
}
I compile it with:
gcc mmaptests.c
This is what it returns :
$./a.out
mmap result 0xffffffffffffffff (errno Operation not permitted)
I will be happy for any insights.
Thanks
B
Linux will only let you mmap the 0-th page if you have privileges.
gcc mmaptests.c && sudo ./a.out
should get you:
mmap result (nil) (errno Success)

can /proc/self/exe be mmap'ed?

Can a process read /proc/self/exe using mmap? This program fails to mmap the file:
$ cat e.c
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
int main()
{
int f=open("/proc/self/exe",O_RDONLY);
char*p=mmap(NULL,0,PROT_READ,0,f,0);
return 0;
}
$ cc e.c -o e
$ strace ./e
[snip]
open("/proc/self/exe", O_RDONLY) = 3
mmap(NULL, 0, PROT_READ, MAP_FILE, 3, 0) = -1 EINVAL (Invalid argument)
exit_group(0) = ?
+++ exited with 0 +++
You are making 2 mistakes here:
Mapped size must be > 0. Zero-size mappings are invalid.
You have to specify, if you want to create a shared (MAP_SHARED) or a private (MAP_PRIVATE) mapping.
The following should work for example:
char *p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, f, 0);
If you wish to map the full executable, you will have to do a stat() on it first, to retrieve the correct file size and then use that as the second parameter to mmap().

Permission error when using mmap to modify a file

Created a simple mmap program that modifies a byte file.
Run it as root on a simple/small file, got error
# ./a.out tmp.txt 92
fd=3
mmap: Permission denied
Code snippet
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
int main(int argc, char *argv[]) {
int fd = open(argv[1], O_WRONLY);
printf("fd=%d\n", fd);
char *p = mmap(0, 0x1000, PROT_WRITE, MAP_SHARED, fd, 0);
if (p == MAP_FAILED) {
perror ("mmap");
return 1;
}
p[0] = 0xde;
close(fd);
return 0;
}
Wonder what went wrong. Thanks.
UPDATE1
Fixed a typo in the code snippet, I meant to use PROT_WRITE there.
from the man page for mmap:
EACCES A file descriptor refers to a non-regular file. Or a file mapping was requested, but fd is not open for reading. Or
MAP_SHARED was requested and PROT_WRITE is set, but fd is not
open in read/write (O_RDWR) mode. Or PROT_WRITE is set, but the
file is append-only.
So in order to map a file MAP_SHARED, you need to open it in read/write mode, not writeonly. Makes sense, as the contents of the file needs to be read to initialize parts of the memory you don't write.
In addition, IA-32 does not allow write-only mappings of pages, so mapping with PROT_WRITE on such a machine will implictly also enable PROT_READ, so will fail for a file descriptor that isn't readable.

How to mmap in Linux a file under Windows' share (LINUX)?

I am mounting a Windows share in Linux with -o uid=1000,gid=1000 so no permission problems should appear. I made sure the permissions are set correctly in windows.
I can create, edit, as well as delete directories and files.
However, I cannot mmap a file on the share (on regular mount point it works).
I also cannot fsync directories but this is understandable.
How to mmap the share?
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
int main()
{
const char * file = "/home/lvm/Sources/SharedVM/blabla";
int fd = open(file, O_RDWR | O_CREAT | O_SYNC, S_IWUSR | S_IRUSR);
printf("%d\n", fd);
int frc = posix_fallocate(fd, 0, 1024L);
printf("fallocate rc=%d\n", frc);
void * result = mmap(0, 1024L, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
printf("errno=%d\n", errno);
printf("addr = %p\n", result);
printf("res = %p", result); // => 0xffffffffffffffff when windows share, or valid adddress on linux regular mount point
return 42;
}
Result :
3
fallocate rc=0
errno=22
addr = 0xffffffffffffffff
While if changing the file to "/tmp/blabla" then we get:
3
fallocate rc=0
errno=0
addr = 0x7f9e2de7c000
Well, the answer is the file system does not support fallocate.
The share was cifs. That's why /tmp file was able to be fallocated.
Sharing ntfs directory works. cifs doesn't.

Behaviour of PROT_READ and PROT_WRITE with mprotect

I've been trying to use mprotect against reading first, and then writing.
Is here my code
#include <sys/types.h>
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(void)
{
int pagesize = sysconf(_SC_PAGE_SIZE);
int *a;
if (posix_memalign((void**)&a, pagesize, sizeof(int)) != 0)
perror("memalign");
*a = 42;
if (mprotect(a, pagesize, PROT_WRITE) == -1) /* Resp. PROT_READ */
perror("mprotect");
printf("a = %d\n", *a);
*a = 24;
printf("a = %d\n", *a);
free (a);
return 0;
}
Under Linux here are the results:
Here is the output for PROT_WRITE:
$ ./main
a = 42
a = 24
and for PROT_READ
$ ./main
a = 42
Segmentation fault
Under Mac OS X 10.7:
Here is the output for PROT_WRITE:
$ ./main
a = 42
a = 24
and for PROT_READ
$ ./main
[1] 2878 bus error ./main
So far, I understand that OSX / Linux behavior might be different, but I don't understand why PROT_WRITE does not crash the program when reading the value with printf.
Can someone explain this part?
There are two things that you are observing:
mprotect was not designed to be used with heap pages. Linux and OS X have slightly different handling of the heap (remember that OS X uses the Mach VM). OS X does not like it's heap pages to be tampered with.
You can get identical behaviour on both OSes if you allocate your page via mmap
a = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
if (a == MAP_FAILED)
perror("mmap");
This is a restriction of your MMU (x86 in my case). The MMU in x86 does not support writable, but not readable pages. Thus setting
mprotect(a, pagesize, PROT_WRITE)
does nothing. while
mprotect(a, pagesize, PROT_READ)
removed write priveledges and you get a SIGSEGV as expected.
Also although it doesn't seem to be an issue here, you should either compile your code with -O0 or set a to volatile int * to avoid any compiler optimisations.
Most operating systems and/or cpu architectures automatically make something readable when it writeable, so PROT_WRITE most often implies PROT_READ as well. It's simply not possible to make something writeable without making it readable. The reasons can be speculated on, either it's not worth the effort to make an additional readability bit in the MMU and caches, or as it was on some earlier architectures, you actually need to read through the MMU into a cache before you can write, so making something unreadable automatically makes it unwriteable.
Also, it's likely that printf tries to allocate from memory that you damaged with mprotect. You want to allocate a full page from libc when you're changing its protection, otherwise you'll be changing the protection of a page that you don't own fully and libc doesn't expect it to be protected. On your MacOS test with PROT_READ this is what happens. printf allocates some internal structures, tries to access them and crashes when they are read only.

Resources