Mmap a block device on Mac OS X? - c

I want to access an encrypted core storage volume in my program.
My plan is to mmap the decrypting block device to be able to jump around in the file system structures with ease and without having to deal with the crypto myself.
While mapping a big file works like a charm, I am getting an EINVAL error on the mmap syscall in the following code:
#include <stddef.h>
#include <errno.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>
int main(int argc, char *argv[])
{
int fd = open("/dev/disk2", O_RDONLY); // this works fine
if (fd < 0)
{
perror("Could not open file"); return(-1);
}
int pagesize = getpagesize(); // page size is 4096 on my system
void* address = mmap(NULL, pagesize, PROT_READ, MAP_SHARED | MAP_FILE, fd, 0); // try to map first page
if (address == MAP_FAILED)
{
perror("Could not mmap"); // Error complaining about an invalid argument
}
}
The device has a size of 112 GB and I am compiling with clang mmap.c -O0 -o mmap on Mavericks 10.9.3 for x86_64. My system has 4 GB of RAM and > 10 GB of free hard disk space.
The man-Page mmap(2) only lists the following explanations for an EINVAL error, but these do not seem to apply:
MAP_FIXED was specified and the addr argument was not page aligned, or part of the desired address space resides out of the valid address space for a user process.
flags does not include either MAP_PRIVATE or MAP_SHARED.
The len argument was negative.
The offset argument was not page-aligned based on the page size as returned by getpagesize(3).
[...]
The flags parameter must specify either MAP_PRIVATE or MAP_SHARED.
The size parameter must not be 0.
The off parameter must be a multiple of pagesize, as returned by sysconf().
While I have not figured out all the nitty gritty details of the implementation, the comments on this XNU kernel source file explicitly mention being able to map a block device (as long as it's shared): https://www.opensource.apple.com/source/xnu/xnu-2422.1.72/bsd/kern/kern_mman.c
What am I missing?

Your problem might be using MAP_FILE since this indicates a regular file rather than a block device.
If that doesn't work, trying calling fstat() after you open the file and check the file's length. When I use stat -x to get the information about the block devices on my system, the file sizes are reported as zero (ls -l reported the sizes as "1,"). A zero-length file might also prevent you from being able to create a mapping.

Related

Cannot create anonymous mapping with MAP_32BIT on MacOS

I'm on a 64-bit system, but want to use mmap to allocate pages within the first 2GB of memory. On Linux, I can do this with the MAP_32BIT flag:
#include <sys/mman.h>
#include <stdio.h>
int main() {
void *addr = mmap(
NULL, // address hint
4096, // size
PROT_READ | PROT_WRITE, // permissions
MAP_32BIT | MAP_PRIVATE | MAP_ANONYMOUS, // flags
-1, // file descriptor
0 // offset
);
if (addr == MAP_FAILED)
perror("mmap");
else
printf("%p", addr);
}
Godbolt link demonstrating that this works on Linux. As of version 10.15, MacOS also allegedly supports the MAP_32BIT flag. However, when I compile and run the program on my system (11.3), it fails with ENOMEM. The mapping does work when MAP_32BIT is removed.
I have a few potential explanations for why this doesn't work, but none of them are very compelling:
The permissions are wrong somehow (although removing either PROT_READ or PROT_WRITE didn't solve it).
I need to specify an address hint for this to work, for some reason.
MacOS (or my version of it) simply doesn't support MAP_32BIT for anonymous mappings.
The problem is the "zero page": on some 32-bit Unixes, the lowest page of memory is commonly configured to be inaccessible so that accesses to NULL can be detected and signal an error. On 64-bit systems, MacOS extends this to the entire first 4 GiB of memory by default. mmap therefore refuses to map addresses in this region, since they are already mapped to page zero.
This can be simply changed using a linker option:
$ cc -Wl,-pagezero_size,0x1000 test.c
$ ./a.out
0xb0e5000

allocating address zero on Linux with mmap fails

I am writing a static program loader for Linux, I am reading ELF program headers and mapping the segments to the memory.
I have come across an executable which assumes that the virtual address of its first segment is at 0. My memory mapping fails, I get error allocating virtual page at address 0.
I wonder if it is possible to allocate at all memory at address 0 for the user-space.
See this example code:
/*mmaptests.c*/
#include <sys/mman.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
int main()
{
void* p = mmap(0, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
printf("mmap result %p (errno %s)\n",p,strerror(errno));
return 0;
}
I compile it with:
gcc mmaptests.c
This is what it returns :
$./a.out
mmap result 0xffffffffffffffff (errno Operation not permitted)
I will be happy for any insights.
Thanks
B
Linux will only let you mmap the 0-th page if you have privileges.
gcc mmaptests.c && sudo ./a.out
should get you:
mmap result (nil) (errno Success)

Shared memory ignores read only flag in Linux c

I'm using shared memory with shmget and shmat for educational purpose.
I'm trying to make a memory chunk to be mutable only by it's creator and all other processes can read only.
But the reader processes can somehow write without any error.
This is my code for the creator of the shared memory:
#include <sys/shm.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
int main(){
int shmid = shmget((key_t)56666, 1, IPC_CREAT | O_RDONLY);
if (shmid ==-1) {
perror("Err0:");
exit(EXIT_FAILURE);
}
void* shmaddr = shmat(shmid, (void *)0,0);
if (shmaddr == (void *)-1) {
perror("Err:");
exit(EXIT_FAILURE);
}
*(char*)shmaddr = 'a';
putchar(*(char*)shmaddr);
while(1);
return 0;
}
And this is my code for the reader:
#include <sys/shm.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
int main(){
int shmid = shmget((key_t)56666, 4, O_RDONLY);
if (shmid ==-1) {
perror("Err0:");
exit(EXIT_FAILURE);
}
void* shmaddr = shmat(shmid, (void *)0,0);
if (shmaddr == (void *)-1) {
perror("Err:");
exit(EXIT_FAILURE);
}
*(char*)shmaddr = 'b';
putchar(*(char*)shmaddr);
return 0;
}
As you can see the reader can edit the memory but no error occures even though I open the memory as read only in the reader and created it with read only flag in the creator of the shared memory.
I have not seen any of O_RDONLY or SHM_RDONLY documented as flags for the shmat(2) system call in the linux or freebsd manual pages. Probably the problem is misuse or a misunderstanding on how it works. More on this at the end, as after trying I see that SHM_RDONLY is the flag you should use to control read only attachment, instead of O_RDONLY (which is of no use here)
Probably you have to specify permission bits in the creation shmget(2) system call to disable access for other user's processes, to implement what you want. With permissions, it does work, or you'd have serious security problems with systems that use shared memory (e.g. postgresql database uses sysvipc shared memory segments)
To my knowledge, the best way to implement is to run the writer of the shared memory segment as some user, and the processes allowed to read it as different users, adjusting the permission bits to allow them to read but not to write on the shared memory segment. Something like having all the processes in the same group id, with the writer process as the user who creates the shared memory segment and the others having only read access, with no permissions to other user ids, would be enough for any application.
shmget((key_t)56666, 1, IPC_CREAT | 0640);
and running the other processes as other different user in the same group id.
EDIT
after testing your code in a freebsd machine (sorry, no linux available, but ipc calls are SysV AT&T unix calls, so everything should be compatible) the creation process stops on error on shmat(2) call with the following message:
$ shm_creator
Err:: Permission denied
most probably because you didn't give permissions on shared memory creation, even to the owner (and I try to imagine you are not developing as root in your machine, are you? ;) )
ipcs(1) shows:
usr1#host ~$ ipcs -m
Shared Memory:
T ID KEY MODE OWNER GROUP
m 65537 56666 ----------- usr1 usr1
and you see there are no permission bits active for the shared memory segment, but it has been created. I have modified your program to, instead of doing busywait in a while(1); loop, doing a non consuming cpu wait with sleep(3600); that will put it to sleep for a whole hour.
shm_creator.c
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
int main(){
int shmid = shmget((key_t)56666, 1, IPC_CREAT | 0640 );
if (shmid ==-1) {
perror("Err0:");
exit(EXIT_FAILURE);
}
void* shmaddr = shmat(shmid, (void *)0,0);
if (shmaddr == (void *)-1) {
perror("Err:");
exit(EXIT_FAILURE);
}
*(char*)shmaddr = 'a';
putchar(*(char*)shmaddr);
puts("");
sleep(3600);
return 0;
}
which I run as user usr1:
usr1#host:~/shm$ shm_creator &
[2] 76950
a
then I switch to another user usr2, and run:
$ su usr2
Password:
[usr2#host /home/usr1/shm]$ shm_client &
[1] 76963
[usr2#host /home/usr1/shm]$ Err:: Permission denied
and as you labeled it, it happens in the shmat(2) system call. But if I run it as usr1 i get:
usr1#host:~/shm$ shm_client
b
if using SHM_RDONLY as flag in the shm_client.c source file, on running (either as same or different user) I get the following:
usr1#host:~/shm$ shm_client
Segmentation fault (generated `core')
which is expected behaviour, as you tried to write unwritable memory (it was attached as read only memory)
EDIT 2
After browsing online the linux manual pages, there's a reference to SHM_RDONLY to allow to attach a shared memory segment as readonly. No support is offered for write only shared memory segments, otherwise. As it is not documented on freebsd, this option is also available there (the constant is included in the proper include files) and some other imprecisions are found in the freebsd manual (as the use of S_IROWN, S_IWOWN, S_IRGRP, S_IWGRP, S_IROTH and S_IWOTH flags to control the permission bits and no inclusion of #include <sys/stat.h> in the SYNOPSIS of the manual page)
CONCLUSSION
If the SHM_RDONLY is available in your system, then you can use it as a non-preemptive way to disallow write access to you shared memory, but if you want kernel enforced way, you have to switch to the user permission bits approach.

c/linux - ftruncate and POSIX Shared Memory Segments

The end goal here is that I'd like to be able to extend the size of a shared memory segment and notify processes to remap the segment after the extension. However it seems that calling ftruncate a second time on a shared memory fd fails with EINVAL. The only other question I could find about this has no answer: ftruncate failed at the second time
The manpages for ftruncate and shm_open make no mention of disallowing the expansion of shared memory segments after creation, in fact they seem to indicate that they can be resized via ftruncate but so far my testing has shown otherwise. The only solution I can think of would be to destroy the shared memory segment and recreate it at a larger size, however this would require all processes that have mmap'd the segment to unmap it before the object will be destroyed and available for recreation.
Any thoughts? Thanks!
EDIT: As requested as simple example
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/types.h>
int main(int argc, char *argv[]){
const char * name = "testfile";
size_t sz = 4096; // page size on my sys
int fd;
if((fd = shm_open(name, O_CREAT | O_RDWR, 0666)) == -1){
perror("shm_open");
exit(1);
}
ftruncate(fd, sz);
perror("First truncate");
ftruncate(fd, 2*sz);
perror("second truncate");
shm_unlink(name);
return 0;
}
Output:
First truncate: Undefined error: 0
second truncate: Invalid argument
EDIT - Answer: Appears that this is an issue with OSX implementation of the POSIX standard, the above snippet works on a 3.13.0-53-generic GNU/Linux kernel and likely others I'd guess.
With respect to your end goal, here's an open source library I wrote that seems to be a match: rszshm - resizable pointer-safe shared memory.

Behaviour of PROT_READ and PROT_WRITE with mprotect

I've been trying to use mprotect against reading first, and then writing.
Is here my code
#include <sys/types.h>
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(void)
{
int pagesize = sysconf(_SC_PAGE_SIZE);
int *a;
if (posix_memalign((void**)&a, pagesize, sizeof(int)) != 0)
perror("memalign");
*a = 42;
if (mprotect(a, pagesize, PROT_WRITE) == -1) /* Resp. PROT_READ */
perror("mprotect");
printf("a = %d\n", *a);
*a = 24;
printf("a = %d\n", *a);
free (a);
return 0;
}
Under Linux here are the results:
Here is the output for PROT_WRITE:
$ ./main
a = 42
a = 24
and for PROT_READ
$ ./main
a = 42
Segmentation fault
Under Mac OS X 10.7:
Here is the output for PROT_WRITE:
$ ./main
a = 42
a = 24
and for PROT_READ
$ ./main
[1] 2878 bus error ./main
So far, I understand that OSX / Linux behavior might be different, but I don't understand why PROT_WRITE does not crash the program when reading the value with printf.
Can someone explain this part?
There are two things that you are observing:
mprotect was not designed to be used with heap pages. Linux and OS X have slightly different handling of the heap (remember that OS X uses the Mach VM). OS X does not like it's heap pages to be tampered with.
You can get identical behaviour on both OSes if you allocate your page via mmap
a = mmap(NULL, pagesize, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
if (a == MAP_FAILED)
perror("mmap");
This is a restriction of your MMU (x86 in my case). The MMU in x86 does not support writable, but not readable pages. Thus setting
mprotect(a, pagesize, PROT_WRITE)
does nothing. while
mprotect(a, pagesize, PROT_READ)
removed write priveledges and you get a SIGSEGV as expected.
Also although it doesn't seem to be an issue here, you should either compile your code with -O0 or set a to volatile int * to avoid any compiler optimisations.
Most operating systems and/or cpu architectures automatically make something readable when it writeable, so PROT_WRITE most often implies PROT_READ as well. It's simply not possible to make something writeable without making it readable. The reasons can be speculated on, either it's not worth the effort to make an additional readability bit in the MMU and caches, or as it was on some earlier architectures, you actually need to read through the MMU into a cache before you can write, so making something unreadable automatically makes it unwriteable.
Also, it's likely that printf tries to allocate from memory that you damaged with mprotect. You want to allocate a full page from libc when you're changing its protection, otherwise you'll be changing the protection of a page that you don't own fully and libc doesn't expect it to be protected. On your MacOS test with PROT_READ this is what happens. printf allocates some internal structures, tries to access them and crashes when they are read only.

Resources