When two processes share a segment of memory opened with shm_open and then it gets mmap-ed, does doing an mprotect on a portion of the shared memory in one process affects the permissions seen by the other process on this same portion? In other words, if one process makes part of the shared memory segment read-only, does it become read-only for the other process too?
I always like to address those questions in two parts.
Part 1 - Let's test it
Let's consider an example that is relatively similar to the one at shm_open(3).
Shared header - shared.h
#include <sys/mman.h>
#include <fcntl.h>
#include <semaphore.h>
#include <sys/stat.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
struct shmbuf {
char buf[4096];
sem_t sem;
};
Creator process - creator.c (Compile using gcc creator.c -o creator -lrt -lpthread)
#include <ctype.h>
#include <string.h>
#include "shared.h"
int
main(int argc, char *argv[])
{
if (argc != 2) {
fprintf(stderr, "Usage: %s /shm-path\n", argv[0]);
exit(EXIT_FAILURE);
}
char *shmpath = argv[1];
int fd = shm_open(shmpath, O_CREAT | O_EXCL | O_RDWR,
S_IRUSR | S_IWUSR);
if (fd == -1)
errExit("shm_open");
struct shmbuf *shm;
if (ftruncate(fd, sizeof(*shm)) == -1)
errExit("ftruncate");
shm = mmap(NULL, sizeof(*shm), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (shm == MAP_FAILED)
errExit("mmap");
if (sem_init(&shm->sem, 1, 0) == -1)
errExit("sem_init");
if (mprotect(&shm->buf, sizeof(shm->buf), PROT_READ) == -1)
errExit("mprotect");
if (sem_wait(&shm->sem) == -1)
errExit("sem_wait");
printf("got: %s\n", shm->buf);
shm_unlink(shmpath);
exit(EXIT_SUCCESS);
}
Writer process - writer.c (Compile using gcc writer.c -o writer -lrt -lpthread)
#include <string.h>
#include "shared.h"
int
main(int argc, char *argv[])
{
if (argc != 3) {
fprintf(stderr, "Usage: %s /shm-path string\n", argv[0]);
exit(EXIT_FAILURE);
}
char *shmpath = argv[1];
char *string = argv[2];
int fd = shm_open(shmpath, O_RDWR, 0);
if (fd == -1)
errExit("shm_open");
struct shmbuf *shm = mmap(NULL, sizeof(*shm), PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
if (shm == MAP_FAILED)
errExit("mmap");
strcpy(&shm->buf[0], string);
if (sem_post(&shm->sem) == -1)
errExit("sem_post");
exit(EXIT_SUCCESS);
}
What's supposed to happen?
The creator process, creates a new shared memory object, "sets" its size, maps it into memory (as shm), uses mprotect to allow only writes to the buffer (shm->buf) and waits for a semaphore to know when the writer (which we will discuss in a moment finishes its thing).
The writer process starts, opens the same shared memory object, writes into it whatever we tell it to and signals the semaphore.
Question is, will the writer be able to write to the shared memory object even though the creator changed the protection to READ-ONLY?
Let's find out. We can run it using:
# ./creator.c /shmulik &
# ./writer.c /shmulik hi!
got: hi!
#
[1]+ Done ./creator /shmulik
As you can see, the writer was able to write to the shared memory, even though the creator set it's protection to READ-ONLY.
Maybe the creator does something wrong? Let's try to add the following line to creator.c:
if (mprotect(&shm->buf, sizeof(shm->buf), PROT_READ) == -1)
errExit("mprotect");
memset(&shm->buf, 0, sizeof(shm->buf)); // <-- This is the new line
if (sem_wait(&shm->sem) == -1)
errExit("sem_wait");
Let's recompile & run the creator again:
# gcc creator.c -o creator -lrt -lpthread
# ./creator /shmulik
Segmentation fault
As you can see, the mprotect worked as expected.
How about we let the writer map the shared memory, then we change the protection? Well, it ain't going to change anything. mprotect ONLY affects the memory protection of the process calling it (and it's descendants).
Part 2 - Let's understand it
First, you have to understand that shm_open is a glibc method, it's not a systemcall.
You can get the glibc source code from their website and just look for shm_open to see that yourself.
The underlying implementation of shm_open is a regular call for open, just like the man page suggests.
As we already saw, most of the magic happens in mmap. When calling mmap, we have to use MAP_SHARED (rather than MAP_PRIVATE), otherwise every process is going to get a private memory segment to begin with, and obviously one ain't gonna affect the other.
When we call mmap, the hops are roughly:
ksys_mmap_pgoff.
vm_mmap_pgoff.
do_mmap
mmap_region
At that last point, you could see that we take the process' memory management context mm and allocate a new virtual memory area vma:
struct mm_struct *mm = current->mm;
...
vma = vm_area_alloc(mm);
...
vma->vm_page_prot = vm_get_page_prot(vm_flags);
This memory area is not shared with other processes.
Since mprotect changes only the vm_page_prot on the per-process vma, it doesn't affect other processes that map the same memory space.
The POSIX specification for mprotect() suggests that changes in the protection of shared memory should affect all processes using that shared memory.
Two of the error conditions detailed are:
[EAGAIN]
The prot argument specifies PROT_WRITE over a MAP_PRIVATE mapping and there are insufficient memory resources to reserve for locking the private page.
[ENOMEM]
The prot argument specifies PROT_WRITE on a MAP_PRIVATE mapping, and it would require more space than the system is able to supply for locking the private pages, if required.
These strongly suggest that memory that is mapped with MAP_SHARED should not fail because of a lack of memory for making copies.
See also the POSIX specification for mmap().
Related
Inspired by this example for Windows. In short, they create a file handle (with CreateFileMapping) then create 2 different pointers to the same memory (MapViewOfFileEx or MapViewOfFile3)
So I tried to do the same thing with shm_open, ftruncate and mmap. I used mmap a few times in the past for memory and files but I never mixed it with shm_open or used shm_open.
My code fails on the second mmap with a segfault. I tried doing a syscall directly on both mmaps and it still segfaults :( How do I do this properly? The idea is I can do memcpy(p+len-10, src, 20) and have the first 10bytes of src be at the end of the memory and last 10 written to the start (hence circular)
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <assert.h>
int main()
{
write(2, "Start\n", 6); //prints this
int len = 1024*1024*2;
int fd = shm_open("example", O_RDWR | O_CREAT, 0777);
assert(fd > 0); //ok
int r1 = ftruncate(fd, len);
assert(r1 == 0); //ok
char*p = (char*)mmap(0, len, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
assert((long long)p>0); //ok
//Segfaults on next line
char*p2 = (char*)mmap(p+len, len, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED, fd, 0); //segfaults
write(2, "Finish\n", 7); //doesn't print this
return 0;
}
Linux usually selects address space for mappings starting from a certain point and goes lower with each reservation. So your 2nd mmap call replaces one of previous file mappings (likely libc.so), which leads to SIGSEGV with SEGV_ACCERR - invalid access permissions. You are overwriting executable section of libc.so (that is being executed right now) with non-executable data.
Use strace to check what is going on inside:
$ strace ./a.out
...
openat(AT_FDCWD, "/dev/shm/example", O_RDWR|O_CREAT|O_NOFOLLOW|O_CLOEXEC, 0777) = 3
ftruncate(3, 2097152) = 0
mmap(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x7f134c1bf000
mmap(0x7f134c3bf000, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0x7f134c3bf000
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7f134c4ccc37} ---
+++ killed by SIGSEGV +++
Compare addresses you are passing around with /proc/$pid/maps file and you will see what you are overwriting.
Your mistake was to assume MAP_FIXED can be used without reserving memory beforehand. To do this properly you need to:
Reserve memory by calling mmap with len * 2 size, PROT_NONE and MAP_ANONYMOUS | MAP_PRIVATE (and without file)
Use mmap with MAP_FIXED to overwrite portions of that mapping with the content you need
Additionally, you should prefer using memfd_create instead of shm_open on Linux to avoid shared memory files from staying around. Unlinking them with shm_unlink doesn't help if your program crashes. This also gives you a file that is private to your program instance.
You do not need to call mmap again to gen new pointer. (You even must not to do it.) Just increment it.
The pointer p2 will not point to the address just after the memory block allocated.
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <assert.h>
int main()
{
write(2, "Start\n", 6); //prints this
int len = 1024*1024*2;
int fd = shm_open("example", O_RDWR | O_CREAT, 0777);
assert(fd > 0); //ok
int r1 = ftruncate(fd, len);
assert(r1 == 0); //ok
char*p = (char*)mmap(0, len, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
assert((long long)p>0); //ok
char*p2 = p+len;
write(2, "Finish\n", 7); //doesn't print this
return 0;
}
I want to write a semaphore to shared memory. My first idea was to pass the pointer returned by mmap to sem_init():
#include <stdio.h>
#include <semaphore.h>
#include <string.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(void)
{
sem_t *sem_ptr;
int shm_fd = shm_open("Shm", O_CREAT | O_RDWR, DEFFILEMODE);
fprintf(stderr, "%s\n", strerror(errno));
sem_ptr = mmap(NULL, sizeof(sem_t), PROT_WRITE, MAP_SHARED, shm_fd, 0);
fprintf(stderr, "%p\n", strerror(errno));
sem_init(sem_ptr, 1, 1);
fprintf(stderr, "%s\n", strerror(errno));
sem_destroy(sem_ptr);
return 0;
}
But it leads to this error(when sem_init() is called): Process finished with exit code 135 (interrupted by signal 7: SIGEMT)
Then I tried to initialize the semaphore with a sem_t variable and write it to the shared memory:
int main(void)
{
sem_t *sem_ptr;
sem_t s;
int shm_fd = shm_open("Shm", O_CREAT | O_RDWR, DEFFILEMODE);
fprintf(stderr, "%s\n", strerror(errno));
sem_ptr = mmap(NULL, sizeof(sem_t), PROT_WRITE, MAP_SHARED, shm_fd, 0);
fprintf(stderr, "%p\n", strerror(errno));
sem_init(&s, 1, 1);
fprintf(stderr, "%s\n", strerror(errno));
*sem_ptr = s;
sem_destroy(&s);
return 0;
}
Now the line *sem_ptr = s; leads to the same error as in the first programm
Can anyone help me please?
Your first strategy for creating the semaphore is correct. You can't necessarily copy a sem_t object to a different memory address and have it still work.
I'm not sure why you're getting SIGEMT, which I thought was never generated by modern Unixes. But when I run either of your programs on my computer, they crash with SIGBUS instead, and that pointed me at a bug that I know how to fix. When you mmap a file (a shared memory object is considered to be a file), and the size you ask for in the mmap call is bigger than the file, and then you access the memory area beyond the end of the file (by far enough that the CPU can trap this), you get a SIGBUS. And let me quote you a key piece of the shm_open manpage:
O_CREAT: Create the shared memory object if it does not exist. [...]
A new shared memory object initially has zero length—the size of
the object can be set using ftruncate(2).
What you need to do is call ftruncate on shm_fd to make the shared memory object big enough to hold the semaphore.
Some less-important bugs you should fix at the same time:
All of the system calls that work with memory maps may malfunction if you give them offsets or sizes that aren't a multiple of the system page size. (They're supposed to round up for you, but historically there have been a lot of bugs in this area.) You get the system page size by calling sysconf(_SC_PAGESIZE), and you round up with a little helper function shown below.
Most C library functions are allowed to set errno to a nonzero value even if they succeed. You should check whether each function actually failed before printing strerror(errno). (In the code below I used perror instead for brevity.)
The name of a shared memory object is required to start with a slash, followed by up to NAME_MAX characters that are not slashes.
sem_init may read from as well as writing to the memory pointed to by sem_ptr, and subsequent use of sem_wait and sem_post definitely will, so you should use PROT_READ|PROT_WRITE in the mmap call.
Putting it all together, this is a revised version of your first program which works on my computer. Because of the SIGEMT thing I can't promise it will work for you.
#include <fcntl.h>
#include <semaphore.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#ifndef DEFFILEMODE
# define DEFFILEMODE 0666
#endif
static long round_up(long n, long mult)
{
return ((n + mult - 1) / mult) * mult;
}
int main(void)
{
long pagesize;
long semsize;
sem_t *sem_ptr;
int shm_fd;
pagesize = sysconf(_SC_PAGESIZE);
if (pagesize == -1) {
perror("sysconf(_SC_PAGESIZE)");
return 1;
}
shm_fd = shm_open("/Shm", O_CREAT|O_RDWR, DEFFILEMODE);
if (shm_fd == -1) {
perror("shm_open");
return 1;
}
semsize = round_up(sizeof(sem_t), pagesize);
if (ftruncate(shm_fd, semsize) == -1) {
perror("ftruncate");
return 1;
}
sem_ptr = mmap(0, semsize, PROT_READ|PROT_WRITE, MAP_SHARED, shm_fd, 0);
if (sem_ptr == MAP_FAILED) {
perror("mmap");
return 1;
}
if (sem_init(sem_ptr, 1, 1)) {
perror("sem_init");
return 1;
}
sem_destroy(sem_ptr);
shm_unlink("/Shm");
return 0;
}
An additional complication you should be aware of is that calling sem_init on a semaphore that has already been initialized causes undefined behavior. This means you have to use some other kind of locking around the creation of the shared memory segment and the semaphore within. Off the top of my head I don't know how to do this in a bulletproof way.
Let's say I have the standard "Hello, World! \n" saved to a text file called hello.txt. If I want to change the 'H' to a 'R' or something, can I achieve this with mmap()?
mmap does not exist in the standard C99 (or C11) specification. It is defined in POSIX.
So assuming you have a POSIX system (e.g. Linux), you could first open(2) the file for read & write:
int myfd = open("hello.txt", O_RDWR);
if (myfd<0) { perror("hello.txt open"); exit(EXIT_FAILURE); };
Then you get the size (and other meta-data) of the file with fstat(2):
struct stat mystat = {};
if (fstat(myfd,&mystat)) { perror("fstat"); exit(EXIT_FAILURE); };
Now the size of the file is in mystat.st_size.
off_t myfsz = mystat.st_size;
Now we can call mmap(2) and we need to share the mapping (to be able to write inside the file thru the virtual address space)
void*ad = mmap(NULL, myfsz, PROT_READ|PROT_WRITE, MAP_SHARED,
myfd, 0);
if (ad == MMAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); };
Then we can overwrite the first byte (and we check that indeed the first byte in that file is H since you promised so):
assert (*(char*ad) == 'H');
((char*)ad) = 'R';
We might call msync(2) to ensure the file is updated right now on the disk. If we don't, it could be updated later.
Notably for very large mappings (notably those much larger than available RAM), we can assist the kernel (and its page cache) with hints given thru madvise(2) or posix_madvise(3)...
Notice that a mapping remains in effect even after a close(2). Use munmap & mprotect or mmap with MAP_FIXED on the same address range to change them.
On Linux, you could use proc(5) to query the address space. So your program could read (e.g. after fopen, using fgets in a loop) the pseudo /proc/self/maps file (or /proc/1234/maps for process of pid 1234).
BTW, mmap is used by dlopen(3); it can be called a lot of times, my manydl.c program demonstrates that on Linux you could have many hundreds of thousands of dlopen-ed shared files (so many hundreds of thousands of memory mappings).
Here's a working example.
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
int main(){
int myFile = open("hello.txt", O_RDWR);
if(myFile < 0){
printf("open error\n");
}
struct stat myStat = {};
if (fstat(myFile, &myStat)){
printf("fstat error\n");
}
off_t size = myStat.st_size;
char *addr;
addr = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, myFile, 0);
if (addr == MAP_FAILED){
printf("mmap error\n");
}
if (addr[0] != 'H'){
printf("Error: first char in file not H");
}
addr[0] = 'J';
return 0;
}
My program passes data pointers to third-party plugins with the intention that the data should be read-only, so it would be nice to prevent the plugins from writing to the data objects. Ideally there would be a segfault if a plugin attempts a write. I've heard there is some way to double-map a memory region, such that a second virtual address range points to the same physical memory pages. The second mapping would not have write permission, and the exported pointers would use this address range instead of the original (writable) one. I would prefer not to change the original memory allocations, whether they happen to use malloc or mmap or whatever. Can someone explain how to do this?
It is possible to get a dual mapping, but it requires some work.
The only way I know how to create such a dual mapping is to use the mmap function call. For mmap you need some kind of file-descriptor. Fortunately Linux allows you to get a shared memory object, so no real file on a storage medium is required.
Here is a complete example that shows how to create the shared memory object, creates a read/write and read-only pointer from it and then does some basic tests:
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
int main()
{
// Lets do this demonstration with one megabyte of memory:
const int len = 1024*1024;
// create shared memory object:
int fd = shm_open("/myregion", O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
printf ("file descriptor is %d\n", fd);
// set the size of the shared memory object:
if (ftruncate(fd, len) == -1)
{
printf ("setting size failed\n");
return 0;
}
// Now get two pointers. One with read-write and one with read-only.
// These two pointers point to the same physical memory but will
// have different virtual addresses:
char * rw_data = mmap(0, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd,0);
char * ro_data = mmap(0, len, PROT_READ , MAP_SHARED, fd,0);
printf ("rw_data is mapped to address %p\n", rw_data);
printf ("ro_data is mapped to address %p\n", ro_data);
// ===================
// Simple test-bench:
// ===================
// try writing:
strcpy (rw_data, "hello world!");
if (strcmp (rw_data, "hello world!") == 0)
{
printf ("writing to rw_data test passed\n");
} else {
printf ("writing to rw_data test failed\n");
}
// try reading from ro_data
if (strcmp (ro_data, "hello world!") == 0)
{
printf ("reading from ro_data test passed\n");
} else {
printf ("reading from ro_data test failed\n");
}
printf ("now trying to write to ro_data. This should cause a segmentation fault\n");
// trigger the segfault
ro_data[0] = 1;
// if the process is still alive something didn't worked.
printf ("writing to ro_data test failed\n");
return 0;
}
Compile with: gcc test.c -std=c99 -lrt
For some reason I get a warning that ftruncate is not declared. No idea why. The code works well though. Example output:
file descriptor is 3
rw_data is mapped to address 0x7f1778d60000
ro_data is mapped to address 0x7f1778385000
writing to rw_data test passed
reading from ro_data test passed
now trying to write to ro_data. This should cause a segmentation fault
Segmentation fault
I've left the memory deallocation as an exercise for the reader :-)
I've been working with large sparse files on openSUSE 11.2 x86_64. When I try to mmap() a 1TB sparse file, it fails with ENOMEM. I would have thought that the 64 bit address space would be adequate to map in a terabyte, but it seems not. Experimenting further, a 1GB file works fine, but a 2GB file (and anything bigger) fails. I'm guessing there might be a setting somewhere to tweak, but an extensive search turns up nothing.
Here's some sample code that shows the problem - any clues?
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
char * filename = argv[1];
int fd;
off_t size = 1UL << 40; // 30 == 1GB, 40 == 1TB
fd = open(filename, O_RDWR | O_CREAT | O_TRUNC, 0666);
ftruncate(fd, size);
printf("Created %ld byte sparse file\n", size);
char * buffer = (char *)mmap(NULL, (size_t)size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if ( buffer == MAP_FAILED ) {
perror("mmap");
exit(1);
}
printf("Done mmap - returned 0x0%lx\n", (unsigned long)buffer);
strcpy( buffer, "cafebabe" );
printf("Wrote to start\n");
strcpy( buffer + (size - 9), "deadbeef" );
printf("Wrote to end\n");
if ( munmap(buffer, (size_t)size) < 0 ) {
perror("munmap");
exit(1);
}
close(fd);
return 0;
}
The problem was that the per-process virtual memory limit was set to only 1.7GB. ulimit -v 1610612736 set it to 1.5TB and my mmap() call succeeded. Thanks, bmargulies, for the hint to try ulimit -a!
Is there some sort of per-user quota, limiting the amount of memory available to a user process?
My guess is the the kernel is having difficulty allocating the memory that it needs to keep up with this memory mapping. I don't know how swapped out pages are kept up with in the Linux kernel (and I assume that most of the file would be in the swapped out state most of the time), but it may end up needing an entry for each page of memory that the file takes up in a table. Since this file might be mmapped by more than one process the kernel has to keep up with the mapping from the process's point of view, which would map to another point of view, which would map to secondary storage (and include fields for device and location).
This would fit into your addressable space, but might not fit (at least contiguously) within physical memory.
If anyone knows more about how Linux does this I'd be interested to hear about it.