Can I do a copy-on-write memcpy in Linux? - c

I have some code where I frequently copy a large block of memory, often after making only very small changes to it.
I have implemented a system which tracks the changes, but I thought it might be nice, if possible to tell the OS to do a 'copy-on-write' of the memory, and let it deal with only making a copy of those parts which change. However while Linux does copy-on-write, for example when fork()ing, I can't find a way of controlling it and doing it myself.

Your best chance is probably to mmap() the original data to file, and then mmap() the same file again using MAP_PRIVATE.

Depending on what exactly it is that you are copying, a persistent data structure might be a solution for your problem.

Its easier to implement copy-on-write in a object oriented language, like c++. For example, most of the container classes in Qt are copy-on-write.
But if course you can do that in C too, it's just some more work. When you want to assign your data to a new data block, you don't do a copy, instead you just copy a pointer in a wrapper strcut around your data. You need to keep track in your data blocks of the status of the data. If you now change something in your new data block, you make a "real" copy and change the status.
You can't of course no longer use the simple operators like "=" for assignment, instead need to have functions (In C++ you would just do operator overloading).
A more robust implementation should use reference counters instead of a simple flag, I leave it up to you.
A quick and dirty example:
If you have a
struct big {
//lots of data
int data[BIG_NUMBER];
}
you have to implement assign functions and getters/setters yourself.
// assume you want to implent cow for a struct big of some kind
// now instead of
struct big a, b;
a = b;
a.data[12345] = 6789;
// you need to use
struct cow_big a,b;
assign(&a, b); //only pointers get copied
set_some_data(a, 12345, 6789); // now the stuff gets really copied
//the basic implementation could look like
struct cow_big {
struct big *data;
int needs_copy;
}
// shallow copy, only sets a pointer.
void assign(struct cow_big* dst, struct cow_big src) {
dst->data = src.data;
dst->needs_copy = true;
}
// change some data in struct big. if it hasn't made a deep copy yet, do it here.
void set_some_data(struct cow_big* dst, int index, int data } {
if (dst->needs_copy) {
struct big* src = dst->data;
dst->data = malloc(sizeof(big));
*(dst->data) = src->data; // now here is the deep copy
dst->needs_copy = false;
}
dst->data[index] = data;
}
You need to write constructors and destructors as well. I really recommend c++ for this.

The copy-on-write mechanism employed e.g. by fork() is a feature of the MMU (Memory Management Unit), which handles the memory paging for the kernel. Accessing the MMU is a priviledged operation, i.e. cannot be done by a userspace application. I am not aware of any copy-on-write API exported to user-space, either.
(Then again, I am not exactly a guru on the Linux API, so others might point out relevant API calls I have missed.)
Edit: And lo, MSalters rises to the occasion. ;-)

You should be able to open your own memory via /proc/$PID/mem and then mmap() the interesting part of it with MAP_PRIVATE to some other place.

Here's a working example:
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define SIZE 4096
int main(void) {
int fd = shm_open("/tmpmem", O_RDWR | O_CREAT, 0666);
int r = ftruncate(fd, SIZE);
printf("fd: %i, r: %i\n", fd, r);
char *buf = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
printf("debug 0\n");
buf[SIZE - 2] = 41;
buf[SIZE - 1] = 42;
printf("debug 1\n");
// don't know why this is needed, or working
//r = mmap(buf, SIZE, PROT_READ | PROT_WRITE,
// MAP_FIXED, fd, 0);
//printf("r: %i\n", r);
char *buf2 = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE, fd, 0);
printf("buf2: %i\n", buf2);
buf2[SIZE - 1] = 43;
buf[SIZE - 2] = 40;
printf("buf[-2]: %i, buf[-1]: %i, buf2[-2]: %i, buf2[-1]: %i\n",
buf[SIZE - 2],
buf[SIZE - 1],
buf2[SIZE - 2],
buf2[SIZE - 1]);
unlink(fd);
return EXIT_SUCCESS;
}
I'm a little unsure of whether I need to enable the commented out section, for safety.

Related

Mapping existing memory (data segment) to another memory segment

As the title suggests, I would like to ask if there is any way for me to map the data segment of my executable to another memory so that any changes to the second are updated instantly on the first. One initial thought I had was to use mmap, but unfortunately mmap requires a file descriptor and I do not know of a way to somehow open a file descriptor on my running processes memory. I tried to use shmget/shmat in order to create a shared memory object on the process data segment (&__data_start) but again I failed ( even though that might have been a mistake on my end as I am unfamiliar with the shm API). A similar question I found is this: Linux mapping virtual memory range to existing virtual memory range? , but the replies are not helpful.. Any thoughts are welcome.
Thank you in advance.
Some pseudocode would look like this:
extern char __data_start, _end;
char test = 'A';
int main(int argc, char *argv[]){
size_t size = &_end - &__data_start;
char *mirror = malloc(size);
magic_map(&__data_start, mirror, size); //this is the part I need.
printf("%c\n", test) // prints A
int offset = &test - &__data_start;
*(mirror + offset) = 'B';
printf("%c\n", test) // prints B
free(mirror);
return 0;
}
it appears I managed to solve this. To be honest I don't know if it will cause problems in the future and what side effects this might have, but this is it (If any issues arise I will try to log them here for future references).
Solution:
Basically what I did was use the mmap flags MAP_ANONYMOUS and MAP_FIXED.
MAP_ANONYMOUS: With this flag a file descriptor is no longer required (hence the -1 in the call)
MAP_FIXED: With this flag the addr argument is no longer a hint, but it will put the mapping on the address you specify.
MAP_SHARED: With this you have the shared mapping so that any changes are visible to the original mapping.
I have left in a comment the munmap function. This is because if unmap executes we free the data_segment (pointed to by &__data_start) and as a result the global and static variables are corrupted. When at_exit function is called after main returns the program will crash with a segmentation fault. (Because it tries to double free the data segment)
Code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#define _GNU_SOURCE 1
#include <unistd.h>
#include <sys/mman.h>
extern char __data_start;
extern char _end;
int test = 10;
int main(int argc, char *argv[])
{
size_t size = 4096;
char *shared = mmap(&__data_start, 4096, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_ANONYMOUS | MAP_SHARED, -1, 0);
if(shared == (void *)-1){
printf("Cant mmap\n");
exit(-1);
}
printf("original: %p, shared: %p\n",&__data_start, shared);
size_t offset = (void *)&test - (void *)&__data_start;
*(shared+offset) = 50;
msync(shared, 4096, MS_SYNC);
printf("test: %d :: %d\n", test, *(shared+offset));
test = 25;
printf("test: %d :: %d\n", test, *(shared+offset));
//munmap(shared, 4096);
}
Output:
original: 0x55c4066eb000, shared: 0x55c4066eb000
test: 50 :: 50
test: 25 :: 25

Segmentation fault with Posix-C program using mmap and mapfile

Well I have this program and I get a segmentation fault: 11 (core dumped). After lots of checks I get this when the for loop gets to i=1024 and it tries to mapfile[i]=0. The program is about making a server and a client program that communicates by read/writing in a common file made in the server program. This is the server program and it prints the value inside before and after the change. I would like to see what's going on, if it's a problem with the mapping or just problem with memory of the *mapfile. Thanks!
#include <sys/shm.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <errno.h>
#include <math.h>
int main()
{
int ret, i;
int *mapfile;
system("dd if=/dev/zero of=/tmp/c4 bs=4 count=500");
ret = open("/tmp/c4", O_RDWR | (mode_t)0600);
if (ret == -1)
{
perror("File");
return 0;
}
mapfile = mmap(NULL, 2000, PROT_READ | PROT_WRITE, MAP_SHARED, ret, 0);
for (i=1; i<=2000; i++)
{
mapfile[i] = 0;
}
while(mapfile[0] != 555)
{
mapfile = mmap(NULL, 2000, PROT_READ | PROT_WRITE, MAP_SHARED, ret, 0);
if (mapfile[0] != 0)
{
printf("Readed from file /tmp/c4 (before): %d\n", mapfile[0]);
mapfile[0]=mapfile[0]+5;
printf("Readed from file /tmp/c4 (after) : %d\n", mapfile[0]);
mapfile[0] = 0;
}
sleep(1);
}
ret = munmap(mapfile, 2000);
if (ret == -1)
{
perror("munmap");
return 0;
}
close(ret);
return 0;
}
mapfile = mmap(NULL, 2000, PROT_READ | PROT_WRITE, MAP_SHARED, ret, 0);
for (i=1; i<=2000; i++)
{
mapfile[i] = 0;
}
In this code here, you see that you are requesting 2000 units of memory. In this case mmap is taking in a size_t type meaning that its looking for a size, and not an amount of things for memory. As #Mat mentioned, you will need t use the sizeof(int) operator in order to feed mmap the proper size it requires.
The other issue that should be noted about this code that may cause a problem for you down the road, is beginning your loop index at i=1 rather than i=0. Starting your index at 0 wil ensure that you are going from the indices 0 - 1999, which corresponds to the memory you are trying to allocate.
Overall here, it looks like what your trying to do is initialize the values of your memory to 0. perhaps you could do this easier by relying on a builtin function called memset:
void *memset(void *str, int c, size_t n)
your code then becomes:
mapfile = mmap(NULL, 2000*sizeof(int), PROT_READ | PROT_WRITE, MAP_SHARED, ret, 0);
void *returnedPointer = memset(mapfile, 0, 2000*sizeof(int));
docs for memset can be found here:
http://www.tutorialspoint.com/c_standard_library/c_function_memset.htm
You're requesting 2000 bytes from mmap, but treating the returned value as an array of 2000 ints. That can't work, an int is usually 4 or 8 bytes these days. You'll be writing past the end of the reserved memory in your loop.
Change the mmap calls to use 2000*sizeof(int). And while you're at it, give that 2000 constant a name (e.g. const int num_elems = 2000; near the top) and don't repeat the magic constant all over the place. And once that's done change it to 1024 or 2048 so that the resulting size is a multiple of the page size (if you're not sure of your page size, getconf PAGE_SIZE on the command line).
And also change your dd command to create a large-enough file. It is currently creating a 2000 byte file, you'll need to increase that as well.
And validate the return value of mmap - it can fail, and you should detect that.
Finally, don't continuously remap, you're using MAP_SHARED modifications through other shared mappings of the same file and offset will be visible to your process. (Must really be the same file, if the other process also does a dd, that might not work. Only one process should have the responsibility of creating that file.)
If you do want to remap, you must also unmap each time. Otherwise you're leaking mappings.

Can anyone help me make a Shared Memory Segment in C

I need to make a shared memory segment so that I can have multiple readers and writers access it. I think I know what I am doing as far as the semaphores and readers and writers go...
BUT I am clueless as to how to even create a shared memory segment. I want the segment to hold an array of 20 structs. Each struct will hold a first name, an int, and another int.
Can anyone help me at least start this? I am desperate and everything I read online just confuses me more.
EDIT: Okay, so I do something like this to start
int memID = shmget(IPC_PRIVATE, sizeof(startData[0])*20, IPC_CREAT);
with startData as the array of structs holding my data initialized and I get an error saying
"Segmentation Fault (core dumped)"
The modern way to obtain shared memory is to use the API, provided by the Single UNIX Specification. Here is an example with two processes - one creates a shared memory object and puts some data inside, the other one reads it.
First process:
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <fcntl.h>
#define SHM_NAME "/test"
typedef struct
{
int item;
} DataItem;
int main (void)
{
int smfd, i;
DataItem *smarr;
size_t size = 20*sizeof(DataItem);
// Create a shared memory object
smfd = shm_open(SHM_NAME, O_RDWR | O_CREAT, 0600);
// Resize to fit
ftruncate(smfd, size);
// Map the object
smarr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, smfd, 0);
// Put in some data
for (i = 0; i < 20; i++)
smarr[i].item = i;
printf("Press Enter to remove the shared memory object\n");
getc(stdin);
// Unmap the object
munmap(smarr, size);
// Close the shared memory object handle
close(smfd);
// Remove the shared memory object
shm_unlink(SHM_NAME);
return 0;
}
The process creates a shared memory object with shm_open(). The object is created with an initial size of zero, so it is enlarged using ftruncate(). Then the object is memory mapped into the virtual address space of the process using mmap(). The important thing here is that the mapping is read/write (PROT_READ | PROT_WRITE) and it is shared (MAP_SHARED). Once the mapping is done, it can be accessed as a regular dynamically allocated memory (as a matter of fact, malloc() in glibc on Linux uses anonymous memory mappings for larger allocations). Then the process writes data into the array and waits until Enter is pressed. Then it unmaps the object using munmap(), closes its file handle and unlinks the object with shm_unlink().
Second process:
#include <stdio.h>
#include <sys/mman.h>
#include <fcntl.h>
#define SHM_NAME "/test"
typedef struct
{
int item;
} DataItem;
int main (void)
{
int smfd, i;
DataItem *smarr;
size_t size = 20*sizeof(DataItem);
// Open the shared memory object
smfd = shm_open(SHM_NAME, O_RDONLY, 0600);
// Map the object
smarr = mmap(NULL, size, PROT_READ, MAP_SHARED, smfd, 0);
// Read the data
for (i = 0; i < 20; i++)
printf("Item %d is %d\n", i, smarr[i].item);
// Unmap the object
munmap(smarr, size);
// Close the shared memory object handle
close(smfd);
return 0;
}
This one opens the shared memory object for read access only and also memory maps it for read access only. Any attempt to write to the elements of the smarr array would result in segmentation fault being delivered.
Compile and run the first process. Then in a separate console run the second process and observe the output. When the second process has finished, go back to the first one and press Enter to clean up the shared memory block.
For more information consult the man pages of each function or the memory management portion of the SUS (it's better to consult the man pages as they document the system-specific behaviour of these functions).

Does any operating system allow an application programmer to create pointers out of thunks?

Many operating systems allow one to memory map files, and read from them lazily. If the operating system can do this then effectively it has the power to create regular pointers out of thunks.
Does any operating system allow an application programmer to create pointers out of his own thunks?
I know that to a limited extent operating systems already support this capability because one can create a pipe, map it into memory, and connect a process to the pipe to accomplish some of what I'm talking about so this functionality doesn't seem too impossible or unreasonable.
A simple example of this feature is a pointer which counts the number of times it's been dereferenced. The following program would output zero, and then one.
static void setter(void* environment, void* input) {
/* You can't set the counter */
}
static void getter(void* environment, void* output) {
*((int*)output) = *((int*)environment)++;
}
int main(int argc, char** argv) {
volatile int counter = 0;
volatile int * const x = map_special_voodoo_magic(getter, setter, sizeof(*x),
&counter);
printf("%i\n", *x);
printf("%i\n", *x);
unmap_special_voodoo_magic(x);
}
P.S. A volatile qualifier is required because the value pointed to by x changes unexpectedly right? As well, the compiler has no reason to think that dereferencing x would change counter so a volatile is needed there too right?
The following code has no error checking, and is not a complete implementation. However, the following code does illustrate the basic idea of how to make a thunk into a pointer. In particular, the value 4 is calculated on demand and not assigned to memory earlier.
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <signal.h>
void * value;
static void catch_function(int signal, siginfo_t* siginfo,
void* context) {
if (value != siginfo->si_addr) {
puts("ERROR: Wrong address!");
exit(1);
}
mmap(value, sizeof(int), PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED
| MAP_ANON, -1, 0);
*((int*)value) = 2 * 2;
}
int main(void) {
struct sigaction new_action;
value = mmap(NULL, sizeof(int), PROT_NONE, MAP_ANON | MAP_SHARED, -1,
0);
sigemptyset(&new_action.sa_mask);
new_action.sa_sigaction = catch_function;
new_action.sa_flags = SA_SIGINFO;
sigaction(SIGBUS, &new_action, NULL);
printf("The square of 2 is: %i\n", *((int*)value));
munmap(value, sizeof(int));
return 0;
}

How to map two virtual adresses on the same physical memory on linux?

I'm facing a quite tricky problem. I'm trying to get 2 virtual memory areas pointing to the same physical memory. The point is to have different page protection parameters on different memory areas.
On this forum, the user seems to have a solution, but it seems kinda hacky and it's pretty clear that something better can be done performance-wise :
http://www.linuxforums.org/forum/programming-scripting/19491-map-two-virtual-memory-addres-same-physical-page.html
As I'm facing the same problem, I want to give a shot here to know if somebody has a better idea. Don't be afraid to mention the dirty details behind the hood, this is what this question is about.
Thank by advance.
Since Linux kernel 3.17 (released in October 2014) you can use memfd_create system call to create a file descriptor backed by anonymous memory. Then mmap the same region several times, as mentioned in the above answers.
Note that glibc wrapper for the memfd_create system call was added in glibc 2.27 (released in February 2018). The glibc manual also describes how the descriptor returned can be used to create multiple mappings to the same underlying memory.
I'm trying to get 2 virtual memory area pointing on the same physical memory.
mmap the same region in the same file, twice, or use System V shared memory (which does not require mapping a file in memory).
I suppose if you dislike Sys V shared memrory you could use POSIX shared memory objects. They're not very popular but available on Linux and BSDs at least.
Once you get an fd with shm_open you could immediately call shm_unlink. Then no other process can attach to the same shared memory, and you can mmap it multiple times. Still a small race period available though.
As suggested by #PerJohansson, I wrote & tested following code, it works well on linux, using mmap with MAP_SHARED|MAP_FIXED flag, we can map the same physical page allocated by POSIX shm object multiple times and continuously into very large virtual memory.
#include "stdio.h"
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h> /* For mode constants */
#include <fcntl.h> /* For O_* constants */
void * alloc_1page_mem(int size) {
int fd;
char * ptr_base;
char * rptr;
/* Create shared memory object and set its size */
fd = shm_open("/myregion", O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
if (fd == -1) {
perror("error in shm_open");
return NULL;
}
if (ftruncate(fd, 4096) == -1) {
perror("error in ftruncate");
return NULL;
}
// following trick reserves big enough holes in VM space
ptr_base = rptr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
munmap(rptr, size);
for(int i=0; i<size; i+=4096) {
rptr = mmap(rptr, 4096, PROT_READ | PROT_WRITE, MAP_SHARED|MAP_FIXED, fd, 0);
if (rptr == MAP_FAILED) {
perror("error in mmap");
return NULL;
}
rptr += 4096;
}
close(fd);
shm_unlink("/myregion");
return ptr_base;
}
void check(int * p, int total_cnt){
for (int i=0;i<4096/sizeof(int);i++) {
p[i] = i;
}
int fail_cnt = 0;
for (int k=0; k<total_cnt; k+= 4096/sizeof(int)) {
for (int i=0;i<4096/sizeof(int);i++) {
if (p[k+i] != i)
fail_cnt ++;
}
}
printf("fail_cnt=%d\n", fail_cnt);
}
int main(int argc, const char * argv[]) {
const char * cmd = argv[1];
int sum;
int total_cnt = 32*1024*1024;
int * p = NULL;
if (*cmd++ == '1')
p = alloc_1page_mem(total_cnt*sizeof(int));
else
p = malloc(total_cnt*sizeof(int));
sum = 0;
while(*cmd) {
switch(*cmd++) {
case 'c':
check(p, total_cnt);
break;
case 'w':
// save only 4bytes per cache line
for (int k=0;k<total_cnt;k+=64/sizeof(int)){
p[k] = sum;
}
break;
case 'r':
// read only 4bytes per cache line
for (int k=0;k<total_cnt;k+=64/sizeof(int)) {
sum += p[k];
}
break;
case 'p':
// prevent sum from being optimized
printf("sum=%d\n", sum);
}
}
return 0;
}
You can observe very low cache miss rate on memory allocated in such method:
$ sudo perf stat -e mem_load_retired.l3_miss -- ./a.out 0wrrrrr
# this produces L3 miss linearly increase with number of 'r' charaters
$ sudo perf stat -e mem_load_retired.l3_miss -- ./a.out 1wrrrrr
# this produces almost constant L3 miss.
If you are root, you can mmap("/dev/mem", ...) but there are caveats in the newer kernels, see accessing mmaped /dev/mem?

Resources