Is it possible to have persistent memory allocated to a process? - c

Suppose process A allocates some memory in which it stores some data. Let's say it is a set of key -> value pairs. It is expensive to create these key -> value pairs. So, I want to allocate memory such that even if process A dies for some reason when it is restarted it should be able to access this data in RAM. I understand I can store the data to a file and read it back when A restarts. I want to explore if there are other methods available if the amount of memory available is not an issue.
Is there a mechanism (api) to allocate memory such that it is pinned in memory until freed. If not, is it possible to achieve this by employing shared memory techniques. For example 2 process allocate and share the same memory and so even if one process dies the memory is not freed because the other is still alive. When the dead process is restarted can it regain access to that shared memory? If yes how?
Finally if this is not possible I am curious why the kernel does not provide such a mechanism?

Yes. What you're looking for is called Shared Memory segments. Run man 7 shm_overview to get the overview but basically it's:
shm_open - allocate or re-open a shared memory segment (POSIX)
shmget - allocate a shared memory segment (System V)
shmat - attaches to a shared memory segment (System V)
shmdt - detaches from a shared memory segment (System V)
shm_unlink - remove the shared memory segment (POSIX)
If you have a copy of "Advanced UNIX Programming" 2nd edition the chapter "Advanced Interprocess Communication" cover this in more detail in sections "System V Shared Memory" and "POSIX Shared Memory".
Also, this feature predates Linux, it's been around since 1983 assuming the dates on https://en.wikipedia.org/wiki/UNIX_System_V are correct.

Related

How is memory layout shared with other processes/threads?

I'm currently learning memory layout in C. For now I know there exist several sections in C program memory: text, data, bss, heap and stack. They also say heap is shared with other things beyond the program.
My questions are these.
What exactly is the heap shared with? One source states that Heap must always be freed in order to make it available for other processes whereas another says The heap area is shared by all threads, shared libraries, and dynamically loaded modules in a process. If it is not shared with other processes, do I really have to free it while my program is running (not at the end of it)?
Some sources also single out high addresses (the sixth section) for command line arguments and environment variables. Shall this be considered as another layer and a part of a program memory?
Are the other sections shared with anything else beyond a program?
The heap is a per-process memory: each process has its own heap, which is shared only within the same process space (like between the process threads, as you said). Why should you free it? Not properly to give space to other processes (at least in modern OS where the process memory is reclaimed by the OS when the process dies), but to prevent heap exhaustion within your process memory: in C, if you don't deallocate the heap memory regions you used, they will be always considered as busy even when they are not used anymore. Thus, to prevent undesired errors, it's a good practice to free the memory in the heap as soon as you don't need it anymore.
In a C program the command line variables are stored in the stack as function variables of the main. What happens is that usually the stack is allocated in the highest portion of a process memory, which is mapped to the high addresses (this is probably the reason why some sources point out what you wrote). But, generally speaking, there isn't any sixth memory area.
As said by the others, the text area can be shared by processes. This area usually contains the binary code, which would be the same for different processes which share the same binary. For performance reasons, the OS can allow to share such memory area, (think for example when you fork a child process).
Heap is shared with other processes in a sense that all processes use RAM. The more of it you use, the less is available to other programs. Heap sharing with other threads in your own program means that all your threads actually see and access the same heap (same virtual address space, same actual RAM, with some luck also same cache).
No.
text can be shared with other processes. These days it is marked as read-only, so having several processes share text makes sense. In practice this means that if you are already running top and you run another instance it makes no sense to load text part again. This would waste time and physical RAM. If the OS is smart enough it can map those RAM pages into virtual address space of both top instances, saving time and space.
On the official aspect:
The terms thread, process, text section, data section, bss, heap and stack are not even defined by the C language standard, and every platform is free to implement these components however "it may like".
Threads and processes are typically implemented at the operating-system layer, while all the different memory sections are typically implemented at the compiler layer.
On the practical aspect:
For every given process, all these memory sections (text section, data section, bss, heap and stack) are shared by all the threads of that process.
Hence, it is under the responsibility of the programmer to ensure mutual-exclusion when accessing these memory sections from different threads.
Typically, this is achieved via synchronization utilities such as semaphores, mutexes and message queues.
In between processes, it is under the responsibility of the operating system to ensure mutual-exclusion.
Typically, this is achieved via virtual-memory abstraction, where each process runs inside its own logical address space, and each logical address space is mapped to a different physical address space.
Disclaimer: some would claim that each thread has its own stack, but technically speaking, those stacks are usually allocated consecutively on the stack of the process, and there's usually no one to prevent a thread from accessing the stacks of other threads, whether intentionally or by mistake (aka stack overflow).

POSIX shared memory - How many copies of memory are there

Situation:
If process a & b each use mmap() to create a shared memory mapping, with the same shared memory object /shm-a as backed file.
My guess:
I originally thought there is only 1 copy of memory, which processes write/read on.
But later I think there are actually 3 copy of them, right? Each process has 1 copy which is created by mmap(), and the 3rd copy is the shared memory object, which is used to sync between process, but I am not sure.
The questions are:
Then how many copy of memory there are? 1 or n+1 (where n is process count)
If it's n+1, won't this be a kind waste of memory? And is it proper for process to read/write to the shared memory object via its fd directly?
Then how many copy of memory there are? 1 or n+1 (where n is process count)
There is only one copy of the shared memory.
The same physical memory is mapped into different processes. But it may be mapped to different addresses.
And is it proper for process to read/write to the shared memory object via its fd directly?
Yes it is. That is, in fact, the purpose of shared memory. What one process writes into shared memory can be read by the other process. This a very fast form of IPC. But you do have to be careful in how you use it. In particular, you need to worry about concurrent access, and sharing pointers in shared memory.

Memory Management for Mapped Data in Shared Memory Segments

I'm working on a project in C that uses shared memory for IPC on a Linux system. However, I'm a little bit confused about memory management in these segments. I'm using the POSIX API for this project.
I understand how to create the shared segments, and that these persist until a reboot if you fail to properly remove them with shm_unlink(). Additionally, I understand how to do the actually mapping & unmapping with mmap and munmap respectively. However, the usage of these operations and how it affects the stored data in these shared segments is confusing me.
Here is what I'm trying to properly understand:
Lets say I create a segment using shm_open() with the O_CREAT flag. This gives me a file descriptor that I've named msfd in the below example. Now I have a struct that I map into that address space with the following:
mystruct* ms = (mystruct*)mmap(NULL, sizeof(mystruct), PROT_READ | PROT_WRITE, MAP_SHARED, msfd, 0);
//set the elements of the struct here using ms->element = X as usual
Part 1)
Here's where my confusion beings. Lets say that this process is now done accessing that location since it was just setting data for another process to read. Do I still call munmap()?
I want the other process to still have access to all of this data that the current process has set. Normally, you wouldn't call free() on a malloc'ed pointer until its use is no longer needed permanently. However, I understand that when this process exits the unmapping happens automatically anyway. Is the data persisted inside the segment, or does that segment just get reserved with it's allotted size and name?
Part 2)
We're now in the process of the other application that needs to access and read from that shared segment. I understand that we now open that segment with shm_open() and then perform the same mapping operation with mmap(). Now we have access to the structure in that segment. When we call munmap() from this process (NOT the one that created the data) it "unlinks" us from that pointer, however the data is still accessible. Does this assume that process 1 (the creator) has NOT called munmap()?
Is the data persisted inside the segment,
Yes.
does that segment just get reserved with it's allotted size and name?
Also yes.
Does this assume that process 1 (the creator) has NOT called munmap()?
No.
The shared memory gets created via shm_create() (as being taken from available OS memory) and from this moment on it carries whichever content had been written into until it is given back to the OS via shm_unlink().
shm_create() and shm_open() act system oriented, in terms of the (shared) memory being a system (not process) specific resource.
mmap() and unmap() act process oriented, that is map and unmap the system resource shared memory into/out-of the process' address space.

shmat for attaching shared memory segment

When I looked through the man pages of shmat. It is described as the primitive function of the API is to attach the memory segment associated wih shmid it to the calling process' address space .
The questions I have are the following.
The term attach looks generic to me. I find difficulties in understanding what is the underlying acivity that attach refers to.?
What it means by mapping a segment of memory?
Use it as char *ptr=shmat(seg_id,NULL,0);
It attaches the created segment id by function shmget() with the process which contains this above code.
seg_id is the segment id of newly created segment
NULL means the Operating System will take care of the starting address of the segment on user's behalf
0 is flag for read/write both
Whenever a process attaches to shared memory then it must be detached so that another process can access it by attaching to that segment (if the locking mechanism of resources is present.)
to detach : shmdt(ptr);
There's a good explanation here: http://www.makelinux.net/alp/035
"Under Linux, each process's virtual memory is split into pages. Each process maintains a mapping from its memory addresses to these virtual memory pages, which contain the actual data. Even though each process has its own addresses, multiple processes' mappings can point to the same page, permitting sharing of memory"

Freeing memory in C under Linux

If you do not free memory that you malloc'd in a C program under Linux, when is it released? After the program terminates? Or is the memory still locked up until an unforseen time (probably at reboot)?
Memory allocated by malloc() is freed when the process ends. However memory allocated using shmget() is not freed when the process ends. Be careful with that.
Edit:
mmap() is not shmget() read about the difference here: http://www.opengroup.org/onlinepubs/000095399/functions/mmap.html http://www.opengroup.org/onlinepubs/009695399/functions/shmget.html
They are different system calls, which do very different things.
Yes, the memory is freed when the process terminates.
malloc()ed memory is definitely freed when the process ends, but not by calling free(). it's the whole memory mapping (presented to the process as linear RAM) that is just deleted from the MMU tables.
In Linux (and most other Unixes) when you invoke a program from a command shell, it creates a new process to run that program in. All resources a process reserves, including heap memory, should get released back to the OS when your process terminates.
Memory 'allocation' has two distinct meanings that are commonly overlapped in questions like this.
memory is made available to the process by the sbrk system call
memory is assigned to some purpose within the context of the program by malloc()
the sbrk system call tell the kernel to get some more memory ready in case the process needs it. the memory is not actually mapped into the processes address space until immediately before it is written to. This is called demand paging. Typically the right to access this memory is never actually revoked by the operating system until the process exits. If memory becomes scarce then the kswapd (part of the kernel) will shuffle the least used parts off to disk to make room. The kernel can enforce a hard limit on the ammount of memory in a processes working set if you would like :)
the second context is what you are talking about when you call malloc/free. malloc keeps a list of available memory and hands chunks out to your functions when requested. if it detects that it doesnt have enough memory on hand to meet a request it will call sbrk to allow the process to access more.
when you look at the output of top you will see two numbers for a processes memory usage. One for the 'virtual size' and 'resident size', virtual size is the total amount that the process has requested access to. resident size is the amount that is actively being used by the process at the moment.
The memory is released from the point of the program when it exits. It is not tied up in any way after that and can be reused by other processes.
modern operating systems will release all memory allocated by a process when that process is terminated. The only situations where that might not be true, would probably be in very old operating systems (perhaps DOS?) and some embedded systems.

Resources