I am curious to know the details of how the kernel deals with concurrent writes / access to a shared mapping created through mmap. Looking in the man pages, all that I see for MAP_SHARED is the following:
Share this mapping. Updates to the mapping are visible to
other processes mapping the same region, and (in the case of
file-backed mappings) are carried through to the underlying
file. (To precisely control when updates are carried through
to the underlying file requires the use of msync(2).)
However, from my understanding, msync is dealing with synchronization from the page cache to the underlying storage device. However, it doesn't seem to deal with the case that two processes have mmaped a file. What happens when two processes write to the mapped region at the same time? If I create a shared mapping, how can I be sure that an unrelated process that happens to map the same file isn't doing concurrent writes?
The kernel on its own isn't going to do anything to prevent multiple processes (or even threads within a single process) from trashing the shared memory contents. You have to use synchronization constructs (such as a mutex) and ensure that all code that accesses the shared memory agrees on how to go about it.
The way to protect against another process simply writing to the file is to use flock before mmapping the region. You then need some way for the process that performs the mapping to communicate the file descriptor to the second process.
The answer to this question suggests using unix file sockets for that purpose.
Related
Do file system or disk drivers support a concept of file system modification fence, like a memory fence in the CPU or shared memory system?
A fence is an instruction that separate memory operations such that globally visible memory accesses after the fence event are not detectable until all those that come before it.
Is such feature available for file content modification (and repertory modification), in an efficient way? Of course a simplistic solution would be wait until all writes are written to stable storage; that however is blocking the application and could be inefficient if many synchronization points are needed. Also, it could cause many small individual writes when one big write (including many writes separated by fences) satisfies the same constrains on fully journalled system, or when the biggest write is guaranteed to be atomic by the disk driver.
Can file system drivers be forced to order writes with file system access fences? Has the concept been explored?
PRECISION
The context of the question is not multiple processes accessing the same files in a racy way but one process saving data in a database such that an interruption of the process (even a computer crash) should leave only one sequence of modifications (between two fences) partially written.
I am working with some older real-time control system code written using the RTAI extensions to Linux. I see four different mechanisms in use to create and share memory across process boundaries.
1) RTAI shared memory (rt_shm_alloc & rt_shm_free):
This uses an unsigned long globally unique value as a key for accessing the shared memory. Behind the scenes (from user space at least) it uses an ioctl on a character device to generate the memory then mmap to make it available.
2) System V (ftok, shmget, shmat, shmctl, etc):
This uses ftok to generate a key that is used, along with an index value, to find and map a block of memory. I've not tried to see how this is actually implemented, but I'm assuming that somewhere behind the curtain it is using mmap.
3) Posix shared memory (shm_open, mmap, shm_unlink, etc):
This takes a string (with some restrictions on content) and provides a file handle which can be used to mmap the linked block of memory. This seems to be supported using a virtual filesystem.
4) direct use of mmap and character driver ioctl calls
Certain kernel modules provide an interface which directly supports using mmap to create and shared a block of memory.
All of these mechanisms seem to use mmap either explicitly or implicitly to alter the virtual memory subsystem to setup and manage the shared memory.
The question is: If a block of memory is shared using one of these systems, is there any way to setup an alias that will access the same memory in the other systems.
A use case:
I have two I/O subsystems. The first is implemented using linux kernel driver and exports it's current I/O state in one chunk of shared memory created using the RTAI shared memory mechanism. The second is based on the etherlab ethercat master which uses a custom kernel module and directly uses ioctl and mmap to create a shared memory block.
I have 40 or so other systems which need access to certain I/O fields, but don't really need to know which I/O subsystem the data came from.
What I want is a way to open and access the different types of shared memory in a single coherent way, isolating the underlying implementation details from the users. Does such a mechanism exist?
I have altered the ethercat stack to use the RTAI shared memory mechanism to solve this instance, but that's just a temporary solution (Read: hack).
Lets say 4 simultaneous processes are running on a processor, and data needs to be copied from an HDFS (used with Spark) file system to a local directory. Now I want only one process to copy that data, while the other processes just wait for that data to be copied by the first process.
So, basically, I want some kind of a semaphore mechanism, where every process tries to obtain semaphore to try copying the data, but only one process gets the semaphore. All processes who failed to acquire the semaphore would then just wait for the semaphore to be cleared (the process who was able to acquire the semaphore would clear it after its done with copying), and when its cleared they know the data has already been copied. How can I do that in Linux?
There's a lot of different ways to implement semaphores. The classical, System V semaphore way is described in man semop and more broadly in man sem_overview.
You might still want to do something more easily scalable and modern. Many IPC frameworks (Apache has one or two of those, too!) have atomic IPC operations. These can be used to implement semaphores, but I'd be very very careful.
Generally, I regularly encourage people who write multi-process or multi-threaded applications to use C++ instead of C. It's often simpler to see where a shared state must be protected if your state is nicely encapsulated in an object which might do its own locking. Hence, I urge you to have a look at Boost's IPC synchronization mechanisms.
In addition of Marcus Müller's answer, you could use some file locking mechanism to synchronize.
File locking might not work very well on networked or remote file systems. You should use it on a locally mounted file system (e.g. Ext4, BTRFS, ...) not on a remote one (e.g. NFS)
For example, you might adopt the convention that your directory contains (or else you'll create it) some .lock file and use an advisory lock flock(2) (or a POSIX lockf(3)) on that .lock file before accessing the directory.
If using flock, you could even lock the directory directly....
The advantage of using such a file lock approach is that you could code shell scripts using flock(1)
And on Linux, you might also use inotify(7) (e.g. to be notified when some file is created in that directory)
Notice that most solutions are (advisory, so) presupposing that every process accessing that directory is following some convention (in other words, without more precautions like using flock(1), a careless user could access that directory - e.g. with a plain cp command -, or files under it, while your locking process is accessing the directory). If you don't accept that, you might look for mandatory file locking (which is a feature of some Linux kernels & filesystems, AFAIK it is sort-of deprecated).
BTW, you might read more about ACID properties and consider using some database, etc...
I'm writing a real time library which exports a standardized interface (VST) and is hosted by external applications.
The library must publish a table that is viewable by any thread in the same process (if it knows where to look) - to be clear, this table must be viewable by ALL dlls in the process space - if they know where to look.
Accessing the table must be fast. Virtual memory seems like overkill, and I've considered using a window handle (and I still may) to message pump, but I'd prefer an even faster method, if one is available.
Also, a shared data segment in the PE is something I'd like to avoid if possible. I think I'd almost rather use a window handle.
I'm not concerned with synchronization at the moment, I can handle that after the fact. I'd just like some suggestions for the fastest technique to publish the table within a process space.
You seem to be confused. All threads in the same process share the same address space, so you don't need any form of IPC: if a thread knows the address of the table, it can access it.
Use CreateFileMapping and pass in INVALID_FILE_HANDLE as the file handle.
This will create a named shared memory page(s) that is accessible by anyone who knows the name.
Don't be alarmed by the fact that MSDN docs say it's backed by the paging file - it will only go to disk in case your physical memory is exhausted, just like regular system memory.
In all regards, since it's supported by hardware MMU - it's identical to regular memory.
I have a linux server process that load big resources on startup. This processes will fork on request. The resources that are loaded on startup are the bigest stuff and will not change while runtime. The folked child processes use read/write control structures to handle requests to the constant resources.
How do I find out how much memory is shared between the processes and how many is uinque for every process? Or what pages are duplicated because of write access from any of the processes?
You can get this information from the /proc/$pid/pagemap and /proc/kpagecount and /proc/kpageflags virtual files in the proc filesystem. Access to the latter requires root because it could leak privileged information about process memory mappings you don't own. Read Documentation/vm/pagemap.txt from the kernel docs for details on the data format.