How can I serialize access to a directory in Linux? - c

Lets say 4 simultaneous processes are running on a processor, and data needs to be copied from an HDFS (used with Spark) file system to a local directory. Now I want only one process to copy that data, while the other processes just wait for that data to be copied by the first process.
So, basically, I want some kind of a semaphore mechanism, where every process tries to obtain semaphore to try copying the data, but only one process gets the semaphore. All processes who failed to acquire the semaphore would then just wait for the semaphore to be cleared (the process who was able to acquire the semaphore would clear it after its done with copying), and when its cleared they know the data has already been copied. How can I do that in Linux?

There's a lot of different ways to implement semaphores. The classical, System V semaphore way is described in man semop and more broadly in man sem_overview.
You might still want to do something more easily scalable and modern. Many IPC frameworks (Apache has one or two of those, too!) have atomic IPC operations. These can be used to implement semaphores, but I'd be very very careful.
Generally, I regularly encourage people who write multi-process or multi-threaded applications to use C++ instead of C. It's often simpler to see where a shared state must be protected if your state is nicely encapsulated in an object which might do its own locking. Hence, I urge you to have a look at Boost's IPC synchronization mechanisms.

In addition of Marcus Müller's answer, you could use some file locking mechanism to synchronize.
File locking might not work very well on networked or remote file systems. You should use it on a locally mounted file system (e.g. Ext4, BTRFS, ...) not on a remote one (e.g. NFS)
For example, you might adopt the convention that your directory contains (or else you'll create it) some .lock file and use an advisory lock flock(2) (or a POSIX lockf(3)) on that .lock file before accessing the directory.
If using flock, you could even lock the directory directly....
The advantage of using such a file lock approach is that you could code shell scripts using flock(1)
And on Linux, you might also use inotify(7) (e.g. to be notified when some file is created in that directory)
Notice that most solutions are (advisory, so) presupposing that every process accessing that directory is following some convention (in other words, without more precautions like using flock(1), a careless user could access that directory - e.g. with a plain cp command -, or files under it, while your locking process is accessing the directory). If you don't accept that, you might look for mandatory file locking (which is a feature of some Linux kernels & filesystems, AFAIK it is sort-of deprecated).
BTW, you might read more about ACID properties and consider using some database, etc...

Related

Can different processes create different files at the same time in the same directory?

I'm coding a server process that can be accessed by multiple client processes at the same time. Server process might need to create some files into a directory depending on the client.
As there can be many clients connected at the same time, I obviously have a dedicated server for each one (which is a thread), so my question is, do I need to add mutex handling (e.g pthread_mutex_lock / pthread_mutex_unlock when accessing the directory? (I can guarantee the same file won't be modified or created more than once, so my question is just regarding accessing the directory).
It is the operating system's responsibility to control access to shared resources. It would be a pretty poor OS that could not handle multiple files open simultaneously.
The only time you would need consider that perhaps is where you are implementing the filesystem itself on a bare-metal system lacking an OS, or at least an OS lacking an intrinsic filesystem of its own, which is pretty much restricted to embedded systems RTOS / kernels, or where you are writing the OS itself.
Accessing the same file concurrently may be a different matter. It is usually necessary to explicitly request/permit shared access of a file, but not a directory.

Cross-platform synchronization primitives which allow determining which PID is using them

I need to design a wrapper for a process synchronization primitive which acts like a semaphore with let's say limit 1 (so that only one client can have it locked at the same time). If this was the only requirement then I could just use named semaphores. But I'd also like to know, in the scenarios where a client can not lock the primitive, who actually has locked it. The best would be to know the locking process id. I see how I can achieve this on POSIX systems with semctl and GETPID but Windows does not expose anything like that. I am also aware that I can easily achieve this with files (e.g. opening a known file with shared read and non shared write permissions - when locking the client creates that file and writes it's PID so that the others can read it), but if possible I'd like to use actual OS API primitives instead of filesystem. Is this possible?
In Windows there is the Wait Chain Traversal which allows you to see who has locked what.

Posix named lock inter process what work with multi-thread application?

I need to create named lock that work correctly with multi-thread application for Linux. Each instance of application could use more than one named-lock with different names.
I know about fcntl/flock, but it doesn't work if try to lock twice from different thread of one application or from one thread.
I know about open(..., O_CREATE | O_EXCL), but this file-lock will not be removed if application was killed by signal KILL or was crashed with segmentation fault and there is needed manual removing of lock-files after restart application.
Any another ways?
If you just need to run under modern Linux, you could use file-private locks. If that's not an option, you'll have to build your own thread-safe locking abstraction on top of fcntl locks. SQLite is public domain and has implemented that, so you could look at that for inspiration. If GPLed code is okay: OpenJDK has another, incompatible implementation of the same thing.
O_EXCL does not perform locking (beyond the file creation step), so that's usually not helpful.
Other options are System V and POSIX semaphores, but these usually do not work as well as fcntl locks when processes day. A robust, process-shared mutex in a file mapping could be an option as well, but you need to be careful to stay within the POSIX semantics as far as serialization to disk is concerned (basically, you need to reinitialize the mutex every time the application starts from scratch, after a reboot or libc update).

Is it possible to write (or firstly, open) a disk file in two different programs simultaneously?

I need to update a log file according to the messages produced by two different modules which may be running simultaeously.
So is it possible to open and write a file simultaneously in two programs?
Sys Spec: SLES 11 x86_64.
You can do one of the following:
Use flock() (or a similar mechanism) to synchronize the writes on the open file descriptors (as already answered).
Use open() and close() (or similar) repeatedly on systems that support (or even enforce) exclusive open().
Depend on buffered output to send out log lines uninterrupted. This is often used with stderr logging, as a possible race condition isn't usually a problem here.
Use a logging service and only open() the file there. Other processes communicate with the logging service via IPC. You can use a custom logging service or a tool like syslog or journald. Both of them AFAIK support logging from non-root processes as well.
I would personally prefer the last option because its design is the cleanest one and it doesn't depend so much on OS-specific behavior. If your application consists of multiple processes started by the main process, then the main process may perform as the logging service as well and create pipes before spawning the child processes. If the processes are started separately, you can have a separate service that listens on a TCP/IP socket or (if your system supports it) a local domain socket.
Yes. A file can be opened by several processes/programs simulatneously. Multiple processes/programs can read & write in a file simultaneously but the end result of writing in the same file at the same time may be undefined. So it is better to use locks.
On Linux you can use: flocks

fcntl, lockf, which is better to use for file locking?

Looking for information regarding the advantages and disadvantages of both fcntl and lockf for file locking. For example which is better to use for portability? I am currently coding a linux daemon and wondering which is better suited to use for enforcing mutual exclusion.
What is the difference between lockf and fcntl:
On many systems, the lockf() library routine is just a wrapper around fcntl(). That is to say lockf offers a subset of the functionality that fcntl does.
Source
But on some systems, fcntl and lockf locks are completely independent.
Source
Since it is implementation dependent, make sure to always use the same convention. So either always use lockf from both your processes or always use fcntl. There is a good chance that they will be interchangeable, but it's safer to use the same one.
Which one you chose doesn't matter.
Some notes on mandatory vs advisory locks:
Locking in unix/linux is by default advisory, meaning other processes don't need to follow the locking rules that are set. So it doesn't matter which way you lock, as long as your co-operating processes also use the same convention.
Linux does support mandatory locking, but only if your file system is mounted with the option on and the file special attributes set. You can use mount -o mand to mount the file system and set the file attributes g-x,g+s to enable mandatory locks, then use fcntl or lockf. For more information on how mandatory locks work see here.
Note that locks are applied not to the individual file, but to the inode. This means that 2 filenames that point to the same file data will share the same lock status.
In Windows on the other hand, you can actively exclusively open a file, and that will block other processes from opening it completely. Even if they want to. I.e., the locks are mandatory. The same goes for Windows and file locks. Any process with an open file handle with appropriate access can lock a portion of the file and no other process will be able to access that portion.
How mandatory locks work in Linux:
Concerning mandatory locks, if a process locks a region of a file with a read lock, then other processes are permitted to read but not write to that region. If a process locks a region of a file with a write lock, then other processes are not permitted to read nor write to the file. What happens when a process is not permitted to access the part of the file depends on if you specified O_NONBLOCK or not. If blocking is set it will wait to perform the operation. If no blocking is set you will get an error code of EAGAIN.
NFS warning:
Be careful if you are using locking commands on an NFS mount. The behavior is undefined and the implementation widely varies whether to use a local lock only or to support remote locking.
Both interfaces are part of the POSIX standard, and nowadays both interfaces are available on most systems (I just checked Linux, FreeBSD, Mac OS X, and Solaris). Therefore, choose the one that fits better your requirements and use it.
One word of caution: it is unspecified what happens when one process locks a file using fcntl and another using lockf. In most systems these are equivalent operations (in fact under Linux lockf is implemented on top of fcntl), but POSIX says their interaction is unspecified. So, if you are interoperating with another process that uses one of the two interfaces, choose the same one.
Others have written that the locks are only advisory: you are responsible for checking whether a region is locked. Also, don't use stdio functions, if you want the to use the locking functionality.
Your main concerns, in this case (i.e. when "coding a Linux daemon and wondering which is better suited to use for enforcing mutual exclusion"), should be:
will the locked file be local or can it be on NFS?
e.g. can the user trick you into creating and locking your daemon's pid file on NFS?
how will the lock behave when forking, or when the daemon process is terminated with extreme prejudice e.g. kill -9?
The flock and fcntl commands behave differently in both cases.
My recommendation would be to use fcntl. You may refer to the File locking article on Wikipedia for an in-depth discussion of the problems involved with both solutions:
Both flock and fcntl have quirks which
occasionally puzzle programmers from
other operating systems. Whether flock
locks work on network filesystems,
such as NFS, is implementation
dependent. On BSD systems flock calls
are successful no-ops. On Linux prior
to 2.6.12 flock calls on NFS files
would only act locally. Kernel 2.6.12
and above implement flock calls on NFS
files using POSIX byte range locks.
These locks will be visible to other
NFS clients that implement
fcntl()/POSIX locks.1 Lock upgrades
and downgrades release the old lock
before applying the new lock. If an
application downgrades an exclusive
lock to a shared lock while another
application is blocked waiting for an
exclusive lock, the latter application
will get the exclusive lock and the
first application will be locked out.
All fcntl locks associated with a file
for a given process are removed when
any file descriptor for that file is
closed by that process, even if a lock
was never requested for that file
descriptor. Also, fcntl locks are not
inherited by a child process. The
fcntl close semantics are particularly
troublesome for applications which
call subroutine libraries that may
access files.
I came across an issue while using fcntl and flock recently that I felt I should report here as searching for either term shows this page near the top on both.
Be advised BSD locks, as mentioned above, are advisory. For those who do not know OSX (darwin) is BSD. This must be remembered when opening a file to write into.
To use fcntl/flock you must first open the file and get its ID. However if you have opened the file with "w" the file will instantly be zeroed out. If your process then fails to get the lock as the file is in use elsewhere, it will most likely return, leaving the file as 0kb. The process which had the lock will now find the file has vanished from underneath it, catastrophic results normally follow.
To remedy this situation, when using file locking, never open the file "w", but instead open it "a", to append. Then if the lock is successfully acquired, you can then safely clear the file as "w" would have, ie. :
fseek(fileHandle, 0, SEEK_SET);//move to the start
ftruncate(fileno((FILE *) fileHandle), 0);//clear it out
This was an unpleasant lesson for me.
As you're only coding a daemon which uses it for mutual exclusion, they are equivalent, after all, your application only needs to be compatible with itself.
The trick with the file locking mechanisms is to be consistent - use one and stick to it. Varying them is a bad idea.
I am assuming here that the filesystem will be a local one - if it isn't, then all bets are off, NFS / other network filesystems handle locking with varying degrees of effectiveness (in some cases none)

Resources