file lock leases via NFS v4 in C - c

does anybody know how to use the fancy file locking features of NFS v4? (described in e.g. About the NFS protocol (scroll down)). supposedly NFS v4 supports file lock leasing with a 45 second lifetime. I would like to believe that the linux kernel (I'm using gentoo 2.6.30) happily takes care of these details, and I can use fcntl() and it all comes out in the wash. I am guessing, however, that I have to do something special somehow to get, maintain, and release the lock lease. all help appreciated.

you are right, fcntl takes care of all this business for you. The lease management is done by the nfs client(kernel module in linux)

Related

How can I serialize access to a directory in Linux?

Lets say 4 simultaneous processes are running on a processor, and data needs to be copied from an HDFS (used with Spark) file system to a local directory. Now I want only one process to copy that data, while the other processes just wait for that data to be copied by the first process.
So, basically, I want some kind of a semaphore mechanism, where every process tries to obtain semaphore to try copying the data, but only one process gets the semaphore. All processes who failed to acquire the semaphore would then just wait for the semaphore to be cleared (the process who was able to acquire the semaphore would clear it after its done with copying), and when its cleared they know the data has already been copied. How can I do that in Linux?
There's a lot of different ways to implement semaphores. The classical, System V semaphore way is described in man semop and more broadly in man sem_overview.
You might still want to do something more easily scalable and modern. Many IPC frameworks (Apache has one or two of those, too!) have atomic IPC operations. These can be used to implement semaphores, but I'd be very very careful.
Generally, I regularly encourage people who write multi-process or multi-threaded applications to use C++ instead of C. It's often simpler to see where a shared state must be protected if your state is nicely encapsulated in an object which might do its own locking. Hence, I urge you to have a look at Boost's IPC synchronization mechanisms.
In addition of Marcus Müller's answer, you could use some file locking mechanism to synchronize.
File locking might not work very well on networked or remote file systems. You should use it on a locally mounted file system (e.g. Ext4, BTRFS, ...) not on a remote one (e.g. NFS)
For example, you might adopt the convention that your directory contains (or else you'll create it) some .lock file and use an advisory lock flock(2) (or a POSIX lockf(3)) on that .lock file before accessing the directory.
If using flock, you could even lock the directory directly....
The advantage of using such a file lock approach is that you could code shell scripts using flock(1)
And on Linux, you might also use inotify(7) (e.g. to be notified when some file is created in that directory)
Notice that most solutions are (advisory, so) presupposing that every process accessing that directory is following some convention (in other words, without more precautions like using flock(1), a careless user could access that directory - e.g. with a plain cp command -, or files under it, while your locking process is accessing the directory). If you don't accept that, you might look for mandatory file locking (which is a feature of some Linux kernels & filesystems, AFAIK it is sort-of deprecated).
BTW, you might read more about ACID properties and consider using some database, etc...

Are there any distributed high-availability filesystems (for Linux) that are actively-developed?

Are there any distributed, high-availability filesystems (for Linux) that are actively-developed?
Let me be more specific:
Distributed means it deals gracefully with client-to-server latencies like you'd find over the public worldwide internet (300ms and up being commonplace) and occasional connectivity flakiness. This means really good client-side caching (i.e. with callbacks) is required. NFS does not do this. It also means encryption of on-the-wire data without needing an IPSEC VPN.
High availability means that data can be stored on multiple servers and the client is smart enough to try another server if it encounters problems. Putting that intelligence in the client is really important, and it's why this sort of thing can't just be grafted onto NFS. At a minimum this needs to be possible for read-only data. It would be nice for read-write data but I know that's hard.
Filesystem means a kernel driver exporting a POSIX interface and permissions and access control are enforced in the face of untrustworthy clients. SAN systems often assume the clients are trustworthy.
I'm an OpenAFS refugee. I love it but at this point I can no longer accept its requirement that all the file servers effectively "have root" on all other file servers. The proprietary disk format and overhead of having to run Kerberos infrastructure (which I wouldn't otherwise need) are also becoming increasingly problematic.
Are there any systems other than OpenAFS with these properties? Intermezzo and Coda probably qualify but aren't active projects any longer. Lustre is cool but seems to be designed for ultra-low-latency data centres. Ceph is awesome but not really a filesystem, more of a thing that runs under a filesystem (yes, there's CephFS, but it's really a showcase for Ceph and explicitly not production-ready and there's no timetable for that). Tahoe-LAFS is cool but it and GoogleFS aren't really filesystems in that they don't export a POSIX interface through a kernel module. My understanding of GFS (Global Filesystem) is that the clients can manipulate the on-disk data structures directly, so they're implicitly root-level trusted (and this is part of why it's fast) -- correct me if I'm wrong here.
Needs to be open source since I can't afford to have my data locked up in something proprietary. I don't mind paying for software, but I can't be held hostage in this situation.
Thanks,
First of all you can use local file system (mounted with -o user_xattr) to cache NFS (mounted with -o fsc) using cachefilesd (provided by cachefilesd package on Debian) through fscache facility.
Although file system that you are looking for probably do not exist, IMHO two projects came pretty close with fairly good FUSE client implementations:
LizardFS (GPL-3 licensed, hosted at Github), fork of now proprietary MooseFS.
Gfarm file system (BSD/Apache-2.0, hosted at SourceForge)
After evaluating Ceph for quite a while I came to conclusion that it is flawed (with no hope for improvement in the foreseeable future) and not suitable for serious use. XtreemFS is a disappointment too. I hope that upcoming OrangeFS version 3 (with promised data integrity checks) might not be too bad but that's remains to be seen...

Is it possible to interrupt a process and checkpoint it to resume it later on?

Lets say, you have an application, which is consuming up all the computational power. Now you want to do some other necessary work. Is there any way on Linux, to interrupt that application and checkpoint its state, so that later on it could be resumed from the state it was interrupted?
Especially I am interested in a way, where the application could be stopped and restarted on another machine. Is that possible too?
In general terms, checkpointing a process is not entirely possible (because a process is not only an address space, but also has other resources likes file descriptors, and TCP/IP sockets ...).
In practice, you can use some checkpointing libraries like BLCR etc. With certain limiting conditions, you might be able to migrate a checkpoint image from one system to another one (very similar to the source one: same kernel, same versions of libraries & compilers, etc.).
Migrating images is also possible at the virtual machine level. Some of them are quite good for that.
You could also design and implement your software with your own checkpointing machinery. Then, you should think of using garbage collection techniques and terminology. Look also into Emacs (or Xemacs) unexec.c file (which is heavily machine dependent).
Some languages implementation & runtime have checkpointing primitives. SBCL (a free Common Lisp implementation) is able to save a core image and restart it later. SML/NJ is able to export an image. Squeak (a Smalltalk implementation) also has such ability.
As an other example of checkpointing, the GCC compiler is actually able to compile a single *.h header (into a pre-compiled header file which is a persistent image of GCC heap) by using persistence techniques.
Read more about orthogonal persistence. It is also a research subject. serialization is also relevant (and you might want to use textual formats à la JSON, YAML, XML, ...). You might also use hibernation techniques (on the whole system level).
From the man pages man kill
Interrupting a process requires two steps:
To stop
kill -STOP <pid>
and
To continue
kill -CONT <pid>
Where <pid> is the process-id.
Type: Control + Z to suspend a process (it sends a SIGTSTP)
then bg / fg to resume it in background or in foreground
Checkingpointing an individual process is fundamentally impossible on POSIX. That's because processes are not independent; they can interact. If nothing else, a process has a unique process ID, which it might have stored somewhere internally, and if you resume it with a different process ID, all hell could break loose. This is especially true if the process uses any kind of locks/synchronization primitives. Of course you also can't resume the process with the same process ID it originally had, since that might be taken by a new process.
Perhaps you could solve the problem by making process (and thread) ids 128-bit or so, such that they're universally unique...
On linux it is achivable by sending this process STOP signal. Leter on you resume it by sending CONT signal. Please refer to kill manual.

libevent2 and file io

I've been toying around with libevent2, and I've got reading files working, but it blocks. Is there any way to make file reading not block just within libevent. Or, do I need to use another IO library for files and make it pump events that I need.
fd = open("/tmp/hello_world",O_RDONLY);
evbuffer_read(buf,fd,4096);
The O_NONBLOCK flag doesn't work either.
In POSIX disks are considered "fast devices" meaning that they always block (which is why O_NONBLOCK didn't work for you). Only network sockets can be non-blocking.
There is POSIX AIO, but e.g. on Linux that comes with a bunch of restrictions making it unsuitable for general-purpose usage (only for O_DIRECT, I/O must be sector-aligned).
If you want to integrate normal POSIX IO into an asynchronous event loop it seems people resort to thread pools, where the blocking syscalls are executed in the background by one of the worker threads. One example of such a library is libeio
No.
I've yet to see a *nix where you can do non-blocking i/o on regular files without resorting to the more special AIO library (Though for some, e.g. solaris, O_NONBLOCK has an effect if e.g. someone else holds a lock on the file)
Please take a look at libuv, which is used by node.js / io.js: https://github.com/libuv/libuv
It's a good alternative to libeio, because it does perform well on all major operating systems, from Windows to the BSDs, Mac OS X and of course Linux.
It supports I/O completion ports, which makes it a better choice than libeio if you are targeting Windows.
The C code is also very readable and I highly recommend this tutorial: https://nikhilm.github.io/uvbook/

fcntl, lockf, which is better to use for file locking?

Looking for information regarding the advantages and disadvantages of both fcntl and lockf for file locking. For example which is better to use for portability? I am currently coding a linux daemon and wondering which is better suited to use for enforcing mutual exclusion.
What is the difference between lockf and fcntl:
On many systems, the lockf() library routine is just a wrapper around fcntl(). That is to say lockf offers a subset of the functionality that fcntl does.
Source
But on some systems, fcntl and lockf locks are completely independent.
Source
Since it is implementation dependent, make sure to always use the same convention. So either always use lockf from both your processes or always use fcntl. There is a good chance that they will be interchangeable, but it's safer to use the same one.
Which one you chose doesn't matter.
Some notes on mandatory vs advisory locks:
Locking in unix/linux is by default advisory, meaning other processes don't need to follow the locking rules that are set. So it doesn't matter which way you lock, as long as your co-operating processes also use the same convention.
Linux does support mandatory locking, but only if your file system is mounted with the option on and the file special attributes set. You can use mount -o mand to mount the file system and set the file attributes g-x,g+s to enable mandatory locks, then use fcntl or lockf. For more information on how mandatory locks work see here.
Note that locks are applied not to the individual file, but to the inode. This means that 2 filenames that point to the same file data will share the same lock status.
In Windows on the other hand, you can actively exclusively open a file, and that will block other processes from opening it completely. Even if they want to. I.e., the locks are mandatory. The same goes for Windows and file locks. Any process with an open file handle with appropriate access can lock a portion of the file and no other process will be able to access that portion.
How mandatory locks work in Linux:
Concerning mandatory locks, if a process locks a region of a file with a read lock, then other processes are permitted to read but not write to that region. If a process locks a region of a file with a write lock, then other processes are not permitted to read nor write to the file. What happens when a process is not permitted to access the part of the file depends on if you specified O_NONBLOCK or not. If blocking is set it will wait to perform the operation. If no blocking is set you will get an error code of EAGAIN.
NFS warning:
Be careful if you are using locking commands on an NFS mount. The behavior is undefined and the implementation widely varies whether to use a local lock only or to support remote locking.
Both interfaces are part of the POSIX standard, and nowadays both interfaces are available on most systems (I just checked Linux, FreeBSD, Mac OS X, and Solaris). Therefore, choose the one that fits better your requirements and use it.
One word of caution: it is unspecified what happens when one process locks a file using fcntl and another using lockf. In most systems these are equivalent operations (in fact under Linux lockf is implemented on top of fcntl), but POSIX says their interaction is unspecified. So, if you are interoperating with another process that uses one of the two interfaces, choose the same one.
Others have written that the locks are only advisory: you are responsible for checking whether a region is locked. Also, don't use stdio functions, if you want the to use the locking functionality.
Your main concerns, in this case (i.e. when "coding a Linux daemon and wondering which is better suited to use for enforcing mutual exclusion"), should be:
will the locked file be local or can it be on NFS?
e.g. can the user trick you into creating and locking your daemon's pid file on NFS?
how will the lock behave when forking, or when the daemon process is terminated with extreme prejudice e.g. kill -9?
The flock and fcntl commands behave differently in both cases.
My recommendation would be to use fcntl. You may refer to the File locking article on Wikipedia for an in-depth discussion of the problems involved with both solutions:
Both flock and fcntl have quirks which
occasionally puzzle programmers from
other operating systems. Whether flock
locks work on network filesystems,
such as NFS, is implementation
dependent. On BSD systems flock calls
are successful no-ops. On Linux prior
to 2.6.12 flock calls on NFS files
would only act locally. Kernel 2.6.12
and above implement flock calls on NFS
files using POSIX byte range locks.
These locks will be visible to other
NFS clients that implement
fcntl()/POSIX locks.1 Lock upgrades
and downgrades release the old lock
before applying the new lock. If an
application downgrades an exclusive
lock to a shared lock while another
application is blocked waiting for an
exclusive lock, the latter application
will get the exclusive lock and the
first application will be locked out.
All fcntl locks associated with a file
for a given process are removed when
any file descriptor for that file is
closed by that process, even if a lock
was never requested for that file
descriptor. Also, fcntl locks are not
inherited by a child process. The
fcntl close semantics are particularly
troublesome for applications which
call subroutine libraries that may
access files.
I came across an issue while using fcntl and flock recently that I felt I should report here as searching for either term shows this page near the top on both.
Be advised BSD locks, as mentioned above, are advisory. For those who do not know OSX (darwin) is BSD. This must be remembered when opening a file to write into.
To use fcntl/flock you must first open the file and get its ID. However if you have opened the file with "w" the file will instantly be zeroed out. If your process then fails to get the lock as the file is in use elsewhere, it will most likely return, leaving the file as 0kb. The process which had the lock will now find the file has vanished from underneath it, catastrophic results normally follow.
To remedy this situation, when using file locking, never open the file "w", but instead open it "a", to append. Then if the lock is successfully acquired, you can then safely clear the file as "w" would have, ie. :
fseek(fileHandle, 0, SEEK_SET);//move to the start
ftruncate(fileno((FILE *) fileHandle), 0);//clear it out
This was an unpleasant lesson for me.
As you're only coding a daemon which uses it for mutual exclusion, they are equivalent, after all, your application only needs to be compatible with itself.
The trick with the file locking mechanisms is to be consistent - use one and stick to it. Varying them is a bad idea.
I am assuming here that the filesystem will be a local one - if it isn't, then all bets are off, NFS / other network filesystems handle locking with varying degrees of effectiveness (in some cases none)

Resources