Exclusive\Synchronized access to a file (Queuing) - file

We have a shared folder which contains some files that needs to be processed.
We also have 3 UNIX servers which runs a shell script which take and process one file each time. At the end of the script the file it's moved away.
The 3 UNIX server doesn't communicate each other, and they are not aware of each other.
In your opinion what is the best way to guarantee that each file will be processed one time, without raising concurrent access issues\error ?

So or so you need some type of a file locking mechanism. Some of the possibilities:
You can create a temporary lock file for every files on work. For example, for file name.ext you will need to create a name.ext.lock, just before you start its processing. If this file already exists - also, the creation fails with a "file exists", it means somebody is already working on it, thus you shouldn't do anything with it.
Second, you could use advisory locks. Advisory locking doesn't already work on every type of file sharing, and they have only libc-level interface, so you can't use them from shell scripting. I suggest to dig into the manual of the flock libc api call.
Third, it were the hardest and it is deeply unix-specific. It were the mandatory lock. Mandatory locking means that the locks are effective even against the processes, who don't know anything from them. You can read more about them here: Mandatory file lock on linux
In your place I did the first, if I can modify the workings of the processing (for example, if I can hook them with a script or even I am developing the processing script). If not, you need probably the third, although it doesn't always work.

Related

C - Shared library with management of the same file

I'd like to create shared library in C for linux, some abstract implementation of database management. Shared library whould be responsible for read file containing database and write differences into it. But I have no idea how to handle multiprocessing problems of file handling for this case eg.: App1 try to write differences into database file and App2 has currently opened file with database to read it. In the case of this example I'd like to inform app1 that file is currently open and delay write sequence until App2 will finish database file read.
I was thinking of using some mutual exclusion mechanisms or by using global enum variable to manage current file status, but after I read some of posts I understood that every application that uses shared library create it's own copy in memory and they don't share any memory section during work.
Shared library whould be responsible for read file containing database and write differences into it.
That is possible but quit complicated solution.
While you would need to make sure that multiple processes do not interfere with each other. It's possible to do so with file logs (see man flock), and record locking in man fcntl but you must insure that multiple processes update disjoint file chunks "in place" (without resizing the file).
This is also prone to deadlocks -- if one of the processes takes a lock on a region and then goes into infinite loop, other processes may get stuck as well.
A much simpler solution involves client-server, where the server implements all writes and clients send it read and modify requests.
P.S. There are existing libraries implementing either approach. You will likely save yourself several months of development time using existing solutions.

c/c++ flock as mutex on linux not robust to file delete

File locking in C using flock is commonly used to implement cross-platform cooperative inter-process locking/mutex.
I have an implementation (mac/linux/win) that works well, but it is not robust to file deletion (at least under Linux).
One or more process have started creating and using lockfile (/tmp/lockfile) and cooperatively interlock on a shared resource with it.
Some time later, I manually delete the lockfile (rm /tmp/lockfile). The running process keep on cooperating with each other, but any new process that wants to start using the same resource lock and lockfile breaks the overall mutex logic. It will create a new version of the /tmp/lockfile that is somehow different to the one already in used in already running process.
What can be done to prevent the lockfile from being unlinked while any process has it open?
What other solutions can be used?
I can't use a semaphore because I need the lock to self-release if the owning process crashes.
You can indeed use a semaphore. Linux provides the flag SEM_UNDO, which will undo the semaphore-operation on process termination (see semop(2)).
The rm command does not actually delete files. Rather, it unlinks them from the file system, as if via the unlink(2) syscall. The files will not be removed from disk as long as any process holds them open (or any other hard links to them exist), and processes that do hold them open continue to refer to the same file, even though it no longer appears in directory listings. Nothing prevents another file from being created and linked to the file system at the same place as the previous one, but that is an altogether different file. This behavior is desirable for consistent program behavior, and some programs intentionally use it to their advantage for managing temporary files.
There is nothing you can do to prevent a process with sufficient privileges from unlinking the lock file. Any process that has sufficient privilege to create the lock file has sufficient privilege to unlink it, with the consequences you describe. One usually mitigates this problem by creating a temporary file with an unpredictable name for use with flock(), so that the file name or an open file handle must be exchanged between processes that want to synchronize actions by locking that file. For the particular case of child processes, you can rely on the child inheriting open file descriptors from its parent to enable them to get at the lock file even if it has been unlinked.
On the other hand, if you are relying on a lock file with a well-known name then the solution may be to create the file in advance, make root its owner and the owner of every directory in the path of hard links leading to it, and deny all other users write access the file and the directories. You could consider further wrapping it up with mandatory access controls (SELinux policy) if you wanted to be even more careful.

A way to make a file contents snapshot in Linux

What is the best way to create an "atomic" snapshot of file contents in Linux? Emphasis is not on performance, but on getting contents as a whole.
I may think of using sendfile(2) (since 2.6.33) or splice(2), but neither have any indication of operation atomicity. Both are run in the kernel-space entirely, but at least sendfile(2) implies it's using mmap(2) and mmap gives no guarantees that writes to the same mmaped (as MAP_SHARED) region in other processes won't be visible even with MAP_PRIVATE (probably they will, because that are the same pages).
Taking that this functions are writing with performance in mind and sendfile(2) is optimized to be used with DMA, I may only assume that they just copy memory in some background kernel thread and it's quite possible that other operations may also affect the data being copied.
So the only possible solution I see is to place a read lease with fcntl(2) (FD_SETLEASE) and copy file as normal, but if someone opens it for writing, either try to "rush" it (very reliable, I know) and beat the timer, or just give up and try later. Is that correct?
So the only possible [filesystem-independent] solution I see is to place a read lease with fcntl(2) (FD_SETLEASE) and copy file as normal, but if someone opens it for writing, either try to "rush" it (very reliable, I know) and beat the timer, or just give up and try later. Is that correct?
Almost; there is also fanotify. Plus, as mentioned in a comment, there are some filesystem-specific options, and some possibilities only available in certain configurations.
The lease break timer is configurable, /proc/sys/fs/lease_break_time in seconds, and the default is 45 seconds.
"Just give up and try later" is also a bit defeatist; you do have ways to monitor when the snapshot might work. Consider placing an inotify IN_CLOSE_WRITE and IN_CLOSE_NOWRITE watch on the file, and try the snapshot whenever you receive such an event.
fanotify:
For a few years now, I've been monitoring the progress of Linux fanotify, in the hopes that it would grow enough features that it could be used for automagic file versioning. Essentially, whenever someone opens the file with write permissions, the current file would be snapshot to temporary storage, marked with some metadata (timestamp, real human user (backtracked through sudo/su), and so on). When that descriptor is closed, another snapshot is taken, and a helper thread/process diffs the two, annotating the changes (or even pushing it to git).
It is limited to local filesystems, but with 2.6.37 and later kernels (including 3.x), the interface is sufficient for specific files, or an entire mount. In your case, the fanotify interface allows similar features to file leases, except for local filesystems only, but you can simply deny any accesses during the snapshot. (One can argue whether that is a good idea at all, especially if the file to be snapshotted is a system or configuration file; many programmers overlook error checking, because "some files just have to be always accessible, or your system is broken".)
As far as my change monitoring goes, fanotify should now have all sufficient features, but only if an entire mount is monitored. I was hoping to monitor configuration files on multi-admin clusters, but those files reside on the same mount as all system libraries and binaries do, so the monitoring causes considerable overhead. So much so, that it seems more appropriate to just modify SSH configuration, console configuration (getty etc.), sudo configuration, and possibly su, to always include a dynamic library that interposes file access syscalls, and basically does the versioning on behalf of the user. This way service binaries are not affected, only user actions are monitored.
This might work under some circumstances:
(Optional) Do something to prevent new processes to open the file:
a/ rename the file
b/ restrict file permissions
Find all existing file readers/writers via lsof and kill -STOP them
Do your snapshot
kill -CONT all readers/writers
(Optional) Restore action 1.

Wait for file to be unlocked - Windows

I'm writing a TFTP server program for university, which needs exclusive access to the files it opens for reading. Thus it can be configured that if a file is locked by another process that it waits for the file to become unlocked.
Is there any way on Win32 to wait for a file become unlocked without creating a handle for it first?
The reason I ask, is that if another process calls CreateFile() with a dwShareMode that is incompatible to the one my process uses, I won't even be able to get a file handle to use for waiting on the lock using LockFileEx().
Thanks for your help in advance!
If you take a look at the Stack Overflow questions What Win32 API can be used to find the process that has a given file open? and SYSTEM_HANDLE_INFORMATION structure, you will find links to code that can be used to enumerate processes and all open handles of each running process. This information can be used to obtain a HANDLE to the process that has the file open as well as its HANDLE for the file. You would then use DuplicateHandle() to create a copy of the file HANDLE, but in the TFTP process' handle table. The duplicated HANDLE could then be used by the TFTP process with LockFileEx().
This solution relies on an internal function, NtQuerySystemInformation(), and an undocumented system information class value that can be used to enumerate open handles. Note that this feature of NtQuerySystemInformation() "may be altered or unavailable in future versions of Windows". You might want to use an SEH handler to guard against access violations were that to happen.
As tools from MS like OH and Process Explorer do it, it is definitely possible to get all the handles opened by a process. From there to wait on what you'd like the road is still long, but it is a beginning :)
If you have no success with the Win32 API, one place to look at is for sure the NT Native API http://en.wikipedia.org/wiki/Native_API
You can start from here http://msdn.microsoft.com/en-us/library/windows/desktop/ms724509%28v=vs.85%29.aspx and see if it works with the SystemProcessInformation flag.
Look also here for a start http://nsylvain.blogspot.com/2007/09/how-list-all-open-handles.html
The native API is poorly documented, but you can find resources online (like here http://www.osronline.com/article.cfm?id=91)
As a disclaimer, I should add that the Native API is somehow "internal", and therefore subject to change on future versions. Some functions, however, are exposed also publicly in the DDK, at kernel level, so the likelihood of these functions to change is low.
Good luck!

Sharing file locks

I am currently working on a file processing service that looks at a fileshare, where files are uploaded to via FTP.
For scalability I've been asked to make this service to be able to be load balanced, so the service has to expect that other services on different machines may also be trying to process these files.
OK, so I thought I should be able achieve this by obtaining an exclusive lock for my process before processing a file, and skipping any files that may already be locked by another process.
The crux of this approach is shown below (I've left out the error handling for simplicity):
using(FileStream fs = File.Open(myFile, FileMode.Open, FileAccess.ReadWrite, (FileShare.Read | FileShare.Delete))
{
//Do work
}
Q1: My process now has a lock on this file. I thought this would mean I could then access the same file (without using the stream) and still have the correct access to it, but based on testing it seems I only have the benefits of the lock through the stream. Is this correct?
(For example, before I included FileShare.Delete, File.Delete(myFile) failed)
The above lock ultimately uses the 'Write' permission to determine which service has the file, but is intended to allow other processes to still Read the file. This is because the process that has the lock attempts to verify if the file is a valid zip file , which uses a third party library (Xceed.Zip). However this fails saying the file "is being used by another process". Using reflector I ultimately found the problematic call is:
stream = this.m_info.Open(FileMode.Open, FileAccess.Read, FileShare.Read);
Now I would have expected this to work as it only wants to read the file, but it fails. The reason appears to be outlined in a similar question. However, as this is a 3rd party API I can't change their code to use ReadWrite.
Q2: Is there a way I can correctly lock the file so it will not be picked up by the other services, but it can still be verified as a zip file using the external API?
I feel like there should be a 'correct' way to do this, but at the moment the best I can come up with is to lock the file, move it away from the shared directory, and then verify it at the new location.
If you're planning to reactively handle this situation by handling UnauthorizedAccessException I think you're making a serious mistake.
This can be handled by proactively renaming files. For example you can configure your service to only read files whose name is in the format 'Filename.YYYYMMDD.txt'. Prior to processing the file, you can rename it to 'Filename.YYYYMMDD.processing'. Then after processing the file you rename it to 'Filename.YYYYMMDD.done'.
You can even take it a step further by making another service that enqueues the filenames. This service will be a FileSystemWatcher that listens for FileAdd operations. Once it receives that event it proceeds to queueing the Filename to a global message queue. Then, each of your service will just be dequeueing filenames and no longer have to worry about concurrent access.
HTH

Resources