Is there an os-independent way to atomically overwrite a file? - file

If the file already exists, I want to overwrite it. If it doesn't exist, I want to create it and write to it. I'd prefer to not have to use a 3rd party library like lockfile (which seems to handle all types of locking.)
My initial idea was to:
Write to a temporary file with a randomly generated large id to avoid conflict.
Rename the temp filename -> new path name.

os.Rename calls syscall.Rename which for Linux/UNIXs uses the rename syscall (which is atomic*). On Windows syscall.Rename calls MoveFileW which assuming the source and destination are on the same device (which can be arranged) and the filesystem is NTFS (which is often the case) is atomic*.
I would take care to make sure the source and destination are on the same device so the Linux rename does not fail, and the Windows rename is actually atomic. As Dave C mentions above creating your temporary file (usually using ioutil.TempFile) in the same directory as existing file is the way to go; this is how I do my atomic renames.
This works for me in my use case which is:
One Go process gets updates and renames files to swap updates in.
Another Go process is watching for file updates with fsnotify and re-mmaps the file when it is updated.
In the above use case simply using os.Rename has worked perfectly well for me.
Some further reading:
Is rename() atomic? "Yes and no. rename() is atomic assuming the OS does not crash...."
Is an atomic file rename (with overwrite) possible on Windows?
*Note: I do want to point out that when people talk about atomic filesystem file operations, from an application perspective, they usually mean the operation happens or does not happen (which journaling can help with) from the users perspective. If you are using atomic in the sense of an atomic memory operation, very few filesystem operations (outside of direct I/O [O_DIRECT] one block writes and reads with disk buffering disabled) can be considered truly atomic.

Related

What is the most efficient way to copy many files programmatically?

Once upon a time long ago, we had a bash script that works out a list of files that need to be copied based on some criteria (basically like a filtered version of cp -rf).
This was too slow and was replaced by a C++ program.
What the C++ program does is essentially:
foreach file
read entire file into buffer
write entire file
The program uses Posix calls open(), read() and write() to avoid buffering and other overheads vs iostream and fopen, fread & fwrite.
Is it possible to improve on this?
Notes:
I am assuming these are not sparse files
I am assuming GNU/Linux
I am not assuming a particular filesystem is available
I am not assuming prior knowledge of whether the source and destination are on the same disk.
I am not assuming prior knowledge of the kind of disk, SSD, HDD maybe even NFS or sshfs.
We can assume the source files are on the same disk as each other.
We can assume the destination files will also be on the same disk as each other.
We cannot assume whether the source and destinations are on the same disk or or not.
I think the answer is yes but it is quite nuanced.
Copying speed is of course limited by disk IO not CPU.
But how can we be sure to optimise our use of disk IO?
Maybe the disk has the equivalent of multiple read or write heads available? (perhaps an SSD?)
In which case performing multiple copies in parallel will help.
Can we determine and exploit this somehow?
This is surely well trod territory so rather than re-invent the wheel straight away (though that is always fun) it would be nice to hear what others have tried or would recommend.
Otherwise I will try various things and answer my own question sometime in the distant future.
This is what my evolving answer looks like so far...
If the source and destination are different physical disks then
we can at least read and write at the same time with something like:
writer thread
read from write queue
write file
reader thread
foreach file
read file
queue write on writer thread
If the source and destination are on the same physical disk and we happen to be on a filesystem
with copy on write semantics (like xfs or btrfs) we can potentially avoid actually copying the file at all.
This is apparently called "reflinking".
The cp command supports this using --reflink=auto.
See also:
https://www.reddit.com/r/btrfs/comments/721rxp/eli5_how_does_copyonwrite_and_deduplication_work/
https://unix.stackexchange.com/questions/80351/why-is-cp-reflink-auto-not-the-default-behaviour
From this question
and https://github.com/coreutils/coreutils/blob/master/src/copy.c
it looks as if this is done using an ioctl as in:
ioctl (dest_fd, FICLONE, src_fd);
So a quick win is probably:
try FICLONE on first file.
If it succeeds then:
foreach file
srcFD = open(src);
destFD = open(dest);
ioctl(destFD,FICLONE,srcFD);
else
do it the other way - perhaps in parallel
In terms of low-level system APIs we have:
copy_file_range
ioctl FICLONE
sendfile
I am not clear when to choose one over the other except that copy_file_range is not safe to use with some filesystems notably procfs.
This answer gives some advice and suggests sendfile() is intended for sockets but in fact this is only true for kernels before 2.6.33.
https://www.reddit.com/r/kernel/comments/4b5czd/what_is_the_difference_between_splice_sendfile/
copy_file_range() is useful for copying one file to another (within
the same filesystem) without actually copying anything until either
file is modified (copy-on-write or COW).
splice() only works if one of the file descriptors refer to a pipe. So
you can use for e.g. socket-to-pipe or pipe-to-file without copying
the data into userspace. But you can't do file-to-file copies with it.
sendfile() only works if the source file descriptor refers to
something that can be mmap()ed (i.e. mostly normal files) and before
2.6.33 the destination must be a socket.
There is also a suggestion in a comment that reading multiple files then writing multiple files will result in better performance.
This could use some explanation.
My guess is that it tries to exploit the heuristic that the source files and destination files will be close together on the disk.
I think the parallel reader and writer thread version could perhaps do the same.
The problem with such a design is it cannot exploit any performance gain from the low level system copy APIs.
The general answer is: Measure before trying another strategy.
For HDD this is probably your answer: https://unix.stackexchange.com/questions/124527/speed-up-copying-1000000-small-files
Ultimately I did not determine the "most efficient" way but I did end up with a solution that was sufficiently fast for my needs.
generate a list of files to copy and store it
copy files in parallel using openMP
#pragma omp parallel for
for (auto iter = filesToCopy.begin(); iter < filesToCopy.end(); ++iter)
{
copyFile(*iter);
}
copy each file using copy_file_range()
falling back to using splice() with a pipe() when compiling for old platforms not supporting copy_file_range().
Reflinking, as supported by copy_file_range(), to avoid copying at all when the source and destination are on the same filesystem is a massive win.

Atomic copy in c

I'm trying to implement an atomic version of copy on write. I have certain conditions if met that will make a copy of the original file.
I implemented something like this pseudo code.
//write operations//
if(some condition)
//create a temp file//
rename(srcfile, copied-version)
rename(tmpfile, srcfile)
problem with this logic :
Hardlinks.
I want to transfer the Hardlink from copied version to new srcfile.
You can't.
Hardlinks are one directional pointers. So you can't modify or remove other hardlinks that you don't explicitly know about. All you can do is write to the same file data, and that's not atomic.
This rule applies uniformly to both hadlinks and file descriptors. What that means is that you can't modify the content pointed to by an unknown hardlink and not modify the content pointed to by another process with the same file open.
That effectively prevents you from modifying the file an unknown hardlink points
to atomically.
If you have control over every process which might modify or access these files (if they are only modified by programs you've written), then you might be able to use flock() to signal to other processes that the file is in use. This won't work if the file is stored on an NFS remote file system, but should generally work otherwise.
In some cases, file leases can be a solution to the underlying issue – ensuring atomic content updates – but only if each reader and writer opens and closes the file for each snapshot.
Because a similar limitation happens for the traditional copy–update–rename-over sequence, perhaps the file lease solution would also work for OP.
For details, see man 2 fcntl Leases and Managing signals sections. The process must either have the same owner as the file, or have the CAP_LEASE capability (usually granted to the process via filesystem capabilities). Superuser processes (running as root) have the capability by default.
The idea is that when the process wishes to make "atomic" changes to the file, it acquires a write lease on the file. This only succeeds if no other process has the file open. If another process tries to open the file, the lease holder receives a signal, and has up to lease-break-time (about a minute, typically) to downgrade the lease (or simply close the file); during that time, the opener will block.
Note that there is no way to divert the opener. The situation is that the opener already has a handle to the underlying inode (so access checks and filename resolution has already occurred); it is just that kernel won't return it to the userspace process before the lease is released or broken.
Your lease owner can, however, create a copy of the current contents to a temporary file, acquiring a write lease on that as well, and then rename it over the target file name. This way, each (set of) opener(s) obtain a handle to the file contents as they were at the time of the opening; if they do any modifications, they will be "private", and not reflected on the original file. Since the underlying inode is no longer referred to by any filename, when they (the last process having it open) close it, the inode is deleted and the storage released back to the file system. The Linux page cache also caches such accesses very well, so in many cases the "temporary copy file" never even hits actual storage media (unless there is memory pressure, i.e. memory needed for non-pagecache purposes).
A pure "atomic modification" does not require any kind of copies or renames, only holding the lease for the duration of the set of writes that must appear atomic for the readers.
Note that taking a write lease will normally block until no other process has the file open any longer, so the time at which such a lease-based atomic update can occur, is restricted, and not guaranteed to be always available. (For example, you may have a lazy process that just keeps the file open, and occasionally polls it. If you have such processes, this lease-based approach won't work – but nor would the copy–rename-over approach either.)
Also, leases work only on local files.
If you need record-based atomicity, just use fcntl-based record locks, and have all readers take a read-lock for the region they want to access atomically, and all writers take a write-lock for the region to be updated, as record-locks are advisory (i.e., do not block reads or writes, only other record locks).

How to preserve ownership and permissions when doing an atomic file replace?

So, the normal POSIX way to safely, atomically replace the contents of a file is:
fopen(3) a temporary file on the same volume
fwrite(3) the new contents to the temporary file
fflush(3)/fsync(2) to ensure the contents are written to disk
fclose(3) the temporary file
rename(2) the temporary file to replace the target file
However, on my Linux system (Ubuntu 16.04 LTS), one consequence of this process is that the ownership and permissions of the target file change to the ownership and permissions of the temporary file, which default to uid/gid and current umask.
I thought I would add code to stat(2) the target file before overwriting, and fchown(2)/fchmod(2) the temporary file before calling rename, but that can fail due to EPERM.
Is the only solution to ensure that the uid/gid of the file matches the current user and group of the process overwriting the file? Is there a safe way to fall back in this case, or do we necessarily lose the atomic guarantee?
Is the only solution to ensure that the uid/gid of the file matches the current user and group of the process overwriting the file?
No.
In Linux, a process with the CAP_LEASE capability can obtain an exclusive lease on the file, which blocks other processes from opening the file for up to /proc/sys/fs/lease-break-time seconds. This means that technically, you can take the exclusive lease, replace the file contents, and release the lease, to modify the file atomically (from the perspective of other processes).
Also, a process with the CAP_CHOWN capability can change the file ownership (user and group) arbitrarily.
Is there a safe way to [handle the case where the uid or gid does not match the current process], or do we necessarily lose the atomic guarantee?
Considering that in general, files may have ACLs and xattrs, it might be useful to create a helper program, that clones the ownership including ACLs, and extended attributes, from an existing file to a new file in the same directory (perhaps with a fixed name pattern, say .new-################, where # indicate random alphanumeric characters), if the real user (getuid(), getgid(), getgroups()) is allowed to modify the original file. This helper program would have at least the CAP_CHOWN capability, and would have to consider the various security aspects (especially the ways it could be exploited). (However, if the caller can overwrite the contents, and create new files in the target directory -- the caller must have write access to the target directory, so that they can do the rename/hardlink replacement --, creating a clone file on their behalf with empty contents ought to be safe. I would personally exclude target files owned by root user or group, though.)
Essentially, the helper program would behave much like the mktemp command, except it would take the path to the existing target file as a parameter. It would then be relatively straightforward to wrap it into a library function, using e.g. fork()/exec() and pipes or sockets.
I personally avoid this problem by using group-based access controls: dedicated (local) group for each set. The file owner field is basically just an informational field then, indicating the user that last recreated (or was in charge of) said file, with access control entirely based on the group. This means that changing the mode and the group id to match the original file suffices. (Copying ACLs would be even better, though.) If the user is a member of the target group, they can do the fchown() to change the group of any file they own, as well as the fchmod() to set the mode, too.
I am by no means an expert in this area, but I don't think it's possible. This answer seems to back this up. There has to be a compromise.
Here are some possible solutions. Every one has advantages and disadvantages and weighted and chosen depending on the use case and scenario.
Use atomic rename.
Advantage: atomic operation
Disadvantage: possible to not keep owner/permissions
Create a backup. Write file in place
This is what some text editor do.
Advantage: will keep owner/permissions
Disadvantage: no atomicity. Can corrupt file. Other application might get a "draft" version of the file.
Set up permissions to the folder such that creating a new file is possible with the original owner & attributes.
Advantages: atomicity & owner/permissions are kept
Disadvantages: Can be used only in certain specific scenarios (knowledge at the time of creation of the files that would be edited, the security model must allow and permit this). Can decrease security.
Create a daemon/service responsible for editing the files. This process would have the necessary permissions to create files with the respective owner & permissions. It would accept requests to edit files.
Advantages: atomicity & owner/permissions are kept. Higher and granular control to what and how can be edited.
Disadvantages. Possible in only specific scenarios. More complex to implement. Might require deployment and installation. Adding an attack surface. Adding another source of possible (security) bugs. Possible performance impact due to the added intermediate layer.
Do you have to worry about the file that's named being a symlink to a file somewhere else in the file system?
Do you have to worry about the file that's named being one of multiple links to an inode (st_nlink > 1).
Do you need to worry about extended attributes?
Do you need to worry about ACLs?
Does the user ID and group IDs of the current process permit the process to write in the directory where the file is stored?
Is there enough disk space available for both the old and the new files on the same file system?
Each of these issues complicates the operation.
Symlinks are relatively easy to deal with; you simply need to establish the realpath() to the actual file and do file creation operations in the directory containing the real path to the file. From here on, they're a non-issue.
In the simplest case, where the user (process) running the operation owns the file and the directory where the file is stored, can set the group on the file, the file has no hard links, ACLs or extended attributes, and there's enough space available, then you can get atomic operation with more or less the sequence outlined in the question — you'd do group and permission setting before executing the atomic rename() operation.
There is an outside risk of TOCTOU — time of check, time of use — problems with file attributes. If a link is added between the time when it is determined that there are no links and the rename operation, then the link is broken. If the owner or group or permissions on the file change between the time when they're checked and set on the new file, then the changes are lost. You could reduce the risk of that by breaking atomicity but renaming the old file to a temporary name, renaming the new file to the original name, and rechecking the attributes on the renamed old file before deleting it. That is probably an unnecessary complication for most people, most of the time.
If the target file has multiple hard links to it and those links must be preserved, or if the file has ACLs or extended attributes and you don't wish to work out how to copy those to the new file, then you might consider something along the lines of:
write the output to a named temporary file in the same directory as the target file;
copy the old (target) file to another named temporary file in the same directory as the target;
if anything goes wrong during steps 1 or 2, abandon the operation with no damage done;
ignoring signals as much as possible, copy the new file over the old file;
if anything goes wrong during step 4, you can recover from the extra backup made in step 2;
if anything goes wrong in step 5, report the file names (new file, backup of original file, broken file) for the user to clean up;
clean up the temporary output file and the backup file.
Clearly, this loses all pretense at atomicity, but it does preserve links, owner, group, permissions, ACLS, extended attributes. It also requires more space — if the file doesn't change size significantly, it requires 3 times the space of the original file (formally, it needs size(old) + size(new) + max(size(old), size(new)) blocks). In its favour is that it is recoverable even if something goes wrong during the final copy — even a stray SIGKILL — as long as the temporary files have known names (the names can be determined).
Automatic recovery from SIGKILL probably isn't feasible. A SIGSTOP signal could be problematic too; a lot could happen while the process is stopped.
I hope it goes without saying that errors must be detected and handled carefully with all the system calls used.
If there isn't enough space on the target file system for all the copies of the files, or if the process cannot create files in the target directory (even though it can modify the original file), you have to consider what the alternatives are. Can you identify another file system with enough space? If there isn't enough space anywhere for both the old and the new file, you clearly have major issues — irresolvable ones for anything approaching atomicity.
The answer by Nominal Animal mentions Linux capabilities. Since the question is tagged POSIX and not Linux, it isn't clear whether those are applicable to you. However, if they can be used, then CAP_LEASE sounds useful.
How crucial is atomicity vs accuracy?
How crucial is POSIX compliance vs working on Linux (or any other specific POSIX implementation)?

How do I make files save for concurrent C access?

I have several C-programs, which are accessing (read: fprintf/ write fopen) at the same time different files on the file system. What is the best way to do this concurrent access save? should I write some sort of file locks (and whats the best way to do this?) or are there any better reading methods (preferably in the C99 standard lib, additional dependencies would be a problem)? or should I use something like SQLite?
edit:
I am using Linux as operating system.
edit:
I don't really want to write with different processes in same files, I'm dealing with a legacy monolith code, which saves intermediate steps in files for recycling. I want a way to speed the calculations up by running several calculations at the same time, which have the same intermediate results.
You could use fcntl() with F_SETLK or F_SETLKW:
struct flock lock;
...
fcntl( fd, F_SETLKW, &lock );
See more from man page fcntl(3) or this article.
You can make sure that your files do not get corrupted on concurrent writes from multiple threads/processes by using copy-on-the-write technique:
A writer opens the file it would like to update for reading.
The writer creates a new file with a unique name (mkostemps) and copies the original file into the copy.
The writer modifies the copy.
The writer renames the copy to the original name using rename. This happens atomically, so that users of the file either see the old version of it or the new, but never a partially updated file.
See Things UNIX can do atomically for more details.

Atomically write 64kB

I need to write something like 64 kB of data atomically in the middle of an existing file. That is all, or nothing should be written. How to achieve that in Linux/C?
I don't think it's possible, or at least there's not any interface that guarantees as part of its contract that the write would be atomic. In other words, if there is a way that's atomic right now, that's an implementation detail, and it's not safe to rely on it remaining that way. You probably need to find another solution to your problem.
If however you only have one writing process, and your goal is that other processes either see the full write or no write at all, you can just make the changes in a temporary copy of the file and then use rename to atomically replace it. Any reader that already had a file descriptor open to the old file will see the old contents; any reader opening it newly by name will see the new contents. Partial updates will never be seen by any reader.
There are a few approaches to modify file contents "atomically". While technically the modification itself is never truly atomic, there are ways to make it seem atomic to all other processes.
My favourite method in Linux is to take a write lease using fcntl(fd, F_SETLEASE, F_WRLCK). It will only succeed if fd is the only open descriptor to the file; that is, nobody else (not even this process) has the file open. Also, the file must be owned by the user running the process, or the process must run as root, or the process must have the CAP_LEASE capability, for the kernel to grant the lease.
When successful, the lease owner process gets a signal (SIGIO by default) whenever another process is opening or truncating the file. The opener will be blocked by the kernel for up to /proc/sys/fs/lease-break-time seconds (45 by default), or until the lease owner releases or downgrades the lease or closes the file, whichever is shorter. Thus, the lease owner has dozens of seconds to complete the "atomic" operation, without any other process being able to see the file contents.
There are a couple of wrinkles one needs to be aware of. One is the privileges or ownership required for the kernel to allow the lease. Another is the fact that the other party opening or truncating the file will only be delayed; the lease owner cannot replace (hardlink or rename) the file. (Well, it can, but the opener will always open the original file.) Also, renaming, hardlinking, and unlinking/deleting the file does not affect the file contents, and therefore are not affected at all by file leases.
Remember also that you need to handle the signal generated. You can use fcntl(fd, F_SETSIG, signum) to change the signal. I personally use a trivial signal handler -- one with an empty body -- to catch the signal, but there are other ways too.
A portable method to achieve semi-atomicity is to use a memory map using mmap(). The idea is to use memmove() or similar to replace the contents as quickly as possible, then use msync() to flush the changes to the actual storage medium.
If the memory map offset in the file is a multiple of the page size, the mapped pages reflect the page cache. That is, any other process reading the file, in any way -- mmap() or read() or their derivatives -- will immediately see the changes made by the memmove(). The msync() is only needed to make sure the changes are also stored on disk, in case of a system crash -- it is basically equivalent to fsync().
To avoid preemption (kernel interrupting the action due to the current timeslice being up) and page faults, I'd first read the mapped data to make sure the pages are in memory, and then call sched_yield(), before the memmove(). Reading the mapped data should fault the pages into page cache, and sched_yield() releases the rest of the timeslice, making it extremely likely that the memmove() is not interrupted by the kernel in any way. (If you do not make sure the pages are already faulted in, the kernel will likely interrupt the memmove() for each page separately. You won't see that in the process, but other processes see the modifications to occur in page-sized chunks.)
This is not exactly atomic, but it is practical: it does not give you any guarantees, only makes the race window very very short; therefore I call this semi-atomic.
Note that this method is compatible with file leases. One could try to take a write lease on the file, but fall back to leaseless memory mapping if the lease is not granted within some acceptable time period, say a second or two. I'd use timer_create() and timer_settime() to create the timeout timer, and the same empty-body signal handler to catch the SIGALRM signal; that way the fcntl() is interrupted (returns -1 with errno == EINTR) when the timeout occurs -- with the timer interval set to some small value (say 25000000 nanoseconds, or 0.025 seconds) so it repeats very often after that, interrupting syscalls if the initial interrupt is missed for any reason.
Most userspace applications create a copy of the original file, modify the contents of the copy, then replace the original file with the copy.
Each process that opens the file will only see complete changes, never a mix of old and new contents. However, anyone keeping the file open, will only see their original contents, and not be aware of any changes (unless they check themselves). Most text editors do check, but daemons and other processes do not bother.
Remember that in Linux, the file name and its contents are two separate things. You can open a file, unlink/remove it, and still keep reading and modifying the contents for as long as you have the file open.
There are other approaches, too. I do not want to suggest any specific approach, because the optimal one depends heavily on the circumstances: Do the other processes keep the file open, or do they always (re)open it before reading the contents? Is atomicity preferred or absolutely required? Is the data plain text, structured like XML, or binary?
EDITED TO ADD:
Please note that there are no ways to guarantee beforehand that the file will be successfully modified atomically. Not in theory, and not in practice.
You might encounter a write error with the disk full, for example. Or the drive might hiccup at just the wrong moment. I'm only listing three practical ways to make it seem atomic in typical use cases.
The reason write leases are my favourite is that I can always use fcntl(fd,F_GETLEASE,&ptr) to check whether the lease is still valid or not. If not, then the write was not atomic.
High system load is unlikely to cause the lease to be broken for a 64k write, if the same data has been read just prior (so that it will likely be in page cache). If the process has superuser privileges, you can use setpriority(PRIO_PROCESS,getpid(),-20) to temporarily raise the process priority to maximum while taking the file lease and modifying the file. If the data to be overwritten has just been read, it is extremely unlikely to be moved to swap; thus swapping should not occur, either.
In other words, while it is quite possible for the lease method to fail, in practice it is almost always successful -- even without the extra tricks mentioned in this addendum.
Personally, I simply check if the modification was not atomic, using the fcntl() call after the modification, prior to msync()/fsync() (making sure the data hits the disk in case a power outage occurs); that gives me an absolutely reliable, trivial method to check whether the modification was atomic or not.
For configuration files and other sensitive data, I too recommend the rename method. (Actually, I prefer the hardlink approach used for NFS-safe file locking, which amounts to the same thing but uses a temporary name to detect naming races.) However, it has the problem that any process keeping the file open will have to check and reopen the file, voluntarily, to see the changed contents.
Disk writes cannot be atomic without a layer of abstraction. You should keep a journal and revert if a write is interrupted.
As far as I know a write below the size of PIPE_BUF is atomic. However I never rely on this. If the programs that access the file are written by you, you can use flock() to achieve exclusive access. This system call sets a lock on the file and allows other processes that know about the lock to get access or not.

Resources