I have a service A which constantly updates a set of files in an S3 bucket.
More or less, it is equivalent to something like this:
while true
do
generate file
aws cp s3 <file> <bucket>/<file>
sleep a little
done
I have a service B which reads that file once in a while to update the data inside itself. I want a single instance of service A while service B runs 100 instances.
So service A has an equivalent to:
while true
do
aws cp s3 <bucket>/<file> <file>
update variable holding this data
sleep a little
done
At the moment, the <file> name always remains the same. I'm wondering whether this can cause issues. When happens when I upload a file version of the file? Is the old version still available until the copy in service B is done, or does the file get overwritten by service A?
i.e. under all operating systems I know of, if I write to a file, a read at the same location sees the new data, not the old one. In other words, in case of a standard OS file, the read may see mangled data (a mix of old and new data).
Are S3 files the same as standard OS files, or are they safer in this case or not overwritten until an upload is done?
Note: I'm particularly interested in having an official S3 document about how this specific case works. My searches have, so far, come empty.
The answer is in the Amazon S3 User Guide in the Consistency Model.
Here is the pertinent paragraph:
Updates to a single key are atomic. For example, if you make a PUT request to an existing key from one thread and perform a GET request on the same key from a second thread concurrently, you will get either the old data or the new data, but never partial or corrupt data.
This clearly says that the data you GET will not be overwritten as in the case of a standard file. You also won't know whether it is the old or new instance (unless you define some metadata or have a date or serial number in the file).
However, when dealing with large files, the API automatically switches to multi-part uploads and that means you may end up copying part of the old file and parts of the new file. To avoid the issue, you must make sure to do a copy without using multiparts.
Related
Does the maxTransactionSize set in the NetworkParameters include attachments?
I have a situation where I need to send a large file as an attachment and I want to make sure it's less than the maxTransactionSize enforced by the network params.
maxTransactionSize does include attachments when calculated.
If you want to take a look for yourself, the calculation is done in - WireTransaction.checkTransactionSize: https://github.com/corda/corda/blob/master/core/src/main/kotlin/net/corda/core/transactions/WireTransaction.kt
Unfortunately, the maxTransactionSize is hard coded at 10mb for second for the time being:
#property maxMessageSize This is currently ignored. However, it will be wired up in a future release.
This is an interesting implementation because the file is NOT sent in the initial transaction. In fact, only a hash of the file is included in the transaction itself. Files are only sent to the requesting node when needed.
Still, the issue with a large maxTransactionSize is that all nodes must read the entire file into RAM will cause some nodes to fail to verify massive transactions if they have insufficient RAM.
Work is being done to chunk files and enable attachments of arbitrary size!
So, the normal POSIX way to safely, atomically replace the contents of a file is:
fopen(3) a temporary file on the same volume
fwrite(3) the new contents to the temporary file
fflush(3)/fsync(2) to ensure the contents are written to disk
fclose(3) the temporary file
rename(2) the temporary file to replace the target file
However, on my Linux system (Ubuntu 16.04 LTS), one consequence of this process is that the ownership and permissions of the target file change to the ownership and permissions of the temporary file, which default to uid/gid and current umask.
I thought I would add code to stat(2) the target file before overwriting, and fchown(2)/fchmod(2) the temporary file before calling rename, but that can fail due to EPERM.
Is the only solution to ensure that the uid/gid of the file matches the current user and group of the process overwriting the file? Is there a safe way to fall back in this case, or do we necessarily lose the atomic guarantee?
Is the only solution to ensure that the uid/gid of the file matches the current user and group of the process overwriting the file?
No.
In Linux, a process with the CAP_LEASE capability can obtain an exclusive lease on the file, which blocks other processes from opening the file for up to /proc/sys/fs/lease-break-time seconds. This means that technically, you can take the exclusive lease, replace the file contents, and release the lease, to modify the file atomically (from the perspective of other processes).
Also, a process with the CAP_CHOWN capability can change the file ownership (user and group) arbitrarily.
Is there a safe way to [handle the case where the uid or gid does not match the current process], or do we necessarily lose the atomic guarantee?
Considering that in general, files may have ACLs and xattrs, it might be useful to create a helper program, that clones the ownership including ACLs, and extended attributes, from an existing file to a new file in the same directory (perhaps with a fixed name pattern, say .new-################, where # indicate random alphanumeric characters), if the real user (getuid(), getgid(), getgroups()) is allowed to modify the original file. This helper program would have at least the CAP_CHOWN capability, and would have to consider the various security aspects (especially the ways it could be exploited). (However, if the caller can overwrite the contents, and create new files in the target directory -- the caller must have write access to the target directory, so that they can do the rename/hardlink replacement --, creating a clone file on their behalf with empty contents ought to be safe. I would personally exclude target files owned by root user or group, though.)
Essentially, the helper program would behave much like the mktemp command, except it would take the path to the existing target file as a parameter. It would then be relatively straightforward to wrap it into a library function, using e.g. fork()/exec() and pipes or sockets.
I personally avoid this problem by using group-based access controls: dedicated (local) group for each set. The file owner field is basically just an informational field then, indicating the user that last recreated (or was in charge of) said file, with access control entirely based on the group. This means that changing the mode and the group id to match the original file suffices. (Copying ACLs would be even better, though.) If the user is a member of the target group, they can do the fchown() to change the group of any file they own, as well as the fchmod() to set the mode, too.
I am by no means an expert in this area, but I don't think it's possible. This answer seems to back this up. There has to be a compromise.
Here are some possible solutions. Every one has advantages and disadvantages and weighted and chosen depending on the use case and scenario.
Use atomic rename.
Advantage: atomic operation
Disadvantage: possible to not keep owner/permissions
Create a backup. Write file in place
This is what some text editor do.
Advantage: will keep owner/permissions
Disadvantage: no atomicity. Can corrupt file. Other application might get a "draft" version of the file.
Set up permissions to the folder such that creating a new file is possible with the original owner & attributes.
Advantages: atomicity & owner/permissions are kept
Disadvantages: Can be used only in certain specific scenarios (knowledge at the time of creation of the files that would be edited, the security model must allow and permit this). Can decrease security.
Create a daemon/service responsible for editing the files. This process would have the necessary permissions to create files with the respective owner & permissions. It would accept requests to edit files.
Advantages: atomicity & owner/permissions are kept. Higher and granular control to what and how can be edited.
Disadvantages. Possible in only specific scenarios. More complex to implement. Might require deployment and installation. Adding an attack surface. Adding another source of possible (security) bugs. Possible performance impact due to the added intermediate layer.
Do you have to worry about the file that's named being a symlink to a file somewhere else in the file system?
Do you have to worry about the file that's named being one of multiple links to an inode (st_nlink > 1).
Do you need to worry about extended attributes?
Do you need to worry about ACLs?
Does the user ID and group IDs of the current process permit the process to write in the directory where the file is stored?
Is there enough disk space available for both the old and the new files on the same file system?
Each of these issues complicates the operation.
Symlinks are relatively easy to deal with; you simply need to establish the realpath() to the actual file and do file creation operations in the directory containing the real path to the file. From here on, they're a non-issue.
In the simplest case, where the user (process) running the operation owns the file and the directory where the file is stored, can set the group on the file, the file has no hard links, ACLs or extended attributes, and there's enough space available, then you can get atomic operation with more or less the sequence outlined in the question — you'd do group and permission setting before executing the atomic rename() operation.
There is an outside risk of TOCTOU — time of check, time of use — problems with file attributes. If a link is added between the time when it is determined that there are no links and the rename operation, then the link is broken. If the owner or group or permissions on the file change between the time when they're checked and set on the new file, then the changes are lost. You could reduce the risk of that by breaking atomicity but renaming the old file to a temporary name, renaming the new file to the original name, and rechecking the attributes on the renamed old file before deleting it. That is probably an unnecessary complication for most people, most of the time.
If the target file has multiple hard links to it and those links must be preserved, or if the file has ACLs or extended attributes and you don't wish to work out how to copy those to the new file, then you might consider something along the lines of:
write the output to a named temporary file in the same directory as the target file;
copy the old (target) file to another named temporary file in the same directory as the target;
if anything goes wrong during steps 1 or 2, abandon the operation with no damage done;
ignoring signals as much as possible, copy the new file over the old file;
if anything goes wrong during step 4, you can recover from the extra backup made in step 2;
if anything goes wrong in step 5, report the file names (new file, backup of original file, broken file) for the user to clean up;
clean up the temporary output file and the backup file.
Clearly, this loses all pretense at atomicity, but it does preserve links, owner, group, permissions, ACLS, extended attributes. It also requires more space — if the file doesn't change size significantly, it requires 3 times the space of the original file (formally, it needs size(old) + size(new) + max(size(old), size(new)) blocks). In its favour is that it is recoverable even if something goes wrong during the final copy — even a stray SIGKILL — as long as the temporary files have known names (the names can be determined).
Automatic recovery from SIGKILL probably isn't feasible. A SIGSTOP signal could be problematic too; a lot could happen while the process is stopped.
I hope it goes without saying that errors must be detected and handled carefully with all the system calls used.
If there isn't enough space on the target file system for all the copies of the files, or if the process cannot create files in the target directory (even though it can modify the original file), you have to consider what the alternatives are. Can you identify another file system with enough space? If there isn't enough space anywhere for both the old and the new file, you clearly have major issues — irresolvable ones for anything approaching atomicity.
The answer by Nominal Animal mentions Linux capabilities. Since the question is tagged POSIX and not Linux, it isn't clear whether those are applicable to you. However, if they can be used, then CAP_LEASE sounds useful.
How crucial is atomicity vs accuracy?
How crucial is POSIX compliance vs working on Linux (or any other specific POSIX implementation)?
I'm developing a little software in C that reads and writes messages in a notice-board. Every message is a .txt named with a progressive number.
The software is multithreading, with many users that can do concurrent operations.
The operations that a user can do are:
Read the whole notice-board (concatenation of all the .txt file contents)
Add a message (add a file named "id_max++.txt")
Remove a message. When a message is removed there will be a hole in that number (e.g, "1.txt", "2.txt", "4.txt") that will never be filled up.
Now, I'd like to know if there is some I/O problem (*) that I should manage (and how) or the OS (Unix-like) does it all by itself.
(*) such as 2 users that want to read and delete the same file
As you have an Unix-like, OS will take care of deleting a file while it is open by another thread : the directory entry is immediately removed, and the file itself (inode) is deleted on last close.
The only problem I can see is between the directory scan and the open of a file : race conditions could make that the file has been deleted.
IMHO you simply must considere that an error file does not exist is normal, and simply go to next file.
What you describe is not really bad, since it is analog to MH folders for mails, and it can be accessed by many different processes, even if locking is involved. But depending on the load and on the size of the messages, you could considere using a database. Rule of thumb (my opinion) :
few concurrent accesses and big files : keep on using file system
many accesses and small files (several ko max.) : use a database
Of course, you must use a mutex protected routine to find next number when creating a new message (credits should be attributed to #merlin2011 for noticing the problem).
You said in a comment that your specs do not allow a database. On the analogy with mail handling, you could alse use a single file (like traditionnal mail format) :
one single file
each message is preceded with a fixed size header saying whether it is active or deleted
read access need not be synchronized
write accesses must be synchronized
It would be a poor man's database where all synchronization is done by hand, but you have only one file descriptor per thread and save all open and close operations. It makes sense where there are many reads and few writes or deletes
A possible improvement would be (still like mail readers do) to build an index with the offset and status of each message. The index could be on disk or in memory depending on your requirements.
The easier solution is to use a database like sqlite or MySQL, both of which provide transactions that you can use ot achieve consistency. If you still want to go down the route, read on.
The issue is not an IO problem, it's a concurrency problem if you do not implement proper monitors. Consider the following scenario (it is not the only problematic one, but it is one example of one).
User 1 reads the maximum id and stores it in a local variable.
Meanwhile, User 2 reads the same maximum id and stores it in a local variable also.
User 1 writes first, and then User 2 overwrites what User 1 just wrote, because it had the same idea of what the maximum id was.
This particular scenario can be solved by keeping the current maximum id as a variable that is initialized when the program is initialized, and protecting the get_and_increment operation with a lock. However, this is not the only problematic scenario that you will need to reason through if you go with this approach.
If the file already exists, I want to overwrite it. If it doesn't exist, I want to create it and write to it. I'd prefer to not have to use a 3rd party library like lockfile (which seems to handle all types of locking.)
My initial idea was to:
Write to a temporary file with a randomly generated large id to avoid conflict.
Rename the temp filename -> new path name.
os.Rename calls syscall.Rename which for Linux/UNIXs uses the rename syscall (which is atomic*). On Windows syscall.Rename calls MoveFileW which assuming the source and destination are on the same device (which can be arranged) and the filesystem is NTFS (which is often the case) is atomic*.
I would take care to make sure the source and destination are on the same device so the Linux rename does not fail, and the Windows rename is actually atomic. As Dave C mentions above creating your temporary file (usually using ioutil.TempFile) in the same directory as existing file is the way to go; this is how I do my atomic renames.
This works for me in my use case which is:
One Go process gets updates and renames files to swap updates in.
Another Go process is watching for file updates with fsnotify and re-mmaps the file when it is updated.
In the above use case simply using os.Rename has worked perfectly well for me.
Some further reading:
Is rename() atomic? "Yes and no. rename() is atomic assuming the OS does not crash...."
Is an atomic file rename (with overwrite) possible on Windows?
*Note: I do want to point out that when people talk about atomic filesystem file operations, from an application perspective, they usually mean the operation happens or does not happen (which journaling can help with) from the users perspective. If you are using atomic in the sense of an atomic memory operation, very few filesystem operations (outside of direct I/O [O_DIRECT] one block writes and reads with disk buffering disabled) can be considered truly atomic.
Suppose we have an already existing file, say <File>. This file has been opened by a C program for update (r+b). We use fseek to navigate to a point inside <File>, other than the end of it. Now we start writing data using fwrite/fputc. Note that we don't delete any data previously existing in <File>...
How does the system handle those writes? Does it rewrite the whole file to another position in the Disk, now containing the new data? Does it fragment the file and write only the new data in another position (and just remember that in the middle there is some free space)? Does it actually overwrite in place only the part that has changed?
There is a good reason for asking: In the first case, if you continuously update a file, the system can get slow. In the second case, it could be faster but will mess up the File System if done to many files. In the third case, especially if you have a solid state Disk, updating the same spot of a File over and over again may render that part of the Disk useless.
Actually, that's where my question originates from. I've read that, to save Disk Sectors from overuse, Solid State Disks move Data to less used sectors, using different techniques. But how exactly does the stdio functions handle such situations?
Thanks in advance for your time! :D
The fileystem handler creates a kind of dicationary writing to sectors on the disc, so when you update the content of the file, the filesystem looks up the dictionary on the disc, which tells it, in which sector on the disc the file data is located. Then it spins (or waits until the disc arrives there) and updates the appropriate sectors on the disc.
That's the short version.
So in case, of updating the file, the file is normally not moved to a new place. When you write new data to the file, appending to it, and the data doesn't fit into the existing sector, then additional sectors are allocated and the data is written there.
If you delete a file, then usually the sectors are marked as free and are reused. So only if you open a new file and rewrite it, it can happen that the file is put in different sectors than before.
But the details can vary, depending on the hardware. AFAIK if you overwrite data on a CD, then the data is newly written (as long as the session is not finalized), because you can not update data on a CD, once it is written.
Your understanding is incorrect: "Note that we don't delete any data previously existing in File"
If you seek into the middle of a file and start writing it will write over whatever was at that position before.
How this is done under the covers probably depends on how computer in the hard disk implements it. It's supposed to be invisible outside the hard disk and shouldn't matter.