Suppose streams A and B both are seeded from baseline bl_C_5 of some stream C. From there parallel developments goes in A and B and some point the baseline at A is bl_A_3 and in B is bl_B_6. At this point is it posible to deliver bl_B_6 from B to A so that A will now contain an exact copy of bl_B_6 (without being merged with bl_A_3)? In other words is it possible to completely overwrite a stream with delivery operation? If not, is there any other way or command for achieving this?
A "copy merge" isn't supported at the deliver operation (it can be supported at the type level, by associating a "copy-merge" policy, but that isn't practical when you want to merge everything)
One way to quickly do a "copy-merge" is to:
start the deliver as usual
instead of completing right away, start a clearfsimport -rmname from a view on the source Stream to the destination view (see this question for instance, and man clearfsimport): you will get a mirror image of Stream B on stream A.
complete the deliver
checkin whatever file is still in checkout.
Related
I have a service A which constantly updates a set of files in an S3 bucket.
More or less, it is equivalent to something like this:
while true
do
generate file
aws cp s3 <file> <bucket>/<file>
sleep a little
done
I have a service B which reads that file once in a while to update the data inside itself. I want a single instance of service A while service B runs 100 instances.
So service A has an equivalent to:
while true
do
aws cp s3 <bucket>/<file> <file>
update variable holding this data
sleep a little
done
At the moment, the <file> name always remains the same. I'm wondering whether this can cause issues. When happens when I upload a file version of the file? Is the old version still available until the copy in service B is done, or does the file get overwritten by service A?
i.e. under all operating systems I know of, if I write to a file, a read at the same location sees the new data, not the old one. In other words, in case of a standard OS file, the read may see mangled data (a mix of old and new data).
Are S3 files the same as standard OS files, or are they safer in this case or not overwritten until an upload is done?
Note: I'm particularly interested in having an official S3 document about how this specific case works. My searches have, so far, come empty.
The answer is in the Amazon S3 User Guide in the Consistency Model.
Here is the pertinent paragraph:
Updates to a single key are atomic. For example, if you make a PUT request to an existing key from one thread and perform a GET request on the same key from a second thread concurrently, you will get either the old data or the new data, but never partial or corrupt data.
This clearly says that the data you GET will not be overwritten as in the case of a standard file. You also won't know whether it is the old or new instance (unless you define some metadata or have a date or serial number in the file).
However, when dealing with large files, the API automatically switches to multi-part uploads and that means you may end up copying part of the old file and parts of the new file. To avoid the issue, you must make sure to do a copy without using multiparts.
So, the normal POSIX way to safely, atomically replace the contents of a file is:
fopen(3) a temporary file on the same volume
fwrite(3) the new contents to the temporary file
fflush(3)/fsync(2) to ensure the contents are written to disk
fclose(3) the temporary file
rename(2) the temporary file to replace the target file
However, on my Linux system (Ubuntu 16.04 LTS), one consequence of this process is that the ownership and permissions of the target file change to the ownership and permissions of the temporary file, which default to uid/gid and current umask.
I thought I would add code to stat(2) the target file before overwriting, and fchown(2)/fchmod(2) the temporary file before calling rename, but that can fail due to EPERM.
Is the only solution to ensure that the uid/gid of the file matches the current user and group of the process overwriting the file? Is there a safe way to fall back in this case, or do we necessarily lose the atomic guarantee?
Is the only solution to ensure that the uid/gid of the file matches the current user and group of the process overwriting the file?
No.
In Linux, a process with the CAP_LEASE capability can obtain an exclusive lease on the file, which blocks other processes from opening the file for up to /proc/sys/fs/lease-break-time seconds. This means that technically, you can take the exclusive lease, replace the file contents, and release the lease, to modify the file atomically (from the perspective of other processes).
Also, a process with the CAP_CHOWN capability can change the file ownership (user and group) arbitrarily.
Is there a safe way to [handle the case where the uid or gid does not match the current process], or do we necessarily lose the atomic guarantee?
Considering that in general, files may have ACLs and xattrs, it might be useful to create a helper program, that clones the ownership including ACLs, and extended attributes, from an existing file to a new file in the same directory (perhaps with a fixed name pattern, say .new-################, where # indicate random alphanumeric characters), if the real user (getuid(), getgid(), getgroups()) is allowed to modify the original file. This helper program would have at least the CAP_CHOWN capability, and would have to consider the various security aspects (especially the ways it could be exploited). (However, if the caller can overwrite the contents, and create new files in the target directory -- the caller must have write access to the target directory, so that they can do the rename/hardlink replacement --, creating a clone file on their behalf with empty contents ought to be safe. I would personally exclude target files owned by root user or group, though.)
Essentially, the helper program would behave much like the mktemp command, except it would take the path to the existing target file as a parameter. It would then be relatively straightforward to wrap it into a library function, using e.g. fork()/exec() and pipes or sockets.
I personally avoid this problem by using group-based access controls: dedicated (local) group for each set. The file owner field is basically just an informational field then, indicating the user that last recreated (or was in charge of) said file, with access control entirely based on the group. This means that changing the mode and the group id to match the original file suffices. (Copying ACLs would be even better, though.) If the user is a member of the target group, they can do the fchown() to change the group of any file they own, as well as the fchmod() to set the mode, too.
I am by no means an expert in this area, but I don't think it's possible. This answer seems to back this up. There has to be a compromise.
Here are some possible solutions. Every one has advantages and disadvantages and weighted and chosen depending on the use case and scenario.
Use atomic rename.
Advantage: atomic operation
Disadvantage: possible to not keep owner/permissions
Create a backup. Write file in place
This is what some text editor do.
Advantage: will keep owner/permissions
Disadvantage: no atomicity. Can corrupt file. Other application might get a "draft" version of the file.
Set up permissions to the folder such that creating a new file is possible with the original owner & attributes.
Advantages: atomicity & owner/permissions are kept
Disadvantages: Can be used only in certain specific scenarios (knowledge at the time of creation of the files that would be edited, the security model must allow and permit this). Can decrease security.
Create a daemon/service responsible for editing the files. This process would have the necessary permissions to create files with the respective owner & permissions. It would accept requests to edit files.
Advantages: atomicity & owner/permissions are kept. Higher and granular control to what and how can be edited.
Disadvantages. Possible in only specific scenarios. More complex to implement. Might require deployment and installation. Adding an attack surface. Adding another source of possible (security) bugs. Possible performance impact due to the added intermediate layer.
Do you have to worry about the file that's named being a symlink to a file somewhere else in the file system?
Do you have to worry about the file that's named being one of multiple links to an inode (st_nlink > 1).
Do you need to worry about extended attributes?
Do you need to worry about ACLs?
Does the user ID and group IDs of the current process permit the process to write in the directory where the file is stored?
Is there enough disk space available for both the old and the new files on the same file system?
Each of these issues complicates the operation.
Symlinks are relatively easy to deal with; you simply need to establish the realpath() to the actual file and do file creation operations in the directory containing the real path to the file. From here on, they're a non-issue.
In the simplest case, where the user (process) running the operation owns the file and the directory where the file is stored, can set the group on the file, the file has no hard links, ACLs or extended attributes, and there's enough space available, then you can get atomic operation with more or less the sequence outlined in the question — you'd do group and permission setting before executing the atomic rename() operation.
There is an outside risk of TOCTOU — time of check, time of use — problems with file attributes. If a link is added between the time when it is determined that there are no links and the rename operation, then the link is broken. If the owner or group or permissions on the file change between the time when they're checked and set on the new file, then the changes are lost. You could reduce the risk of that by breaking atomicity but renaming the old file to a temporary name, renaming the new file to the original name, and rechecking the attributes on the renamed old file before deleting it. That is probably an unnecessary complication for most people, most of the time.
If the target file has multiple hard links to it and those links must be preserved, or if the file has ACLs or extended attributes and you don't wish to work out how to copy those to the new file, then you might consider something along the lines of:
write the output to a named temporary file in the same directory as the target file;
copy the old (target) file to another named temporary file in the same directory as the target;
if anything goes wrong during steps 1 or 2, abandon the operation with no damage done;
ignoring signals as much as possible, copy the new file over the old file;
if anything goes wrong during step 4, you can recover from the extra backup made in step 2;
if anything goes wrong in step 5, report the file names (new file, backup of original file, broken file) for the user to clean up;
clean up the temporary output file and the backup file.
Clearly, this loses all pretense at atomicity, but it does preserve links, owner, group, permissions, ACLS, extended attributes. It also requires more space — if the file doesn't change size significantly, it requires 3 times the space of the original file (formally, it needs size(old) + size(new) + max(size(old), size(new)) blocks). In its favour is that it is recoverable even if something goes wrong during the final copy — even a stray SIGKILL — as long as the temporary files have known names (the names can be determined).
Automatic recovery from SIGKILL probably isn't feasible. A SIGSTOP signal could be problematic too; a lot could happen while the process is stopped.
I hope it goes without saying that errors must be detected and handled carefully with all the system calls used.
If there isn't enough space on the target file system for all the copies of the files, or if the process cannot create files in the target directory (even though it can modify the original file), you have to consider what the alternatives are. Can you identify another file system with enough space? If there isn't enough space anywhere for both the old and the new file, you clearly have major issues — irresolvable ones for anything approaching atomicity.
The answer by Nominal Animal mentions Linux capabilities. Since the question is tagged POSIX and not Linux, it isn't clear whether those are applicable to you. However, if they can be used, then CAP_LEASE sounds useful.
How crucial is atomicity vs accuracy?
How crucial is POSIX compliance vs working on Linux (or any other specific POSIX implementation)?
I'm developing a little software in C that reads and writes messages in a notice-board. Every message is a .txt named with a progressive number.
The software is multithreading, with many users that can do concurrent operations.
The operations that a user can do are:
Read the whole notice-board (concatenation of all the .txt file contents)
Add a message (add a file named "id_max++.txt")
Remove a message. When a message is removed there will be a hole in that number (e.g, "1.txt", "2.txt", "4.txt") that will never be filled up.
Now, I'd like to know if there is some I/O problem (*) that I should manage (and how) or the OS (Unix-like) does it all by itself.
(*) such as 2 users that want to read and delete the same file
As you have an Unix-like, OS will take care of deleting a file while it is open by another thread : the directory entry is immediately removed, and the file itself (inode) is deleted on last close.
The only problem I can see is between the directory scan and the open of a file : race conditions could make that the file has been deleted.
IMHO you simply must considere that an error file does not exist is normal, and simply go to next file.
What you describe is not really bad, since it is analog to MH folders for mails, and it can be accessed by many different processes, even if locking is involved. But depending on the load and on the size of the messages, you could considere using a database. Rule of thumb (my opinion) :
few concurrent accesses and big files : keep on using file system
many accesses and small files (several ko max.) : use a database
Of course, you must use a mutex protected routine to find next number when creating a new message (credits should be attributed to #merlin2011 for noticing the problem).
You said in a comment that your specs do not allow a database. On the analogy with mail handling, you could alse use a single file (like traditionnal mail format) :
one single file
each message is preceded with a fixed size header saying whether it is active or deleted
read access need not be synchronized
write accesses must be synchronized
It would be a poor man's database where all synchronization is done by hand, but you have only one file descriptor per thread and save all open and close operations. It makes sense where there are many reads and few writes or deletes
A possible improvement would be (still like mail readers do) to build an index with the offset and status of each message. The index could be on disk or in memory depending on your requirements.
The easier solution is to use a database like sqlite or MySQL, both of which provide transactions that you can use ot achieve consistency. If you still want to go down the route, read on.
The issue is not an IO problem, it's a concurrency problem if you do not implement proper monitors. Consider the following scenario (it is not the only problematic one, but it is one example of one).
User 1 reads the maximum id and stores it in a local variable.
Meanwhile, User 2 reads the same maximum id and stores it in a local variable also.
User 1 writes first, and then User 2 overwrites what User 1 just wrote, because it had the same idea of what the maximum id was.
This particular scenario can be solved by keeping the current maximum id as a variable that is initialized when the program is initialized, and protecting the get_and_increment operation with a lock. However, this is not the only problematic scenario that you will need to reason through if you go with this approach.
I have a list of strings that I want to output to different files according to a key for each file(this key is present in the list, so if this key is 1 in certain node then the string needs to be written to file 1.txt and if the key is 2 then the output should be redirected to 2.txt and so on...).
What I was thinking, is to assign each list member a unique key which makes it a unique record, and then spawn multiple threads depending on the number of processors available in the system. The thread redirect the output of a node from a pool of nodes(that is my list) to the concerned file. I was skeptical whether this is a good design for batch processing. Or should I just have one thread to do the whole output thingy.
ps - Before I get bashed or anything let me tell you I am just a curious learner.
Make it single threaded. Then run, find what your bottleneck is. If you find out, that your bottleneck is CPU and not disk IO, then enable parallel processing.
As I understand your processing steps is:
select file by the key
write item to file
I think this is not the case when parallel processing can result in performance improvement. If you want to speed up this code - use buffering and asynchronous I/O.
for each file maintain a flag - write-in-progress
when you want to write something to file - check this flag
if write-in-progress is False:
set write-in-progress = True
add your item to buffer
start writing this buffer to file asynchronously
if write-in-progress is True:
add your item to buffer
when pending asynchronous operation is completed
check is there is nonempty buffer, if so start async write
There is more simple approach: use buffering and synchronous I/O. It will be slower than asynchronous approach described above, but not very much. You can start several thread and traverse list in each thread independently. Each thread must process only some unique set of keys. For example, you can use two threads, first thread must write only items with odd keys, second thread must write only items with even keys.
You need a concurrency model for that - however serious it sounds :)
First analyze what can be done at the same time and is unrelated to each other. Imagine each step of your program is executed on different machine with a sort of communication between, e.g. IP network.
Then draw a flow between these instances (actions/machines). Mark what resources actions need to perform, e.g. a list, a file. Mark resources as separate instances (same as actions and machines).
Put the file system in your picture to see if writing separate files may be sped up or it will end in the file system and thus it will be serialized again.
Connect the instances. And see if you get any benefit. It could look like that:
list
|
list reader
/ \ \
/ \ ----------\
file file file
writer writer writer
| | |
file 1 file 2 file 3
\ / |
\ / |
file system 1 file system 2
In the example you can see that it may make sense to get some parallel execution
I have an algorithm that takes 7 days to Run To Completion (and few more algorithms too)
Problem: In order to successfully Run the program, I need continuous power supply. And if out of luck, there is a power loss in the middle, I need to restart it again.
So I would like to ask a way using which I can make my program execute in phases (say each phase generates Results A,B,C,...) and now in case of a power loss I can some how use this intermediate results and continue/Resume the Run from that point.
Problem 2: How will i prevent a file from re opening every time a loop iterates ( fopen was placed in a loop that runs nearly a million times , this was needed as the file is being changed with each iteration)
You can separate it in some source files, and use make.
When each result phase is complete, branch off to a new universe. If the power fails in the new universe, destroy it and travel back in time to the point at which you branched. Repeat until all phases are finished, and then merge your results into the original universe via a transcendental wormhole.
Well, couple of options, I guess:
You split your algorithm along sensible lines with this a defined output from a phase that can be the input to the next phase. Then, configure your algorithm as a workflow (ideally soft-configured through some declaration file.
You add logic to your algorithm by which it knows what it has successfully completed (commited). Then, on failure, you can restart the algorithm and it bins all uncommitted data and restarts from the last commit point.
Note that both these options may draw out your 7hr run time further!
So, to improve the overall runtime, could you also separate your algorithm so that it has "worker" components that can work on "jobs" in parallel. This usually means drawing out some "dumb" but intensive logic (such as a computation) that can be parameterised. Then, you have the option of running your algorithm on a grid/ space/ cloud/ whatever. At least you have options to reduce the run time. Doesn't even need to be a space... just use queues (IBM MQ Series has a C interface) and just have listeners on other boxes listening to your jobs queue and processing your results before persisting the results. You can still phase the algorithm as discussed above too.
Problem 2: Opening the file on each iteration of the loop because it's changed
I may not be best qualified to answer this but doing fopen on each iteration (and fclose) presumably seems wasteful and slow. To answer, or have anyone more qualified answer, I think we'd need to know more about your data.
For instance:
Is it text or binary?
Are you processing records or a stream of text? That is, is it a file of records or a stream of data? (you aren't cracking genes are you? :-)
I ask as, judging by your comment "because it's changed each iteration", would you be better using a random-accessed file. By this, I'm guessing you're re-opening to fseek to a point that you may have passed (in your stream of data) and making a change. However, if you open a file as binary, you can fseek through anywhere in the file using fsetpos and fseek. That is, you can "seek" backwards.
Additionally, if your data is record-based or somehow organised, you could also create an index for it. with this, you could use to fsetpos to set the pointer at the index you're interested in and traverse. Thus, saving time in finding the area of data to change. You could even persist your index in an accompanying index file.
Note that you can write plain text to a binary file. Perhaps worth investigating?
Sounds like classical batch processing problem for me.
You will need to define checkpoints in your application and store the intermediate data until a checkpoint is reached.
Checkpoints could be the row number in a database, or the position inside a file.
Your processing might take longer than now, but it will be more reliable.
In general you should think about the bottleneck in your algo.
For problem 2, you must use two files, it might be that your application will be days faster, if you call fopen 1 million times less...