Is using fs::rename to move files between different file systems atomic? - file

On Linux, if we move (rename) a file from one file system to another like this:
fs::rename("/src/a", "/dest/a")?;
is it possible the file /dest/a to become visible/available to other potential readers (processes that scan /dest/ for example) before the whole file data (contents) is fully copied to the destination file system?

From the docs
This will not work if the new name is on a different mount point.
So you shouldn't be using fs::rename to move between mount points. It's a simple filesystem rename and is not actually moving any data (which is why you can fs::rename a 2-terabyte file fairly quickly), so it only works if the source and destination are on the same filesystem.
If the source and destination are on the same mount point, then the answer is no. It's not possible for someone to read it before it's fully available, since, again, no data is actually transferred: it's just a single operation of "this pointer now points here and this one doesn't exist anymore".

Related

How do I get the disk addresses of files in C/C++?

When a file is saved into a drive, its contents are written & then indexed. I want to get the indexes and to access the raw contents of the files.
Any idea on the method how to do it, especially for ex4 & btrfs?
UPDATE: I want to get the addresses of the extents of a file. The information about the addresses must be stored somewhere onto the disk. I want to retrieve this info, in order to map the physical location of the file contents. Any methods in order to achieve that?
UPDATE: Hello, all! Thanks for your replies. What I want is a function/command which returns me a list of extent addresses. debugfs seems the function/command with the most-relevant functionality.
It depends of the filesystem you are using. If you are running Linux you can use debufs to seek the file in the filesystem.
I have to say that all FSs are mounted through a VFS, a virtual filesystem that is like a simplified interface with the standard operations (open, close, read...). What is the meaning of that? No filesystem nor its contents(files, dirs) are opened directly from disk, when you open something, you move it to the main memory(your RAM) you do your operations and when you close something it returns to the disk drive.
Now, the question is: Can I get the absolute address in a FS? Yes, if you open your whole filesystem like open ("/dev/sdaX", 0_RDONLY); so you get the address relative to your filesystem using lseek in C for example.
And then... Can I get the same in the whole drive? No, that is because you cannot open the whole drive as a file descriptor. Remember /dev/sdaXin UNIX? Partitions and its can be opened like files because they have a virtual interface running on them.
Your last answer: Can I read really raw contents? All files are read as they appear on disk, the only thing that changes is the descriptor used by the OS and some data about how is indexed, all this as a "file header".
I hope all your questions are answered.
The current solution/workaround is to call these functions with popen:
filefrag -e /path/to/file
hdparm --fibmap /path/to/filename
Then one should simply parse the stringoutputs of these programs. It is not a real solution (i.e.: outputs at C/C++ level), but I'll accept it for now.
Sources:
https://unix.stackexchange.com/questions/106802/what-command-do-i-use-to-see-the-start-and-end-block-of-a-file-in-the-file-syste
https://serverfault.com/questions/29886/how-do-i-list-a-files-data-blocks-on-linux

How to make atomic operation with both file system and database in Postgres?

I think the following should be a pretty common pattern :
A database is used to store file paths
The files themselves are stored in the file system
Issues may occur when say we want to modify a file path : we need to both modify
the database file path and to move the file in the filesystem. It is important that this is done "atomically". Indeed, while we are doing the modification, another process may attempt to read the file path in the datadase and then tries to access the file in the file system. We should make sure that the tuple
("file path", "actual file location")
remains consistant all the time.
Is there a canonical/simple way to achieve this with Postgres/Linux ?
One of the major features of the database is that the processes see it consistently. That also means that different clients see different state of the database.
This means that when you correct a file path in the database and commit the change any transactions that started before the commit can see the old path for some time after the commit.
So actually to make sure nobody would try to read the old file path you have to wait until all transactions from before the commit would end. That can take milliseconds or, in extreme situations, days. If you have a
I'd try to implement the following scheme (pseudocode):
sql("begin")
os.hardlink(old_path, new_path)
sql("update files set path=? where path=?, new_path, old_path)
sql("insert into files_to_clean values (?, txid_current())", old_path)
sql("commit")
if random()<CLEANUP_PROBABILITY:
sql("begin")
for delete_path in sql("
delete from files_to_clean
where txid<txid_snapshot_xmin(txid_current_snapshot())
returning path skip locked
"):
os.delete(delete_path)
sql("commit")

How to create a virtual file with inode that has references to storage in memory

Let me explain clearly.
The following is my requirement:
Let's say there is a command which has an option specified as '-f' that takes a filename as argument.
Now I have 5 files and I want to create a new file merging those 5 files and give the new filename as argument for the above command.
But there is a difference between
reading a single file and
merging all files & reading the merged file.
There is more IO (read from 5 files + write to the merged file + any IO our command does with the given file) generated in the second case than IO (any IO our command does with the given file) generated in the first case.
Can we reduce this unwanted IO?
In the end, I really don't want the merged file at all. I only create this merged file just to let the command read the merged files content.
And to say, I also don't want this implementation. The file sizes are not so big and it is okay to have that extra negligible IO. But, I am just curious to know if this can be done.
So in order to implement this, I have following understanding/questions:
Generally what all the commands (that takes the filename argument) does is it reads the file.
In our case, the filename(filepath) is not ready, it's just an virtual/imaginary filename that exists (as the mergation of all files).
So, can we create such virtual filename?
What is a filename? It's an indirect inode entry for a storage location.
In our case, the individual files have different inode entries and all inode entries have different storage locations. And our virtual/imaginary file has in fact no inode and even if we could create an imaginary inode, that can only point to a storage in memory (as there is no reference to the storage location of another file from a storage location of one file in disk)
But, let's say using advanced programming, we are able to create an imaginary filepath with imaginary inode, that points to a storage in memory.
Now, when we give that imaginary filename as argument and when the command tries to open that imaginary file, it finds that it's inode entry is referring to a storage in memory. But the actual content is there in disk and not in the memory. So, the data is not loaded into memory yet, unless we read it explicitly. Hence, again we would need to read the data first.
Simply saying, as there is no continuity or references at storage in disk to the next file data, the merged data needs to be loaded to memory first.
So, with my deduction, it seems we would at least need to put the data in memory. However, as the command itself would need the file to be read (if not the whole file, at least a part of it until the commands's operation is done - let it be parsing or whatever). So, using this method, we could save some significant IO, if it's really a big file.
So, how can we create that virtual file?
My first answer is to write the merged file to tmpfs and refer to that file. But is it the only option or can we actually point to a storage location in memory, other than tmpfs? tmpfs is not option because, my script can be run from any server and we need to have a solution that work from all servers. If I mention to create merged file at /dev/shm in my script, it may fail in the server where it doesn't have /dev/shm. So I should be able to load to memory directly. But I think normal user will not have access to memory and so, it seems can not be done without shm.
Please let me know your comments and also kindly correct me if my understanding anywhere is wrong. Even if it is complicated for my level, kindly post your answer. At least, I might understand it after few months.
Create a fifo (named pipe) and provide its name as an argument to your program. The process that combines the five input files writes to this fifo
mkfifo wtf
cat file1 file2 file3 file4 file5 > wtf # this will block...
[from another terminal] cp wtf omg
Here I used cp as your program, and cat as the program combining the five files. You will see that omg will contain the output of your program (here: cp) and that the first terminal will unblock after the program is done.
Your program (here:cp) is not even aware that its 1st argument wtf refers to a fifo; it just opens it and reads from it like it would do with an ordinary file. (this will fail if the program attempts to seek in the file; seek() is not implemented for pipes and fifos)

Safely writing to and reading from the same file with multiple processes on Linux and Mac OS X

I have three processes designed to run constantly in both Linux and Mac OS X environments. One process (the Downloader) downloads and stores a local copy of a large XML file every 30 seconds. Two other processes (the Workers) use the stored XML file for input. Each Worker starts and runs at random times. Since the XML file is big, it takes a long time to download. The Workers also take a long time to read and parse it.
What is the safest way to setup the processes so the Downloader doesn't clobber the stored file while the Workers are trying to read it?
For Linux and Mac OS X machines that use inode based file systems, use temporary files to store the data while its being downloaded (and is an incomplete state). Once the download is complete, move the temporary file into its final location with an atomic action.
For a little more detail, there are two main things to watch out for when one process (e.g. Downloader) writes a file that's actively read by other processes (e.g. Workers):
Make sure the Workers don't try to read the file before the Downloader has finished writing it.
Make sure the Downloader doesn't alter the file while the Workers are reading it.
Using temporary files accommodates both of these points.
For a more specific example, when the Downloader is actively pulling the XML file, have it write to a temporary location (e.g. 'data-storage.tmp') on the same device/disk* where the final file will be stored. Once the file is completely downloaded and written, have the Downloader move it to its final location (e.g. 'data-storage.xml') via an atomic (aka linearizable) rename command like bash's mv.
* Note that the reason the temporary file needs to be on the same device as the final file location is to ensure the inode number stays the same and the rename can be done atomically.
This methodology ensures that while the file is being downloaded/written the Workers won't see it since it's in the .tmp location. Because of the way renaming works with inodes, it also make sure that any Worker that opened the file continues to see the old content even if a new version of the data-storage file is put in place.
Downloader will point 'data-storage.xml' to a new inode number when it does the rename, but the Worker will continue to access 'data-storage.xml' from the previous inode number thereby continuing to work with the file in that state. At the same time, any Worker that opens a new copy 'data-storage.xml' after Downloader has done the rename will see contents from the new inode number since it's now what is referenced directly in the file system. So, two Workers can be reading from the same filename (data-storage.xml) but each will see a different (and complete) version of the contents of the file based on which inode the filename was pointed to when the file was first opened.
To see this in action, I created a simple set of example scripts that demonstrate this functionality on github. They can also be used to test/verify that using a temporary file solution works in your environment.
An important note is that it's the file system on the particular device that matters. If you are using a Linux or Mac machine but working with a FAT file system (for example, a usb thumb drive), this method won't work.

Get `df` to show updated information on FreeBSD

I recently ran out of disk space on a drive on a FreeBSD server. I truncated the file that was causing problems but I'm not seeing the change reflected when running df. When I run du -d0 on the partition it shows the correct value. Is there any way to force this information to be updated? What is causing the output here to be different?
In BSD a directory entry is simply one of many references to the underlying file data (called an inode). When a file is deleted with the rm(1) command only the reference count is decreased. If the reference count is still positive, (e.g. the file has other directory entries due to symlinks) then the underlying file data is not removed.
Newer BSD users often don't realize that a program that has a file open is also holding a reference. The prevents the underlying file data from going away while the process is using it. When the process closes the file if the reference count falls to zero the file space is marked as available. This scheme is used to avoid the Microsoft Windows type issues where it won't let you delete a file because some unspecified program still has it open.
An easy way to observe this is to do the following
cp /bin/cat /tmp/cat-test
/tmp/cat-test &
rm /tmp/cat-test
Until the background process is terminated the file space used by /tmp/cat-test will remain allocated and unavailable as reported by df(1) but the du(1) command will not be able to account for it as it no longer has a filename.
Note that if the system should crash without the process closing the file then the file data will still be present but unreferenced, an fsck(8) run will be needed to recover the filesystem space.
Processes holding files open is one reason why the newsyslog(8) command sends signals to syslogd or other logging programs to inform them they should close and re-open their log files after it has rotated them.
Softupdates can also effect filesystem freespace as the actual inode space recovery can be deferred; the sync(8) command can be used to encourage this to happen sooner.
This probably centres on how you truncated the file. du and df report different things as this post on unix.com explains. Just because space is not used does not necessarily mean that it's free...
Does df --sync work?

Resources