How do I get the disk addresses of files in C/C++? - c

When a file is saved into a drive, its contents are written & then indexed. I want to get the indexes and to access the raw contents of the files.
Any idea on the method how to do it, especially for ex4 & btrfs?
UPDATE: I want to get the addresses of the extents of a file. The information about the addresses must be stored somewhere onto the disk. I want to retrieve this info, in order to map the physical location of the file contents. Any methods in order to achieve that?
UPDATE: Hello, all! Thanks for your replies. What I want is a function/command which returns me a list of extent addresses. debugfs seems the function/command with the most-relevant functionality.

It depends of the filesystem you are using. If you are running Linux you can use debufs to seek the file in the filesystem.
I have to say that all FSs are mounted through a VFS, a virtual filesystem that is like a simplified interface with the standard operations (open, close, read...). What is the meaning of that? No filesystem nor its contents(files, dirs) are opened directly from disk, when you open something, you move it to the main memory(your RAM) you do your operations and when you close something it returns to the disk drive.
Now, the question is: Can I get the absolute address in a FS? Yes, if you open your whole filesystem like open ("/dev/sdaX", 0_RDONLY); so you get the address relative to your filesystem using lseek in C for example.
And then... Can I get the same in the whole drive? No, that is because you cannot open the whole drive as a file descriptor. Remember /dev/sdaXin UNIX? Partitions and its can be opened like files because they have a virtual interface running on them.
Your last answer: Can I read really raw contents? All files are read as they appear on disk, the only thing that changes is the descriptor used by the OS and some data about how is indexed, all this as a "file header".
I hope all your questions are answered.

The current solution/workaround is to call these functions with popen:
filefrag -e /path/to/file
hdparm --fibmap /path/to/filename
Then one should simply parse the stringoutputs of these programs. It is not a real solution (i.e.: outputs at C/C++ level), but I'll accept it for now.
Sources:
https://unix.stackexchange.com/questions/106802/what-command-do-i-use-to-see-the-start-and-end-block-of-a-file-in-the-file-syste
https://serverfault.com/questions/29886/how-do-i-list-a-files-data-blocks-on-linux

Related

Is using fs::rename to move files between different file systems atomic?

On Linux, if we move (rename) a file from one file system to another like this:
fs::rename("/src/a", "/dest/a")?;
is it possible the file /dest/a to become visible/available to other potential readers (processes that scan /dest/ for example) before the whole file data (contents) is fully copied to the destination file system?
From the docs
This will not work if the new name is on a different mount point.
So you shouldn't be using fs::rename to move between mount points. It's a simple filesystem rename and is not actually moving any data (which is why you can fs::rename a 2-terabyte file fairly quickly), so it only works if the source and destination are on the same filesystem.
If the source and destination are on the same mount point, then the answer is no. It's not possible for someone to read it before it's fully available, since, again, no data is actually transferred: it's just a single operation of "this pointer now points here and this one doesn't exist anymore".

How to create a virtual file with inode that has references to storage in memory

Let me explain clearly.
The following is my requirement:
Let's say there is a command which has an option specified as '-f' that takes a filename as argument.
Now I have 5 files and I want to create a new file merging those 5 files and give the new filename as argument for the above command.
But there is a difference between
reading a single file and
merging all files & reading the merged file.
There is more IO (read from 5 files + write to the merged file + any IO our command does with the given file) generated in the second case than IO (any IO our command does with the given file) generated in the first case.
Can we reduce this unwanted IO?
In the end, I really don't want the merged file at all. I only create this merged file just to let the command read the merged files content.
And to say, I also don't want this implementation. The file sizes are not so big and it is okay to have that extra negligible IO. But, I am just curious to know if this can be done.
So in order to implement this, I have following understanding/questions:
Generally what all the commands (that takes the filename argument) does is it reads the file.
In our case, the filename(filepath) is not ready, it's just an virtual/imaginary filename that exists (as the mergation of all files).
So, can we create such virtual filename?
What is a filename? It's an indirect inode entry for a storage location.
In our case, the individual files have different inode entries and all inode entries have different storage locations. And our virtual/imaginary file has in fact no inode and even if we could create an imaginary inode, that can only point to a storage in memory (as there is no reference to the storage location of another file from a storage location of one file in disk)
But, let's say using advanced programming, we are able to create an imaginary filepath with imaginary inode, that points to a storage in memory.
Now, when we give that imaginary filename as argument and when the command tries to open that imaginary file, it finds that it's inode entry is referring to a storage in memory. But the actual content is there in disk and not in the memory. So, the data is not loaded into memory yet, unless we read it explicitly. Hence, again we would need to read the data first.
Simply saying, as there is no continuity or references at storage in disk to the next file data, the merged data needs to be loaded to memory first.
So, with my deduction, it seems we would at least need to put the data in memory. However, as the command itself would need the file to be read (if not the whole file, at least a part of it until the commands's operation is done - let it be parsing or whatever). So, using this method, we could save some significant IO, if it's really a big file.
So, how can we create that virtual file?
My first answer is to write the merged file to tmpfs and refer to that file. But is it the only option or can we actually point to a storage location in memory, other than tmpfs? tmpfs is not option because, my script can be run from any server and we need to have a solution that work from all servers. If I mention to create merged file at /dev/shm in my script, it may fail in the server where it doesn't have /dev/shm. So I should be able to load to memory directly. But I think normal user will not have access to memory and so, it seems can not be done without shm.
Please let me know your comments and also kindly correct me if my understanding anywhere is wrong. Even if it is complicated for my level, kindly post your answer. At least, I might understand it after few months.
Create a fifo (named pipe) and provide its name as an argument to your program. The process that combines the five input files writes to this fifo
mkfifo wtf
cat file1 file2 file3 file4 file5 > wtf # this will block...
[from another terminal] cp wtf omg
Here I used cp as your program, and cat as the program combining the five files. You will see that omg will contain the output of your program (here: cp) and that the first terminal will unblock after the program is done.
Your program (here:cp) is not even aware that its 1st argument wtf refers to a fifo; it just opens it and reads from it like it would do with an ordinary file. (this will fail if the program attempts to seek in the file; seek() is not implemented for pipes and fifos)

Does stat()/fstat() function finally open or read the file to get attributes?

In my program there is a function to frequently call stat() to get the attributes of a file in flash storage. Sometimes after power off and reboot the contents of the file lost. I noticed that the stat() finally calls the file system driver in the Linux kernel.
My questions are: will the Linux kernel fs open or read the file to get the file attributes? Is it possible for the power off during stat() or fstat() corrupt the file in flash?
All the stat() call does is to retrieve the contents of the file's i-node; the file itself isn't touched. However, the file's i-node will be in memory, and it the file was updated in any way [even by being held open by this or another process], the file mtime and such will need to be updated and the i-node will get updated, perhaps wrongly. Poof! No file.
But this behavior is not unique to flash.

Copying a directory using sockets

I'm writing a program in C that sends files across the network using sockets. This works fine for files - they are read into a buffer and then written onto the socket. They are picked up at the other end by reversing this process.
However, how can this apply to directories? I also want to copy directories, keeping the permissions the same (so I don't think mkdir will work). At the moment when I try to run this on a directory, it says the size is -1. How is a directory represented?
To be clear, for example, if I want my program to copy /tmp across the network, it will do this:
/tmp/1.txt - OK
/tmp/2.txt - OK
/tmp/dir/ - Skip
/tmp/dir/3.txt - Can't write to path
There are several possibilities. It would fit fairly will with what you have already to tar the directory to transfer, send the resulting archive across the network, and untar on the other side.
Alternatively, you can walk the directory tree recursively. For each directory you need transfer only the name and whichever attributes you want to preserve, but then you must list the directory contents (probably via readdir()) and transfer each member.
By the way, don't neglect to think about how you're going to handle links, both symbolic ones and hard ones. And if you want your program to be really robust then consider also what to do with special files such as device files and FIFOs.
I guess it is homework, otherwise why not use FTP, scp, rsync, unison etc.
To test if a file path is a plain file, a device, a directory, etc etc... use
stat(2)
To read a directory, use opendir(3) then loop on readdir(3) (then of course closedir). You don't need to know how a directory is represented.
You probably should be interested in nftw(3) to recursively traverse a file tree.
To make one directory, use mkdir(2)
You should read Advanced Linux Programming
BTW, this answer contains useful information too...

Check if another program has a file open

After doing tons of research and nor being able to find a solution to my problem i decided to post here on stackoverflow.
Well my problem is kind of unusual so I guess that's why I wasn't able to find any answer:
I have a program that is recording stuff to a file. Then I have another one that is responsible for transferring that file. Finally I have a third one that gets the file and processes it.
My problem is:
The file transfer program needs to send the file while it's still being recorded. The problem is that when the file transfer program reaches end of file on the file doesn't mean that the file actually is complete as it is still being recorded.
It would be nice to have something to check if the recorder has that file still open or if it already closed it to be able to judge if the end of file actually is a real end of file or if there simply aren't further data to be read yet.
Hope you can help me out with this one. Maybe you have another idea on how to solve this problem.
Thank you in advance.
GeKod
Simply put - you can't without using filesystem notification mechanisms, windows, linux and osx all have flavors of this. I forget how Windows does it off the top of my head, but linux has 'inotify' and osx has 'knotify'.
The easy way to handle this is, record to a tmp file, when the recording is done then move the file into the 'ready-to-transfer-the-file' directory, if you do this so that the files are on the same filesystem when you do the move it will be atomic and instant ensuring that any time your transfer utility 'sees' a new file, it'll be wholly formed and ready to go.
Or, just have your tmp files have no extension, then when it's done rename the file to an extension that the transfer agent is polling for.
Have you considered using stream interface between the recorder program and the one that grabs the recorded data/file? If you have access to a stream interface (say an OS/stack service) which also provides a reliable end of stream signal/primitive you could consider that to replace the file interface.
There is no functions/libraries available in C to do this. But a simple alternative is to rename the file once an activity is over. For example, recorder can open the file with name - file.record and once done with recording, it can rename the file.record to file.transfer and the transfer program should look for file.transfer to transfer and once the transfer is done, it can rename the file to file.read and the reader can read that and finally rename it to file.done!
you can check if file is open or not as following
FILE_NAME="filename"
FILE_OPEN=`lsof | grep $FILE_NAME`
// if [ -z $FILE_NAME ] ;then
// "File NOT open"
// else
// "File Open"
refer http://linux.about.com/library/cmd/blcmdl8_lsof.htm
I think an advisory lock will help. Since if one using the file which another program is working on it, the one will get blocked or get an error. but if you access it in force,the action is Okey, but the result is unpredictable, In order to maintain the consistency, all of the processes who want to access the file should obey the advisory lock rule. I think that will work.
When the file is closed then the lock is freed too.Other processes can try to hold the file.

Resources