how are directories implemented in UNIX filesystem? - file

This question is extension to https://unix.stackexchange.com/questions/18605/how-are-directories-implemented-in-unix-filesystems
I'm aiming to implement basic filesystem: After reading inode number and name we come to know Name of file and hence we can list directory contents but we can't determine type of entry: whether it's a file or a file.
If there are 1000 entries in directory then reading 1000 inodes just to determine whether it is file or directory looks too silly.
Am I missing something here or it is this way only?

Related

Get the name of file from its inode number [duplicate]

This question already has answers here:
Is there any way that I can search for a file or a filename using a given inode number?
(2 answers)
Closed 3 years ago.
In a C program, I have got the inode number of a file, and I need to get file name for this inode number.
I have a hint that gets the directory entry for this inode number, and that will, of course, have the file name. But I am unable to figure out how to get the directory entry for a file from its inode number.
I need to do this in the C program and Ubuntu. Any solutions?
In a C program, I have got the inode number of a file, and I need to get file name for this inode number.
A single file, identified by its inode number, may have any number of links (a.k.a. "directory entries" a.k.a. "names") associated with it. (It is a bit more complicated than that, because files that are open have additional links that may or may not be directory entries, but that's not important for our purposes). One can add and remove links that refer to the same file (inode) freely (unless that file is actually a directory). The file doesn't keep track of links associated with it. It only keeps track of their number (the reference count). As soon as the number of links goes down to zero, the file gets deleted.
Given just an inode number, there's absolutely positively no way whatsoever to find any or all of its associated directory entries, short of checking all directory entries of a filesystem.
N.B. Links above are hard links. Soft links are something else entirely.

Get path from file descriptor when path is longer than PATH_MAX

I receive filesystem events from fanotify. Sometimes I want to get an absolute path to a file that's being accessed.
Usually, it's not a problem - fanotify_event_metadata contains a file descriptor fd, so I can call readlink on /proc/self/fd/<fd> and get my path.
However, if a path exceeds PATH_MAX readlink can no longer be used - it fails with ENAMETOOLONG. I'm wondering if there's a way to get a file path in this case.
Obviously, I can fstat the descriptor I get from a fanotify and traverse the entire filesystem looking for files with identical device ID and inode number. But this approach is not feasible for me performance-wise (even if I optimize it to ignore paths shorter than PATH_MAX).
I've tried getting a parent directory by reopening fd with O_PATH and calling openat(fd, "..", ...). Obviously, that failed because fd doesn't refer to a directory. I've also tried examining contents of a buffer after a failed readlink call (hoping it contains partial path). That didn't work either.
So far I've managed to get long paths for files inside the working directory of a process that opened them (fanotify events contain a pid of a target process, so I can read /proc/<pid>/cwd and get the path to the root from there). But that is a partial solution.
Is there a way to get an absolute path from a file descriptor without traversing the whole filesystem? Preferably the one that will work with kernel 2.6.32/glibc 2.11.
Update: For the curious. I've figured out why calling readlink("/proc/self/fd/<fd>", ... with a buffer large enough to store the entire path doesn't work.
Look at the implementation of do_proc_readlink. Notice that it doesn't use provided buffer directly. Instead, it allocates a single page and uses it as a temporary buffer when it calls d_path. In other words, no matter how large is buffer, d_path will always be limited to a size of a page. Which is 4096 bytes on amd64. Same as PATH_MAX! The -ENAMETOOLONG itself is returned by prepend when it runs out of mentioned page.
readlink can be used with a link target that's longer than PATH_MAX. There are two restrictions: the name of the link itself must be shorter than PATH_MAX (check, "/proc/self/fd/<fd>" is about 20 characters) and the provided output buffer must be large enough. You might want to call lstat first to figure out how big the output buffer should be, or just call readlink repeatedly with growing buffers.
the limitation of PATH_MAX births from the fact that the unix (or linux, from now) needs to bind the size of parameters passed to the kernel. There's no limit on how deep a file hierarchy can grow, and always there's the possibility to access all files, independent on how deep they are in the filesystem hierarchy. What is actually limited is the lenght of the string you can pass or receive from the kernel representing a file name. This means you cannot create (because you have to pass the target path) a symlink longer than this length, but you can have easily paths far longer this limit.
When you pass a filename to the kernel, you can do that for two reasons, to name a file (or device, or socket, or fifo, or whatever), to open it, etc. YOu do this and your filename goes first to a routine that converts that path into an inode (which is what the kernel manages actually). That routine begins scanning from two possible point in the filesystem hierarchi. Those points are the inode reference of the root inode and the inode reference of the curren working diretory of a process. The selection of which inode to use as departure inode depends on the presence of a leading / character at the begining of the path. From this point, up to PATH_MAX characters will be processed each time, but that can lead us deep enough that we cannot get to the root in one step only...
Suppose you use the path to change your current directory, and do a chdir A/B/C/D/E/.../Z. Once there, you create new directories and do the same thing, chdir AA/AB/AC/AD/AE/.../AZ, then chdir BA/BB/BC/BD/... and so on... there's nothing in the system that forbids you to get so deep in the filesystem (you can try that yourself, I have done and tested before) You can grow to a map that is by far larger than PATH_MAX. But this only mean that you cannot get there directly from the filesystem root. You can go there in steps, as much as the system allows you, and depending on where you fix you root directory (by means of the chroot(2) syscall) or your current directory (by means of the chdir(2) syscall)
probably you have notice (or not) that there's no system call to get your curren working directory path from root... There are several reasons for this:
root inode and curren working inode are two local-to-process concepts. Two processes in the same system can have different working directories, and also different root directories, up to the point that they are able to share nothing in common and no way from one's directory to reach the other.
inode path can be ambiguous. Well, this is not true for a directory, as it is not allowed two hard links to point to the same directory inode (this was possible in older unices, where directories had to be created with the mknod(2) system call, if you have access to some hp-ux v6 or old Unix SysV R4 you can create directories with a ... entry ---pointing to the granparent of a directory or similar things, just being root and knowing how to use the mknod(2) syscall) The idea is that when two links point to the same inode, which (or both) of then goes to the root, which one is the right path from the root inode to the current dir?
curren inode and root can be separated by a path far enough to not fit in the PATH_MAX limit.
there can be several different filesystems (and filesystem types) involved in getting to the root. So this is not something that can be obtained only knowing the stored data in the disks, you must know the mounting table.
For these reasons, there's no direct support in the kernel to know the root path to a file. And also there's no way to get the path (and this is what the pwd(1) command does) than to follow the .. entry and get to the parent directory and search there a link that gets to the inode number of the current dir... and repeat this until the parent inode is the same as the last inode visited. Only then you'll be in the root directory (your root directory, that is different in general of other processes root directories)
Just try this exercise:
i=0
while [ "$i" -lt 10000 ]
do
mkdir dir-$i
cd dir-$i
i=$(expr "$i" + 1)
done
and see how far you can go from the root directory in your hierarchy.
NOTE 1
Another reason to be impossible to get the path to a file from an open descriptor is that you have access only to the inode (the path you used to open(2) it can have no relationship to the actual root path, as you can use symlinks and relative to the working directory, or changed root dir in between the open call and the time you want to access the path, it can even not exist, as you can have unlink(2)d it) The inode information has no reference to the path to the inode, as there can be multiple (even millions) paths to a file. In the inode you have only a ref count, which means the number of paths that actually finish on that inode.

c - How to find the inode of a directory in an ext2 virtual image (.img) given an absolute path?

In c, how would I find a directory in a virtual disk? I can easily recurse the absolute path and tun that into just the name of the directory I am looking for (i.e. turning /x/y/z into just z). I know that the root is inode 2, and I know how to get to some parts of the file system (superblock, block descriptor, inode table, bg_block/inode bitmap) but I have no clue how to traverse all the data in the image.
This image only has one block group, for what it's worth. Inode size and block size are set to their own predefined variables in the header (EXT2_BLOCK_SIZE and s_inode_size in superblock).
You have to implement the namei algorithm for ext[234] filesystem to get to the correct place. Just follow the kernel source code for the implementation of the ext[234] filesystem and look for the namei routine.

Assign unique numbers to File during runtime

I want to assign unique file numbers to files during run time.
Creating hash for the file name is not an option for me as I do not want any collisions.
One good option is create running numbers for all files. But I do not have access to source file to walk the directory in place where I am running my binary.
So I need some option that can extract file name from the binary (Say using symbol table similar to GDB). I am not sure how to do that. Any help is appriciated
You could try to use the inode number (st_ino) from the file itself -- you get that from using fstat (http://linux.die.net/man/2/fstat).
The inode number is how the file system is keeping track of the files, and they are unique for the given file system -- hence as long as the files are not located on different files systems (different mount points) the inode number is unique.
This include if there are multiple links to the same file, if that worries you as well.

Open every file but not links to other directories when using scandir()

I want to recursively copy one directory into another (like cp -R) using POSIX scandir().
The problem is that when I copy a directory like /sys/bus/, which contains links to higher levels (for example: foo/foo1/foo2/foo/foo1/foo2/foo/... ) the system enters a loop status and copies the directories "in the middle" forever...
How can I check if the file I'm opening with dirent is a link or not?
Look at this: How to check whether two file names point to the same physical file
You need to store a list of inodes that you have visited to make sure that you don't get any duplicates. If you have two hard links to the same file, there is no "one" canonical name. One possibility is to first store all the files and then recurse through all the filenames. You can store the path structure separately from the inodes and file contents.

Resources