Special file in UNIX file system when mounting? - file

I am reading The UNIX Time-Sharing System by D. M. Ritchie and K. Thompson, where they briefly introduce the UNIX OS. In the file system section, when they talk about the "mount", they say the following 2 paragraphs. And I have a few questions about the bold and itatic content in the paragraphs.
Paragraph 1: When an I/O request is made to a file whose i-node
indicates that it is special, the last 12 device address words are
immaterial, and the first specifies an internal device name, which is
interpreted as a pair of numbers representing, respectively, a
device type and subdevice number. The device type indicates which system routine will deal with I/O on that device; the
subdevice number selects, for example, a disk drive attached to a
particular controller or one of several similar terminal interfaces.
Paragraph 2: In this environment, the implementation of the mount
system call (Section 3.4) is quite straightforward. mount maintains a
system table whose argument is the i-number and device name of the ordinary file specified during the mount, and whose
corresponding value is the device name of the indicated special file. This table is searched for each i-number/device pair that turns
up while a path name is being scanned during an open or create; if a
match is found, the i-number is replaced by the i-number of the
root directory and the device name is replaced by the table value.
From the first paragraph, I know that device name is something existing in a special file's i-node. However, why in the second paragraph it says the ordinary file also has it?
What is the system table the mount tries to maintain? In para. 2 is it indicating that the system table is part of the internal file system, and the mount process makes such a table that the entries of it are special files that point to files in the mounted external device?

Let's start with 2 observations:
mount /dev/sda2 /mnt/sda2 takes a special file /dev/sda2, and a non-special file (ordinary file) /mnt/sda2.
file name lookup has to know how to cross into another mounted filesystems.
Let's assume that /dev/sda1 is device 100, and mounted on /. Let's assume that /dev/sda2 is device 200, and mounted on /mnt/sda2. What happens when you look up /mnt/sda2/x? That file is stored as /x on /dev/sda2. Here's what happens:
Assume that the inode number of the root inode is 1 on every filesystem.
The OS looks up mnt in inode 1 of device 100, and finds e.g. inode 5.
The OS checks it's global system table of mounts to see if (device 100, inode 5) maps to something - it doesn't.
The OS looks up sda2 in inode 5 of device 100, and finds e.g. inode 17.
The OS checks it's global system table of mounts to see if (device 100, inode 17) maps to something.
Because /dev/sda2 is mounted onto /mnt/sda2, the table returns /dev/sda2, which is device 200 - we're crossing into another mounted filesystem.
The OS lookups up x in inode 1 of device 200, and finds e.g. inode 11.
Lookup returns (device 200, inode 5).
So to answer your questions:
The ordinary file is the mount point. "Ordinary" means "not a special file". Note that it is possible to mount a regular file, not just a directory. Docker uses this capability.
The system table's entries are mount points which point to mounted devices.

Related

Why read garbage inode while creating a file is OS?

I am studying operating systems from the book OSTEP. In the file-systems implementation chapter, there is a file creation timeline, which displays what operations are done on what type of file-system elements(inodes, data blocks, inode/data bitmaps etc) when you create a file. Following image displays the timeline.
File Creation Timeline
Suppose we want to create a file bar in /foo/, so the file will be /foo/bar. what data elements will be accessed?
I understood following steps
(1) Start reading from root, to go to location where we want to create a file, so first we read the inode of root so that we can access its data block to find what is the inode number of foo directory.
(2) Read from root data block the inode number of foo.
(3) Now read from inode of foo to find location of its data block.
(4) In the data block of foo, we check whether there is any file with name bar. We can only proceed with creation if there is not, because Linux does not allow same filenames in same directory.
(5) Assuming we can proceed, we read from inode bitmap to find which inode is available for allocation.
(6) Then we allocate that inode by writing to corresponding bitmap location of the above discovered inode.
(7) Then we make an entry in foo for the file of bar. The entry will be a pair ('bar', <inode no of bar>). So a write in foo.
(8) Now I'm stuck from this read done in the inode of bar. Why do we need to read from this inode?
I think we should write straight away in the inode of bar. We have to write metadata of bar to this inode right? Then why de we read hear?

how to create a symbolic link in EXT2 file system

I am working with the EXT2 File System and spent the last 2 days trying to figure out how to create a symbolic link. From http://www.nongnu.org/ext2-doc/ext2.html#DEF-SYMBOLIC-LINKS, "For all symlink shorter than 60 bytes long, the data is stored within the inode itself; it uses the fields which would normally be used to store the pointers to data blocks. This is a worthwhile optimization as it we avoid allocating a full block for the symlink, and most symlinks are less than 60 characters long"
To create a sym link at /link1 to /source I create a new inode and say it gets index 24. Since it's <60 characters, I placed the string "/source" starting at the i_block[0] field (so printing new_inode->i_block[0] in gdb shows "/dir2/source") and set i_links_count to 1, i_size and i_blocks to 0. I then created a directory entry at the inode 2 (root inode) with the properties 24, "link1", and file type EXT2_FT_SYMLINK.
A link called "link1" gets created but its a directory and when I click it it goes to "/". I'm wondering what I'm doing wrong...
A (very) late response, but just because the symlink's data is in the block pointers that doesn't mean the file size is 0! You need to set the i_size field in the symlink's inode equal to the length of the path

How the getcwd is implemented in the kernel (library)?

One process could do
chdir("/to/some/where");
when from the another shell
mv /to/some/where /now/different/path/
the 1st process
print getcwd();
#prints /now/different/path/
How the getcwd is implemented? (at the lowest level, e.g. at the level of kernel, inodes ...).
I know how common (inode based) filesystem works, e.g. what contains the directory (name of the entries and the corresponding inode numbers).
EDIT
Probably the question was to vague - trying to refine it. One possible scenario (from what o knows)
the kernel knows the inode of the CWD for the given process (and his threads) - e.g. inode number 1000
reads the inode (gets the blocks what needs to read)
reads the corresponding blocks (e.g. opens the directory)
read the directory entries (name of the entries and the inode numbers)
gets the inode number for the .. parent directory (for example 900) and the inode number of the . (current directory)
reads the content of the parent directory where gets
the name of the previous directory (for the inode 1000)
the inode number of the parent directory
continue to 5. - until the root inode is reached.
Thats mean, the getcwd for
/some/very/very/very/deep/directory/level
tooks more raw IO operations (more directory entries need to read) as for the short
/tmp
where the whole getcwd is done by two readings?
Is this correct? or it is done in totally another way?
First, you asking on the wrong place. This question is more about the operating system, so the unix.stackexchange is the better place.
Anyway, your proposed solution is true for some ancient UNIX implementation (for example BSD 2.8) or like. That pathname resolution could be done as you described.
However, many problems arises - few of them:
as you said - too complicated pathname resolution (and yes, for the deeper directories needs more IO)
depends on the premise that only ONE ROOT directory exists. This isn't true from the BSD 4.2 where are introduced the per process root directory - what allows the chroot system call - what allow sets the root to any directory without showing the real path to the process. (One of the coolest FreeBSD feature are the jails - depends on this) (Also ancient linuxes have only one root - only in the 0.96c are introduced the VFS - virtual filesystem layer)
and permission problems - e.g. what happens when
#shell1
$ mkdir -p /tmp/some
$ cd /tmp/some
second shell
$ su
# mkdir -p /tmp/my
# chmod 700 /tmp/my
# mv /tmp/some /tmp/my/
the /tmp/my directory isn't readable for the first process. So, it can't determine the path, so how it should work with the files? So, in shell1 again:
$ pwd
/tmp/some #the original
$ echo $CWD
/tmp/some
$ /bin/pwd
pwd: .: Permission denied
But, you still can do for example
$ touch bob #works
e.g. the system allows you work in the "current" directory without let you know where are you. (in both scenarios e.g. in chroot and in the second one) ;)
That's mean than every process stores in his table the current working directory:
device number (e.g. hdd1 or hdd2)
inode number on the device
and
the kernel maintains another global table(s), (in linux called as dentry (directory entries)), - where the kernel maintaining the "inode" -> "path" mapping for every process, every opened file descriptor, and also indode caches (in the linux maintained by the kernel itself, BSD: job for the vnod driver) and like.
E.g. when some process asks for the pathname for the inode X, the kernel searches the dentry table, if the entry found - return immediately, if not - calls the lookup process, what doing the pathname resolution.
When for example the rename occurs, the kernel searched the dentry table, if found the entry and changes it as needed.
All above is extremely simplified, as you can see yourself, all above is highly OS dependent, the common base is defined by POSIX - but happens behind (e.g. the implementation) - you need really read the sources of the kernel and/or google for:
linux dentry
linux vfs
freebsd vnode
pathname resolution
and such.
Ps: for the nitpickers, :) - as i said - everything is over-simplyfied, so if you want correct and add more details - edit the answer - i converted it to "community wiki answer".
In current POSIX kernels like Linux (or *BSD-s) the current working directory (as a kernel inode) is part of the process state. So the in-kernel process descriptor (probably some struct task_struct on Linux) contains or refers to that cwd. Then getcwd is "simply" a syscall querying that.
The kernel inodes (for opened file descriptors, including working directories) are related to filesystems and are not the same as disk inodes.
Of course, the evil is in the details!
Key point: chdir() only affects the current process and any child processes launched after that - it is not a global state.

What is the difference between inode number and file descriptor?

I understand file descriptors are kernel handle to identify the file , while inode number of a file is pointer to a structure which has other details about file(Correct me if I am wrong). But I am unable to get the difference between them.
An inode is an artifact of a particular file-system and how it manages indirection. A "traditional *ix" file-system uses this to link together files into directories, and even multiple parts of a file together. That is, an inode represents a physical manifestation of the file-system implementation.
On the other hand, a file descriptor is an opaque identifier to an open file by the Kernel. As long as the file remains open that identifier can be used to perform operations such as reading and writing. The usage of "file" here is not to be confused with a general "file on a disk" - rather a file in this context represents a stream and operations which can be performed upon it, regardless of the source.
A file descriptor is not related to an inode, except as such may be used internally by particular [file-system] driver.
The difference is not substantial, both are related to the abstract term called "file". An inode is a filesystem structure that represents files. Whereas, a file descriptor is an integer returned by open syscall. By definition:
Files are represented by inodes. The inode of a file is a structure kept by the filesystem which holds information about a file, like its type, owner, permissions, inode links count and so on.
On other the hand, a file descriptor
File Descriptors:
The value returned by an open call is termed a file descriptor and is essentially an index into an array of open files kept by the kernel.
The kernel doesn't represent open files by their names, instead it uses an array of entries for open files for every process, so a file descriptor in effect is an index into an array of open files. For example, let's assume you're doing the following operation in a process:
read(0, 10)
0 denotes the file descriptor number, and 10 to read 10 bytes. In this case, the process requests 10 bytes from the file/stream in index 0, this is stdin. The kernel automatically grants each process three open streams:
Descriptor No.
0 ---> stdin
1 ---> stdout
2 ---> stderr
These descriptors are given to you for free by the kernel.
Now, when you open a file, in the process via open("/home/myname/file.txt") syscall, you'll have index 3 for the newly opened file, you open another file, you get index 4 and so forth. These are the descriptors of the opened files in the process:
Descriptor No.
0 ---> stdin
1 ---> stdout
2 ---> stderr
3 ---> /home/user100/out.txt
4 ---> /home/user100/file.txt
See OPEN(2) it explains what goes underneath the surface when you call open.
The fundamental difference is that an inode represents a file while a file descriptor (fd) represents a ticket to access the file, with limited permission and time window. You can think an inode as kind of complex ID of the file. Each file object has a unique inode. On the other hand, a file descriptor is an "opened" file by a particular user. The user program is not aware of the file's inode. It uses the fd to access the file. Depending on the user's permissions and the mode the user program choses to open the file (read-only for example) a fd is allowed a certain set of operations on the file. Once the fd is "closed" the user program can't access the file unless it opens another fd. At any given time, there can be multiple fds accessing a file in the same or different user programs.

Print all files on a filesystem using system call

I am working in the kernel and I am trying to make a system call that takes a partition as input (i.e. /dev/sda1) and then prints every file on the filesystem using printk().
I enter a partition (i.e. /dev/sda1) and I put a printk() inside this system call to print.
First, I tried to do this with a process, because if I am right each process is represented by a task_struct and I tried to access the files with the files_struct. But the problem is that I only have the file descriptors of the opened files and not all the files.
So, what I want to do is that I pass the name of the partition and I printk() the names of all the files.
For example:
I enter the path /dev/sda1 as an argument and let's suppose I have the file a.txt and b.txt inside this partition , so the system call should print a.txt and b.txt.
The signature will be like this:
asmlinkage long sys_acall(char *partition_name);
There is a few things that needs to be discussed.
The partition_name parameter of your syscall should have the __user tag.
If you want to, strictly speaking, read files from a partition you will have to implement filesystem recognition (is that partition ext3, reiserfs, ntfs, ...?) and then implement the driver for that kind of filesystem. As Christ pointed out, partitions doesn't contain files but filesystems does. Another option is use the drivers already implemented for the filesystem on that partition. This option is just horrible.
If you want to read files from a filesystem your work gets easier, you can use the VFS interface to access it, but you will need that filesystem to be mounted (you can do it on-the-fly though).
My final opinion, I would change "implement a system call that prints every file in a partition" for "implement a system call that prints every file in a directory". The signature for that system call would be:
asmlinkage long sys_crazyness(__user const char *dir);
We don't care if the directory passed is the root of a filesystem or just a folder in any depth-level of a filesystem.
If you can change your problem to this one it would be much easier ;)

Resources