I am studying operating systems from the book OSTEP. In the file-systems implementation chapter, there is a file creation timeline, which displays what operations are done on what type of file-system elements(inodes, data blocks, inode/data bitmaps etc) when you create a file. Following image displays the timeline.
File Creation Timeline
Suppose we want to create a file bar in /foo/, so the file will be /foo/bar. what data elements will be accessed?
I understood following steps
(1) Start reading from root, to go to location where we want to create a file, so first we read the inode of root so that we can access its data block to find what is the inode number of foo directory.
(2) Read from root data block the inode number of foo.
(3) Now read from inode of foo to find location of its data block.
(4) In the data block of foo, we check whether there is any file with name bar. We can only proceed with creation if there is not, because Linux does not allow same filenames in same directory.
(5) Assuming we can proceed, we read from inode bitmap to find which inode is available for allocation.
(6) Then we allocate that inode by writing to corresponding bitmap location of the above discovered inode.
(7) Then we make an entry in foo for the file of bar. The entry will be a pair ('bar', <inode no of bar>). So a write in foo.
(8) Now I'm stuck from this read done in the inode of bar. Why do we need to read from this inode?
I think we should write straight away in the inode of bar. We have to write metadata of bar to this inode right? Then why de we read hear?
Related
I am reading The UNIX Time-Sharing System by D. M. Ritchie and K. Thompson, where they briefly introduce the UNIX OS. In the file system section, when they talk about the "mount", they say the following 2 paragraphs. And I have a few questions about the bold and itatic content in the paragraphs.
Paragraph 1: When an I/O request is made to a file whose i-node
indicates that it is special, the last 12 device address words are
immaterial, and the first specifies an internal device name, which is
interpreted as a pair of numbers representing, respectively, a
device type and subdevice number. The device type indicates which system routine will deal with I/O on that device; the
subdevice number selects, for example, a disk drive attached to a
particular controller or one of several similar terminal interfaces.
Paragraph 2: In this environment, the implementation of the mount
system call (Section 3.4) is quite straightforward. mount maintains a
system table whose argument is the i-number and device name of the ordinary file specified during the mount, and whose
corresponding value is the device name of the indicated special file. This table is searched for each i-number/device pair that turns
up while a path name is being scanned during an open or create; if a
match is found, the i-number is replaced by the i-number of the
root directory and the device name is replaced by the table value.
From the first paragraph, I know that device name is something existing in a special file's i-node. However, why in the second paragraph it says the ordinary file also has it?
What is the system table the mount tries to maintain? In para. 2 is it indicating that the system table is part of the internal file system, and the mount process makes such a table that the entries of it are special files that point to files in the mounted external device?
Let's start with 2 observations:
mount /dev/sda2 /mnt/sda2 takes a special file /dev/sda2, and a non-special file (ordinary file) /mnt/sda2.
file name lookup has to know how to cross into another mounted filesystems.
Let's assume that /dev/sda1 is device 100, and mounted on /. Let's assume that /dev/sda2 is device 200, and mounted on /mnt/sda2. What happens when you look up /mnt/sda2/x? That file is stored as /x on /dev/sda2. Here's what happens:
Assume that the inode number of the root inode is 1 on every filesystem.
The OS looks up mnt in inode 1 of device 100, and finds e.g. inode 5.
The OS checks it's global system table of mounts to see if (device 100, inode 5) maps to something - it doesn't.
The OS looks up sda2 in inode 5 of device 100, and finds e.g. inode 17.
The OS checks it's global system table of mounts to see if (device 100, inode 17) maps to something.
Because /dev/sda2 is mounted onto /mnt/sda2, the table returns /dev/sda2, which is device 200 - we're crossing into another mounted filesystem.
The OS lookups up x in inode 1 of device 200, and finds e.g. inode 11.
Lookup returns (device 200, inode 5).
So to answer your questions:
The ordinary file is the mount point. "Ordinary" means "not a special file". Note that it is possible to mount a regular file, not just a directory. Docker uses this capability.
The system table's entries are mount points which point to mounted devices.
I am working with the EXT2 File System and spent the last 2 days trying to figure out how to create a symbolic link. From http://www.nongnu.org/ext2-doc/ext2.html#DEF-SYMBOLIC-LINKS, "For all symlink shorter than 60 bytes long, the data is stored within the inode itself; it uses the fields which would normally be used to store the pointers to data blocks. This is a worthwhile optimization as it we avoid allocating a full block for the symlink, and most symlinks are less than 60 characters long"
To create a sym link at /link1 to /source I create a new inode and say it gets index 24. Since it's <60 characters, I placed the string "/source" starting at the i_block[0] field (so printing new_inode->i_block[0] in gdb shows "/dir2/source") and set i_links_count to 1, i_size and i_blocks to 0. I then created a directory entry at the inode 2 (root inode) with the properties 24, "link1", and file type EXT2_FT_SYMLINK.
A link called "link1" gets created but its a directory and when I click it it goes to "/". I'm wondering what I'm doing wrong...
A (very) late response, but just because the symlink's data is in the block pointers that doesn't mean the file size is 0! You need to set the i_size field in the symlink's inode equal to the length of the path
Let me explain clearly.
The following is my requirement:
Let's say there is a command which has an option specified as '-f' that takes a filename as argument.
Now I have 5 files and I want to create a new file merging those 5 files and give the new filename as argument for the above command.
But there is a difference between
reading a single file and
merging all files & reading the merged file.
There is more IO (read from 5 files + write to the merged file + any IO our command does with the given file) generated in the second case than IO (any IO our command does with the given file) generated in the first case.
Can we reduce this unwanted IO?
In the end, I really don't want the merged file at all. I only create this merged file just to let the command read the merged files content.
And to say, I also don't want this implementation. The file sizes are not so big and it is okay to have that extra negligible IO. But, I am just curious to know if this can be done.
So in order to implement this, I have following understanding/questions:
Generally what all the commands (that takes the filename argument) does is it reads the file.
In our case, the filename(filepath) is not ready, it's just an virtual/imaginary filename that exists (as the mergation of all files).
So, can we create such virtual filename?
What is a filename? It's an indirect inode entry for a storage location.
In our case, the individual files have different inode entries and all inode entries have different storage locations. And our virtual/imaginary file has in fact no inode and even if we could create an imaginary inode, that can only point to a storage in memory (as there is no reference to the storage location of another file from a storage location of one file in disk)
But, let's say using advanced programming, we are able to create an imaginary filepath with imaginary inode, that points to a storage in memory.
Now, when we give that imaginary filename as argument and when the command tries to open that imaginary file, it finds that it's inode entry is referring to a storage in memory. But the actual content is there in disk and not in the memory. So, the data is not loaded into memory yet, unless we read it explicitly. Hence, again we would need to read the data first.
Simply saying, as there is no continuity or references at storage in disk to the next file data, the merged data needs to be loaded to memory first.
So, with my deduction, it seems we would at least need to put the data in memory. However, as the command itself would need the file to be read (if not the whole file, at least a part of it until the commands's operation is done - let it be parsing or whatever). So, using this method, we could save some significant IO, if it's really a big file.
So, how can we create that virtual file?
My first answer is to write the merged file to tmpfs and refer to that file. But is it the only option or can we actually point to a storage location in memory, other than tmpfs? tmpfs is not option because, my script can be run from any server and we need to have a solution that work from all servers. If I mention to create merged file at /dev/shm in my script, it may fail in the server where it doesn't have /dev/shm. So I should be able to load to memory directly. But I think normal user will not have access to memory and so, it seems can not be done without shm.
Please let me know your comments and also kindly correct me if my understanding anywhere is wrong. Even if it is complicated for my level, kindly post your answer. At least, I might understand it after few months.
Create a fifo (named pipe) and provide its name as an argument to your program. The process that combines the five input files writes to this fifo
mkfifo wtf
cat file1 file2 file3 file4 file5 > wtf # this will block...
[from another terminal] cp wtf omg
Here I used cp as your program, and cat as the program combining the five files. You will see that omg will contain the output of your program (here: cp) and that the first terminal will unblock after the program is done.
Your program (here:cp) is not even aware that its 1st argument wtf refers to a fifo; it just opens it and reads from it like it would do with an ordinary file. (this will fail if the program attempts to seek in the file; seek() is not implemented for pipes and fifos)
In c, how would I find a directory in a virtual disk? I can easily recurse the absolute path and tun that into just the name of the directory I am looking for (i.e. turning /x/y/z into just z). I know that the root is inode 2, and I know how to get to some parts of the file system (superblock, block descriptor, inode table, bg_block/inode bitmap) but I have no clue how to traverse all the data in the image.
This image only has one block group, for what it's worth. Inode size and block size are set to their own predefined variables in the header (EXT2_BLOCK_SIZE and s_inode_size in superblock).
You have to implement the namei algorithm for ext[234] filesystem to get to the correct place. Just follow the kernel source code for the implementation of the ext[234] filesystem and look for the namei routine.
I understand file descriptors are kernel handle to identify the file , while inode number of a file is pointer to a structure which has other details about file(Correct me if I am wrong). But I am unable to get the difference between them.
An inode is an artifact of a particular file-system and how it manages indirection. A "traditional *ix" file-system uses this to link together files into directories, and even multiple parts of a file together. That is, an inode represents a physical manifestation of the file-system implementation.
On the other hand, a file descriptor is an opaque identifier to an open file by the Kernel. As long as the file remains open that identifier can be used to perform operations such as reading and writing. The usage of "file" here is not to be confused with a general "file on a disk" - rather a file in this context represents a stream and operations which can be performed upon it, regardless of the source.
A file descriptor is not related to an inode, except as such may be used internally by particular [file-system] driver.
The difference is not substantial, both are related to the abstract term called "file". An inode is a filesystem structure that represents files. Whereas, a file descriptor is an integer returned by open syscall. By definition:
Files are represented by inodes. The inode of a file is a structure kept by the filesystem which holds information about a file, like its type, owner, permissions, inode links count and so on.
On other the hand, a file descriptor
File Descriptors:
The value returned by an open call is termed a file descriptor and is essentially an index into an array of open files kept by the kernel.
The kernel doesn't represent open files by their names, instead it uses an array of entries for open files for every process, so a file descriptor in effect is an index into an array of open files. For example, let's assume you're doing the following operation in a process:
read(0, 10)
0 denotes the file descriptor number, and 10 to read 10 bytes. In this case, the process requests 10 bytes from the file/stream in index 0, this is stdin. The kernel automatically grants each process three open streams:
Descriptor No.
0 ---> stdin
1 ---> stdout
2 ---> stderr
These descriptors are given to you for free by the kernel.
Now, when you open a file, in the process via open("/home/myname/file.txt") syscall, you'll have index 3 for the newly opened file, you open another file, you get index 4 and so forth. These are the descriptors of the opened files in the process:
Descriptor No.
0 ---> stdin
1 ---> stdout
2 ---> stderr
3 ---> /home/user100/out.txt
4 ---> /home/user100/file.txt
See OPEN(2) it explains what goes underneath the surface when you call open.
The fundamental difference is that an inode represents a file while a file descriptor (fd) represents a ticket to access the file, with limited permission and time window. You can think an inode as kind of complex ID of the file. Each file object has a unique inode. On the other hand, a file descriptor is an "opened" file by a particular user. The user program is not aware of the file's inode. It uses the fd to access the file. Depending on the user's permissions and the mode the user program choses to open the file (read-only for example) a fd is allowed a certain set of operations on the file. Once the fd is "closed" the user program can't access the file unless it opens another fd. At any given time, there can be multiple fds accessing a file in the same or different user programs.