I am required to create my own implementation of a filesystem in C. I am planning on creating a system similar to that of the FAT system. We are given one file of size 10MB, which acts as our own "disk." I understand that the FAT table stores cluster numbers, and the Root Directory stores other pertinent information about each file we create (e.g. file name, size, date and time of last modification, start block in FAT, etc.). But I am confused about how the cluster numbers are translated to physical addresses in the data region on the disk.
For example, let's say an entry in the Root Directory says that a file starts in block 100 in the FAT table, and in block 100 of the FAT table is the integer 327, which is where the next cluster of the file is located. How are these addresses translated to physical addresses in the data region of the disk? Where are these physical addresses translated and stored?
Clusters vary in size between different versions of FAT (FAT12, FAT16, and FAT32), but in general the cluster number points to a consecutively numbered cluster of whatever size is present in the format for the existing file system. As I recall (from long ago) FAT12, at least on hard disks, used 2 kibibyte clusters (made up of four 512-byte sectors each), with a maximum cluster number of 2^11 (12 bits starting with zero), so cluster 327 would be 327 * 2048 bytes from the start of the data area of the disk.
The data area includes the FAT, backup FAT, and all directories. My recollection is that each cluster entry in the FAT contains a pointer to the next cluster in the file that occupies that cluster, length of data if it's the last cluster of the file, and some other information needed in reading or writing the file, while the directory entry contains the file name, first cluster, size/date/etc..
A disk is divided into sectors. A hard disk for example has a sector size of 512 bytes. Addressing data on the disk usually uses these sectors and data is read/written in blocks of this size. The FAT filesystem groups a number of sectors into clusters. For example you could have 8 sectors per cluster. This constant is stored along with other information about the filesystem in the first few sectors of the partition. The FAT driver uses this value to compute the sector number from the cluster number. The formula is something like this:
SectorNumber = SectorsPerCluster * ClusterNumber + Constant
The constant is the sector number of the first sector of the data region of the partition. You can find the exact formula in the FAT Specification.
Related
Can get size of disk sector via the Linux API/ABI? It's about the quantum of I/O disk, normally it's equal 512 bytes, but others values can be too (usually multiple 512 bytes).
Also it should not confuse to size of logical block or to size of sector of a file system.
A block device is reflected as file in a file system of an UNIX (/dev/sda, /dev/sr etc.) It means, can open that file and make some manipulations to its content like with content of the corresponded block device.
So specifically the work to a true block device similar the work to a virtual hard disk (the .vhd format for instance).
But i don't know how to get size of sector in general case.
At moment i've single solution: get the maximal CHS address and size of hard drive, both action via BIOS. But i think, it's bad idea, because portability lost
As I understand it each file has an inode that holds the data block location and some metadata about that file. What is the method in which these inodes are stored and referenced too? I'm not asking about the inode structure itself but how we differentiate each by its inode number. Do we have a new structure of an inode table kind of like a bitmap? an array of say inodes[0] would make sense to access the inode number of zero.
or I think I am miss understanding. In our file system we store an Inode into the first blocks of memory so say our first block of memory is inode1 and to access the second inode you reach the second block of memory by point to the start of memory plus the size of a block of memory
This is a possible disk layout for a simple file system. The boot block contains special data used to start the entire system. The superblock contains information describing the file system.After the superblock, there may be a section of memory dedicated to a bitmap that tracks how unused blocks in the system. After the bitmap, there is the inode section.
You could have an inode table that is an array of inode structs that would do the translation of inode number to inode for you. It would be as easy as inode_table[0].
Alternatively, and this is how I learned it in my systems class, if you have functions to read and write sectors from physical disk and you know which sector for which the inode section starts and the size of a sector and the size of an inode in bytes, using a bit of arithmetic and modulus operations you can easily fetch the specific inode from the filesystem without the use of an inode table.
Is there a way to identify the type of a FAT partition (if it is 16 or 32) only by reading its boot sector?
thanks.
Not by reading the boot sector - You need to look into the file system itself.
Find the number of clusters. The file system subtype can be determined by this number:
less than 4086: FAT12
equal or more than 4086: FAT16
more than 65525: FAT32
If the sectors per FAT word in the FAT12/FAT16 BPB is zero, it is FAT32. (Regardless of the actual FAT size, FAT32 uses the EBPB's sectors per FAT dword.) Likewise, if the number of root directory entries word is zero, it is FAT32.
ll /srv/node/dcodxx/test.sh
-rw-r--r--. 1 root root 7 Nov 5 11:18 /srv/node/dcodxx/test.sh
The size of the file is shown in bytes. This file is stored in an xfs filesystem with block size 4096 bytes.
xfs_info /srv/node/sdaxx/
meta-data=/dev/sda isize=256 agcount=32, agsize=7630958 blks
= sectsz=4096 attr=2, projid32bit=0
data = bsize=4096 blocks=244190646, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=119233, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Does this mean that a block can house more than one file, if not what happens to the remaining bytes (4096-7)?
Also, where is the 256 bytes reserved for an inode stored, if it stored in the same block as the file, shouldn't the file size be larger(256+7)?
File data is stored in units of the filesystem block size, and no block sharing is currently possible across multiple files on XFS. So used disk space is always the number of bytes in the file rounded up to the next block size - a 1-byte file will consume 4k of diskspace on a 4k block size filesystem.
The inode itself contains file metadata such as size, timestamps, extent data, etc - and on xfs it can also contain extended attribute information.
The on-disk inode is separate from the file data blocks, and will always consume 256 bytes on a filesystem with 256 byte inodes, regardless of the amount of metadata used. If more than 256 bytes is required to store additional extent information or extended attribute data, additional filesystem-block-sized metadata blocks will be allocated.
Does this mean that a block can house more than one file, if not what happens to the remaining bytes (4096-7)?
A block cannot contain more than one file. If a file is bigger than one block, multiple blocks are used.
Modern filesystems like XFS have a functionality called "inline", where files small enough (no more than 60 bytes) can be stored in the inode, in the space taken to store pointers to the blocks.
where is the 256 bytes reserved for an inode stored, if it stored in the same block as the file, shouldn't the file size be larger(256+7)?
Inode information is stored in the inode table.
I was wondering about the actual (disk-)size of each MFT record. Since the number of clusters per MFT record is set in the bootsector, i guess each one has the same size.
However, each record header stores an additional value: its Allocated size (at 0x1C). As far as i could observe, this value was always equivalent to the value stored in the bootsector.
Is it possible that these two are different (and when)?
If not, the Allocated size value in each record is kind of a waste, right?
It's not actually that much of a waste. You should try to look at what happens when the number of attributes stored in the file record exceeds 1 KB. (by adding additional file names, streams, etc.) It is not clear (to me at least) for different versions of NTFS if the additional attributes are stored in the data section of the volume or in another File Record.
In previous versions of NTFS the size of a MFT File Record was equal to the size of a cluster (generally 4KB) which was a waste of space since sometimes all the attributes would take less than 1 KB of space. Since NT 5.0 (I may be wrong), after some research, Microsoft decided that all MFT File Records should be 1KB. So, one reason for storing that number may be backwards compatibility. Imagine you found an old hard drive which still used 4KB file records and you want to add some file to that drive or copy some files.
Another use for storing that number there would be that you wouldn't need to read the boot sector every time you get a file record to see what it's size should be. Imagine if you were the algorithm that has to mitigate the transfer between 4KB records to 1KB records because of backwards compatibility. If you didn't know what to expect you would have to read the boot sector to find out what size of a record to expect.
What if you didn't have access to the boot sector or you're trying to recover files from a drive that had it's boot sector wiped or has bad clusters? What would happen if the volume is on multiple extents and you're reading the MFT from one extent and the boot sector is in another extent that you don't have access to?
Usually, filesystems are designed by more than a few people over a long time. If those values would be redundant I should think they would certainly notice.