Xfs file size, inode size and block size - filesystems

ll /srv/node/dcodxx/test.sh
-rw-r--r--. 1 root root 7 Nov 5 11:18 /srv/node/dcodxx/test.sh
The size of the file is shown in bytes. This file is stored in an xfs filesystem with block size 4096 bytes.
xfs_info /srv/node/sdaxx/
meta-data=/dev/sda isize=256 agcount=32, agsize=7630958 blks
= sectsz=4096 attr=2, projid32bit=0
data = bsize=4096 blocks=244190646, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=119233, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Does this mean that a block can house more than one file, if not what happens to the remaining bytes (4096-7)?
Also, where is the 256 bytes reserved for an inode stored, if it stored in the same block as the file, shouldn't the file size be larger(256+7)?

File data is stored in units of the filesystem block size, and no block sharing is currently possible across multiple files on XFS. So used disk space is always the number of bytes in the file rounded up to the next block size - a 1-byte file will consume 4k of diskspace on a 4k block size filesystem.
The inode itself contains file metadata such as size, timestamps, extent data, etc - and on xfs it can also contain extended attribute information.
The on-disk inode is separate from the file data blocks, and will always consume 256 bytes on a filesystem with 256 byte inodes, regardless of the amount of metadata used. If more than 256 bytes is required to store additional extent information or extended attribute data, additional filesystem-block-sized metadata blocks will be allocated.

Does this mean that a block can house more than one file, if not what happens to the remaining bytes (4096-7)?
A block cannot contain more than one file. If a file is bigger than one block, multiple blocks are used.
Modern filesystems like XFS have a functionality called "inline", where files small enough (no more than 60 bytes) can be stored in the inode, in the space taken to store pointers to the blocks.
where is the 256 bytes reserved for an inode stored, if it stored in the same block as the file, shouldn't the file size be larger(256+7)?
Inode information is stored in the inode table.

Related

Block Size and inode size in a filesystem

I am going through the book , "Practical Filesystem design " by "Dominic Giampaolo" .
The two important concepts are
Block : The smallest readable or writable unit of memory for a filesystem .
Inode : Inode , is , an area, which stores the data about a file , stores the data about where the blocks composing a file are stored .
The author states about the simplicity introduced by storing a few block addresses directly in i-node . Then he mentions about tradeoff that is faced between "the size of the i-node" and how much data the i-node map .
As such he mentions that the size of the i-node works best when it is an even divisor of the block size .
How to reason out the above statement ? Any calculations to support this ?
Since all read/write operations operate at the block-level, then having your inodes block-aligned and occupying entire blocks ensures that your reads/writes are not wasteful.
If a block is 4096 bytes, but an inode is just 4000 bytes, then either:
1. our inodes are block-aligned: we're not very efficient since we always waste 96 bytes of every block.
2. our inodes are not block-aligned: we're not very efficient since when we want to read an inode, we often need to read two blocks - and none of them will be 100% occupied by inode data.
We remain efficient when:
1. The size of an inode equals to the size of a block (1:1 ratio)
2. The size of an inode is an exact multiple of the size of a block (1:n ratio)
3. The size of a block is an exact multiple of the size of an inode (n:1 ratio)

What is a sparse file and why do we need it?

What is a sparse file and why do we need it?
The only thing that I am able to get is that it is a very large file and it is efficient(in gigabytes). How is it efficient ?
Say you have a file with many empty bytes \x00. These many empty bytes \x00 are called holes. Storing empty bytes is just not efficient, we know there are many of them in the file, so why store them on the storage device? We could instead store metadata describing those zeros. When a process reads the file those zero byte blocks get generated dynamically as opposed to being stored on physical storage (look at this schematic from Wikipedia):
This is why a sparse file is efficient, because it does not store the zeros on disk, instead it holds enough data describing the zeros that will be generated.
Note: the logical file size is greater than the physical file size for sparse files. This is because we have not stored the zeros physically on a storage device.
Edit:
When you run:
$ dd if=/dev/zero of=output bs=1G count=4
The command here copies 4G blocks of null bytes to output. To see that:
$ stat output
File: ouput
Size: 4294967296 Blocks: 8388616 IO Block: 4096 regular file
--omitted--
You can see that this file has 8388616 blocks allocated to it, these blocks store nothing but empty bytes copied from /dev/zero and they do occupy physical disk space, they're holes stored on disk (sparse zeros). dd did what you asked for, copying blocks of data from one file to another.
Now, run this command to detect the holes and make the file sparse in-place:
$ fallocate -d output
$ stat output
File: swapfile
Size: 4294967296 Blocks: 0 IO Block: 4096 regular file
--omitted--
Do you notice something? The the number of blocks now is 0 because the blocks that were storing only empty bytes were de-allocated. Remember, output's blocks store nothing, only a bunch of empty zeros, fallocate -d detected the blocks that contain only empty zeros and deallocated them, since all the blocks for this file contain zeros, they were all de-allocated.
Also notice how the size remained the same. This is the logical (virtual) size of the file, not its size on disk. It's crucial to know that output doesn't occupy physical storage space now, it has 0 blocks allocated to it and thus I doesn't really use disk space. The size preserved after running fallocate -d so when you later read from the file, you get the empty bytes generated to you by the filesystem at runtime. The physical size of output however, is zero, it uses no data blocks.
Remember, when you read output file the empty bytes are generated by the filesystem at runtime dynamically, they're not really physically stored on disk, and the file's size as reported by stat is the logical size, and the physical size is zero for output. In this case the filesystem has to generate 4G of empty bytes when a process reads the file.
To generate a sparse file using dd:
$ dd if=/dev/zero of=output2 bs=1G seek=0 count=0
$ stat
stat output2
File: output2
Size: 4294967296 Blocks: 0 IO Block: 4096 regular file
GNU dd internally uses lseek and ftruncate, so check truncate(2) and lseek(2).
A sparse file is a file that is mostly empty, i.e. it contains large blocks of bytes whose value is 0 (zero).
On the disk, the content of a file is stored in blocks of fixed size (usually 4 KiB or more). When all the bytes contained in such a block are 0, a file system that implements sparse files does not store the block on disk, instead it keeps the information somewhere in the file meta-data.
Advantages of using sparse files:
empty blocks of data do not occupy disk space; they are not stored as the regular blocks of data, their identifiers (that use only several bytes) are stored instead in the file meta-data; this way 4 KiB of disk space (or more) are saved for each empty block;
reading an empty block of data from a sparse file does not take time; this happens because no data is read from disk; since the file system knows all the bytes in the block are 0, it just sets to 0 all the bytes in the input buffer and the data is ready; there is no need to access the slow storage device;
writing an empty block of data into a sparse file does not take time; on writing, the file system detects that the block is empty (all its bytes are 0) and puts the block ID into the list of empty blocks (in the file meta-data); no data is written to the disk.
More information about sparse files can be found on the Wikipedia page.

Does a zero-length file take up a block on disk?

I understand if a file has 1 byte, it will still take up an entire block on disk (e.g. 4KB). Is the same true for a zero-length file? I am specifically wondering about NTFS but insight on other file systems welcome!
No, in case of NTFS, if file has 1 byte, it doesn't use any block. In general, if file has less than 300 bytes (approximately and in case that file record in MFT has 512 bytes - this value depends on file name length, size of MTF file record, etc.), data are located in MFT (master file table). Only if it doesn't fit in one file record (in MFT), then data are externalized to blocks (usually 4 KB).

How does memory translation work in the FAT filesystem?

I am required to create my own implementation of a filesystem in C. I am planning on creating a system similar to that of the FAT system. We are given one file of size 10MB, which acts as our own "disk." I understand that the FAT table stores cluster numbers, and the Root Directory stores other pertinent information about each file we create (e.g. file name, size, date and time of last modification, start block in FAT, etc.). But I am confused about how the cluster numbers are translated to physical addresses in the data region on the disk.
For example, let's say an entry in the Root Directory says that a file starts in block 100 in the FAT table, and in block 100 of the FAT table is the integer 327, which is where the next cluster of the file is located. How are these addresses translated to physical addresses in the data region of the disk? Where are these physical addresses translated and stored?
Clusters vary in size between different versions of FAT (FAT12, FAT16, and FAT32), but in general the cluster number points to a consecutively numbered cluster of whatever size is present in the format for the existing file system. As I recall (from long ago) FAT12, at least on hard disks, used 2 kibibyte clusters (made up of four 512-byte sectors each), with a maximum cluster number of 2^11 (12 bits starting with zero), so cluster 327 would be 327 * 2048 bytes from the start of the data area of the disk.
The data area includes the FAT, backup FAT, and all directories. My recollection is that each cluster entry in the FAT contains a pointer to the next cluster in the file that occupies that cluster, length of data if it's the last cluster of the file, and some other information needed in reading or writing the file, while the directory entry contains the file name, first cluster, size/date/etc..
A disk is divided into sectors. A hard disk for example has a sector size of 512 bytes. Addressing data on the disk usually uses these sectors and data is read/written in blocks of this size. The FAT filesystem groups a number of sectors into clusters. For example you could have 8 sectors per cluster. This constant is stored along with other information about the filesystem in the first few sectors of the partition. The FAT driver uses this value to compute the sector number from the cluster number. The formula is something like this:
SectorNumber = SectorsPerCluster * ClusterNumber + Constant
The constant is the sector number of the first sector of the data region of the partition. You can find the exact formula in the FAT Specification.

File system block size

What is the significance of the file system block size? If my filesystem block size is set at, say 8K, does that mean that all read/write I/O will happen at size 8K? So if my application wants to read say 16 bytes at offset 4097 then a 4K block starting from offset 4096 will be read?
How do writes work in this case? Suppose I want to write say 64 bytes.
You are right. The block size is the unit of work for the file system. Every read and write is done in full multiples of the block size.
The block size is also the smallest size on disk a file can have. If you have a 16 byte Block size,then a file with 16 bytes size occupies a full block on disk.
The book "Practical file system design" states:
Block: The smallest unit writable by a disk or file system. Everything a
file system does is composed of operations done on blocks. A file system
block is always the same size as or larger (in integer multiples) than the
disk block size.
Normally when you have to deal with files in programming you should use Stream abstraction.
I/O operations through code are often reads and writes to streams; reading and writing from and to streams, can be buffered so that chunks of file can be read or written.
Block size on fs refers to mapping disk surface; minor the size of the single block major the number of blocks (and so the elements in the table that keeps information on allocation of files).
So OS's so can map file on disk discretely based on block size and have a smaller "map of files".
As I know this doesn't affect stream abstraction in API's of programming language.

Resources