File system block size - filesystems

What is the significance of the file system block size? If my filesystem block size is set at, say 8K, does that mean that all read/write I/O will happen at size 8K? So if my application wants to read say 16 bytes at offset 4097 then a 4K block starting from offset 4096 will be read?
How do writes work in this case? Suppose I want to write say 64 bytes.

You are right. The block size is the unit of work for the file system. Every read and write is done in full multiples of the block size.
The block size is also the smallest size on disk a file can have. If you have a 16 byte Block size,then a file with 16 bytes size occupies a full block on disk.
The book "Practical file system design" states:
Block: The smallest unit writable by a disk or file system. Everything a
file system does is composed of operations done on blocks. A file system
block is always the same size as or larger (in integer multiples) than the
disk block size.

Normally when you have to deal with files in programming you should use Stream abstraction.
I/O operations through code are often reads and writes to streams; reading and writing from and to streams, can be buffered so that chunks of file can be read or written.
Block size on fs refers to mapping disk surface; minor the size of the single block major the number of blocks (and so the elements in the table that keeps information on allocation of files).
So OS's so can map file on disk discretely based on block size and have a smaller "map of files".
As I know this doesn't affect stream abstraction in API's of programming language.

Related

How to get size of disk driver sector

Can get size of disk sector via the Linux API/ABI? It's about the quantum of I/O disk, normally it's equal 512 bytes, but others values can be too (usually multiple 512 bytes).
Also it should not confuse to size of logical block or to size of sector of a file system.
A block device is reflected as file in a file system of an UNIX (/dev/sda, /dev/sr etc.) It means, can open that file and make some manipulations to its content like with content of the corresponded block device.
So specifically the work to a true block device similar the work to a virtual hard disk (the .vhd format for instance).
But i don't know how to get size of sector in general case.
At moment i've single solution: get the maximal CHS address and size of hard drive, both action via BIOS. But i think, it's bad idea, because portability lost

How do you properly read/write a block device in C?

I know you can use regular I/O functions for block devices (/dev/sda etc), but can you just read some data of size n, or does it have to be divisible by 512? Is there any overheard to reading in smaller sizes? Some devices have blocks bigger than 512 bytes, if there is overhead for smaller sizes, how can I know the optimal block size?
According to Wikipedia and for Unix and Unix-like systems (hence Linux):
Block special files or block devices provide buffered access to hardware devices, and provide some abstraction from their specifics. Unlike character devices, block devices will always allow the programmer to read or write a block of any size (including single characters/bytes) and any alignment. The downside is that because block devices are buffered, the programmer does not know how long it will take before written data is passed from the kernel's buffers to the actual device, or indeed in what order two separate writes will arrive at the physical device...
That means that you can use use any size for a read. Because of the in driver buffering, the physical reads will always read physical sectors.

Giving read() a start position

When you give read a start position - does it slow down read()? Does it have to read everything before the position to find the text it's looking for?
In other words, we have two different read commands,
read(fd,1000,2000)
read(fd,50000,51000)
where we give it two arguments:
read(file descriptor, start, end)
is there a way to implement read so that the two commands take the same amount of computing time?
You don't name a specific file system implementation or one specific language library so I will comment in general.
In general, a file interface will be built directly on top of the OS level file interface. In the OS level interface for most types of drives, data can be read in sectors with random access. The drive can seek to the start of a particular sector (without reading data) and can then read that sector without reading any of the data before it in the file. Because data is typically read in chunks by sector, if the data you request doesn't perfectly align on a sector boundary, it's possible the OS will read the entire sector containing the first byte you requested, but it won't be a lot and won't make a meaningful difference in performance as once the read/write head is positioned correctly, a sector is typically read in one DMA transfer.
Disk access times to read a given set of bytes for a spinning hard drive are not entirely predictable so it's not possible to design a function that will take exactly the same time no matter which bytes you're reading. This is because there's OS level caching, disk controller level caching and a difference in seek time for the read/write head depending upon what the read/write head was doing beforehand. If there are any other processes or services running on your system (which there always are) some of them may also be using the disk and contending for disk access too. In addition, depending upon how your files were written and how many bytes you're reading and how well your files are optimized, all the bytes you read may or may not be in one long readable sequence. It's possible the drive head may have to read some bytes, then seek to a new position on the disk and then read some more. All of that is not entirely predictable.
Oh, and some of this is different if it's a different type of drive (like an SSD) since there's no drive head to seek.
When you give read a start position - does it slow down read()?
No. The OS reads the directory entry to find out where the file is located on the disk, then calculates where on the disk your desired read should be, seeks to that position on the disk and starts reading.
Does it have to read everything before the position to find the text it's looking for?
No. Since it reads sectors at a time, it may read a few bytes before what you requested (whatever is before it in the sector), but sectors are not huge (often 8K) and are typically read in one fell swoop using DMA so that extra part of the sector before your desired data is not likely noticeable.
Is there a way to implement read so that the two commands take the same amount of computing time?
So no, not really. Disk reads, even of identical number of bytes vary a bit depending upon the situation and what else might be happening on the computer and what else might be cached already by the OS or the drive itself.
If you share what problem you're really trying to solve, we could probably suggest alternate approaches rather than relying on a given disk read taking an exact amount of time.
Well, filesystems usually split the data in a file in even-sized blocks. In most file systems the allocated blocks are organized in trees with high branching factor so it is effectively the same time to find the the nth data block than the first data block of the file, computing-wise.
The only general exception to this rule is the brain-damaged floppy disk file system FAT from Microsoft that should have become extinct in 1980s, because in it the blocks of the file are organized in a singly-linked list so to find the nth block you need to scan through n items in the list. Of course decent operating systems then have all sorts of tricks to address the shortcomings here.
Then the next thing is that your reads should touch the same number of blocks or operating system memory pages. Usually operating system pages are 4K nowadays and disk blocks something like 4k too so having every count being a multiple of 4096, 8192 or 16384 is better design than to have decimal even numbers.
i.e.
read(fd, 4096, 8192)
read(fd, 50 * 4096, 51 * 4096)
While it does not affect the computing time in a multiprocessing system, the type of media affects a lot: in magnetic disks the heads need to move around to find the new read position, and the disk must have spun to be in the reading position whereas SSDs have identical random access timings regardless of where on disk the data is positioned. And additionally the operating system might cache frequently accessed locations or expect that the block that is read after N would be N + 1 and hence such order be faster. But most of the time you wouldn't care.
Finally: perhaps instead of read you should consider using memory mapped I/O for random accesses!
Read typically reads data from the given file descriptor into a buffer. The amount of data it reads is from start (arg2) - end (arg3). More generically put the amount of data read can be found with (end-start). So if you have the following reads
read(fd1, 0xffff, 0xffffffff)
and
read(fd2, 0xf, 0xff)
the second read will be quicker because the end (0xff) - the start (0xf) is less than the first reads end (0xffffffff) - start (0xffff). AKA less bytes are being read.

Block Size and inode size in a filesystem

I am going through the book , "Practical Filesystem design " by "Dominic Giampaolo" .
The two important concepts are
Block : The smallest readable or writable unit of memory for a filesystem .
Inode : Inode , is , an area, which stores the data about a file , stores the data about where the blocks composing a file are stored .
The author states about the simplicity introduced by storing a few block addresses directly in i-node . Then he mentions about tradeoff that is faced between "the size of the i-node" and how much data the i-node map .
As such he mentions that the size of the i-node works best when it is an even divisor of the block size .
How to reason out the above statement ? Any calculations to support this ?
Since all read/write operations operate at the block-level, then having your inodes block-aligned and occupying entire blocks ensures that your reads/writes are not wasteful.
If a block is 4096 bytes, but an inode is just 4000 bytes, then either:
1. our inodes are block-aligned: we're not very efficient since we always waste 96 bytes of every block.
2. our inodes are not block-aligned: we're not very efficient since when we want to read an inode, we often need to read two blocks - and none of them will be 100% occupied by inode data.
We remain efficient when:
1. The size of an inode equals to the size of a block (1:1 ratio)
2. The size of an inode is an exact multiple of the size of a block (1:n ratio)
3. The size of a block is an exact multiple of the size of an inode (n:1 ratio)

Reading file using fread in C

I lack formal knowledge in Operating systems and C. My questions are as follows.
When I try to read first single byte of a file using fread in C, does the entire disk block containing that byte is brought into memory or just the byte?
If entire block is brought into memory, what happens on reading
second byte since the block containing that byte is already in
memory?.
Is there significance in reading the file in size of disk blocks?
Where is the read file block kept in memory?
Here's my answers
More than 1 block, default caching is 64k. setvbuffer can change that.
On the second read, there's no I/O. The data is read from the disk cache.
No, a file is ussuly smaller than it's disk space. You'll get an error reading past the file size even if you're within the actual disk space size.
It's part of the FILE structure. This is implementation (compiler) specific so don't touch it.
The above caching is used by the C runtime library not the OS. The OS may or may not have disk caching and is a separate mechanism.

Resources