How much do modern filesystems reserve for each block group?

How much do modern filesystems reserve for each block group? - filesystems

In reading about the Unix FFS, I've read that 10% of the disk space is reserved so that files' data blocks can be ensured to be in the same cylinder group. Is this still true with filesystems like ext2/ext3, is there space reserved so that files' data blocks can all be in the same block group? Is it also 10%? or does it vary? Also, is the same true for journaling filesystems as well? Thank you.

first of all i think that ext filesystems implement the same notion of a cylinder group,
they just call it block group.
to find out about it , you can fdisk the partition to find your actual block count
and blocks/group number .Then the number of block groups = block count / (block/group).
They are used in exactly the same way as FFS cgs (to speed up access times).
Now journaling IMO has nothing to do with this operation, except that it actually wastes
some more space on your disk :). As far as i understand , soft updates which is the BSD solution to the problem that a journal would solve in typical ext filesystems, don't require extra space , but are tremendously complex to implement and add new features on (like resizing).
interesting read:
ext3 overhead disclosed part 1
cheers!

My data for fresh ext2 images are:
Size Block size Bl/Gr Total bytes Free bytes Ratio
1MB 1024 8192 1048576 1009664 0.03710
10MB 1024 8192 10485760 10054656 0.04111
100MB 1024 8192 104857600 99942400 0.04688
512M 4096 32768 536870912 528019456 0.01649
1G 4096 32768 1073741824 1055543296 0.01695
10G 4096 32768 10737418240 10545336320 0.01789
So, it's quite predictable that the space efficiency of an Ext2 filesystem depends on block size due to layout described in the above answer: filesystem is a set of block groups, for each group its size is determined as count of blocks which can be described by a 1-block bitmap => for a 4096 byte block there are 8 * 4096 blocks.
Conclusion: for ext2/ext3 family of filesystems average default consumption of space depends on block size:
~ 1.6 - 1.8 % for 4096 byte blocks, ~ 4 % for 1024 ones

Related

AOSP: How to change size of file systems?

When i do df -h, I can see that /dev/block/dm-2 is mounted on vendor, /dev/block/dm-0 on /(system, i guess?) etc. as shown below.
Filesystem Size Used Avail Use% Mounted on
tmpfs 978M 816K 978M 1% /dev
tmpfs 978M 0 978M 0% /mnt
/dev/block/mmcblk2p11 11M 144K 11M 2% /metadata
/dev/block/dm-0 934M 931M 2.8M 100% /
/dev/block/dm-2 228M 227M 708K 100% /vendor
As it can be seen, both the vendor and system partitions are almost full. How can i increase the size of both the file systems?

It may have enable dynamic partitions. Have a look at: https://source.android.com/devices/tech/ota/dynamic_partitions/implement?hl=en.
With dynamic partitions, vendors no longer have to worry about the individual sizes of partitions such as system, vendor, and product. Instead, the device allocates a super partition, and sub-partitions can be sized dynamically within it. Individual partition images no longer have to leave empty space for future OTAs. Instead, the remaining free space in super is available for all dynamic partitions.

How will changing 1K block size in a fixed block size file system to 2K affect disk throughput/utilization

As per my understanding if we change 1K block size in a fixed block size file system
to 2K it will lead to better disk throughput but poorer disk space utilization.
As now the file blocks are large and so disk space utilization could be poor since
now the last nonempty block could have approx 2K space. Whereas when it was
1K the last nonempty block could have approx 1K space.
So there is poor disk utilization. But now the same file that was stored
in say, 100 blocks could now be stored in 50 blocks. and so read/write will be faster.
Is that correct reasoning?

I think your reasoning is sound for most cases, but there is an edge case where larger block sizes may cause lower throughput.
Consider what happens when you have a system with many small files, and you want to read all those small files. Small files implies that these files do not use the entire 2K block size, but a block must be read in its entirety. So, if you have a million 1K files, and each file uses half of a block, then half of the time doing IO is wasted reading parts of the block that were garbage.

Block Size and inode size in a filesystem

I am going through the book , "Practical Filesystem design " by "Dominic Giampaolo" .
The two important concepts are
Block : The smallest readable or writable unit of memory for a filesystem .
Inode : Inode , is , an area, which stores the data about a file , stores the data about where the blocks composing a file are stored .
The author states about the simplicity introduced by storing a few block addresses directly in i-node . Then he mentions about tradeoff that is faced between "the size of the i-node" and how much data the i-node map .
As such he mentions that the size of the i-node works best when it is an even divisor of the block size .
How to reason out the above statement ? Any calculations to support this ?

Since all read/write operations operate at the block-level, then having your inodes block-aligned and occupying entire blocks ensures that your reads/writes are not wasteful.
If a block is 4096 bytes, but an inode is just 4000 bytes, then either:
1. our inodes are block-aligned: we're not very efficient since we always waste 96 bytes of every block.
2. our inodes are not block-aligned: we're not very efficient since when we want to read an inode, we often need to read two blocks - and none of them will be 100% occupied by inode data.
We remain efficient when:
1. The size of an inode equals to the size of a block (1:1 ratio)
2. The size of an inode is an exact multiple of the size of a block (1:n ratio)
3. The size of a block is an exact multiple of the size of an inode (n:1 ratio)

data block size in HDFS, why 64MB?

The default data block size of HDFS/Hadoop is 64MB. The block size in the disk is generally 4KB.
What does 64MB block size mean? ->Does it mean that the smallest unit of reading from disk is 64MB?
If yes, what is the advantage of doing that?-> easy for continuous access of large files in HDFS?
Can we do the same by using the disk's original 4KB block size?

What does 64MB block size mean?
The block size is the smallest data unit that a file system can store. If you store a file that's 1k or 60Mb, it'll take up one block. Once you cross the 64Mb boundary, you need a second block.
If yes, what is the advantage of doing that?
HDFS is meant to handle large files. Let's say you have a 1000Mb file. With a 4k block size, you'd have to make 256,000 requests to get that file (1 request per block). In HDFS, those requests go across a network and come with a lot of overhead. Each request has to be processed by the Name Node to determine where that block can be found. That's a lot of traffic! If you use 64Mb blocks, the number of requests goes down to 16, significantly reducing the cost of overhead and load on the Name Node.

HDFS's design was originally inspired by the design of the Google File System (GFS). Here are the two reasons for large block sizes as stated in the original GFS paper (note 1 on GFS terminology vs HDFS terminology: chunk = block, chunkserver = datanode, master = namenode; note 2: bold formatting is mine):
A large chunk size offers several important advantages. First, it reduces clients’ need to interact with the master because reads and writes on the same chunk require only one initial request to the master for chunk location information. The reduction is especially significant for our workloads because applications mostly read and write large files sequentially. [...] Second, since on a large chunk, a client is more likely to perform many operations on a given chunk, it can reduce network overhead by keeping a persistent TCP connection to the chunkserver over an extended period of time. Third, it reduces the size of the metadata stored on the master. This allows us to keep the metadata
in memory, which in turn brings other advantages that we will discuss in Section 2.6.1.
Finally, I should point out that the current default size in Apache Hadoop is is 128 MB (see dfs.blocksize).

In HDFS the block size controls the level of replication declustering. The lower the block size your blocks are more evenly distributed across the DataNodes. The higher the block size your data are potentially less equally distributed in your cluster.
So what's the point then choosing a higher block size instead of some low value? While in theory equal distribution of data is a good thing, having a too low blocksize has some significant drawbacks. NameNode's capacity is limited, so having 4KB blocksize instead of 128MB means also having 32768 times more information to store. MapReduce could also profit from equally distributed data by launching more map tasks on more NodeManager and more CPU cores, but in practice theoretical benefits will be lost on not being able to perform sequential, buffered reads and because of the latency of each map task.

In normal OS block size is 4K and in hadoop it is 64 Mb.
Because for easy maintaining of the metadata in Namenode.
Suppose we have only 4K of block size in hadoop and we are trying to load 100 MB of data into this 4K then here we need more and more number of 4K blocks required. And namenode need to maintain all these 4K blocks of metadata.
If we use 64MB of block size then data will be load into only two blocks(64MB and 36MB).Hence the size of metadata is decreased.
Conclusion:
To reduce the burden on namenode HDFS prefer 64MB or 128MB of block size. The default size of the block is 64MB in Hadoop 1.0 and it is 128MB in Hadoop 2.0.

It has more to do with disk seeks of the HDD (Hard Disk Drives). Over time the disk seek time had not been progressing much when compared to the disk throughput. So, when the block size is small (which leads to too many blocks) there will be too many disk seeks which is not very efficient. As we make progress from HDD to SDD, the disk seek time doesn't make much sense as they are moving parts in SSD.
Also, if there are too many blocks it will strain the Name Node. Note that the Name Node has to store the entire meta data (data about blocks) in the memory. In the Apache Hadoop the default block size is 64 MB and in the Cloudera Hadoop the default is 128 MB.

If block size was set to less than 64, there would be a huge number of blocks throughout the cluster, which causes NameNode to manage an enormous amount of metadata.
Since we need a Mapper for each block, there would be a lot of Mappers, each processing a piece bit of data, which isn't efficient.

The reason Hadoop chose 64MB was because Google chose 64MB. The reason Google chose 64MB was due to a Goldilocks argument.
Having a much smaller block size would cause seek overhead to increase.
Having a moderately smaller block size makes map tasks run fast enough that the cost of scheduling them becomes comparable to the cost of running them.
Having a significantly larger block size begins to decrease the available read parallelism available and may ultimately make it hard to schedule tasks local to the tasks.
See Google Research Publication: MapReduce
http://research.google.com/archive/mapreduce.html

Below is what the book "Hadoop: The Definitive Guide", 3rd edition explains(p45).
Why Is a Block in HDFS So Large?
HDFS blocks are large compared to disk blocks, and the reason is to
minimize the cost of seeks. By making a block large enough, the time
to transfer the data from the disk can be significantly longer than
the time to seek to the start of the block. Thus the time to transfer
a large file made of multiple blocks operates at the disk transfer
rate.
A quick calculation shows that if the seek time is around 10 ms and
the transfer rate is 100 MB/s, to make the seek time 1% of the
transfer time, we need to make the block size around 100 MB. The
default is actually 64 MB, although many HDFS installations use 128 MB
blocks. This figure will continue to be revised upward as transfer
speeds grow with new generations of disk drives.
This argument shouldn’t be taken too far, however. Map tasks in
MapReduce normally operate on one block at a time, so if you have too
few tasks (fewer than nodes in the cluster), your jobs will run slower
than they could otherwise.

File system block size

What is the significance of the file system block size? If my filesystem block size is set at, say 8K, does that mean that all read/write I/O will happen at size 8K? So if my application wants to read say 16 bytes at offset 4097 then a 4K block starting from offset 4096 will be read?
How do writes work in this case? Suppose I want to write say 64 bytes.

You are right. The block size is the unit of work for the file system. Every read and write is done in full multiples of the block size.
The block size is also the smallest size on disk a file can have. If you have a 16 byte Block size,then a file with 16 bytes size occupies a full block on disk.
The book "Practical file system design" states:
Block: The smallest unit writable by a disk or ﬁle system. Everything a
ﬁle system does is composed of operations done on blocks. A ﬁle system
block is always the same size as or larger (in integer multiples) than the
disk block size.

Normally when you have to deal with files in programming you should use Stream abstraction.
I/O operations through code are often reads and writes to streams; reading and writing from and to streams, can be buffered so that chunks of file can be read or written.
Block size on fs refers to mapping disk surface; minor the size of the single block major the number of blocks (and so the elements in the table that keeps information on allocation of files).
So OS's so can map file on disk discretely based on block size and have a smaller "map of files".
As I know this doesn't affect stream abstraction in API's of programming language.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How much do modern filesystems reserve for each block group? - filesystems

Related

AOSP: How to change size of file systems?

How will changing 1K block size in a fixed block size file system to 2K affect disk throughput/utilization

Block Size and inode size in a filesystem

data block size in HDFS, why 64MB?

File system block size

Categories

Resources