AOSP: How to change size of file systems? - filesystems

When i do df -h, I can see that /dev/block/dm-2 is mounted on vendor, /dev/block/dm-0 on /(system, i guess?) etc. as shown below.
Filesystem Size Used Avail Use% Mounted on
tmpfs 978M 816K 978M 1% /dev
tmpfs 978M 0 978M 0% /mnt
/dev/block/mmcblk2p11 11M 144K 11M 2% /metadata
/dev/block/dm-0 934M 931M 2.8M 100% /
/dev/block/dm-2 228M 227M 708K 100% /vendor
As it can be seen, both the vendor and system partitions are almost full. How can i increase the size of both the file systems?

It may have enable dynamic partitions. Have a look at: https://source.android.com/devices/tech/ota/dynamic_partitions/implement?hl=en.
With dynamic partitions, vendors no longer have to worry about the individual sizes of partitions such as system, vendor, and product. Instead, the device allocates a super partition, and sub-partitions can be sized dynamically within it. Individual partition images no longer have to leave empty space for future OTAs. Instead, the remaining free space in super is available for all dynamic partitions.

Related

Do files in /dev/shm take up memory when grown with ftruncate but are not written?

I'm using mmap to create shared memory segments, and I'm wondering if I can pre-create all the segments I'm going to possibly use in /dev/shm without triggering any memory use. The reason I suspect this may be possible is that I know most filesystems have a concept of all-zero pages, and it's possible when you initially grow a file before you do any writes to have the file not really take up space because of these 'hole' pages. But is this true for tmpfs (filesystem for /dev/shm)? Can I go wild making large files in /dev/shm without triggering memory use as long as I don't write to them?
On Linux, the tmpfs file system supports sparse files. Just resizing the file does not allocate memory (beyond the internal tmpfs data structures). Just like with regular file systems which support sparse files (files with holes), you either have to actually write data or use fallocate to allocate backing storage. As far as I can see, this has been this way since the Linux 2.6 days.

Fragmentation in modern file systems

I was tinkering with Pintos OS file system and wonder:
How do modern file systems handle fragmentation issue, including internal, external and data?
OK, so it's file fragmentation you are interested in.
The answer is it depends entirely on the file system and the operating system. In the case of traditional eunuchs file systems, the disk is inherently fragmented. There is no concept whatsoever of contiguous files. Files are stored in changed data blocks. This is why paging is done to partitions and most database systems on eunuchs use partitions.
"Hard" file systems that allow contiguous files manage them in different ways. A file consists of one or more "extents." If the initial extent gets filled, the file system manager creates a new extent and chains to it.In some systems there are many options for file creation. One can specify the initial size of the file and reserve space for subsequent allocations (ie, the size of the first extent and the size of additional extents).
When a hard file system gets fragmented, there are different approaches for dealing with it. In some systems, the normal way of "defragging" is to do an image back up to secondary storage then restore. This can be part of the normal system maintenance process.
Other system use "defragging" utilities that either run as part of the regular system schedule or are manually run.
The problem of disk fragmentation is often exaggerated. If you have a disk with a reasonable amount of space, you don't really tend to get much file fragmentation. Disk fragmentation—yes; but this is not really much of a problem if you have sufficient free disk space. File fragmentation occurs when (1) you don't have enough free contiguous disk space or (2) [most likely with reasonable disk space] you have a file that continually gets added data.
Most file systems indeed have ways to deal with fragmentation. I'll however describe the situations for the usual file systems that are not too complex.
For Ext2, for each file there are 12 direct block pointers that point to the blocks where the file is contained. If they are not enough, there is one singly indirect block that points to block_size / 4 blocks. If they are still not enough, there is a doubly indirect block that points to block_size / 4 singly indirect blocks. If not yet enough, there is a triply indirect block that points to block_size / 4 doubly indirect blocks. This way, the file system allows fragmentation at block boundaries.
For ISO 9660, which is the usual file system for CDs and DVDs, the file system doesn't support fragmentation as is. However, it's possible to use multiple consecutive directory records in order to split a big (more than 2G/4G, the maximum describable file size) file into describable files. This might cause fragmentation.
For FAT, the file allocation table describes the location and status of all data clusters on the disk in order to allow fragmentation. So when reading the next cluster, the driver looks up in the file allocation table to find the number of the next cluster.

Crate - What is the minimum memory requirement for a node host?

I can find cheap VPS hosts with 128MB RAM, and I wonder if that is enough to run a crate node for a tiny database, initially for testing. (I'm not looking for recommended memory, but the minimum one, for not running into out-of-memory exceptions. Crate is supposed to be the only service in the node.)
It is possible to run Crate in such an environment. I wouldn't recommend it, though. In any case you need to take a few precautions:
Select a lean Linux distribution that actually boots and runs with such a small memory footprint. Alpine might be one choice.
Install Java. You need at least openjdk7 (update 55 and up).
Install and start Crate from the tarball as explained on the Crate website.
On a virtual machine with 128 MB RAM on top of Alpine 3.3, I installed openjdk8-jre (you have to enable community repositories in /etc/apk/repositories) on disk. I downloaded the Crate 0.54.7 tarball and just extracted it. I set CRATE_HEAP_SIZE=64m as this is the recommeded half of the available memory.
I created a table "demo"
DROP TABLE IF EXISTS demo;
CREATE TABLE demo (
data string
);
and filled it up with 10,000 records of 10 KB random strings each with a slow bash script:
head -c7380 /dev/urandom | uuencode - | grep ^M | tr -d '\n\047'
This took a few minutes (about 20 records/s), but with bulk inserts it should be way faster and just take seconds.
The net amount of data was about 100 MB and took 287 MB gross on disk as reported by the admin UI.
Operating system, the installed software, and the data altogether claimed 820 MB on the disk.
I configured twice the amount of memory as swapspace and got the following footprint (the Crate process itself without data takes up about 40 MB):
# free
total used free shared buffers cached
Mem: 120472 117572 2900 0 652 6676
-/+ buffers/cache: 110244 10228
Swap: 240636 131496 109140
A fulltext search over all 10,000 records (SELECT count(*) FROM demo WHERE data LIKE '%ABC%') took about 1.9 seconds.
Summary: Yes, it's possible, but you lose a lot of features if you actually do so. Your results will heavily depend on the type of queries you actually run.
I just played around a bit how much you could reduce the HEAP size to, and it looks like that 64MB heap (128MB memory) could work out for your use case.
Make sure you set the HEAP size correctly using the CRATE_HEAP_SIZE (docs) environment variable and also set bootstrap.mlockall: true (docs) to the JVM does not swap memory.
However, I would recommend at least 256MB HEAP (512MB memory).

data block size in HDFS, why 64MB?

The default data block size of HDFS/Hadoop is 64MB. The block size in the disk is generally 4KB.
What does 64MB block size mean? ->Does it mean that the smallest unit of reading from disk is 64MB?
If yes, what is the advantage of doing that?-> easy for continuous access of large files in HDFS?
Can we do the same by using the disk's original 4KB block size?
What does 64MB block size mean?
The block size is the smallest data unit that a file system can store. If you store a file that's 1k or 60Mb, it'll take up one block. Once you cross the 64Mb boundary, you need a second block.
If yes, what is the advantage of doing that?
HDFS is meant to handle large files. Let's say you have a 1000Mb file. With a 4k block size, you'd have to make 256,000 requests to get that file (1 request per block). In HDFS, those requests go across a network and come with a lot of overhead. Each request has to be processed by the Name Node to determine where that block can be found. That's a lot of traffic! If you use 64Mb blocks, the number of requests goes down to 16, significantly reducing the cost of overhead and load on the Name Node.
HDFS's design was originally inspired by the design of the Google File System (GFS). Here are the two reasons for large block sizes as stated in the original GFS paper (note 1 on GFS terminology vs HDFS terminology: chunk = block, chunkserver = datanode, master = namenode; note 2: bold formatting is mine):
A large chunk size offers several important advantages. First, it reduces clients’ need to interact with the master because reads and writes on the same chunk require only one initial request to the master for chunk location information. The reduction is especially significant for our workloads because applications mostly read and write large files sequentially. [...] Second, since on a large chunk, a client is more likely to perform many operations on a given chunk, it can reduce network overhead by keeping a persistent TCP connection to the chunkserver over an extended period of time. Third, it reduces the size of the metadata stored on the master. This allows us to keep the metadata
in memory, which in turn brings other advantages that we will discuss in Section 2.6.1.
Finally, I should point out that the current default size in Apache Hadoop is is 128 MB (see dfs.blocksize).
In HDFS the block size controls the level of replication declustering. The lower the block size your blocks are more evenly distributed across the DataNodes. The higher the block size your data are potentially less equally distributed in your cluster.
So what's the point then choosing a higher block size instead of some low value? While in theory equal distribution of data is a good thing, having a too low blocksize has some significant drawbacks. NameNode's capacity is limited, so having 4KB blocksize instead of 128MB means also having 32768 times more information to store. MapReduce could also profit from equally distributed data by launching more map tasks on more NodeManager and more CPU cores, but in practice theoretical benefits will be lost on not being able to perform sequential, buffered reads and because of the latency of each map task.
In normal OS block size is 4K and in hadoop it is 64 Mb.
Because for easy maintaining of the metadata in Namenode.
Suppose we have only 4K of block size in hadoop and we are trying to load 100 MB of data into this 4K then here we need more and more number of 4K blocks required. And namenode need to maintain all these 4K blocks of metadata.
If we use 64MB of block size then data will be load into only two blocks(64MB and 36MB).Hence the size of metadata is decreased.
Conclusion:
To reduce the burden on namenode HDFS prefer 64MB or 128MB of block size. The default size of the block is 64MB in Hadoop 1.0 and it is 128MB in Hadoop 2.0.
It has more to do with disk seeks of the HDD (Hard Disk Drives). Over time the disk seek time had not been progressing much when compared to the disk throughput. So, when the block size is small (which leads to too many blocks) there will be too many disk seeks which is not very efficient. As we make progress from HDD to SDD, the disk seek time doesn't make much sense as they are moving parts in SSD.
Also, if there are too many blocks it will strain the Name Node. Note that the Name Node has to store the entire meta data (data about blocks) in the memory. In the Apache Hadoop the default block size is 64 MB and in the Cloudera Hadoop the default is 128 MB.
If block size was set to less than 64, there would be a huge number of blocks throughout the cluster, which causes NameNode to manage an enormous amount of metadata.
Since we need a Mapper for each block, there would be a lot of Mappers, each processing a piece bit of data, which isn't efficient.
The reason Hadoop chose 64MB was because Google chose 64MB. The reason Google chose 64MB was due to a Goldilocks argument.
Having a much smaller block size would cause seek overhead to increase.
Having a moderately smaller block size makes map tasks run fast enough that the cost of scheduling them becomes comparable to the cost of running them.
Having a significantly larger block size begins to decrease the available read parallelism available and may ultimately make it hard to schedule tasks local to the tasks.
See Google Research Publication: MapReduce
http://research.google.com/archive/mapreduce.html
Below is what the book "Hadoop: The Definitive Guide", 3rd edition explains(p45).
Why Is a Block in HDFS So Large?
HDFS blocks are large compared to disk blocks, and the reason is to
minimize the cost of seeks. By making a block large enough, the time
to transfer the data from the disk can be significantly longer than
the time to seek to the start of the block. Thus the time to transfer
a large file made of multiple blocks operates at the disk transfer
rate.
A quick calculation shows that if the seek time is around 10 ms and
the transfer rate is 100 MB/s, to make the seek time 1% of the
transfer time, we need to make the block size around 100 MB. The
default is actually 64 MB, although many HDFS installations use 128 MB
blocks. This figure will continue to be revised upward as transfer
speeds grow with new generations of disk drives.
This argument shouldn’t be taken too far, however. Map tasks in
MapReduce normally operate on one block at a time, so if you have too
few tasks (fewer than nodes in the cluster), your jobs will run slower
than they could otherwise.

How much do modern filesystems reserve for each block group?

In reading about the Unix FFS, I've read that 10% of the disk space is reserved so that files' data blocks can be ensured to be in the same cylinder group. Is this still true with filesystems like ext2/ext3, is there space reserved so that files' data blocks can all be in the same block group? Is it also 10%? or does it vary? Also, is the same true for journaling filesystems as well? Thank you.
first of all i think that ext filesystems implement the same notion of a cylinder group,
they just call it block group.
to find out about it , you can fdisk the partition to find your actual block count
and blocks/group number .Then the number of block groups = block count / (block/group).
They are used in exactly the same way as FFS cgs (to speed up access times).
Now journaling IMO has nothing to do with this operation, except that it actually wastes
some more space on your disk :). As far as i understand , soft updates which is the BSD solution to the problem that a journal would solve in typical ext filesystems, don't require extra space , but are tremendously complex to implement and add new features on (like resizing).
interesting read:
ext3 overhead disclosed part 1
cheers!
My data for fresh ext2 images are:
Size Block size Bl/Gr Total bytes Free bytes Ratio
1MB 1024 8192 1048576 1009664 0.03710
10MB 1024 8192 10485760 10054656 0.04111
100MB 1024 8192 104857600 99942400 0.04688
512M 4096 32768 536870912 528019456 0.01649
1G 4096 32768 1073741824 1055543296 0.01695
10G 4096 32768 10737418240 10545336320 0.01789
So, it's quite predictable that the space efficiency of an Ext2 filesystem depends on block size due to layout described in the above answer: filesystem is a set of block groups, for each group its size is determined as count of blocks which can be described by a 1-block bitmap => for a 4096 byte block there are 8 * 4096 blocks.
Conclusion: for ext2/ext3 family of filesystems average default consumption of space depends on block size:
~ 1.6 - 1.8 % for 4096 byte blocks, ~ 4 % for 1024 ones

Resources