MBR organization - filesystems

The field in the MBR partition table with 8 bytes offset is intended to store the LBA-address of the partition beginning. LBA addressing was introduced to address larger disks when there were not already enough CHS-coordinates. But at the time of the introduction of MBR support, such disks did not exist yet. Hence the question: what did the fields with offsets of 8 and 12 bytes initially contain? Were they reserved for the further structure development or had some other purpose?
Also I an interested in the implementation of LBA-48. 32 bits are allocated for storing an LBA address in the MBR, so how can a 48-bit address be stored in 32 bits?

Two wikipedia articles gives useful details: https://en.wikipedia.org/wiki/Master_boot_record and https://en.wikipedia.org/wiki/Logical_block_addressing. Other useful source is The Starman's "All the Details of many versions of both MBR".
In MBR there are "Partition table entries"; and for LBA-compatible entries thay have both CHS (3 byte) and LBA (4 byte) addresses. CHS gives two addresses, one for first sector of the partition and other for last sector of the partition. And LBA addresses are LBA address of first sector and total number of sectors in this partition. So both types of addresses are not for bytes but for sectors, and Wikipedia says "The sector size used to be considered fixed at 512 (29) bytes".
With 4 bytes we can encode maximum sizes of the disk up to 2 TB as 4294967295 (2 to 32 power minus 1) multiplied by 512 bytes per sector (2147483647.5 kilobytes).
LBA-48 can't be stored in the officially supported MBR partition entry, as 48 bits is 6 bytes and MBR (and EBR for 5th logical partition) have only 4 bytes to store LBA start and end sector addresses. LBA-48 is used with GUID GPT - https://en.wikipedia.org/wiki/Logical_block_addressing#LBA48
The current 48-bit LBA scheme was introduced in 2003 with the ATA-6 standard,[4] raising the addressing limit to 2^48 × 512 bytes, which is exactly 128 PiB or approximately 144.1PB. ... However, the common DOS style Master Boot Record (MBR) partition table only supports disk partitions up to 2 TiB in size. For larger partitions this needs to be replaced by another scheme, for instance the GUID Partition Table (GPT) which has the same 64-bit limit as the current INT 13h Extensions.
(There is proposed incompatible MBR format for LBA48 in some random wiki-like site, but it will not work with most OS which are expecting correct classic MBR.)
LBA was introduced in 1996 "in Windows 95B and DOS 7.10 in order to support disks larger than 8 GB" as says the Wikipedia. There is related IBM patent granted in 1999 (expired probably in October 2019): Address offset feature for a hard disk drive, US6415383
Little is known about pre-LBA epoch of MBR, but in DOS 2.0 partition table had 64 bytes total size with 4 partitions and 16 bytes per partition. It was the size encoded in MBR parsing code in DOS2:
An Examination of the Assembly Code
062C 83C610 ADD SI,+10 ; Checking the next entry...
; (10h = 16 bytes per entry)
Even The Starman's MBR resource have no information on reasons why 16 bytes were allocated for every partition table entry.
I found earlier 1990-1992 patent of AST Research (now assigned to Samsung), System for multiple access hard disk partitioning, US5136711A which gives partition table layout in Image 2 (Figure 3) with something which sounds very like LBA:
Each of the partition identifier segments 133, 134, 135, and 136 comprises 16 bytes of disk space making up a partition table 140 (FIG. 3) containing identification information for the corresponding disk partition.
Partition table 140 was defined in image as:
141 Boot indicator,
142 Head number,
144 Sector number,
cylinder number,
148 System indicator,
150 head number,
152 Sector number,
cylinder number,
154 Boot sector address,
156 Sector number
And in text of AST 1992 patent partition table described as:
The partition table 140 comprises a boot indicator byte 141 to identify whether the corresponding partition segment P4 is a bootable partition or a non-bootable partition. Only one partition of P4, P3, P2, and P1 may be bootable at a given time. The partition table further comprises a physical starting head number byte 142, a physical starting cylinder and physical starting sector segment 144, a system indicator byte 148 which identifies the type of operating system, a physical ending head number byte 150, a physical ending cylinder and physical ending sector segment 152, a boot sector address segment 154, and a sector number segment 156 which indicates the number of sectors in the partition P4 as is well understood in the art.
So, in my hypothesis probably field +8 may be used to point to boot sector of the partition (it may be placed not in the first sector?) and +12 may be used to check partition size calculations. But in DOS2 code there was no actual reading of +8 and +12 fields. They may be just reserved in IBM MBR and reused in AST patent for some LBA-like usage.
PCMag from 1991 (PC Mag 10 Sep 1991, page 410) also says that 4-byte fields were already used for LBA-like sector addresses:
Each record in the partition table is 16 bytes, including 4 each for the starting sector and the number of sectors. In addition, one byte is reserved for the partition byte.
Same for 1992 Mark Minasi book "The Hard Disk Survival Guide" at least for last +12 field (partition size), page 279 (there are some snippets in google books):
Getting this number to fix a boot record is simple: It is in the MBR. The last four bytes of each partition table entry is the partition length in secotrs.

Related

How many bytes is a gigabyte (GB)?

When I convert between 1GB to byte using online tools, I get different answers. For instance, using Google Convertor: 1GB=1e+9 while in another converter I get 1GB= 1073741824. I suppose the unit is used in different fashion based on whether 1KB=1024B or 1KB=1000B (this is Google unit).
How can I know which unit my machine uses using a small C program or function? Does C have a macro for that? I want to do that as my program will possibly be run via various operating systems.
The two tools are converting two different units.
1 GB = 10^9 bytes while 1 GiB = 2^30 bytes.
Try using google converter with GiB instead of GB and the mystery will be solved.
The following will help you understand the conversion a little better.
Factor Name Symbol Origin Derivation Decimal
2^10 kibi Ki kilobinary: (2^10)^1 kilo: (10^3)^1
2^20 mebi Mi megabinary: (2^10)^2 mega: (10^3)^2
2^30 gibi Gi gigabinary: (2^10)^3 giga: (10^3)^3
2^40 tebi Ti terabinary: (2^10)^4 tera: (10^3)^4
2^50 pebi Pi petabinary: (2^10)^5 peta: (10^3)^5
2^60 exbi Ei exabinary: (2^10)^6 exa: (10^3)^6
Note that the new prefixes for binary multiples are not part of the International System of Units (SI). However, for ease of understanding and recall, they were derived from the SI prefixes for positive powers of ten. As shown in the table, the name of each new prefix is derived from the name of the corresponding SI prefix by retaining the first two letters of the SI prefix and adding the letters bi.
There's still a lot of confusion on the usage of GB and GiB in fact very often GB is used when GiB should or was intended to be.
Think about the hard drives world:
Your operating system assumes that 1 MB equals 1 048 576 bytes i.e. 1MiB. Drive manufacturers consider (correctly) 1 MB as equal to 1 000 000 bytes. Thus if the drive is advertised as 6.4 GB (6 400 000 000 bytes) the operating system sees it as approximately 6.1 GB 6 400 000 000/1 048 576 000 = ~6.1 GiB
Take a look at this for more info on prefixes for binary units
and this on metric prefixes.
This is just a confusion of units. There are actually two prefixes G for 10⁹ and Gi for 2³⁰. Bytes should usually be measured with the second, so the correct writing would be GiB.
The “gibibyte” is a multiple of the unit byte for Digital
Information.
The binary prefix gibi means 2^30, therefore one gibibyte is equal to
1073741824 bytes = 1024 mebibytes.
The unit symbol for the gibibyte is GiB. It is one of the units with
binary prefixes defined by the International Electrotechnical
Commission (IEC) in 1998.
The “gibibyte” is closely related to the Gigabyte (GB), which is
defined by the IEC as 10^9 bytes = 1000000000 bytes, 1GiB ≈ 1.024GB.
1024 Gibibytes are equal to One Tebibyte.
In the context of computer memory, Gigabyte and GB are customarily
used to mean 1024^3 (2^30) bytes, although not in the context of data
transmission and not necessarily for Hard Drive size.

What's biggest difference between 64bit file system and 32bit file system

May I ask what is the biggest difference between 64bit file system and 32bit file system?
More available inodes? Bigger partition?
There is no hard-and-fast standard for exactly what bit size means for filesystems, but I usually see it refer to the data type that stores block addresses. More bits translates to a larger maximum partition size and the ability to use bigger drives as a single filesystem. It can sometimes also mean larger maximum file size or a larger number of files allowed per directory.
It's not directly analogous to CPU bit size, and you'll find filesystems that are 48 bits and ones that are 128 bits. The bit size of a particular filesystem is usually very low in importance as it doesn't give you any indication of how fast, resilient, or manageable a filesystem is.

SCSI Read10 vs Read16

Which case would be considered correct?
Doing reads with a Read 16 command no matter if the LBA's are 32 or 64 bit.
If the max LBA is 32 bit then do a Read 10 command and if the max LBA is 64 bit then do a Read 16 command.
What are the pros and cons of each choice?
I know for a Read Capacity command it is correct to run a 10 and if it returns FFFFFFFFh then run a 16. Why is this the case? The Read Capacity 16 command works for both cases and avoids even needing the Read Capacity 10 at all.
Keep in mind that the reason that SCSI has multiple "sizes" of commands like this is, in many cases, because SCSI is a very old protocol. (It was first standardized in 1986, and was in development for some time before that!) At the time, a large SCSI device would range in the low hundreds of megabytes — even a 32-bit LBA was considered excessive at the time. The 64-bit LBA commands simply didn't exist until much later.
The question here is really just whether you want to support these old devices. If you do, you will need to use Read (10) for reads on "small" devices, as they may not recognize Read (16). Similarly, the use of Read Capacity (10) before Read Capacity (16) is because older devices won't recognize the larger version.

What does 128-bit file system mean?

In the introduction to ZFS file system, I saw one statement:
ZFS file system is quite scalable, 128 bit filesystem
What does 128-bit filesystem mean? What makes it scalable?
ZFS is a “128 bit” file system, which means 128 bits is the largest size address for any unit within it. This size allows capacities and sizes not likely to become confining anytime in the foreseeable future. For instance, the theoretical limits it imposes include 2^48 entries per directory, a maximum file size of 16 EB (2^64 or ~16 * 2^18 bytes), and a maximum of 2^64 devices per “zpool”. Source: File System Char.
The ZFS 128-bit addressing scheme and can store 256 quadrillion zettabytes, which translates into a scalable file system that exceeds 1000s of PB (petabytes) of storage capacity, while allowing to be managed in single or multiple ZFS’s Z-RAID arrays. Source: zfs-unlimited-scalability
TLDR it can hold much larger files then a 64 bit F. such as. EXT.
ZFS is a 128-bit file system,[85] so it can address 1.84 × 1019 times more data than 64-bit systems such as Btrfs. The limitations of ZFS are designed to be so large that they should not be encountered in the foreseeable future.
Some theoretical limits in ZFS are:
248 — Number of entries in any individual directory[86]
16 Exbibytes (264 bytes) — Maximum size of a single file
16 Exbibytes — Maximum size of any attribute
256 Zebibytes (278 bytes) — Maximum size of any zpool
256 — Number of attributes of a file (actually constrained to 248 for the number of files in a ZFS file system)
264 — Number of devices in any zpool
264 — Number of zpools in a system
264 — Number of file systems in a zpool
More here.

LRU Cache design in C with of limited size

I'm now working on a software in mobile platform where memory is very small. In a I/O bottleneck function, I need read some bytes from a img file using seek operation(You can assume that seek is slower about 10 times than read directly from memmry). In my test, this function is called 7480325 times and read bytes from bytes_offset 6800 to 130000, so every byte is read 100 times on average(some bytes are read 3~4 times, some more than 1000 times).
Below is my statistics.
bytes offset 6800 ~ 6900: 170884 times
bytes offset 6900 ~ 7000: 220944 times
bytes offset 7000 ~ 7100: 24216 times
bytes offset 7100 ~ 7200: 9576 times
bytes offset 7200 ~ 7300: 14813 times
bytes offset 7300 ~ 7400: 22109 times
bytes offset 7400 ~ 7500: 19748 times
bytes offset 7500 ~ 7600: 43110 times
bytes offset 7600 ~ 7700: 157976 times
...
bytes offset 121200 ~ 121300: 1514 times
bytes offset 121300 ~ 121400: 802 times
bytes offset 121400 ~ 121500: 606 times
bytes offset 121500 ~ 121600: 444 times
bytes offset 121600 ~ 121700: 398 times
max_bytes_offset 121703
min_bytes_offset 6848
Then I want to build a cache using LRU schema to make the I/O performance better. In some others' question, I find hashtable + doubly-linked list is a good way. But how to make a structure to improve my issue in the best way? I plan to build 1300 buckets, and every bucket own a doubly-linked list with max size 10. Then total memory it takes is about 13KB. It is simple to implemented and maintain, but I think the efficiency is not the best.
In my statistics, some bytes offset interval have more hits ratio while some interval have less. How can I build a structure to adapt my statistics?
And when I search for a key, I need traversal the whole list with size 10. Is there any other method with higher efficiency in searching?
In some mobile platform more memory can be used for cache while other platform allows less. How can I make my cache to adapt the allowed memory changes, except changing the size of buckets?
It seems that caf's method is better. Using a big doubly-linked list and a big hash table mapping keys to node entries makes more sense and takes more advantage of LRU. But designing hash function is becoming a hard problem.
I'm waiting for your suggestion, thank you~
If you're only going to have a maximum of 10 entries in each bucket, then you will likely be better off dispensing with the doubly-linked list and just making each bucket a circular array (which is just 10 entries and a "top of list" index).
You may well even be better off discarding the 10-way set associative design and going with a direct-mapped cache (where you have a larger hash table, with each bucket storing only one entry). Set associative designs are good in hardware, where you can use dedicated hardware to do the n-way comparisons in parallel, but not so much in software (unless you have a vector unit you can leverage for this).
One way to adapt to your statistics is to design your hash function so that it maps a different size address range onto each bucket, such that each bucket gets a roughly equal frequency of access.
Changing the size of the hash table is the other obvious way of scaling the size of the cache.

Resources