HOW DO I SOLVE THIS SUPPOSEDLY HARD QN? - filesystems

Consider a file system that uses contiguous allocation method. For a disk
consists of 100 data blocks, each block is 4KB. What is the maximum and
minimum number of files of size 15KB can it save? Find the number of files the
disk able to support using (1) link allocation and (2) index allocation, assuming
the address is 32 bit

Related

Are Arrays Contiguous? (Virtual vs Physical)

I read that arrays are contiguous in Virtual Memory but probably not in Physical memory, and I don't get that.
Let's suppose I have an array of size 4KB (one page = one frame size), In virtual memory that array is one page.
In virtual memory every page in translated into one frame so our array is still contiguous...
(In Page Table we translate pages into frames not every byte into its own frame...)
Side Question: (When Answering this please mention clearly it's for the side note):
When allocating array in virtual memory of size one page does it have to be one page or could be split into two contiguous pages in virtual memory (for example bottom half of first one and top half of the second)? In this case at worst the answer above is 2, am I wrong?
Unless the start of the array happens to be aligned to the beginning of a memory page, it can still occupy two pages; it can start near the end of one page and end on the next page. Arrays allocated on the stack will probably not be forced to occupy a single page, because stack frames are simply allocated sequentially in the stack memory, and the array will usually be at the same offset within each stack frame.
The heap memory allocator (malloc()) could try to ensure that arrays that are smaller than a page will be allocated entirely on the same page, but I'm not sure if this is actually how most allocators are implemented. Doing this might increase memory fragmentation.
I read that arrays are contiguous in Virtual Memory but probably not in Physical memory, and I don't get that.
This statement is missing something very important. The array size
For small arrays the statement is wrong. For "large/huge" arrays the statement is correct.
In other words: The probability of an array being split over multiple non-contiguous physical pages is a function of the array size.
For small arrays the probability is close to zero but the probability increases as the array size increase. When the array size increases above the systems page size, the probability gets closer and closer to 1. But an array requiring multiple page may still be contiguous in physical memory.
For you side question:
With an array size equal to your systems page size, the array can at maximum span two physical pages.
Anything (array, structure, ...) that is larger than the page size must be split across multiple pages; and therefore may be "virtually contiguous, physical non-contiguous".
Without further knowledge or restriction; anything (array, structure, ...) that is between its minimum alignment (e.g. 4 bytes for an array of uint32_t) and the page size has a probability of being split across multiple pages; where the probability depends on its size and alignment. For example, if page size is 4096 bytes and an array has a minimum alignment of 4 bytes and a size of 4092 bytes, then there's 2 chances in 1024 that it will end up on a single page (and a 99.8% chance that it will be split across multiple pages).
Anything (variable, tiny array, tiny structure, ...) that has a size equal to its minimum alignment won't (shouldn't - see note 3) be split across multiple pages.
Note 1: For anything using memory allocated from the heap, the minimum alignment can be assumed to be the (implementation defined) minimum alignment provided by the heap and not the minimum alignment of the object itself. E.g. for an array of uint16_t the minimum alignment would be 2 bytes; but malloc() will return memory with much larger alignment (maybe 16 bytes)
Note 2: When things are nested (e.g. array inside a structure inside another structure) all of the above applies to the outer structure only. E.g. if you have an array of uint16_t inside a structure where the array happens to begin at offset 4094 within the structure; then it will be significantly more likely that the array will be split across pages.
Note 3: It's possible to explicitly break minimum alignment using pointers (e.g. use malloc() to allocate 1024 bytes, then create a pointer to an array that begins at any offset you want within the allocated area).
Note 4: If something (array, structure, ...) is split across multiple pages; then there's a chance that it will still be physically contiguous. For worst case this depends on the amount of physical memory (e.g. if the computer has 1 GiB of usable physical memory and 4096 byte pages, then there's approximately 1 chance in 262000 that 2 virtually contiguous pages will be "physically contiguous by accident"). If the OS implements page/cache coloring (see https://en.wikipedia.org/wiki/Cache_coloring ) it improves the probability of "physically contiguous by accident" by the number of page/cache "colors" (e.g. if the computer has 1 GiB of usable physical memory and 4096 byte pages, and the OS uses 256 page/cache colors, then there's approximately 1 chance in 1024 that 2 virtually contiguous pages will be "physically contiguous by accident").
Note 5: Most modern operating systems using multiple page sizes (e.g. 4 KiB pages and 2 MiB pages, and maybe also 1 GiB pages). This can either make it hard to guess what the page size actually is, or improve the probability of "physically contiguous by accident" if you assume the smallest page size is used.
Note 6: For some CPUs (e.g. recent AMD/Zen) the TLBs behave as if pages are larger (e.g. as if you're using 16 KiB pages and not 4 KiB pages) if and only if page table entries are compatible (e.g. if 4 page table entries describe four physically contiguous 4 KiB pages with the same permissions/attributes). If an OS is optimized for these CPUs the result is similar to having an extra page size (4 KiB, "16 KiB", 2 MiB and maybe 1 GiB).
When allocating array in virtual memory of size one page does it have to be one page or could be split into two contiguous pages in virtual memory (for example bottom half of first one and top half of the second)?
When allocating an array in heap memory of size one page; the minimum alignment would be the implementation defined minimum alignment provided by the heap manager/malloc() (e.g. maybe 16 bytes). However; most modern heap managers switch to using an alternative (e.g. mmap() or VirtualAlloc() or similar) when the amount of memory being allocated is "large enough"; so (depending on the implementation and their definition of "large enough") it might be page aligned.
When allocating an array in raw virtual memory (e.g. using mmap() or VirtualAlloc() or similar yourself, and NOT using the heap and not using something like malloc()); page alignment is guaranteed (mostly because the virtual memory manager doesn't deal with anything smaller).

Use of memory pools in C (Embedded systems)

I'm writing a piece of code for an embedded system (mcu Cortex-M4 with very limited ram < 32KB) where I need big chunks of memory for a very small and defined time and I thought to use some kind of a memory pool so the memory won't get wasted for functions and actions that run once or twice a lifetime.
I made some efforts but I think I'd like to learn more about memory pools and rewrite my code better.
I saw some examples where a pointer is used to point to the next available free chunk, but I don't understand how I can process those kind of pools.
For example, I can't use strstr in a memory pool where the total string would be spread into more than one chunk. I would need to read chunk by chunk and store the total string into one larger array to carry on further process. Please correct me if I'm wrong.
So, if I get it right, if I have a memory pool of 1024 bytes with 32 bytes long for each chunk that gives us 32 chunks in total. And if I want to store a string of total length let's say 256 chars (bytes) I'd need 8 chunks but if I want to read the string I'd need to copy those 8 chunks into a 256 chars array.
Am I missing something?

How to get largest free contiguous block of memory in FatFs

Using the FatFs and their API, I am trying to pre-allocate file system space for the remainder of the drive to write files of unknown sizes. At the end of the file write process, any unused space is then truncated (f_truncate). However, I'm having a problem allocating enough space when the file system become fragmented after deletion of some files. To overcome this problem, I would like to allocate only enough space as the largest contiguous block of memory on the file system.
In the FatFS API, there are functions to get the amount of free space left on the device, which is f_getfree. There is also the f_expand function, which takes in a size in bytes and returns whether or not a free contiguous block of memory exists for that given size.
Is there an efficient way to calculate the largest free contiguous block of memory available? I'm trying to avoid any sort of brute force "guess and check" method. Thanks
One way would be to create your own extension to the API to count contiguous FAT entries across all the sectors.
Without modifying the API, you could use f_lseek(). Write open a file, use f_lseek() to expand the size of the file until 'disk full'(end of contiguous space). This would need to be repeated with new files until all of the disk was allocated. Pick from this the maximum allocated file, and delete the others.

Why did Windows use the FAT structure instead of a conventional linked list with a next pointer for each data block of a file?

Instead of storing references to next nodes in a table, why couldn't it be just stored like a conventional linked list, that is, with a next pointer?
This is due to alignment. FAT (and just about any other file system) stores file data in one or more whole sectors of the underlying storage. Because the underlying storage can only read and write whole sectors such allocation allows efficient access to the contents of a file.
Issues with interleaving
When a program wants to store something in a file it provides a buffer, say 1MB of data to store. Now if the file's data sectors have to also keep next pointers to their next sector, this pointer information will need to be interleaved with the actual user data. So the file system would need to build another buffer (of slightly more than the provided 1MB), for each output sector copy some of the user data and the corresponding next pointer and give this new buffer to the storage. This would be somewhat inefficient. Unless the file system always stores file data to new sectors (and most usually don't), rewriting these next pointers will also be redundant.
The bigger problem would be when read operation is attempted on the file. Files will now work like tape devices: with only the location of the first sector known in the file's primary metadata, in order to reach sector 1000, the file system will need to read all sectors before it in order: read sector 0, find the address of sector 1 from the loaded next pointer, read sector 1, etc. With typical seek times of around 10 ms per random I/O (assuming a hard disk drive), reaching sector 1000 will take about 10 seconds. Even if sectors are sequentially ordered, while the file system driver processes sector N's data, the disk head will be flying over the next sector and when the read for sector N+1 is issued it may be too late, requiring the disk to rotate entire revolution (8.3ms for 7200 RPM drive) before being able to read the next sector again. On-disk cache can and will help with that though.
Writing single sector is usually atomic operation (depends on hardware): reading back the sector after power failure returns either its old content or the new one without intermediate states. Database applications usually need to know which writes would be atomic. If the file system interleaves file data and metadata in the same sectors, it will need to report smaller than the actual sector size to the application. For example instead of say 512 bytes it may need to report 504. But it can't do it because sector size is usually assumed by applications to be power of 2. Furthermore file stored on such filesystem would very likely be unusable if copied to another file system with different reported sector size.
Better approaches
The FAT format is better because all next pointers are stored in adjacent sectors. For FAT12, FAT16 and not very large FAT32 volumes the entire table is small enough to fit in memory. FAT still records the blocks of a file in a linked list, so to have efficient random access, an implementation needs to cache the chain per file. On large enough volumes (that can sport large enough file) such cache may no longer fit in memory.
ext3 uses direct and indirect blocks. This simple format avoids the need for preprocessing that FAT requires and goes by with only minimal amount of additional reads per I/O when indirect blocks are needed. These additional reads are cached by the operating system so that their overhead is often negligible.
Other variants are also possible and used by various file systems.
Random notes
For the sake of completeness, some hard disk drives can be formatted with slightly larger sector sizes (say 520 bytes) so that the file system can pack 512 bytes of file data with several bytes of metadata in the same sector. Yet because of the above, I don't believe anyone has used such formats for storing the address of the file's next sector. These additional bytes can be put to better use: additional checksums and timestamping come to mind. The timestamping I believe is used to improve the performance of some RAID systems. Still such usage is rare, and most software can't work with them at all.
Some file systems can save the content of small enough files in the file metadata directly without occupying distinct sectors. ReiserFS has the controversial tail packing. This is not important here: large files still benefit from having proper mapping to storage sectors.
Any modern OS requires much more than a pointer to the next data block for its file system: attributes (encryption, compression, hidden, ...), security descriptors (ACL list items), support for different hardware, buffering. This is just a tiny fraction of functionality that any good file system does.
Have a look at file system at Wikipedia to learn what else any modern file system does.
If we ignore the detail of FAT12 sharing a byte between two items to compact 12 bite as 1.5 bytes, then we can concentrate on the deeper meaning of the question.
It turns out that the FAT system is equivalent to a linked list with the following points:
The "next" pointer is located in an array (the FAT) instead of being appended or prepended to the actual data
The value written in "next" is an integer instead of the more familiar memory address of the next node.
The nodes are not reserved dynamically but represented by another array. That array is the entire data part of the hard drive.
One fascinating exercise we were assigned as part of the software engineer education was to convert an application using memory pointer to an equivalent application which use integer value. The rationale was that some processors (PDP-11? or another PDP-xx) would perform integer arithmetic much faster than memory pointer operation or maybe even did forbid any arithmetic on pointers.

I need to increase the Maximum possible array size

I have a 4GB Ram installed on Coure2Duo PC with a 32bit Windows 7 Operating system. I have increased the paging size up to 106110MB. But after doing all this I am not able to significantly increase the maximum array size.
Following are the specs
memory
Maximum possible array: 129 MB (1.348e+08 bytes) *
Memory available for all arrays: 732 MB (7.673e+08 bytes) **
Memory used by MATLAB: 563 MB (5.899e+08 bytes)
Physical Memory (RAM): 3549 MB (3.722e+09 bytes)
* Limited by contiguous virtual address space available.
** Limited by virtual address space available.
Kindly help me on your earliest. I am not even able to read a file of 48+MB size in double format.
There are two things you can do to clear up memory for MATLAB. Since you're using a 32-bit version of the program, you're normally limited to 2GB of memory. Using the /3GB switch while opening the program makes an additional 1GB of RAM available to that program.
Second, you should consider using the pack() function, which rearranges variables in RAM to free up more contiguous memory space. This, more than anything, is affecting your ability to open individual arrays.
Remember: you can figure out how many items an array will hold by dividing the memory amount available by the size of the variable type. Double variables take up 8 bytes each. Your 129MB of space available should allow around 16.85 million double values in a single array.
You can view information about memory usage using the memory functions included in MATLAB.
memory shows the memory information
inmem will show you the variables and functions stored in memory
clear will allow you to clear the memory of specific variables or functions.
You may try to set the 3GB switch, maybe this increases the possible memory. Otherwise: Switch to a 64 bit os. Your system wastes 547MB of RAM simply because there are no addresses for it.

Resources