Logical file System vs Physical file system - file

I was preparing for my operating end semester exam and got stucked at this topic. I searched a lot but didn't found the difference.
The difference between Logical file System and Physical file system
I know the difference between logical address and physical address but I think it doesn't have any relation with this.

I guess the answer refers to physical vs logical blocks
Files can consist of one or more records. A physical record (or physical block)
is the unit of information actually read from or written to a storage device. A logical
record (or logical block) is a collection of data treated as a unit by software. When
each physical record contains exactly one logical record, the file is said to consist of
unblocked records. When each physical record may contain several logical records,
the file is said to consist of blocked records.

I suspect that there is an error in the question. I imagine they are referring to logical disk I/O and physical disk I/O. File systems do not care which is used.
In ye old days, disk blocks were addressed physically. The OS had to request a block by specifying the platter, track, and sector.
In some cases, the OS would add a layer that would create a logical mapping of 0..N to physical blocks. Thus the operating system would translate a request for block X into a physical disk location (platter, track, sector). The OS would have to keep track of bad blocks and remap them.
Now, disks do this translation in hardware (although some disks allow physical I/O for diagnostics). The interface to the disk is logical I/O. The OS simply requests a logical block number and the hardware translates that into physical block location.
As disks move to solid state, physical disk I/O will disappear entirely.

Related

why mmap is faster than traditional file io [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
mmap() vs. reading blocks
I heard (read it on the internet somewhere) that mmap() is faster than sequential IO. Is this correct? If yes then why it is faster?
mmap() is not reading sequentially.
mmap() has to fetch from the disk itself same as read() does
The mapped area is not sequential - so no DMA (?).
So mmap() should actually be slower than read() from a file? Which of my assumptions above are wrong?
I heard (read it on the internet somewhere) that mmap() is faster than sequential IO. Is this correct? If yes then why it is faster?
It can be - there are pros and cons, listed below. When you really have reason to care, always benchmark both.
Quite apart from the actual IO efficiency, there are implications for the way the application code tracks when it needs to do the I/O, and does data processing/generation, that can sometimes impact performance quite dramatically.
mmap() is not reading sequentially.
2) mmap() has to fetch from the disk itself same as read() does
3) The mapped area is not sequential - so no DMA (?).
So mmap() should actually be slower than read() from a file? Which of my assumptions above are wrong?
is wrong... mmap() assigns a region of virtual address space corresponding to file content... whenever a page in that address space is accessed, physical RAM is found to back the virtual addresses and the corresponding disk content is faulted into that RAM. So, the order in which reads are done from the disk matches the order of access. It's a "lazy" I/O mechanism. If, for example, you needed to index into a huge hash table that was to be read from disk, then mmaping the file and starting to do access means the disk I/O is not done sequentially and may therefore result in longer elapsed time until the entire file is read into memory, but while that's happening lookups are succeeding and dependent work can be undertaken, and if parts of the file are never actually needed they're not read (allow for the granularity of disk and memory pages, and that even when using memory mapping many OSes allow you to specify some performance-enhancing / memory-efficiency tips about your planned access patterns so they can proactively read ahead or release memory more aggressively knowing you're unlikely to return to it).
absolutely true
"The mapped area is not sequential" is vague. Memory mapped regions are "contiguous" (sequential) in virtual address space. We've discussed disk I/O being sequential above. Or, are you thinking of something else? Anyway, while pages are being faulted in, they may indeed be transferred using DMA.
Further, there are other reasons why memory mapping may outperform usual I/O:
there's less copying:
often OS & library level routines pass data through one or more buffers before it reaches an application-specified buffer, the application then dynamically allocates storage, then copies from the I/O buffer to that storage so the data's usable after the file reading completes
memory mapping allows (but doesn't force) in-place usage (you can just record a pointer and possibly length)
continuing to access data in-place risks increased cache misses and/or swapping later: the file/memory-map could be more verbose than data structures into which it could be parsed, so access patterns on data therein could have more delays to fault in more memory pages
memory mapping can simplify the application's parsing job by letting the application treat the entire file content as accessible, rather than worrying about when to read another buffer full
the application defers more to the OS's wisdom re number of pages that are in physical RAM at any single point in time, effectively sharing a direct-access disk cache with the application
as well-wisher comments below, "using memory mapping you typically use less system calls"
if multiple processes are accessing the same file, they should be able to share the physical backing pages
The are also reasons why mmap may be slower - do read Linus Torvald's post here which says of mmap:
...page table games along with the fault (and even just TLB miss)
overhead is easily more than the cost of copying a page in a nice
streaming manner...
And from another of his posts:
quite noticeable setup and teardown costs. And I mean noticeable. It's things like following the page tables to unmap everything cleanly. It's the book-keeping for maintaining a list of all the mappings. It's The TLB flush needed after unmapping stuff.
page faulting is expensive. That's how the mapping gets populated, and it's quite slow.
Linux does have "hugepages" (so one TLB entry per 2MB, instead of per 4kb) and even Transparent Huge Pages, where the OS attempts to use them even if the application code wasn't written to explicitly utilise them.
FWIW, the last time this arose for me at work, memory mapped input was 80% faster than fread et al for reading binary database records into a proprietary database, on 64 bit Linux with ~170GB files.
mmap() can share between process.
DMA will be used whenever possible. DMA does not require contiguous memory -- many high end cards support scatter-gather DMA.
The memory area may be shared with kernel block cache if possible. So there is lessor copying.
Memory for mmap is allocated by kernel, it is always aligned.
"Faster" in absolute terms doesn't exist. You'd have to specify constraints and circumstances.
mmap() is not reading sequentially.
what makes you think that? If you really access the mapped memory sequentially, the system will usually fetch the pages in that order.
mmap() has to fetch from the disk itself same as read() does
sure, but the OS determines the time and buffer size
The mapped area is not sequential - so no DMA (?).
see above
What mmap helps with is that there is no extra user space buffer involved, the "read" takes place there where the OS kernel sees fit and in chunks that can be optimized. This may be an advantage in speed, but first of all this is just an interface that is easier to use.
If you want to know about speed for a particular setup (hardware, OS, use pattern) you'd have to measure.

Giving read() a start position

When you give read a start position - does it slow down read()? Does it have to read everything before the position to find the text it's looking for?
In other words, we have two different read commands,
read(fd,1000,2000)
read(fd,50000,51000)
where we give it two arguments:
read(file descriptor, start, end)
is there a way to implement read so that the two commands take the same amount of computing time?
You don't name a specific file system implementation or one specific language library so I will comment in general.
In general, a file interface will be built directly on top of the OS level file interface. In the OS level interface for most types of drives, data can be read in sectors with random access. The drive can seek to the start of a particular sector (without reading data) and can then read that sector without reading any of the data before it in the file. Because data is typically read in chunks by sector, if the data you request doesn't perfectly align on a sector boundary, it's possible the OS will read the entire sector containing the first byte you requested, but it won't be a lot and won't make a meaningful difference in performance as once the read/write head is positioned correctly, a sector is typically read in one DMA transfer.
Disk access times to read a given set of bytes for a spinning hard drive are not entirely predictable so it's not possible to design a function that will take exactly the same time no matter which bytes you're reading. This is because there's OS level caching, disk controller level caching and a difference in seek time for the read/write head depending upon what the read/write head was doing beforehand. If there are any other processes or services running on your system (which there always are) some of them may also be using the disk and contending for disk access too. In addition, depending upon how your files were written and how many bytes you're reading and how well your files are optimized, all the bytes you read may or may not be in one long readable sequence. It's possible the drive head may have to read some bytes, then seek to a new position on the disk and then read some more. All of that is not entirely predictable.
Oh, and some of this is different if it's a different type of drive (like an SSD) since there's no drive head to seek.
When you give read a start position - does it slow down read()?
No. The OS reads the directory entry to find out where the file is located on the disk, then calculates where on the disk your desired read should be, seeks to that position on the disk and starts reading.
Does it have to read everything before the position to find the text it's looking for?
No. Since it reads sectors at a time, it may read a few bytes before what you requested (whatever is before it in the sector), but sectors are not huge (often 8K) and are typically read in one fell swoop using DMA so that extra part of the sector before your desired data is not likely noticeable.
Is there a way to implement read so that the two commands take the same amount of computing time?
So no, not really. Disk reads, even of identical number of bytes vary a bit depending upon the situation and what else might be happening on the computer and what else might be cached already by the OS or the drive itself.
If you share what problem you're really trying to solve, we could probably suggest alternate approaches rather than relying on a given disk read taking an exact amount of time.
Well, filesystems usually split the data in a file in even-sized blocks. In most file systems the allocated blocks are organized in trees with high branching factor so it is effectively the same time to find the the nth data block than the first data block of the file, computing-wise.
The only general exception to this rule is the brain-damaged floppy disk file system FAT from Microsoft that should have become extinct in 1980s, because in it the blocks of the file are organized in a singly-linked list so to find the nth block you need to scan through n items in the list. Of course decent operating systems then have all sorts of tricks to address the shortcomings here.
Then the next thing is that your reads should touch the same number of blocks or operating system memory pages. Usually operating system pages are 4K nowadays and disk blocks something like 4k too so having every count being a multiple of 4096, 8192 or 16384 is better design than to have decimal even numbers.
i.e.
read(fd, 4096, 8192)
read(fd, 50 * 4096, 51 * 4096)
While it does not affect the computing time in a multiprocessing system, the type of media affects a lot: in magnetic disks the heads need to move around to find the new read position, and the disk must have spun to be in the reading position whereas SSDs have identical random access timings regardless of where on disk the data is positioned. And additionally the operating system might cache frequently accessed locations or expect that the block that is read after N would be N + 1 and hence such order be faster. But most of the time you wouldn't care.
Finally: perhaps instead of read you should consider using memory mapped I/O for random accesses!
Read typically reads data from the given file descriptor into a buffer. The amount of data it reads is from start (arg2) - end (arg3). More generically put the amount of data read can be found with (end-start). So if you have the following reads
read(fd1, 0xffff, 0xffffffff)
and
read(fd2, 0xf, 0xff)
the second read will be quicker because the end (0xff) - the start (0xf) is less than the first reads end (0xffffffff) - start (0xffff). AKA less bytes are being read.

Fragmentation in modern file systems

I was tinkering with Pintos OS file system and wonder:
How do modern file systems handle fragmentation issue, including internal, external and data?
OK, so it's file fragmentation you are interested in.
The answer is it depends entirely on the file system and the operating system. In the case of traditional eunuchs file systems, the disk is inherently fragmented. There is no concept whatsoever of contiguous files. Files are stored in changed data blocks. This is why paging is done to partitions and most database systems on eunuchs use partitions.
"Hard" file systems that allow contiguous files manage them in different ways. A file consists of one or more "extents." If the initial extent gets filled, the file system manager creates a new extent and chains to it.In some systems there are many options for file creation. One can specify the initial size of the file and reserve space for subsequent allocations (ie, the size of the first extent and the size of additional extents).
When a hard file system gets fragmented, there are different approaches for dealing with it. In some systems, the normal way of "defragging" is to do an image back up to secondary storage then restore. This can be part of the normal system maintenance process.
Other system use "defragging" utilities that either run as part of the regular system schedule or are manually run.
The problem of disk fragmentation is often exaggerated. If you have a disk with a reasonable amount of space, you don't really tend to get much file fragmentation. Disk fragmentation—yes; but this is not really much of a problem if you have sufficient free disk space. File fragmentation occurs when (1) you don't have enough free contiguous disk space or (2) [most likely with reasonable disk space] you have a file that continually gets added data.
Most file systems indeed have ways to deal with fragmentation. I'll however describe the situations for the usual file systems that are not too complex.
For Ext2, for each file there are 12 direct block pointers that point to the blocks where the file is contained. If they are not enough, there is one singly indirect block that points to block_size / 4 blocks. If they are still not enough, there is a doubly indirect block that points to block_size / 4 singly indirect blocks. If not yet enough, there is a triply indirect block that points to block_size / 4 doubly indirect blocks. This way, the file system allows fragmentation at block boundaries.
For ISO 9660, which is the usual file system for CDs and DVDs, the file system doesn't support fragmentation as is. However, it's possible to use multiple consecutive directory records in order to split a big (more than 2G/4G, the maximum describable file size) file into describable files. This might cause fragmentation.
For FAT, the file allocation table describes the location and status of all data clusters on the disk in order to allow fragmentation. So when reading the next cluster, the driver looks up in the file allocation table to find the number of the next cluster.

Would reading a file sequentially result in random disk seeks?

I was under the impression that sequential scan of a file would actually be a sequential seek on disk. However, I read recently that the blocks of a file might not be written contiguously on disk by a file system. If inodes are used as a map and each block is obtained by following the block pointer, I am wondering whether the actual mechanism with which a file system retrieves the blocks of a file is actually sequential?
If the answer is file system dependant, it would be great to cite some major filesystems.
Thanks.
Filesystems try to allocate as much sequential blocks as possible during writes. But as they age (i.e lot of creates + deletes over time), fragmentation becomes inevitable. There are heuristics to reduce fragmentation like speculative preallocation, delayed preallocation etc. Applications themselves can do things like preallocation (example fallocate), enabling readahead and running de-fragmentation tools depending on the features available in the filesystem to make the blocks contiguous or at least reads faster.

difference between logical and physical I/O?

I can't understand the difference between logical and physical I/O.
Can you explain the difference between them?
thanks
The terms logical, physical, and virtual I/O are normally applied to disks. However, there can be application to other types of devices.
In the disk context, logical I/O treats a disk as a sequence of blocks, numbered 0 to N.
Physical I/O requires addressing disk blocks by platter, track, sector, block.
In the past operating systems implemented the physical to logical translation. Newer disks tend to implement logical I/O in the hardware (and automatically handle bad blocks).
there is a big difference between them,
Logical IO:
FS system calls resolved by FileSystem, that means they never reach the physical block device, for example you read a file and its content is in Page cache and buffer cache (all the necessary information is in cache Inode + blocks)
Your app will get the content given by the VFS+FS
Another example could be when you execute ls, first time the VFS needs to get all the inode information from the physical block device, second time the informaiton will be cached in dentry cache and wont be necessary to deeper to the physical device.
Physical:
For example a synchornic write, it will reach the Physical block device, if the write is async the blocks will be written in OS buffer (logical write) and later all the dirty pages will be written together in the block device (physical) to improve the performance.
That is the reason it is very important to check how our FS is performing the IO to avoid physical IO. depending on the FS and the kernel parameters you can improve the caching to make it fit in what you need.

Resources