File systems provide a mechanism for categorizing (and thus navigating) data on a disk. This makes sense to me. If I want to find some "group" of data, I don't want to have to remember byte offsets myself. I would rather have some look up system that I can dynamically navigate.
However, I don't understand why different file systems must exist. For example, why NTFS, FAT16/32, EXT?
Why should different operating systems (Linux, Windows, etc.) rely on different methods for organizing data on disk?
I think a more appropriate question (and the question you'd like answered) is "Why do multiple file systems exist?". The answer depends on the particular file system, but in many cases it comes down to one (or a mix of) of three reasons:
addressing some type of issue in existing file systems, or
a split due to difference in opinion, or
corporate interests.
The FAT family
The original FAT file system was introduced in the late 1970s. In many ways, FAT is great: it has a low memory footprint, and simple design. IIRC, it's still used in embedded systems to this date.
The FAT family of file systems comprises of the original 8-bit FAT, FAT12, FAT16, and FAT32. (There are several other versions, but they're not relevant to this answer.) There were several feature-differences between each version of the FAT file systems, some of which demonstrate the motivation for creating a new version. For example, in moving from 8-bit FAT to FAT12:
the maximum filename length increased from 9 characters to 11 or 255 characters by switching from 6.3 filename encoding to 8.3 filename encoding or LFN extensions, respectively.
support for subdirectories was added.
file size granularity decreased from 128 bytes to 1 byte.
None of these features individually were likely the motivation for the creation of FAT12, but together these features are a clear win over 8-bit FAT. Refer to the FAT Wikipedia page for a more complete list of differences.
NTFS
Before discussing NTFS, we should look at its predecessor: HPFS. The simple design of FAT turned out to be a problem: it constrained what features FAT could offer, and how it performed. HPFS was created to address the shortcomings of FAT. For example, HPFS provide several features FAT could not:
Support for mixed case file names, in different code pages
More efficient use of disk space (files are not stored using multiple-sector clusters but on a per-sector basis)
An internal architecture that keeps related items close to each other on the disk volume
Separate datestamps for last modification, last access, and creation (as opposed to last-modification-only datestamp in then-times implementations of FAT)
Root directory located at the midpoint, rather than at the beginning of the disk, for faster average access
That should be compelling enough to demonstrate why HPFS was created, but how does NTFS fit into the picture? HPFS was a joint project by Microsoft and IBM. Due to several differences in opinion, they separated, and Microsoft created NTFS. This is another reason new file systems are created: difference in opinion. There's nothing inherently wrong with this, but it does have the side effect of occasionally fragmenting projects.
The extended family
As with NTFS, we need to examine the predecessor of ext to understand why it was created. The predecessor of ext is the MINIX file system. MINIX was created for teaching purposes, so it was simple and elided several complex features the UNIX file system offered. The first file system supported by Linux was the MINIX filesystem. The simplicity of the MINIX file system soon became an issue:
MINIX restricted filename lengths to 14 characters (30 in later versions), it limited partitions to 64 megabytes, and the file system was designed for teaching purposes, not performance.
And thus, the extended file system (ie. ext) was created to address the shortcomings of the MINIX file system.
In a similar vain, ext2 was created to address the shortcomings of ext, and so on. For example, ext2 added three separate timestamps (atime, ctime, and mtime), ext3 adding journaling, and ext4 extended storage limits. These were all breaking changes which required a "new" file system. They weren't the only changes between versions, but these changes demonstrate why creating another file system was necessary.
Why do different operating systems use different file systems?
Several file systems are widely used today. Apple File System (APFS) on Apple devices, NTFS on Windows devices, and several different file systems on Linux. Why do different operating systems use different file systems? For Linux, the reason is obvious: Linux needed an open source file system. That's why it initially used the MINIX file system.
For Windows and Apple devices, the difference is more, shall we say, political. Microsoft created NTFS to address the issues it thought were important, and Apple created APFS to address the issues it thought were important. Commercial OS vendors also create their own file systems for product differentiation.
Why does Linux use several different file systems?
We can kinda see why different OSs use different file systems, but several file systems are actively in use on Linux alone, eg. ext4, Btrfs, ZFS, XFS, and F2FS. What gives?
Linux is a different environment. The Linux kernel source is openly available, and can be modified, booted, and tested by any user. So, if one file system does not support the features you want, or offer the performance you need, you can create a new file system (which is, of course, easier said than done). For example,
Btrfs addressed (among other things) the lack of snapshots on ext3/4.
ZFS was created for the Solaris operating system, but later ported to Linux. (ZFS also has a very rich set of features.)
XFS was created to improve performance by using different underlying data structures (ie. B-trees).
F2FS was created to address performance on solid state media. SSDs offer lower latency, and greater throughput compared to spinning disks. It turns out simply using a faster disk does not necessary equate to better file system performance.
different OS uses different FS Because each of them has a different philosophy and different goals.
For example windows use ntfs because they want secure and smart FS (without have philosophy like fast or small)
Ubuntu (with most modern distributions) use ext4 (And also supports others) Mostly because its simple and speed.
I don't think it's something technical, it's just different companies worked on the same thing at the same time plus the closed source nature of some OSs like windows and mac which make it hard for other companies to replicate the full functionality and illegal to reverse engineer it, it's like why different OSs in the first place.
Related
i've read filesystem in wikipedia, linfo.org and a question on super user about "is filesystem a part of operating system" and i doubt my understanding.
Wikipedia says: "ext is filesystem that is commonly used by linux kernel".
SU's answer says: "the OS contains a driver that allows it to work with filesystem"
now what is the form of the ext itself? is it a driver, used by linux to organize data on disk?
Unfortunately, what "filesystem" exactly means depends on the context.
Most commonly, "filesystem" describes the on-disk format of volumes/partitions. In that sense, APFS, ext, FAT32, and XFS are some of different filesystems. For example, you may hear people say something like "APFS supports alternate data streams, XFS doesn't".
Many times the term "filesystem" is used to describe the ecosystem: the on-disk format of volumes/partitions, plus the OS drivers that read and write these formats. For example, you may hear people say something like "ext2 is not crash-safe, upgrade your partitions to ext3" (see below).
Sometimes, the term "filesystem" is used to describe the driver that reads and writes the on-disk format. For example, you may hear people say something like "ZFS doesn't work on windows, but NFS works everywhere".
NFS, in particular, has a driver, but doesn't have an on-disk format, because it just asks a remote server to store everything on its own filesystem, whichever that is.
The distinction between on-disk format and the driver which reads and writes it is particularly confusing for ext2/3/4. The ext family of filesystem drivers shares a common on-disk format - if you dig up an ext* partition, nowhere on the partition will it say "my version is ext3". Instead, the partition will have a list of features.
What does have different versions is the driver - there are drivers for ext2, ext3, and ext4, and each version adds support for new features. So, you can create a partition using the ext3 driver, and the partition will use features supported by the ext3 driver - both the ext3 and ext4 drivers can read it. Then you upgrade it using the ext4 driver in order to use a new on-disk feature (like high-resolution timestamps), but then you have to read it using the ext4 driver - the ext3 driver can no longer read it because it doesn't support the new feature.
Occasionally you may also see "filesystem" referring to a specific volume/partition that has files and directories. For example, you may hear people say something like "My filesystem is corrupted".
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have developed a basic kernel in assembly/c that runs a basic terminal. I have set it up to run off of an iso with grub.
I would like to continue this OS, but without a file system, I feel as if there's really nothing else I could do. After much time on the internet, I have come up with really nothing I can do to implement this.
People have said implement FAT or make a VFS, but nothing any further, nor tutorials, nor any references to anywhere.
Could someone explain how a file system works, where I can get started/where I can connect a pre-made system, and how to use it?
Also, I do not have access to standard libraries when compiling my os. I use gcc, nasm, ld, and grub-mkrescue(for the disk image). I use qemu for emulation.
EDIT to make less OT
Can someone describe, in detail how a file system works, so when I look at the sources of file systems that have been implemented, like FAT, I can understand how to apply it to my own operating system?
EDIT - Simpler
Even easier. How could I directly access the hard drive? My kernel runs completely in protected mode, so could I switch out and write directly to the hard drive. A file system could be implemented with a file looking like this:
name special char text special char
ie:
hello world.script 0x00 println "Hello, world!!" 0x00
Where you wouldn't need special segmentation, you would just look until you find the file name and the special character (something not in a string like '\0') and then read until you find the second non-string character.
Would there be a way to access the hard drive by switching in and out of protected mode or write a hard disk driver in order to implement this?
First, read wikipage on file systems to have some broad view.
The relevant resource about operating system development is OSdev (but perhaps your question is off-topic here). Kernelnewbies could also help (explaining how Linux is doing). OSdev have wikipages explaining FAT & Ext2 in details.
You could design an OS without any files (but some other persistence machinery). See this answer. You could have persistent processes (read also about application checkpointing, garbage collection, continuations, hibernation).
But you should read some good books about Operating Systems (e.g. by Tanenbaum, or the freely downloadable Operating Systems: Three Easy Pieces book). Be fluent with some existing free software OS, e.g. Linux (& POSIX), so read Advanced Linux Programming (at least to understand many concepts and get a good terminology).
IMHO, the FAT is such an ugly and inefficient file system that it is not worth looking into (except for legacy and compatibility reasons). Ext4 (see here) should be better & the wikipage on Ext2 has a nice picture.
You could adapt some library providing a file system (e.g. libext2) to your kernel.
You could perhaps adapt sqlite to work on a raw disk partition.
You might have a notion of file which is not like MSDOS (or Windows) or POSIX or <stdio.h> files. For example, it might be a sequence of fixed size records (e.g. of 1Kbyte), not a stream of bytes.
You could organize your OS as a microkernel and have file systems given by application code. Look into VSTa and HURD.
You need of course a disk driver, which fetches/writes blocks (of 4Kbytes) from your drive (disk I/O is always by blocks or disk sectors. Old small disks had 512 bytes blocks. New large disks have 4Kbytes ones, see advanced format). It should be interrupt driven and uses DMA. You need a task scheduler. AFAIU, you won't use the BIOS for this (perhaps the UEFI); you need to understand how common hardware (SATA & AHCI) works.
You should publish (today!) your toy OS as free software (e.g. under GPLv3+ on github) to get feedbacks and contributions.
You might copy (if licenses are compatible) existing code from other free software operating systems, and you certainly will study their source code to understand things.
So code some task scheduler, a page fault handler, a virtual memory, then add interrupt driven disk IO, and some file system code above that. Then you'll beginning to understand that an OS cannot be a small toy.... You might consider a microkernel or exokernel approach.
It would be simplest to use an existing open-source filesystem if the licence terms suit your needs. ELM FatFs is one such library, with no usage restrictions whatsoever. You only need to provide the device control interface layer using the provided stubs and examples.
I want to get information about the battery in C on linux. I don't want to read or parse any file! Is there any low-level interface to acpi/the kernel or any other module to get the information I want to have?
I already searched the web, but every question results in the answer "parse /proc/foo/bar". I really don't want to do this because I think, low-level interfaces won't change as fast as Files do.
best regards.
The /proc filesystem does not exist on a disk. Instead, the kernel creates it in memory. They are generated on-demand by the kernel when accessed. As such, your concerns are invalid -- the /proc files will change as quickly as the kernel becomes aware of changes.
Check this for more info about /proc file system.
In any case, I don't believe there's any alternative interface.
You might be looking for UPower: http://upower.freedesktop.org/
This is a common need for both desktop environments and mobile devices, so there have been many solutions over time. For example, one of the oldest ones was acpid, which is pretty much obsolete now.
While I'd recommend using a light-weight abstraction like UPower for code clarity reasons, the files in /proc and (to some extent) /sys are considered part of the Linux kernel ABI, which means that changing them is generally frowned upon.
Most games come with their resources (models, textures, etc.) packed into special files (like .pk3 files in Quake 3 for example). Apparently, those files somehow get "mounted" and are used as if they were separate file systems.
I'd like to know how this is achieved. The only strategy I came up with so far is placing offset-size information in the file's header, then memory-mapping the file and accessing the resources as if they were independent write-protected chunks of memory.
I'd like to know if my strategy is viable and if there are better alternatives.
Thanks!
Your strategy is quite reasonable; in fact, it's an exact analogue of what a filesystem driver does with a raw block device. What problems are you having with that implementation?
Your approach sounds reasonable. Basically you'll have an ISAM file with (optional) metadata. You can split the file into sections ("directories") based on criteria (content type, frequency of use, locality of use, etc) and have a separate index/table of contents for each section. If you allow a section as a content type, then you can nest them and handle them in a consistent fashion.
If your requirements are fairly basic, you could consider simply using a zip/tar file as a container. Looking at the code for these would probably be a good place to start, regardless.
I can't say about exact format of Quake3, but there are several approaches to do what you need:
archive files (ZIP, Tar, etc)
compound storage of different kind. On Windows there's Microsoft Structured Storage
embedded single-file database
virtual file system, which implements an actual file system, but stored not on the disk partition, but somewhere else (in resources, in memory etc).
Each of those approaches has it's own strengths and weaknesses.
Our SolFS product is an example of virtual file system mentioned above. It was designed for tasks similar to yours.
Archives and some compound file implementations usually have linear sequential structure - the files are written one by one with some directory at the beginning or at the end of file.
Some compound file implementations, databases and virtual file system have page-based structure ("page" is similar to sector or cluster in FAT or NTFS) where files can be scattered across the storage.
I am wondering how the OS is reading/writing to the hard drive.
I would like as an exercise to implement a simple filesystem with no directories that can read and write files.
Where do I start?
Will C/C++ do the trick or do I have to go with a more low level approach?
Is it too much for one person to handle?
Take a look at FUSE: http://fuse.sourceforge.net/
This will allow you to write a filesystem without having to actually write a device driver. From there, I'd start with a single file. Basically create a file that's (for example) 100MB in length, then write your routines to read and write from that file.
Once you're happy with the results, then you can look into writing a device driver, and making your driver run against a physical disk.
The nice thing is you can use almost any language with FUSE, not just C/C++.
I found it quite easy to understand a simple filesystem while using the fat filesystem on the avr microcontroller.
http://elm-chan.org/fsw/ff/00index_e.html
Take look at the code you will figure out how fat works.
For learning the ideas of a file system it's not really necessary to use a disk i think. Just create an array of 512 byte byte-arrays. Just imagine this a your Harddisk an start to experiment a bit.
Also you may want to hava a look at some of the standard OS textbooks like http://codex.cs.yale.edu/avi/os-book/OS8/os8c/index.html
The answer to your first question, is that besides Fuse as someone else told you, you can also use Dokan that does the same for Windows, and from there is just a question of doing Reads and Writes to a physical partition (http://msdn.microsoft.com/en-us/library/aa363858%28v=vs.85%29.aspx (read particularly the section on Physical Disks and Volumes)).
Of course that in Linux or Unix besides using something like Fuse you only have to issue, a read or write call to the wanted device in /dev/xxx (if you are root), and in these terms the Unices are more friendly or more insecure depending on your point of view.
From there try to implement a simple filesystem like Fat, or something more exoteric like an tar filesystem, or even some simple filesystem based on Unix concepts like UFS or Minux, or just something that only logs the calls that are made and their arguments to a log file (and this will help you understand, the calls that are made to the filesystem driver during the regular use of your computer).
Now your second question (that is much more simple to answer), yes C/C++ will do the trick, since they are the lingua franca of system development, also a lot of your example code will be in C/C++ so you will at least read C/C++ in your development.
Now for your third question, yes, this is doable by one person, for example the ext filesystem (widely known in Linux world by it's successors as ext2 or ext3) was made by a single developer Theodore Ts'o, so don't think that these things aren't doable by a single person.
Now the final notes, remember that a real filesystem interacts with a lot of other subsystems in a regular kernel, for example, if you have a laptop and hibernate it the filesystem has to flush all changes made to the open files, if you have a pagefile on the partition or even if the pagefile has it's own filesystem, that will affect your filesystem, particularly the block sizes, since they will tend to be equal or powers of the page block size, because it's easy to just place a block from the filesystem on memory that by coincidence is equal to the page size (because that's just one transfer).
And also, security, since you will want to control the users and what files they read/write and that usually means that before opening a file, you will have to know what user is logged on, and what permissions he has for that file. And obviously without filesystem, users can't run any program or interact with the machine. Modern filesystem layers, also interact with the network subsystem due to the fact that there are network and distributed filesystems.
So if you want to go and learn about doing kernel filesystems, those are some of the things you will have to worry about (besides knowing a VFS interface)
P.S.: If you want to make Unix permissions work on Windows, you can use something like what MS uses for NFS on the server versions of windows (http://support.microsoft.com/kb/262965)