FAT System Identification of free space and structure of entry files? - file

Been seaching google for a good explanation for how FAT systems identify free space and the structure of FAT Entry files.
Alot of the explanations ive found are quite hard to follow can anyone help brief sum these up?
i understand that clusters are marked as unused but is this within the root directory or data region? and is the information on clusters status just marked in a table?
I haven't managed to gain any knowledge on the structure of the entry files either, just that they use chains to keep the clusters together
Anyone help?

A file system can be thought of having three (3) types of data: file data, file meta-data and file system meta-data. File data is file or directory contents. File meta-data is that which tells us where the file data is stored on the disk. File system meta-data tells us how the file system allocates the blocks used in the file system.
The FAT file system however does not keep the lines so clear cut. Its disk structures often blur these distinctions.
The File Allocation Table (FAT) itself blurs the lines of the file meta-data and file system meta-data. That is, the FAT entries identify both the cluster number of the where the next cluster of file (or directory) data can be found as well as indicating to the file system whether the cluster identified by the index into the FAT is available (or not). As you indicated in your question, this forms a chain. A special marker (the specific value escapes my memory) indicates that the cluster identified by the index into the FAT is the last cluster in the chain.
Directory entries in a FAT based file system are both file data and file meta-data. They read like files with their entries being the "file data". However, their entries are also interpreted as file meta-data, for they contain the file attributes (permissions, file size, and the starting cluster number--which is an index into the FAT).
The root directory is a special directory on a FAT file system. If memory serves, it does not have either a "." nor a ".." entry. On FAT12 and FAT16 systems, the size of the root directory is specified when the disk is formatted and is thus of fixed size--however, its clusters are still marked in the FAT. On FAT32, the root directory size is not set at format time and can grow. The starting cluster of the root directory is stored in a special field in one of the file system meta-data structures (as I'm going by memory the name of this structure eludes me).
Hope this helps.

Here is a fairly long article that has lots of information about fat file systems.
It should provide all the details you need.
http://en.wikipedia.org/wiki/File_Allocation_Table

Related

What is the real size of file?

How it is possible that text file has size, which is equal to number of chars inside? For example in file.txt you have string "abc" and size of it is 3 bytes. Everything fine, but what with file icon, filename and file informations? Where these data has been stored?
I checked it on Windows, but at Unix systems situation is probalby the same.
When the file is written to disk, it is by means of low level system call like write() and operating systems know exactly how many bytes they write in a given file on a disk. This information, as well as several others (creation and modification date, ownership, etc) is written with the file.
In linux (and generally unix), it is by means of an inodethat fully describes the file. Informations stored in these inodes are:
* access mode
* ids of user and group that owns the file
* size in bytes
* date of creation, modification and access
* list of disk blocks containing file data
These are more or less the informations that are displayed by ls -l
You can also see inode number of each file with ls -i
You can find here additional details on inodes.
Other informations are coded differently. Names, for instance are only in special files describing a directory, not in the inode. A directory is indeed a list that associate a name with an inode.
Icons are generally defined system wide and the association of an icon with a file is done with either filename (and file extension) or with a file "type" that is written in the "inode" (or its equivalent in other OS).
Disks allocate space in blocks. Blocks historically were 512 bytes but that has increased over the years so that 4K is common. Your file size will always be a multiple of the block size.
Most file systems (and Windoze does this) allocate disk space in clusters. A cluster is a number of adjacent blocks. Your file size then will always be a multiple of the block size times the cluster factor. This is usually the size of the file as counted by the operating system.
This all depends upon the disk format and the operating system:
Everything fine, but what with file icon, filename and file informations? Where these data has been stored?
The file information (date, owner, etc.) are usually in some kind of master file table. Sometimes this table will have extensions where information can be stored. Security information is often store in such overflows.
A rationally designed file system will have "A" filename stored in the header. File names are also stored in directories and a file can have multiple names if it is linked to multiple directories. The header file name is used to restored the file in case of corruption.
The location of an icon is entirely system specific and can be done in many ways. In the case of executable files, they are often stored in the file itself. They can also be hidden files in the same directory.

FAT32: root directory entries

We are building a fat32 filesystem manipulation tool in C and are currently trying to access all the entries in the root directory (situated right after the two FAT tables).
The first question is : Are all the root directory entries contiguous in the data region ? If not, given the first entry, how can we access the next entry ?
Does it have anything to do with the tags "low cluster / high cluster" or do we need to look in the FAT table for it (root directory) ?
Basically, we have the "equation" that leads us to the data region. Based on that, we point on the cluster, but after that, we don't really know how to find the next entry in the Root Directory.
This might seem confusing, but if you need pieces of code or more information, I will provide them.
Thank you in advance.
FAT (also FAT32) directory entries are 32bytes and appear in a sequential order.
To store long file names an entry could need multiples of 32 bytes.
About how L(ong)F(ile)N(ames) are marked (from wikipedia):
Long File Names (LFN) are stored on a FAT file system using a trick—adding (possibly multiple) additional entries into the directory before the normal file entry. The additional entries are marked with the Volume Label, System, Hidden, and Read Only attributes (yielding 0x0F), which is a combination that is not expected in the MS-DOS environment, and therefore ignored by MS-DOS programs and third-party utilities. (ff)
Referring your second question (from wikepedia):
[...] VFAT LFN entries always have the cluster value at 0x1A set to 0x0000 and the length entry at 0x1C is never 0x00000000 [...]

Custom-made archive format question

I was thinking about developing an own file archive format to use for private projects. The thing is that I am not looking for a solution like 7z or RAR, but I want to make something different, similar to a file system.
Looking at real file system, each has two sections in common in its architecture - information about files stored on disk and actual data of the files, as follows:
----------------------------
METADATA | FILE DATA
----------------------------
My question is - how is it possible that these two sections will not overlap? I mean, the FAT STRUCTURE section grows towards the FILE DATA section, while the latter grows towards the end of the disk (partition). How does a file system manage these sections?
This is what I have been trying to figure out for most of the time and any tip would be more than welcome.
Most file systems operate with clusters or pages or blocks, which have fixed size. In many filesystems the directory (metadata) is a just a special file, so it can grow in the same way the regular data files grow. On other filesystems some master metadata block has a fixed size which is pre-allocated during file system formatting. In this case the file system can become full before files take all available space.
On a side note, is there a reason to reinvent the wheel (custom file system for private needs)? There exist some implementations of in-file virtual file systems which are similar to archives, but provide more functionality. One of examples is our SolFS.
All you need is a manifest containing the file list, archive name, and or password and then have all the files listed there
if you can make the files smaller than that's even better!

Changing inode behaviour

I am trying to modify the ext3 file system. Basically I want to ensure that the inode for a file is saved in the same (or adjacent) block as the file that it stores metadata for. Hopefully this should help disk access performance
I grabbed the kernel source, compiled it, read a bunch about inodes and looked the inode.c file in the fs subdirectory. However, I am just not sure how I can ensure that any new file being created, and the inode for this file, can be saved in the same or adjacent blocks. Any help or pointers to further readings would be appreciated. Thanks!
Interesting idea.
I'm not deeply familiar with ext3, but I can give you some general pointers.
Currently ext3 stores inodes in predetermined places. Each block group has its own inode table, an array of inodes. So when you have an inode number (i.e., as the result of looking up a filename in a directory), you can find the corresponding inode on disk by using the inode number first to select the correct block group and then to index into that block group's inode table.
If you want to put the inodes next to the corresponding file data, you'll need a new scheme for finding an inode on disk. If you're willing to dedicate a block for each inode, then one possible scheme would be to allocate a new block every time you need an inode and then use the block number as the inode number. This might have the benefit that for small files you could store the data in that same block.
To make something like this happen, creating a new file (i.e., allocating an inode) would have to work very differently than in the current ext3 file system. Instead of using a bitmap to find an unused, pre-allocated and pre-initialized inode, you would have to allocate an empty block and initialize it yourself. So, you'll probably want to look at how the file system allocates blocks when it's writing to a file, then mimic that for allocating an inode.
An alternative scheme would be to store the inode inside the directory. So you save an I/O not because the inode is next to its data, but because when you lookup the filename you also read the inode. This was done back in the 90s as an experiment in BSD's FFS file system, and was written up in an excellent USENIX Paper. Those ideas never made it into FFS, or into any other main stream file system that I'm aware of, so it might be interesting to see how they work in ext3.
Regardless of whether you pursue one of these schemes or come up with something of your own, you'll also have to modify mke2fs to initialize the file system on disk in a way that your new file system variant will understand.
Good luck! It sounds like a fun project.
Kudos for getting into file system design!
First, a bit of engineering advice before you get too deep into hacking: make a copy of the ext3 tree and rename the file system to something else. I've found that when introducing experimental changes into a file system, you really don't want it to be used for your main system. Your system should still boot even if you introduce a bug that randomly loses files (it will eventually happen). You'll also need to branch the ext3 userspace tools to work with your new system.
Second, go get a copy of Understanding the Linux Kernel, 3 ed. by Bovet and Cesati. It presents an organized view of kernel subsystems, and I've found its explanations to be worthwhile. It's written for an older kernel (2.6.x for some x < 15; I forget exactly), but it's still accurate in many places. Read through its descriptions of file systems. I believe it covers ext3.
Third, about your actual project, you aren't proposing a simple modification to ext3. That file system has a pretty straightforward way of mapping an inode number to a disk block. You'll need to find a new way of doing this mapping. I would not anticipate any changes to the rest of ext3. Solving this challenge may be one of the key design points of your architecture. Note that keeping around a big array of inode -> disk block maps doesn't solve your problem: it's probably no better than existing ext3.

how is a file represented on a disk

so I want to ask, and forgive me if this is obvious, or newbie question:
if I create a file, say a text file - save it, (I'm using Ubuntu), so this file I have created, has some extra information associated with it, such as, the place on my hard drive where it has been saved. How to examine this information? Where does this information get stored for my specific file? How to examine the file as it is stored on my disk, I assume in terms of, what, bytes?
Maybe I need to focus this question,
Thanks,
B
This is the responsibility of your file system. In very brief, a file system is a data structure which is laid out onto your entire disk -- that's what "formatting" a disk does -- and your files are saved into that data structure. There are lots of file systems, and their details vary quite widely. http://www.forensics.nl/filesystems has a whole bunch of papers on file system design and organization. I'd start with McKusick's A Fast File System for UNIX; it's old, but it contains lots of ideas that are still influential today.
You need a filesystem-specific forensics tool if you want to look at the data structures on your disks. Ubuntu's probably using something in the ext2 family, so try debugfs.
I think maybe you do need to focus it a bit :-)
For UNIX file systems, there are many different types.
The one I'm most familiar with (ext2) has a "file" on disk containing directory entries. These entries are simple names and pointers to the file itself (which is why you can have multiple directory entries pointing to the same file, hard links).
The file itself is an inode which contains the properties of the file (owner, size, permissions and so on).
The inode also contains direct and indirect pointers to the contents of the file. By direct, I mean a pointer to a data block.
An indirect pointer is a pointer to a pointer to contents. I believe you can go to another two levels of indirection, which gives you truly massive file sizes:
More details on Wikipedia.

Resources