how is a file represented on a disk

how is a file represented on a disk - file

so I want to ask, and forgive me if this is obvious, or newbie question:
if I create a file, say a text file - save it, (I'm using Ubuntu), so this file I have created, has some extra information associated with it, such as, the place on my hard drive where it has been saved. How to examine this information? Where does this information get stored for my specific file? How to examine the file as it is stored on my disk, I assume in terms of, what, bytes?
Maybe I need to focus this question,
Thanks,
B

This is the responsibility of your file system. In very brief, a file system is a data structure which is laid out onto your entire disk -- that's what "formatting" a disk does -- and your files are saved into that data structure. There are lots of file systems, and their details vary quite widely. http://www.forensics.nl/filesystems has a whole bunch of papers on file system design and organization. I'd start with McKusick's A Fast File System for UNIX; it's old, but it contains lots of ideas that are still influential today.
You need a filesystem-specific forensics tool if you want to look at the data structures on your disks. Ubuntu's probably using something in the ext2 family, so try debugfs.

I think maybe you do need to focus it a bit :-)
For UNIX file systems, there are many different types.
The one I'm most familiar with (ext2) has a "file" on disk containing directory entries. These entries are simple names and pointers to the file itself (which is why you can have multiple directory entries pointing to the same file, hard links).
The file itself is an inode which contains the properties of the file (owner, size, permissions and so on).
The inode also contains direct and indirect pointers to the contents of the file. By direct, I mean a pointer to a data block.
An indirect pointer is a pointer to a pointer to contents. I believe you can go to another two levels of indirection, which gives you truly massive file sizes:
More details on Wikipedia.

Related

C function to modify inode?

I was learning how linux file system works and I came across the concept of inodes. I have written a C program to read a particular inode and print its contents.
Now I wan't to modify the contents of inode from my C code. I know this could break the filesystem if something goes wrong but still I want to try it.
How can I achieve this?

You need to access what is called the "meta" information of the drive - the information about the information on the drive - not the normal information. To do that, you need to open the drive itself rather than any file or directory inside the drive.
If you're talking i-nodes, then you're on Linux and the ext filesystem, so the drive name will be something like /dev/sdb. Be careful: this is the whole disk, NOT one partition/volume/slice within it. That might be called /dev/sdb2 or something - different types of Linux call them different things.
Once you have the partition open, you can treat it just like a (very large!) file: a succession of bytes that coincidentally happen to be arranged as sectors on the hard disk. You can seek to any position and read the data there. If you want to overwrite it, you can - but as you say:
You may completely destroy the data on your hard disk!
Perhaps mount a USB stick (with nothing important on it) and experiment on that? And make VERY sure that you open ITS name and not your main disk's name!

Creating/modifying extents in XFS

I am working on a data-deduplication scheme between disks and am wondering how I can dig down into the XFS internals and create/modify extents for files.
Here is an example of what I want to do, suppose we have the file:
bippity
boppity
boo
And we have a block size of 8 bytes(enough for bippity and the new line)
Now I change the file to
bip
boppity
boo
Only the first line changed, I would like to create a file that creates an extent(or block) for the first line, writes the data to that extent, and then joins that extent with the extent already on disk, so only one change has to be pushed out to disk.
Is this possible on xfs(or even better, on a general file system)? I don't mind getting down into the nitty-gritty, but I cannot seem to find a lot of info on this particular subject.

How do file systems track free space

I'm writing a program that needs to read and write lots of data in random order, and since I don't want to use hundreds of small files, I'm trying to develop a sort of virtual file system that writes to one large file that keeps track of where the "files" are in the "disk" file.
Thus, I've been trying to find detailed information about file system implementations, but theres one thing that never seems to be explained in a way I can understand: How does the file system track free/deleted sectors for new file creation? FAT, for example, has an index of all sectors at the beginning that seems to be the only place that holds this information, but searching the index for a new area of free space in a linear, O(n) fashion seems like it would be rather inefficient, especially if there are no deleted sectors and you have to insert something at the end of the list. Am I missing something, or is this is how file systems really detect unused sectors for writing? Thanks!

The answer depends on the overall file system architecture. It can be a linear list of free pages, or the free space can be counted in the same way as other files (eg. linked lists).
Practically developing an efficient file system is quite serious task for a side task that you have. So it makes sense to use some already created virtual file system, such as the one CodeBase offers or our Solid File System.

I found a helpful PDF explaining how free space is mapped in linux file systems. This is more along the lines of what I was looking for.
http://www.kernel.org/doc/ols/2010/ols2010-pages-121-132.pdf

it's just like a link list: each file may be seperated into many partitions and at the end of the partition each partition it's refering to begining of the next the same goes for the free speaces. think of free space as one large file that contains every byte which is not inside another file!

Custom-made archive format question

I was thinking about developing an own file archive format to use for private projects. The thing is that I am not looking for a solution like 7z or RAR, but I want to make something different, similar to a file system.
Looking at real file system, each has two sections in common in its architecture - information about files stored on disk and actual data of the files, as follows:
----------------------------
METADATA | FILE DATA
----------------------------
My question is - how is it possible that these two sections will not overlap? I mean, the FAT STRUCTURE section grows towards the FILE DATA section, while the latter grows towards the end of the disk (partition). How does a file system manage these sections?
This is what I have been trying to figure out for most of the time and any tip would be more than welcome.

Most file systems operate with clusters or pages or blocks, which have fixed size. In many filesystems the directory (metadata) is a just a special file, so it can grow in the same way the regular data files grow. On other filesystems some master metadata block has a fixed size which is pre-allocated during file system formatting. In this case the file system can become full before files take all available space.
On a side note, is there a reason to reinvent the wheel (custom file system for private needs)? There exist some implementations of in-file virtual file systems which are similar to archives, but provide more functionality. One of examples is our SolFS.

All you need is a manifest containing the file list, archive name, and or password and then have all the files listed there
if you can make the files smaller than that's even better!

Changing inode behaviour

I am trying to modify the ext3 file system. Basically I want to ensure that the inode for a file is saved in the same (or adjacent) block as the file that it stores metadata for. Hopefully this should help disk access performance
I grabbed the kernel source, compiled it, read a bunch about inodes and looked the inode.c file in the fs subdirectory. However, I am just not sure how I can ensure that any new file being created, and the inode for this file, can be saved in the same or adjacent blocks. Any help or pointers to further readings would be appreciated. Thanks!

Interesting idea.
I'm not deeply familiar with ext3, but I can give you some general pointers.
Currently ext3 stores inodes in predetermined places. Each block group has its own inode table, an array of inodes. So when you have an inode number (i.e., as the result of looking up a filename in a directory), you can find the corresponding inode on disk by using the inode number first to select the correct block group and then to index into that block group's inode table.
If you want to put the inodes next to the corresponding file data, you'll need a new scheme for finding an inode on disk. If you're willing to dedicate a block for each inode, then one possible scheme would be to allocate a new block every time you need an inode and then use the block number as the inode number. This might have the benefit that for small files you could store the data in that same block.
To make something like this happen, creating a new file (i.e., allocating an inode) would have to work very differently than in the current ext3 file system. Instead of using a bitmap to find an unused, pre-allocated and pre-initialized inode, you would have to allocate an empty block and initialize it yourself. So, you'll probably want to look at how the file system allocates blocks when it's writing to a file, then mimic that for allocating an inode.
An alternative scheme would be to store the inode inside the directory. So you save an I/O not because the inode is next to its data, but because when you lookup the filename you also read the inode. This was done back in the 90s as an experiment in BSD's FFS file system, and was written up in an excellent USENIX Paper. Those ideas never made it into FFS, or into any other main stream file system that I'm aware of, so it might be interesting to see how they work in ext3.
Regardless of whether you pursue one of these schemes or come up with something of your own, you'll also have to modify mke2fs to initialize the file system on disk in a way that your new file system variant will understand.
Good luck! It sounds like a fun project.

Kudos for getting into file system design!
First, a bit of engineering advice before you get too deep into hacking: make a copy of the ext3 tree and rename the file system to something else. I've found that when introducing experimental changes into a file system, you really don't want it to be used for your main system. Your system should still boot even if you introduce a bug that randomly loses files (it will eventually happen). You'll also need to branch the ext3 userspace tools to work with your new system.
Second, go get a copy of Understanding the Linux Kernel, 3 ed. by Bovet and Cesati. It presents an organized view of kernel subsystems, and I've found its explanations to be worthwhile. It's written for an older kernel (2.6.x for some x < 15; I forget exactly), but it's still accurate in many places. Read through its descriptions of file systems. I believe it covers ext3.
Third, about your actual project, you aren't proposing a simple modification to ext3. That file system has a pretty straightforward way of mapping an inode number to a disk block. You'll need to find a new way of doing this mapping. I would not anticipate any changes to the rest of ext3. Solving this challenge may be one of the key design points of your architecture. Note that keeping around a big array of inode -> disk block maps doesn't solve your problem: it's probably no better than existing ext3.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight