where are extended attributes stored?

where are extended attributes stored? - filesystems

This is a simple question but I've done some research and can't find any answers... So does anyone know if when we define extended attributes through xattr, those attributes are stored within - as a part of - the file'contents(in the biggining, in the end?), or if the inode has a special region to store these?
And by the way, i've read that in ext4 "each extended attribute is limited to a filesystem block (e.g. 4 KiB)". I can't tell if this is enough if I wanted to store 7 extended attributes to each file in the file system. Is this reallistic?
My last question is if those extended attributes are portable in the sense that if the files move to other machines with different file systems what happens to this attributes?

If you look at the inode spec, you'll discover that there is a small amount of space at the end of the inode to accommodate extended attributes. If you overflow that area, ext4 allows to you allocate another block (basically like a resource fork) for additional extended attributes. That 4KB restriction is per attribute, per file, so you can certainly store 7 extended attributes per file, unless the attribute is a BLOB larger than 4KB. However, if you overrun the space at the end of the inode, you'll be allocating blocks to metadata instead of data, reducing the usable size of your file system.
Source: https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Extended_Attributes
To answer your question about portability, extended attributes are not copied by default, even within a single file system, and are not portable between file systems.
See: https://unix.stackexchange.com/questions/44253/how-to-clone-copy-all-file-directory-attributes-onto-different-file-directory for some further discussion on this. Rysnc can do a lot for copying extended attributes, but if the target file system doesn't support xattrs, you're out of luck.

Related

Inserting a block of bytes in the middle of a file without rewriting everything?

Since files are stored as blocks on disk, is it possible to insert a block in the middle of the chain of blocks?
This is because, without such an API, if I wanted to insert a 4kb block in the middle of a file at a certain position, using the traditional read/write apis, I would essentially have to rewrite everything in the file after that position and shfit them by 4kb.
I'm ok with an answer that only works for some OS or some file systems. It doesn't have to be cross-platform or work for every file system.
(I also understand that not all file systems or hardware use 4kb for blocks - answers that work for different numbers are also ok).

I am not sure about filesystems that would allow extending a file in the middle easily. Then again many modern filesystems do not actually have a chain of blocks. The chain of blocks was a thing on the FAT filesystem family. Instead the blocks in modern filesystems are often organized as a tree. Within a tree you can find the block containing any byte position in O(lg n) reads, with the logarithm having such a large base that it can be considered essentially constant.
While the chain would allow for the operation of "insert n blocks in between" with comparative ease, the tree unfortunately does not. This doesn't mean that the tree is the wrong structure - on the contrary, many database systems benefit greatly from the fast random access that it offers.
Note that the tree enables you to have another thing that might be useful instead of holes - Unix file systems have sparse files - any blocks of the file that are known to contain zeroes need not actually use disk space at all - instead these blocks are marked unallocated and considered containing zeroes in the tree structure itself.

Changing inode behaviour

I am trying to modify the ext3 file system. Basically I want to ensure that the inode for a file is saved in the same (or adjacent) block as the file that it stores metadata for. Hopefully this should help disk access performance
I grabbed the kernel source, compiled it, read a bunch about inodes and looked the inode.c file in the fs subdirectory. However, I am just not sure how I can ensure that any new file being created, and the inode for this file, can be saved in the same or adjacent blocks. Any help or pointers to further readings would be appreciated. Thanks!

Interesting idea.
I'm not deeply familiar with ext3, but I can give you some general pointers.
Currently ext3 stores inodes in predetermined places. Each block group has its own inode table, an array of inodes. So when you have an inode number (i.e., as the result of looking up a filename in a directory), you can find the corresponding inode on disk by using the inode number first to select the correct block group and then to index into that block group's inode table.
If you want to put the inodes next to the corresponding file data, you'll need a new scheme for finding an inode on disk. If you're willing to dedicate a block for each inode, then one possible scheme would be to allocate a new block every time you need an inode and then use the block number as the inode number. This might have the benefit that for small files you could store the data in that same block.
To make something like this happen, creating a new file (i.e., allocating an inode) would have to work very differently than in the current ext3 file system. Instead of using a bitmap to find an unused, pre-allocated and pre-initialized inode, you would have to allocate an empty block and initialize it yourself. So, you'll probably want to look at how the file system allocates blocks when it's writing to a file, then mimic that for allocating an inode.
An alternative scheme would be to store the inode inside the directory. So you save an I/O not because the inode is next to its data, but because when you lookup the filename you also read the inode. This was done back in the 90s as an experiment in BSD's FFS file system, and was written up in an excellent USENIX Paper. Those ideas never made it into FFS, or into any other main stream file system that I'm aware of, so it might be interesting to see how they work in ext3.
Regardless of whether you pursue one of these schemes or come up with something of your own, you'll also have to modify mke2fs to initialize the file system on disk in a way that your new file system variant will understand.
Good luck! It sounds like a fun project.

Kudos for getting into file system design!
First, a bit of engineering advice before you get too deep into hacking: make a copy of the ext3 tree and rename the file system to something else. I've found that when introducing experimental changes into a file system, you really don't want it to be used for your main system. Your system should still boot even if you introduce a bug that randomly loses files (it will eventually happen). You'll also need to branch the ext3 userspace tools to work with your new system.
Second, go get a copy of Understanding the Linux Kernel, 3 ed. by Bovet and Cesati. It presents an organized view of kernel subsystems, and I've found its explanations to be worthwhile. It's written for an older kernel (2.6.x for some x < 15; I forget exactly), but it's still accurate in many places. Read through its descriptions of file systems. I believe it covers ext3.
Third, about your actual project, you aren't proposing a simple modification to ext3. That file system has a pretty straightforward way of mapping an inode number to a disk block. You'll need to find a new way of doing this mapping. I would not anticipate any changes to the rest of ext3. Solving this challenge may be one of the key design points of your architecture. Note that keeping around a big array of inode -> disk block maps doesn't solve your problem: it's probably no better than existing ext3.

The FAT, Linux, and NTFS file systems

I heard that the NTFS file system is basically a b-tree. Is that true? What about the other file systems? What kind of trees are they?
Also, how is FAT32 different from FAT16?
What kind of tree are the FAT file systems using?

FAT (FAT12, FAT16, and FAT32) do not use a tree of any kind. Two interesting data structures are used, in addition to a block of data describing the partition itself. Full details at the level required to write a compatible implementation in an embedded system are available from Microsoft and third parties. Wikipedia has a decent article as an alternative starting point that also includes a lot of the history of how it got the way it is.
Since the original question was about the use of trees, I'll provide a quick summary of what little data structure is actually in a FAT file system. Refer to the above references for accurate details and for history.
The set of files in each directory is stored in a simple list, initially in the order the files were created. Deletion is done by marking an entry as deleted, so a subsequent file creation might re-use that slot. Each entry in the list is a fixed size struct, and is just large enough to hold the classic 8.3 file name along with the flag bits, size, dates, and the starting cluster number. Long file names (which also includes international character support) is done by using extra directory entry slots to hold the long name alongside the original 8.3 slot that holds all the rest of the file attributes.
Each file on the disk is stored in a sequence of clusters, where each cluster is a fixed number of adjacent disk blocks. Each directory (except the root directory of a disk) is just like a file, and can grow as needed by allocating additional clusters.
Clusters are managed by the (misnamed) File Allocation Table from which the file system gets its common name. This table is a packed array of slots, one for each cluster in the disk partition. The name FAT12 implies that each slot is 12 bits wide, FAT16 slots are 16 bits, and FAT32 slots are 32 bits. The slot stores code values for empty, last, and bad clusters, or the cluster number of the next cluster of the file. In this way, the actual content of a file is represented as a linked list of clusters called a chain.
Larger disks require wider FAT entries and/or larger allocation units. FAT12 is essentially only found on floppy disks where its upper bound of 4K clusters makes sense for media that was never much more than 1MB in size. FAT16 and FAT32 are both commonly found on thumb drives and flash cards. The choice of FAT size there depends partly on the intended application.
Access to the content of a particular file is straightforward. From its directory entry you learn its total size in bytes and its first cluster number. From the cluster number, you can immediately calculate the address of the first logical disk block. From the FAT indexed by cluster number, you find each allocated cluster in the chain assigned to that file.
Discovery of free space suitable for storage of a new file or extending an existing file is not as easy. The FAT file system simply marks free clusters with a code value. Finding one or more free clusters requires searching the FAT.
Locating the directory entry for a file is not fast either since the directories are not ordered, requiring a linear time search through the directory for the desired file. Note that long file names increase the search time by occupying multiple directory entries for each file with a long name.
FAT still has the advantage that it is simple enough to implement that it can be done in small microprocessors so that data interchange between even small embedded systems and PCs can be done in a cost effective way. I suspect that its quirks and oddities will be with us for a long time as a result.

ext3 and ext4 use "H-trees", which are apparently a specialized form of B-tree.
BTRFS uses B-trees (B-Tree File System).
ReiserFS uses B+trees, which are apparently what NTFS uses.
By the way, if you search for these on Wikipedia, it's all listed in the info box on the right side under "Directory contents".

Here is a nice chart on FAT16 vs FAT32.
The numerals in the names FAT16 and
FAT32 refer to the number of bits
required for a file allocation table
entry.
FAT16 uses a 16-bit file allocation
table entry (2 16 allocation units).
Windows 2000 reserves the first 4 bits
of a FAT32 file allocation table
entry, which means FAT32 has a maximum
of 2 28 allocation units. However,
this number is capped at 32 GB by the
Windows 2000 format utilities.
http://technet.microsoft.com/en-us/library/cc940351.aspx

FAT32 uses 32bit numbers to store cluster numbers. It supports larger disks and files up to 4 GiB in size.
As far as I understand the topic, FAT uses File Allocation Tables which are used to store data about status on disk. It appears that it doesn't use trees. I could be wrong though.

Reading tag data for Ogg/Flac files

I'm working on a C library that reads tag information from music files. I've already got ID3v2 taken care of, but I can't figure out how Ogg files are structured.
I opened a .ogg file in a hexeditor and I could find the tag data because that was all human readable. But everything from the beginning of the file to the tag data looked like garbage. How is this data encoded?
I don't need any help in the actual code, I just need help visualizing what a Ogg header looks like and what encoding it uses so I that I can read it. I'd like to use a non-hacky approach to reading Ogg files.
I've been looking at the Flac format, which has been helpful.
The Flac file I'm looking at has about 350 bytes between the "fLac" identifier and the human readable Comments section, and none of it is human readable in my hex editor, so I'm sure there has to be something important in there.
I'm using Linux, and I have no intention of porting to Windows or OS X. So if I need to use a glibc only function to convert the encoding, I'm fine with that.

The Ogg file format is documented here. There is a very nice graphical visualization as you requested with a detailed written description.
You may also want to look at libogg which is a open source BSD-licensed library for reading and writing Ogg files.

As is described in the link you provided, the following metadata blocks can occur between the "fLaC" marker and the VORBIS_COMMENT metadata block.
STREAMINFO: This block has information about the whole stream, like sample rate, number of channels, total number of samples, etc. It must be present as the first metadata block in the stream. Other metadata blocks may follow, and ones that the decoder doesn't understand, it will skip.
APPLICATION: This block is for use by third-party applications. The only mandatory field is a 32-bit identifier. This ID is granted upon request to an application by the FLAC maintainers. The remainder is of the block is defined by the registered application. Visit the registration page if you would like to register an ID for your application with FLAC.
PADDING: This block allows for an arbitrary amount of padding. The contents of a PADDING block have no meaning. This block is useful when it is known that metadata will be edited after encoding; the user can instruct the encoder to reserve a PADDING block of sufficient size so that when metadata is added, it will simply overwrite the padding (which is relatively quick) instead of having to insert it into the right place in the existing file (which would normally require rewriting the entire file).
SEEKTABLE: This is an optional block for storing seek points. It is possible to seek to any given sample in a FLAC stream without a seek table, but the delay can be unpredictable since the bitrate may vary widely within a stream. By adding seek points to a stream, this delay can be significantly reduced. Each seek point takes 18 bytes, so 1% resolution within a stream adds less than 2k. There can be only one SEEKTABLE in a stream, but the table can have any number of seek points. There is also a special 'placeholder' seekpoint which will be ignored by decoders but which can be used to reserve space for future seek point insertion.
Just after the above description, there's also the specification of the format of each of those blocks. The link also says
All numbers used in a FLAC bitstream are integers; there are no floating-point representations. All numbers are big-endian coded. All numbers are unsigned unless otherwise specified.
So, what are you missing? You say
I'd like a non-hacky approach to reading Ogg files.
Why re-write a library to do that when they already exist?

What is the difference between file and record in OS's view?

In terms of general operating system concepts, what is the difference between a file and a record?
How the OS will manage them? I know what a file is and what a record is but how it is distinguished in an
OS?

yeap I got the answer
A file is a collection or set of records.
Typically, In database sense, A Group of records makes a file.
A group of attributes makes a record

These days, on Win32 and *nix at least, there is no difference. A file is just a bag of bytes to the OS, and it's left up to applications to manage and work with those bytes, either all at once or one record at a time.
The days of defining record formats and i/o sources in JCL are long gone.

Many operating systems regard files as a sequence of undistinguished symbols. There is no notion of record. Others, mainly those with a mainframe legacy, consider files to have a fixed record length and block I/O on record boundaries.
Originally, the hierarchy arose from magnetic tape drives where a physical record break was placed between blocks on a tape and sectors on a disk for partitioning a cylinder.
Today applications impose a record structure on files and access them as though there were boundaries and do not make partial accesses. This is particularly applicable to DBMS (as Manoj points out).
The record length does not need to be a constant value, but can change within a single file. They can be implemented with either explicit or implicit record lengths in files that contains multiple record types (.PNG is a good example).
In a sense, even modern OS have a preferred record size in the form of pages. These are the native blocks read from and written to the media by low-level components. This structure may need to be considered for increased performance at the margin.

The Good answer is that 1
""A collection of related fields treated as a single unit is called a record. A collection of related record treated as a single unit is called a file or a data set""