Reading bytes from the empty space on a disk - c

It's widely known that in general, when you delete a file on disk on most (all?) modern OSes, the bytes of that file aren't removed, but the space is 'freed' and not overwritten with other data until it's used for another write operation.
I'm also aware that on UNIX-like systems, I can read the bytes on disk directly from its representation in the filesystem at /dev/whatever.
However, /dev/whatever returns all the bytes of everything on the disk, including files that still 'exist' in the user-facing sense, and what I'd really like to do is identify and read only the bytes on disk that remain after the deletion of some resource - so the data that still exists in the 'free' space of the disk (I'm aware that file recovery and digital forensics tools exist which can recover these files, but for my purposes I need to do something slightly-closer-to-the-metal - I'm interested in getting a bytestream of the data remaining on the empty space of the disk, with no additional structure).
Therefore, is there any way I could access (for instance) the allocated and unallocated ranges of disk space programaticaly, then read the bytes corresponding to the unallocated ranges from disk? I'm pretty agnostic when it comes to programming language - I'm assuming this is going to involve low-level APIs callable from a little C programme, or something similar.
I'm assuming this is OS / Filesystem dependent - I'm on Mac OS X and APFS if this helps, but would appreciate tips for any combination of OS/FS as i'm eventually going to port this project to other platforms.
Any tips or insight much appreciated! Thank you.

Related

Why did Windows use the FAT structure instead of a conventional linked list with a next pointer for each data block of a file?

Instead of storing references to next nodes in a table, why couldn't it be just stored like a conventional linked list, that is, with a next pointer?
This is due to alignment. FAT (and just about any other file system) stores file data in one or more whole sectors of the underlying storage. Because the underlying storage can only read and write whole sectors such allocation allows efficient access to the contents of a file.
Issues with interleaving
When a program wants to store something in a file it provides a buffer, say 1MB of data to store. Now if the file's data sectors have to also keep next pointers to their next sector, this pointer information will need to be interleaved with the actual user data. So the file system would need to build another buffer (of slightly more than the provided 1MB), for each output sector copy some of the user data and the corresponding next pointer and give this new buffer to the storage. This would be somewhat inefficient. Unless the file system always stores file data to new sectors (and most usually don't), rewriting these next pointers will also be redundant.
The bigger problem would be when read operation is attempted on the file. Files will now work like tape devices: with only the location of the first sector known in the file's primary metadata, in order to reach sector 1000, the file system will need to read all sectors before it in order: read sector 0, find the address of sector 1 from the loaded next pointer, read sector 1, etc. With typical seek times of around 10 ms per random I/O (assuming a hard disk drive), reaching sector 1000 will take about 10 seconds. Even if sectors are sequentially ordered, while the file system driver processes sector N's data, the disk head will be flying over the next sector and when the read for sector N+1 is issued it may be too late, requiring the disk to rotate entire revolution (8.3ms for 7200 RPM drive) before being able to read the next sector again. On-disk cache can and will help with that though.
Writing single sector is usually atomic operation (depends on hardware): reading back the sector after power failure returns either its old content or the new one without intermediate states. Database applications usually need to know which writes would be atomic. If the file system interleaves file data and metadata in the same sectors, it will need to report smaller than the actual sector size to the application. For example instead of say 512 bytes it may need to report 504. But it can't do it because sector size is usually assumed by applications to be power of 2. Furthermore file stored on such filesystem would very likely be unusable if copied to another file system with different reported sector size.
Better approaches
The FAT format is better because all next pointers are stored in adjacent sectors. For FAT12, FAT16 and not very large FAT32 volumes the entire table is small enough to fit in memory. FAT still records the blocks of a file in a linked list, so to have efficient random access, an implementation needs to cache the chain per file. On large enough volumes (that can sport large enough file) such cache may no longer fit in memory.
ext3 uses direct and indirect blocks. This simple format avoids the need for preprocessing that FAT requires and goes by with only minimal amount of additional reads per I/O when indirect blocks are needed. These additional reads are cached by the operating system so that their overhead is often negligible.
Other variants are also possible and used by various file systems.
Random notes
For the sake of completeness, some hard disk drives can be formatted with slightly larger sector sizes (say 520 bytes) so that the file system can pack 512 bytes of file data with several bytes of metadata in the same sector. Yet because of the above, I don't believe anyone has used such formats for storing the address of the file's next sector. These additional bytes can be put to better use: additional checksums and timestamping come to mind. The timestamping I believe is used to improve the performance of some RAID systems. Still such usage is rare, and most software can't work with them at all.
Some file systems can save the content of small enough files in the file metadata directly without occupying distinct sectors. ReiserFS has the controversial tail packing. This is not important here: large files still benefit from having proper mapping to storage sectors.
Any modern OS requires much more than a pointer to the next data block for its file system: attributes (encryption, compression, hidden, ...), security descriptors (ACL list items), support for different hardware, buffering. This is just a tiny fraction of functionality that any good file system does.
Have a look at file system at Wikipedia to learn what else any modern file system does.
If we ignore the detail of FAT12 sharing a byte between two items to compact 12 bite as 1.5 bytes, then we can concentrate on the deeper meaning of the question.
It turns out that the FAT system is equivalent to a linked list with the following points:
The "next" pointer is located in an array (the FAT) instead of being appended or prepended to the actual data
The value written in "next" is an integer instead of the more familiar memory address of the next node.
The nodes are not reserved dynamically but represented by another array. That array is the entire data part of the hard drive.
One fascinating exercise we were assigned as part of the software engineer education was to convert an application using memory pointer to an equivalent application which use integer value. The rationale was that some processors (PDP-11? or another PDP-xx) would perform integer arithmetic much faster than memory pointer operation or maybe even did forbid any arithmetic on pointers.

Minix Internal Fragmentation [duplicate]

I am in the middle of writing some software in C that recursively lists all files in a given directory and now I need to work out the internal fragmentation.
I have spent a long time researching this and have found out that the internal fragmentation on ext2 only occurs in the last block. I know that from an inode number in theory you should be able to get the first and last block addresses but I have no idea how.
I have looked into stat(), fcntl() and all sorts of ways. How do I get the last block address from an inode number?
I have also figured out that once I have the address of the last block that I can test to see how much free space is in that block and this will give me the internal fragmentation.
I know that there is a get_inode and a get_block command but have no idea apart from that!
I don't think you can get at the addresses of disk block via the regular system calls such as stat(). You would probably have to find the raw inode on disk (which means accessing the raw disk, and requires elevated privileges) and processing the data from there.
Classically, you'd find direct blocks, indirect blocks, double-indirect blocks and a triple-indirect block for a file. However, the relevant file system type is about as dead as the dodo is (I don't think I've seen that file system type this millennium), so that's unlikely to be much help now.
There might be a non-standard system call to get at the information, but I doubt it.
Maybe you think too complicated, but roughly the internal fragmentation should be able to calculated if you divide the file size by the block size and take the modulo.
But this is only valid if the file is a "classic one" - with sparse files or files holding much "other information" (such as huge ACLs or extended attributes), there might be a difference. (I don't know where they are stored, but I could imagine that there could be file systems storing them in the last block, effectively (but unnoticedly) reducing the internal fragmentation.)

Memory mapped database

I have 8 terabytes of data composed of ~5000 arrays of small sized elements (under a hundred bytes per element). I need to load sections of these arrays (a few dozen megs at a time) into memory to use in an algorithm as quickly as possible. Are memory mapped files right for this use, and if not what else should I use?
Given your requirements I would definitely go with memory mapped files. It's almost exactly what they were made for. And since memory mapped files consume few physical resources, your extremely large files will have little impact on the system as compared to other methods, especially since smaller views can be mapped into the address space just before performing I/O (eg, those arrays of elements). The other big benefit is they give you the simplest working environment possible. You can (mostly) just view your data as a large memory address space and let Windows worry about the I/O. Obviously, you'll need to build in locking mechanisms to handle multiple threads, but I'm sure you know that.

When writing to a file, does all the pages in memory map to contiguous disk blocks in the Disk?

I am interested in knowing how the Filesystem actually writes the files to disk? Does it write it in all contiguous blocks and store the starting block# and device# in the file as metadata?
If they are not stored as contiguous blocks (which I think is the case) then how does it determine the disk blocks so that the read times are optimized?
That's entirely filesystem dependent, there is no general answer.
Here's a good presentation: ext3 on-disk layout for the EXT3 filesystem. Others might do it similarly, or completely differently. Have a look at The structure of the Reiser file system for the reiserfs 3.6 layout.
For a rather different approach, look at XFS on-disk specification.
The ext2 disk organization could probably be described as a "classic" way of doing things that could be found in other filesystems (including its successors).

temporary files vs malloc (in C)

I have a program that generates a variable amount of data that it has to store to use later.
When should I choose to use mallod+realloc and when should I choose to use temporary files?
mmap(2,3p) (or file mappings) means never having to choose between the two.
Use temporary files if the size of your data is larger than the virtual address space size of your target system (2-3 gb on 32-bit hosts) or if it's at least big enough that it would put serious resource strain on the system.
Otherwise use malloc.
If you go the route of temporary files, use the tmpfile function to create them, since on good systems they will never have names in the filesystem and have no chance of getting left around if your program terminates abnormally. Most people do not like temp file cruft like Microsoft Office products tend to leave all over the place. ;-)
Prefer a temporary file if you need/want it to be visible to other processes, and malloc/realloc if not. Also consider the amount of data compared to your address space and virtual memory: will the data consume too much swap space if left in memory? Also consider how good a fit the respective usage is for your application: file read/write etc. can be a pain compared to memory access... memory mapped files make it easier, but you may need custom library support to do dynamic memory allocation within them.
In a modern OS, all the memory gets paged out to disk if needed anyway, so feel free to malloc() anything up to a couple of gigabytes.
If you know the maximum size, it's not too big and you only need one copy, you should use a static buffer, allocated at program load time:
char buffer[1000];
int buffSizeUsed;
If any of those pre-conditions are false and you only need the information while the program is running, use malloc:
char *buffer = malloc (actualSize);
Just make sure you check that the allocations work and that you free whatever you allocate.
If the information has to survive the termination of your program or be usable from other programs at the same time, it'll need to go into a file (or long-lived shared memory if you have that capability).
And, if it's too big to fit into your address space at once, you'll need to store it in a file and read it in a bit at a time.
That's basically going from the easiest/least-flexible to the hardest/most-flexible possibilities.
Where your requirements lie along that line is a decision you need to make.
On a 32-bit system, you won't be able to malloc() more than 2GB or 3GB or so. The big advantage of files is that they are limited only by disk size. Even with a 64-bit system, it's unusual to be able to allocate more than 8GB or 16GB because there are usually limits on how large the swap file can grow.
Use ram for data that is private and for the life of a single process. Use a temp file if the data needs to persist beyond the a single process.

Resources