Can I access and change my iNode values of a file? - c

I know that files in unix systems are represented by their inodes.
Can I as a user have access to these values and change them?
Say, replace the values between two adjacent blocks, and in this way change the file?
Can I overwrite only one block in the middle?
I'm asking this in the context of file manipulation in C (I want to write a program that appends to the beginning, or middle part of a file, and not just to the end).

The user has read access to some of this information using the stat() system call provided he has proper access rights to the directory containing the inode.
The information can only be changed indirectly (timestamps, for example, by accessing the file itself). There is no direct way to mess with this information.
Some file systems might give a bit more access possibilities by exposing some of the information in ioctl() calls. What might or might not be exposed is a decision of the driver/file system developer.

Related

C function to modify inode?

I was learning how linux file system works and I came across the concept of inodes. I have written a C program to read a particular inode and print its contents.
Now I wan't to modify the contents of inode from my C code. I know this could break the filesystem if something goes wrong but still I want to try it.
How can I achieve this?
You need to access what is called the "meta" information of the drive - the information about the information on the drive - not the normal information. To do that, you need to open the drive itself rather than any file or directory inside the drive.
If you're talking i-nodes, then you're on Linux and the ext filesystem, so the drive name will be something like /dev/sdb. Be careful: this is the whole disk, NOT one partition/volume/slice within it. That might be called /dev/sdb2 or something - different types of Linux call them different things.
Once you have the partition open, you can treat it just like a (very large!) file: a succession of bytes that coincidentally happen to be arranged as sectors on the hard disk. You can seek to any position and read the data there. If you want to overwrite it, you can - but as you say:
You may completely destroy the data on your hard disk!
Perhaps mount a USB stick (with nothing important on it) and experiment on that? And make VERY sure that you open ITS name and not your main disk's name!

How to add (and use) binary data to compiled executable?

There are several questions dealing with some aspects of this problem, but neither seems to answer it wholly. The whole problem can be summarized as follows:
You have an already compiled executable (obviously expecting the use of this technique).
You want to add an arbitrarily sized binary data to it (not necessarily by itself which would be another nasty problem to deal with).
You want the already compiled executable to be able to access this added binary data.
My particular use-case would be an interpreter, where I would like to make the user able to produce a single file executable out of an interpreter binary and the code he supplies (the interpreter binary being the executable which would have to be patched with the user supplied code as binary data).
A similar case are self-extracting archives, where a program (the archiving utility, such as zip) is capable to construct such an executable which contains a pre-built decompressor (the already compiled executable), and user-supplied data (the contents of the archive). Obviously no compiler or linker is involved in this process (Thanks, Mathias for the note and pointing out 7-zip).
Using existing questions a particular path of solution shows along the following examples:
appending data to an exe - This deals with the aspect of adding arbitrary data to arbitrary exes, without covering how to actually access it (basically simple append usually works, also true with Unix's ELF format).
Finding current executable's path without /proc/self/exe - In companion with the above, this would allow getting a file name to use for opening the exe, to access the added data. There are many more of these kind of questions, however neither focuses especially on the problem of getting a path suitable for the purpose of actually getting the binary opened as a file (which goal alone might (?) be easier to accomplish - truly you don't even need the path, just the binary opened for reading).
There also may be other, probably more elegant ways around this problem than padding the binary and opening the file for reading it in. For example could the executable be made so that it becomes rather trivial to patch it later with the arbitrarily sized data so it appears "within" it being in some proper data segment? (I couldn't really find anything on this, for fixed size data it should be trivial though unless the executable has some hash)
Can this be done reasonably well with as little deviation from standard C as possible? Even more or less cross-platform? (At least from maintenance standpoint) Note that it would be preferred if the program performing the adding of the binary data didn't rely on compiler tools to do it (which the user might not have), but solutions necessiting those might also be useful.
Note the already compiled executable criteria (the first point in the above list), which requires a completely different approach than solutions described in questions like C/C++ with GCC: Statically add resource files to executable/library or SDL embed image inside program executable , which ask for embedding data compile-time.
Additional notes:
The problems with the obvious approach outlined above and suggested in some comments, that to just append to the binary and use that, are as follows:
Opening the currently running program's binary doesn't seem something trivial (opening the executable for reading is, but not finding the path to supply to the file open call, at least not in a reasonably cross-platform manner).
The method of acquiring the path may provide an attack surface which probably wouldn't exist otherwise. This means that a potential attacker could trick the program to see different binary data (provided by him) like which the executable actually has, exposing any vulnerability which might reside in the parser of the data.
It depends on how you want other systems to see your binary.
Digital signed in Windows
The exe format allows for verifying the file has not been modified since publishing. This would allow you to :-
Compile your file
Add your data packet
Sign your file and publish it.
The advantage of following this system, is that "everybody" agrees your file has not been modified since signing.
The easiest way to achieve this scheme, is to use a resource. Windows resources can be added post- linking. They are protected by the authenticode digital signature, and your program can extract the resource data from itself.
It used to be possible to increase the signature to include binary data. Unfortunately this has been banned. There were binaries which used data in the signature section. Unfortunately this was used maliciously. Some details here msdn blog
Breaking the signature
If re-signing is not an option, then the result would be treated as insecure. It is worth noting here, that appended data is insecure, and can be modified without people being able to tell, but so is the code in your binary.
Appending data to a binary does break the digital signature, and also means the end-user can't tell if the code has been modified.
This means that any self-protection you add to your code to ensure the data blob is still secure, would not prevent your code from being modified to remove the check.
Running module
Windows GetModuleFileName allows the running path to be found.
Linux offers /proc/self or /proc/pid.
Unix does not seem to have a method which is reliable.
Data reading
The approach of the zip format, is to have a directory written to the end of the file. This means the data can be found at the end of the location, and then looked backwards for the start of the data. The advantage here, is the data blob is signposted from the end of the data, rather than the natural start.

Saving data into the program?

Is it possible for one portable program to keep save data inside the application?
I don't want the program to create folders or files.
In order to do it in a portable way, you should have no assumptions about the architecture or operating system: you may or may not have access to the executable in the first place (it could be argv[0], but maybe it isn't. If you had access to the executable file, you could have the rights to open the file and modify it, but maybe you cannot do it.
If, anyway, you want to try it, you could:
Check if argv[0] is a file, that you have read and write permission, and if it is really your code (looking for a random string you can leave somewhere in your code, for example).
Choose a string to mark your modifications, for example, "Edenia", and check if the last bytes of that file are those. If so, the file has been previously modified, and you can read your data process it.
When you want to store additional data, add it to the end of the file (if it was not modified yet), or substitute the modifications it had. Don't forget to add the mark at the end of the file ("Edenia", or whatever).
Anyway, I still think this is not the proper way to store data: try to use external storage (files, database, etc) if you can.

Changing inode behaviour

I am trying to modify the ext3 file system. Basically I want to ensure that the inode for a file is saved in the same (or adjacent) block as the file that it stores metadata for. Hopefully this should help disk access performance
I grabbed the kernel source, compiled it, read a bunch about inodes and looked the inode.c file in the fs subdirectory. However, I am just not sure how I can ensure that any new file being created, and the inode for this file, can be saved in the same or adjacent blocks. Any help or pointers to further readings would be appreciated. Thanks!
Interesting idea.
I'm not deeply familiar with ext3, but I can give you some general pointers.
Currently ext3 stores inodes in predetermined places. Each block group has its own inode table, an array of inodes. So when you have an inode number (i.e., as the result of looking up a filename in a directory), you can find the corresponding inode on disk by using the inode number first to select the correct block group and then to index into that block group's inode table.
If you want to put the inodes next to the corresponding file data, you'll need a new scheme for finding an inode on disk. If you're willing to dedicate a block for each inode, then one possible scheme would be to allocate a new block every time you need an inode and then use the block number as the inode number. This might have the benefit that for small files you could store the data in that same block.
To make something like this happen, creating a new file (i.e., allocating an inode) would have to work very differently than in the current ext3 file system. Instead of using a bitmap to find an unused, pre-allocated and pre-initialized inode, you would have to allocate an empty block and initialize it yourself. So, you'll probably want to look at how the file system allocates blocks when it's writing to a file, then mimic that for allocating an inode.
An alternative scheme would be to store the inode inside the directory. So you save an I/O not because the inode is next to its data, but because when you lookup the filename you also read the inode. This was done back in the 90s as an experiment in BSD's FFS file system, and was written up in an excellent USENIX Paper. Those ideas never made it into FFS, or into any other main stream file system that I'm aware of, so it might be interesting to see how they work in ext3.
Regardless of whether you pursue one of these schemes or come up with something of your own, you'll also have to modify mke2fs to initialize the file system on disk in a way that your new file system variant will understand.
Good luck! It sounds like a fun project.
Kudos for getting into file system design!
First, a bit of engineering advice before you get too deep into hacking: make a copy of the ext3 tree and rename the file system to something else. I've found that when introducing experimental changes into a file system, you really don't want it to be used for your main system. Your system should still boot even if you introduce a bug that randomly loses files (it will eventually happen). You'll also need to branch the ext3 userspace tools to work with your new system.
Second, go get a copy of Understanding the Linux Kernel, 3 ed. by Bovet and Cesati. It presents an organized view of kernel subsystems, and I've found its explanations to be worthwhile. It's written for an older kernel (2.6.x for some x < 15; I forget exactly), but it's still accurate in many places. Read through its descriptions of file systems. I believe it covers ext3.
Third, about your actual project, you aren't proposing a simple modification to ext3. That file system has a pretty straightforward way of mapping an inode number to a disk block. You'll need to find a new way of doing this mapping. I would not anticipate any changes to the rest of ext3. Solving this challenge may be one of the key design points of your architecture. Note that keeping around a big array of inode -> disk block maps doesn't solve your problem: it's probably no better than existing ext3.

how is a file represented on a disk

so I want to ask, and forgive me if this is obvious, or newbie question:
if I create a file, say a text file - save it, (I'm using Ubuntu), so this file I have created, has some extra information associated with it, such as, the place on my hard drive where it has been saved. How to examine this information? Where does this information get stored for my specific file? How to examine the file as it is stored on my disk, I assume in terms of, what, bytes?
Maybe I need to focus this question,
Thanks,
B
This is the responsibility of your file system. In very brief, a file system is a data structure which is laid out onto your entire disk -- that's what "formatting" a disk does -- and your files are saved into that data structure. There are lots of file systems, and their details vary quite widely. http://www.forensics.nl/filesystems has a whole bunch of papers on file system design and organization. I'd start with McKusick's A Fast File System for UNIX; it's old, but it contains lots of ideas that are still influential today.
You need a filesystem-specific forensics tool if you want to look at the data structures on your disks. Ubuntu's probably using something in the ext2 family, so try debugfs.
I think maybe you do need to focus it a bit :-)
For UNIX file systems, there are many different types.
The one I'm most familiar with (ext2) has a "file" on disk containing directory entries. These entries are simple names and pointers to the file itself (which is why you can have multiple directory entries pointing to the same file, hard links).
The file itself is an inode which contains the properties of the file (owner, size, permissions and so on).
The inode also contains direct and indirect pointers to the contents of the file. By direct, I mean a pointer to a data block.
An indirect pointer is a pointer to a pointer to contents. I believe you can go to another two levels of indirection, which gives you truly massive file sizes:
More details on Wikipedia.

Resources