What's the difference between switch_root and run_init, besides switch_root being made by busybox while run_init is from klibc?
Thanks very much
They both perform exactly the same function, which is to switch to the "real" root and execv(3) the "real" init(8) program from an initramfs. They both assume that the filesystem that should become the root has been mounted on some directory, which they take as an argument.
(An initramfs is a (usually) temporary in-memory filesystem loaded by the bootloader. Its purpose is to do any setup that might be required before mounting the real root and switching to the real init program.)
Recent source code for run-init can be found here. run_init() is the entry point (called from run-init.c, which parses the arguments).
Recent source code for switch_root can be found here. switch_root_main() is the entry point.
The code is short for both implementations (though a bit tricky), which makes it easy to compare them by eye. The only difference seems to be that they perform slightly different sanity checks, and that recent versions of run-init have an extra option to drop selected capabilities(7) before execv()'ing the new init.
Related
File paths are inherently dubious when working with data.
Lets say I have a hypothetical situation with a program called find_brca, and some data called my.genome and both are in the /Users/Desktop/ directory.
find_brca takes a single argument, a genome, runs for about 4 hours, and returns the probability of that individual developing breast cancer in their lifetime. Some people, presented with a very high % probability, might then immediately have both of their breasts removed as a precaution.
Obviously, in this scenario, it is absolutely vital that /Users/Desktop/my.genome actually contains the genome we think it does. There are no do-overs. "oops we used an old version of the file from a previous backup" or any other technical issue will not be acceptable to the patient. How do we ensure we are analysing the file we think we are analysing?
To make matters trickier, lets also assert that we cannot modify find_brca itself, because we didn't write it, its closed source, proprietary, whatever.
You might think MD5 or other cryptographic checksums might be able to come to the rescue, and while they do help to a degree, you can only MD5 the file before and/or after find_brca has run, but you can never know exactly what data find_brca used (without doing some serious low-level system probing with DTrace/ptrace, etc).
The root of the problem is that file paths do not have a 1:1 relationship with actual data. Only in a filesystem where files can only be requested by their checksum - and as soon as the data is modified its checksum is modified - can we ensure that when we feed find_brca the genome's file path 4fded1464736e77865df232cbcb4cd19, we are actually reading the correct genome.
Are there any filesystems that work like this? If I wanted to create such a filesystem because none currently exists, how would you recommend I go about doing it?
I have my doubts about the stability, but hashfs looks exactly like what you want: http://hashfs.readthedocs.io/en/latest/
HashFS is a content-addressable file management system. What does that mean? Simply, that HashFS manages a directory where files are saved based on the file’s hash. Typical use cases for this kind of system are ones where: Files are written once and never change (e.g. image storage). It’s desirable to have no duplicate files (e.g. user uploads). File metadata is stored elsewhere (e.g. in a database).
Note: Not to be confused with the hashfs, a student of mine did a couple of years ago: http://dl.acm.org/citation.cfm?id=1849837
I would say that the question is a little vague, however, there are several answers which can be given to parts of your questions.
First of all, not all filesystems lack path/data correspondence. On many (if not most) filesystems, the file is identified only by its path, not by any IDs.
Next, if you want to guarantee that the data is not changed while the application handles them, then the approach depends on the filesystem being used and the way this application works with the file (if it keeps it opened or opens and closes the file as needed).
Finally, if you are concerned by the attacker altering the data on the filesystem in some way while the file data are used, then you probably have a bigger problem, than just the file paths, and that problem should be addressed beforehand.
On a side note, you can implement a virtual file system (FUSE on Linux, our CBFS on Windows), which will feed your application with data taken from elsewhere, be it memory, a database or a cloud. This approach answers your question as well.
Update: if you want to get rid of file paths at all and have the data addressed by hash, then probably a NoSQL database, where the hash is the key, would be your best bet.
I have this funny idea: write some data (say variable of integer type) to the end of the executable itself and then read it on the next run.
Is this possible? Is it a bad thing to do (I'm pretty sure it's :) )? How one would approach this problem?
Additional:
I would prefer to do this with C under Linux OS, but answers with any combination of programming language/OS would be appreciated.
EDIT:
After some time playing with the idea, it became apparent that Linux won't allow to write to a file while it's being executed. However, it allows to delete it.
My vision of the writing process at this point:
make a copy of the program from withing a program
append data to the end of the copy
make a program to delete itself
rename copy to the original name
Will try to implement that as soon as I have some time.
If anyone is interested about how "delete itself" works under Linux - look for info about inode. It's not possible to do this under Windows, as far as I know (might be wrong).
EDIT 2:
Have implemented a working example under Linux with C.
It basically use a strategy described above, i.e. appending bits of data to the end of the copy program, deletes itself and renaming program to the original name. It accepts integers to save as single argument in the CLI, and prints old data as well.
This surely won't work under Windows (although I found some options on a quick search), but I'm curious how it's gonna behave under OS X.
Efficiency thoughts:
Obviously copying whole executable isn't efficient. I guess that something faster is possible with another helper executable that will do the same after program stops executing.
It's not reusing old space but just appending new data to the end on each run. This can be fixed with some footer reservation process (maybe will try to implement this in the future)
EDIT 3:
Surprisingly, it works with OS X! (ver. 10.11.5, default gcc).
There are several questions dealing with some aspects of this problem, but neither seems to answer it wholly. The whole problem can be summarized as follows:
You have an already compiled executable (obviously expecting the use of this technique).
You want to add an arbitrarily sized binary data to it (not necessarily by itself which would be another nasty problem to deal with).
You want the already compiled executable to be able to access this added binary data.
My particular use-case would be an interpreter, where I would like to make the user able to produce a single file executable out of an interpreter binary and the code he supplies (the interpreter binary being the executable which would have to be patched with the user supplied code as binary data).
A similar case are self-extracting archives, where a program (the archiving utility, such as zip) is capable to construct such an executable which contains a pre-built decompressor (the already compiled executable), and user-supplied data (the contents of the archive). Obviously no compiler or linker is involved in this process (Thanks, Mathias for the note and pointing out 7-zip).
Using existing questions a particular path of solution shows along the following examples:
appending data to an exe - This deals with the aspect of adding arbitrary data to arbitrary exes, without covering how to actually access it (basically simple append usually works, also true with Unix's ELF format).
Finding current executable's path without /proc/self/exe - In companion with the above, this would allow getting a file name to use for opening the exe, to access the added data. There are many more of these kind of questions, however neither focuses especially on the problem of getting a path suitable for the purpose of actually getting the binary opened as a file (which goal alone might (?) be easier to accomplish - truly you don't even need the path, just the binary opened for reading).
There also may be other, probably more elegant ways around this problem than padding the binary and opening the file for reading it in. For example could the executable be made so that it becomes rather trivial to patch it later with the arbitrarily sized data so it appears "within" it being in some proper data segment? (I couldn't really find anything on this, for fixed size data it should be trivial though unless the executable has some hash)
Can this be done reasonably well with as little deviation from standard C as possible? Even more or less cross-platform? (At least from maintenance standpoint) Note that it would be preferred if the program performing the adding of the binary data didn't rely on compiler tools to do it (which the user might not have), but solutions necessiting those might also be useful.
Note the already compiled executable criteria (the first point in the above list), which requires a completely different approach than solutions described in questions like C/C++ with GCC: Statically add resource files to executable/library or SDL embed image inside program executable , which ask for embedding data compile-time.
Additional notes:
The problems with the obvious approach outlined above and suggested in some comments, that to just append to the binary and use that, are as follows:
Opening the currently running program's binary doesn't seem something trivial (opening the executable for reading is, but not finding the path to supply to the file open call, at least not in a reasonably cross-platform manner).
The method of acquiring the path may provide an attack surface which probably wouldn't exist otherwise. This means that a potential attacker could trick the program to see different binary data (provided by him) like which the executable actually has, exposing any vulnerability which might reside in the parser of the data.
It depends on how you want other systems to see your binary.
Digital signed in Windows
The exe format allows for verifying the file has not been modified since publishing. This would allow you to :-
Compile your file
Add your data packet
Sign your file and publish it.
The advantage of following this system, is that "everybody" agrees your file has not been modified since signing.
The easiest way to achieve this scheme, is to use a resource. Windows resources can be added post- linking. They are protected by the authenticode digital signature, and your program can extract the resource data from itself.
It used to be possible to increase the signature to include binary data. Unfortunately this has been banned. There were binaries which used data in the signature section. Unfortunately this was used maliciously. Some details here msdn blog
Breaking the signature
If re-signing is not an option, then the result would be treated as insecure. It is worth noting here, that appended data is insecure, and can be modified without people being able to tell, but so is the code in your binary.
Appending data to a binary does break the digital signature, and also means the end-user can't tell if the code has been modified.
This means that any self-protection you add to your code to ensure the data blob is still secure, would not prevent your code from being modified to remove the check.
Running module
Windows GetModuleFileName allows the running path to be found.
Linux offers /proc/self or /proc/pid.
Unix does not seem to have a method which is reliable.
Data reading
The approach of the zip format, is to have a directory written to the end of the file. This means the data can be found at the end of the location, and then looked backwards for the start of the data. The advantage here, is the data blob is signposted from the end of the data, rather than the natural start.
I am attempting to modify the bio structure (in blk_types.h) for linux-3.2.0 (running Ubuntu). The only thing I need to do to this structure is to add an additional variable to keep track of an integer variable (it is for a tainting algorithm). However, adding a single line such as "int id;" to the structure halts the boot sequence of the OS.
It compiles, but when booting it gives the following error:
>Gave up wiating for root device. Common problems:
>Boot args
>check rootdelay= ...
>check root= ...
>missing modules (cat /proc/modules; ls /dev)
>ALERT! /dev/disk/by-uuid/15448888-84a0-4ccf-a02a-0feb3f150a84 does not exist. Dropping to a shell!
>BusyBox Built In Shell ...
>(initramfs)
I took a look around using the given shell and could not find the desired file system by uuid or otherwise (no /dev/sda). Any ideas what might be going on?
Thanks,
-Misiu
I suppose you are trying to modify the Linux kernel header bio.h, not its userland "friend" bui.h.
Said that I must warn you that in many places around kernel sizeof() may be used which is more portable and perhaps some other implementation or API may expect some fixed size. If the later is true then you'll have problems since bio' struct size has been changed by you.
It is a guessing with no further investigation from my side (I mean I hadn't investigate about bio in detail) but when patching the Linux kernel one must make sure of any possible side effects and take the whole scenario on account, specially when modifying lower levels implementation.
Bio helper functions do lots of low level operations on bio struct, take a loot at bio_integrity.c for example.
I managed to fix the problem with your help Caf. Though re-building/installing the modules did not seem to help immediately, I was able to get the system to boot by building the SATA drivers into the kernel, as advised by this forum thread: https://unix.stackexchange.com/questions/8405/kernel-cant-find-dev-sda-file-during-boot.
Thanks for your help,
-Misiu
According to MSDN an image is larger with "incremental linked" than without:
An incrementally linked program is functionally equivalent to a program that is nonincrementally linked. However, because it is prepared for subsequent incremental links, an incrementally linked executable (.exe) file or dynamic-link library (DLL) is larger than a nonincrementally linked program ...
I made a few tests in release mode (just to test the impact of this feature) and I don't see differences in the size of the image produced. How to explain this? Is the MSDN information wrong? Has anyone tried this and seen which impact this linking feature has on the (released) image file.
The quote you supplied says an incrementally linked program is larger, not smaller.
The size increases because not all of the executable is relinked every time, so some old code remains in the executable and is just no longer used. The next full (non-incremental) build will remove the no longer used code. You should always do a full build (non-incremental) before actual release to remove the extraneous noise (and reduce your executable size).
There's no way to tell know in advance what the size difference will be specifically, because it depends on the exact code in your application, what you change, how many times you change it, choices the linker makes about whether it can just do that change incrementally or not, how long it's been since the last full build, and a lot of other variable information.