This question already has an answer here:
which part of ELF file must be loaded into the memory?
(1 answer)
Closed 9 years ago.
I've been trying to figure this out for days. Clearly, I'm too inexperienced to understand the actual code from various examples, and no matter how hard I try, I cannot find an explanation simple enough to follow. This is really not my cup of tea.
My question is, could I get a link (or answer) that has some very easy-to-understand pseudo-code or explanation of how to do the following:
In a c program, load another ELF executable into memory, set up memory and stack and all other necessary variables, and then execute it.
I understand the basic concepts, but it's just not coming together for me. I've checked many other sources, including here on StackOverflow, and they're all too complicated for my idiot brain to understand.
Thank you.
On Linux x86 execve is syscall number 11, which can be called with:
long execve(const char *filename, char *const argv[], char *const envp[]){
long r;
asm volatile("int $128" : "=a"(r):"a"(11),"b"(filename),"c"(argv),"d"(envp):"memory");
return r;
}
Which is how most libc will implement it (more indirectly though with error handling etc...)
To see how the execve syscall works, check out the linux kernel source.
In a c program, load another ELF executable into memory, set up memory and stack and all other necessary variables, and then execute it.
You can't really do this in a C program, because the C program is already loaded into memory (from its own ELF image) and running. The two ELF images are going to conflict with each other, so when you try to map the new image, you'll screw up the old (running) image part way through and things will not work.
The first thing the kernel does when exec'ing an image is that it clears out (empties) the user address space, so the new image can be loaded without any conflicts.
Now that said, you CAN (with careful linking scripts) arrange to build two ELF images with no conflicts, so that the first one can load the second one and both can exist in memory at the same time. That's essentially the way that ld.so (the dynamic linker) works -- it's linked at special addresses so it can coexist with 'normal' programs. But any two 'normal' ELF executables are going to want to live at the same address(es)
Related
Is it possible to execute a raw binary stored in a char array? I tried doing it like so:
#include "stdio.h"
int main(int argc, char **argv)
{
FILE *f = fopen(argv[1],"r");
if(!f)
return 1;
fseek(f,0,SEEK_END);
long l=ftell(f);
rewind(f);
char *buf = malloc(l+1);
fread(buf,1,l,f);
fclose(f);
void (*func)() = (void(*))buf;
func();
}
but it only gives my segfaults. I'm working on my own OS (from scratch), so I'm getting rid of them.
Apologies that this isn't exactly an answer but it's too long to fit as a comment...
I'm going to assume the intent of the file read with raw binary into a buffer is get code bytes into RAM, and you want to execute these bytes. Let's assume you've got the file I/O fixed, so now you have a buffer with code bytes. There's a few reasons why you could still segfault.
First, does your O/S implement virtual memory with page attributes such as read, write and execute? Most modern O/S's won't let you execute code on a page that isn't marked as code. (Marking pages this way is important to know what can be swapped and also to prevent malicious coding.)
Second, is the binary code you've loaded in fully relocatable? In other words, if the code has any JUMPs in it are they all relative? If there's any absolute JUMP ops in their then you need to run through a patch them up to line up where your buffer is in memory.
Third, is the binary code 100% self contained? If it calls out to any external functions then you need to patch those up, too.
Finally, does the binary code need to access data? If so, is all the data also in the binary and also relative addressed vs. absolute.
You might be able to do that but:
You cannot (in general) store your executable in the heap as your doing it here with malloc (nor in the stack for the same reason) because if your hardware supports it, your OS probably marks those areas as readable, writable but no executable (or at least it should do it).
You cannot just take the code of a compiled program, extract it to a file and expect to run it because it usually needs relocation, importing dynamic libraries, setting up another virtual memory area for the variables.
You could be able to do this with a simple handcrafted program which makes a system call to exit(0) ot prints "Hello World".
You might be able to use compiled code. For this, you would need to (at least):
compile a self-contained program (no imported dynamic libraries, link the libraries statically and recompile those statically linked library);
with position-independant code (-fpic of -fpie);
without any relocation (maybe -fvisibility=hidden might help?).
If you manage to do this, you might be able to generate a raw file from the PT_LOAD sections of the ELF file. It would probably need to be executable, readable and writable (because you'll have code and data). And you will probably have to prepend an instruction to jump to the entry point which might be in the middle of the file.
You might look at how ld.so is compiled: it is expected to be loaded anywhere in the virtual address space and has a subset of itself which is supposed to be functional before relocations (because ld.so relocates itself as fare as I understand).
But you should probably just try to implement a basic ELF loader instead (and properly handle relocations).
Problem:
run a non-trivial c program stored on the heap or data section of another c program as asm instructions.
My progress:
Ran a set of simple instructions that print something to stdout. The instructions are stored on the heap and I allowed the page containing the instructions to be executed and then calling into the raw data as though it was a function. This worked fine.
Next up, I want given any statically linked c program, to just read it's binary and be able to run it's main function while it is in memory from another c program.
I believe the issues are:
* jumping to where the main function code is
* changing the binary file's addresses which were created when linking so they are relative to where the code lies now in memory
Please let me know if my approach is good or whether I missed something important and what is the best way to go about it.
Thank you
Modern OSes try not to let you execute code in your data exactly because it's a security nightmare. http://en.wikipedia.org/wiki/No-execute_bit
Even if you get past that, there will be lots more 'gotchas' because both programs will think that they 'own' the stack/heap/etc. Once the new program executes, it's various bits of RAM from the old program will get stomped on. (exec exists just for this reason, to cleanly go from one program to another.)
If you really need to load code, you should make the first one a library, then use dlopen to run it. (You can use objcopy to extract just the subroutine you want and turn it into a library.)
Alternately, you can start the program (in another process) and use strace to inject a little bit of your code into their process to control it.
(If you're really trying to get into shell code, you should have said so. That's a whole 'nother can of worms.)
This question is about Mac OS Classic, which has been obsolete for several years now. I hope someone still knows something about it!
I've been building a PEF executable parser for the past few weeks and I've plugged a PowerPC interpreter to it. With a good dose of wizardry, I would expect to be able to run (to some extent) some Mac OS 9 programs under Mac OS X. In fact, I'm now ready to begin testing with small applications.
To help me with that, I have installed an old version of Mac OS inside SheepShaver and downloaded the (now free) MPW Tools1, and I built a "hello world" MPW tool (just your classic puts("Hello World!") C program, except compiled for Mac OS 9).
When built, this generates a program with a code section and a data section. I expected that I would be able to just jump to the main symbol of the executable (as specified in the header of the loader section), but I hit a big surprise: the compiler placed the main symbol inside the data section.
Obviously, there's no executable code in the data section.
Going back to the Mac OS Runtime Architectures document (published in 1997, surprisingly still up on Apple's website), I found out that this is totally legal:
Using the Main Symbol as a Data Structure
As mentioned before, the
main symbol does not have to point to a routine, but can point to a
block of data instead. You can use this fact to good effect with
plug-ins, where the block of data referenced by the main symbol can
contain essential information about the plug-in. Using the main symbol
in this fashion has several advantages:
The Code Fragment Manager
returns the address of the main symbol when you programmatically
prepare a fragment, so you do not need to call FindSymbol.
You do
not have to reserve and document the specific name of an export for
your plug-in.
However, not having a specific symbol name means that
the plug-in’s purpose is not quite as obvious. A plug-in can store its
name, icon, or information about its symbols in the main symbol data
structure. Storing symbolic information in this fashion eliminates the
need for multiple FindSymbol calls.
My conclusion, therefore, is that MPW tools run as plugins inside the MPW shell, and that the executable's main symbol points to some data structure that should tell it how to start.
But that still doesn't help me figure out what's in that data structure, and just looking at its hex dump has not been very instructive (I have an idea where the compiler put the __start address for this particular program, but that's definitely not enough to make a generic MPW shell "replacement"). And obviously, most valuable information sources on this topic seem to have disappeared with Mac OS 9 in 2004.
So, what is the format of the data structure pointed by the main symbol of a MPW tool?
1. Apparently, Apple very recently pulled the plug of the FTP server that I got the MPW Tools from, so it probably is not available anymore; though a google search for "MPW_GM.img.bin" does find some alternatives).
As it turns out, it's not too complicated. That "data structure" is simply a transition vector.
I didn't realize it right away because of bugs in my implementation of the relocation virtual machine that made these two pointers look like garbage.
Transition vectors are structures that contain (in this order) an entry point (4 bytes) and a "table of contents" offset (4 bytes). This offset should be loaded into register r2 before executing the code pointed to by the entry point.
(The Mac OS Classic runtime only uses the first 8 bytes of a transition vector, but they can technically be of any size. The address of the transition vector is always passed in r12 so the callee may access any additional information it would need.)
How do I inspect in what parts of my memory my heap, stack etc lie? I am currently looking at a program in C, and in looking at the .elf file I can see what memory addresses the program is using, but I don't know if it's in the heap or stack.
That's quite hard to know from a static analysis of the compiled code itself. You should be able to see any static initialized data areas, and also static uninitialized (BSS) sections, but exactly how those are loaded with respect to stack, heap and so on is down to the platform's executable loader.
If you are working in embedded platform , you should probably use some linker scripts(lcf files) along with building the program, then you can identify in detail all the sections(stack,heap,intvec,bss,text,code) ,its placement in the memory (whether in L1 cache,L2 cache or DDR) and its starting/ending address while loading into the board.
The thing is that, please have a look into the linker manual(you can find it in the compiler installation directory) for proper understanding of the keywords in the lcf.
Also there is one more way to analyse the sections, you can create the "map file" for your project and go through it.It will list all sections in the program and its addresses.
you could try using ollydbg, which is a free debugger. the one drawback to this is it shows everything in assembly form, but it will show you what's in your stack, heap, and even what is in your registers. I'm not sure if this is what you are looking for.
I had a little too much time on my hands and started wondering if I could write a self-modifying program. To that end, I wrote a "Hello World" in C, then used a hex editor to find the location of the "Hello World" string in the compiled executable. Is it possible to modify this program to open itself and overwrite the "Hello World" string?
char* str = "Hello World\n";
int main(int argc, char* argv) {
printf(str);
FILE * file = fopen(argv, "r+");
fseek(file, 0x1000, SEEK_SET);
fputs("Goodbyewrld\n", file);
fclose(file);
return 0;
}
This doesn't work, I'm assuming there's something preventing it from opening itself since I can split this into two separate programs (A "Hello World" and something to modify it) and it works fine.
EDIT: My understanding is that when the program is run, it's loaded completely into ram. So the executable on the hard drive is, for all intents and purposes a copy. Why would it be a problem for it to modify itself?
Is there a workaround?
Thanks
On Windows, when a program is run the entire *.exe file is mapped into memory using the memory-mapped-file functions in Windows. This means that the file isn't necessarily all loaded at once, but instead the pages of the file are loaded on-demand as they are accessed.
When the file is mapped in this way, another application (including itself) can't write to the same file to change it while it's running. (Also, on Windows the running executable can't be renamed either, but it can on Linux and other Unix systems with inode-based filesystems).
It is possible to change the bits mapped into memory, but if you do this the OS does it using "copy-on-write" semantics, which means that the underlying file isn't changed on disk, but a copy of the page(s) in memory is made with your modifications. Before being allowed to do this though, you usually have to fiddle with protection bits on the memory in question (e.g. VirtualProtect).
At one time, it used to be common for low-level assembly programs that were in very constrained memory environments to use self-modifying code. However, nobody does this anymore because we're not running in the same constrained environments, and modern processors have long pipelines that get very upset if you start changing code from underneath them.
If you are using Windows, you can do the following:
Step-by-Step Example:
Call VirtualProtect() on the code pages you want to modify, with the PAGE_WRITECOPY protection.
Modify the code pages.
Call VirtualProtect() on the modified code pages, with the PAGE_EXECUTE protection.
Call FlushInstructionCache().
For more information, see How to Modify Executable Code in Memory (Archived: Aug. 2010)
It is very operating system dependent. Some operating systems lock the file, so you could try to cheat by making a new copy of it somewhere, but the you're just running another compy of the program.
Other operating systems do security checks on the file, e.g. iPhone, so writing it will be a lot of work, plus it resides as a readonly file.
With other systems you might not even know where the file is.
All present answers more or less revolve around the fact that today you cannot easily do self-modifying machine code anymore. I agree that that is basically true for today's PCs.
However, if you really want to see own self-modifying code in action, you have some possibilities available:
Try out microcontrollers, the simpler ones do not have advanced pipelining. The cheapest and quickest choice I found is an MSP430 USB-Stick
If an emulation is ok for you, you can run an emulator for an older non-pipelined platform.
If you wanted self-modifying code just for the fun of it, you can have even more fun with self-destroying code (more exactly enemy-destroying) at Corewars.
If you are willing to move from C to say a Lisp dialect, code that writes code is very natural there. I would suggest Scheme which is intentionally kept small.
If we're talking about doing this in an x86 environment it shouldn't be impossible. It should be used with caution though because x86 instructions are variable-length. A long instruction may overwrite the following instruction(s) and a shorter one will leave residual data from the overwritten instruction which should be noped (NOP instruction).
When the x86 first became protected the intel reference manuals recommended the following method for debugging access to XO (execute only) areas:
create a new, empty selector ("high" part of far pointers)
set its attributes to that of the XO area
the new selector's access properties must be set RO DATA if you only want to look at what's in it
if you want to modify the data the access properties must be set to RW DATA
So the answer to the problem is in the last step. The RW is necessary if you want to be able to insert the breakpoint instruction which is what debuggers do. More modern processors than the 80286 have internal debug registers to enable non-intrusive monitoring functionality which could result in a breakpoint being issued.
Windows made available the building blocks for doing this starting with Win16. They are probably still in place. I think Microsoft calls this class of pointer manipulation "thunking."
I once wrote a very fast 16-bit database engine in PL/M-86 for DOS. When Windows 3.1 arrived (running on 80386s) I ported it to the Win16 environment. I wanted to make use of the 32-bit memory available but there was no PL/M-32 available (or Win32 for that matter).
to solve the problem my program used thunking in the following way
defined 32-bit far pointers (sel_16:offs_32) using structures
allocated 32-bit data areas (<=> >64KB size) using global memory and received them in 16-bit far pointer (sel_16:offs_16) format
filled in the data in the structures by copying the selector, then calculating the offset using 16-bit multiplication with 32-bit results.
loaded the pointer/structure into es:ebx using the instruction size override prefix
accessed the data using a combination of the instruction size and operand size prefixes
Once the mechanism was bug free it worked without a hitch. The largest memory areas my program used were 2304*2304 double precision which comes out to around 40MB. Even today, I would call this a "large" block of memory. In 1995 it was 30% of a typical SDRAM stick (128 MB PC100).
There are non-portable ways to do this on many platforms. In Windows you can do this with WriteProcessMemory(), for example. However, in 2010 it's usually a very bad idea to do this. This isn't the days of DOS where you code in assembly and do this to save space. It's very hard to get right, and you're basically asking for stability and security problems. Unless you are doing something very low-level like a debugger I would say don't bother with this, the problems you will introduce are not worth whatever gain you might have.
Self-modifying code is used for modifications in memory, not in file (like run-time unpackers as UPX do). Also, the file representation of a program is more difficult to operate because of relative virtual addresses, possible relocations and modifications to the headers needed for most updates (eg. by changing the Hello world! to longer Hello World you'll need to extend the data segment in file).
I'll suggest that you first learn to do it in memory. For file updates the simplest and more generic approach would be running a copy of the program so that it would modify the original.
EDIT: And don't forget about the main reasons the self-modifying code is used:
1) Obfuscation, so that the code that is actually executed isn't the code you'll see with simple statical analysis of the file.
2) Performance, something like JIT.
None of them benefits from modifying the executable.
If you operating on Windows, I believe it locks the file to prevent it from being modified while its being run. Thats why you often needs to exit a program in order to install an update. The same is not true on a linux system.
On newer versions of Windows CE (atleast 5.x an newer) where apps run in user space, (compared to earlier versions where all apps ran in supervisor mode), apps cannot even read it's own executable file.