Is it possible to execute a raw binary stored in a char array? I tried doing it like so:
#include "stdio.h"
int main(int argc, char **argv)
{
FILE *f = fopen(argv[1],"r");
if(!f)
return 1;
fseek(f,0,SEEK_END);
long l=ftell(f);
rewind(f);
char *buf = malloc(l+1);
fread(buf,1,l,f);
fclose(f);
void (*func)() = (void(*))buf;
func();
}
but it only gives my segfaults. I'm working on my own OS (from scratch), so I'm getting rid of them.
Apologies that this isn't exactly an answer but it's too long to fit as a comment...
I'm going to assume the intent of the file read with raw binary into a buffer is get code bytes into RAM, and you want to execute these bytes. Let's assume you've got the file I/O fixed, so now you have a buffer with code bytes. There's a few reasons why you could still segfault.
First, does your O/S implement virtual memory with page attributes such as read, write and execute? Most modern O/S's won't let you execute code on a page that isn't marked as code. (Marking pages this way is important to know what can be swapped and also to prevent malicious coding.)
Second, is the binary code you've loaded in fully relocatable? In other words, if the code has any JUMPs in it are they all relative? If there's any absolute JUMP ops in their then you need to run through a patch them up to line up where your buffer is in memory.
Third, is the binary code 100% self contained? If it calls out to any external functions then you need to patch those up, too.
Finally, does the binary code need to access data? If so, is all the data also in the binary and also relative addressed vs. absolute.
You might be able to do that but:
You cannot (in general) store your executable in the heap as your doing it here with malloc (nor in the stack for the same reason) because if your hardware supports it, your OS probably marks those areas as readable, writable but no executable (or at least it should do it).
You cannot just take the code of a compiled program, extract it to a file and expect to run it because it usually needs relocation, importing dynamic libraries, setting up another virtual memory area for the variables.
You could be able to do this with a simple handcrafted program which makes a system call to exit(0) ot prints "Hello World".
You might be able to use compiled code. For this, you would need to (at least):
compile a self-contained program (no imported dynamic libraries, link the libraries statically and recompile those statically linked library);
with position-independant code (-fpic of -fpie);
without any relocation (maybe -fvisibility=hidden might help?).
If you manage to do this, you might be able to generate a raw file from the PT_LOAD sections of the ELF file. It would probably need to be executable, readable and writable (because you'll have code and data). And you will probably have to prepend an instruction to jump to the entry point which might be in the middle of the file.
You might look at how ld.so is compiled: it is expected to be loaded anywhere in the virtual address space and has a subset of itself which is supposed to be functional before relocations (because ld.so relocates itself as fare as I understand).
But you should probably just try to implement a basic ELF loader instead (and properly handle relocations).
Related
I need to be able to unmap files that were opened through some libraries I'm linking with. The reason for needing to do this is that the mappings made by these libraries hold references to modules that may need to be reloaded while the program is executing (potentially long running executions). The problem is the modules cannot be unloaded while my process is holding a reference.
I've written C code to parse the information in proc/self/maps in an effort to read the address range of the mappings and calculate its length. I've calculated the length by subtracting the starting address from the ending address then I pass the starting address and the calculated length as the respective parameters to munmap. The problem is that munmap fails with EINVAL (Invalid Argument).
I've checked the size of the page my machine uses with sysconf(_SC_PAGESIZE) and it returned 4096, which is the value of my calculated length. The GNU manual says that munmap can fail with EINVAL if:
The memory range given was outside the user mmap range or wasn’t page
aligned.
Am I missing anything or is this at all not possible? My last resort would be to carefully comb through the system calls made and examine each mmap via strace, but I'd like for this to be a last resort, thanks.
The method you have chosen will not work, but, not necessarily for the reason(s) you think. However, there is a way, so read on ...
I need to be able to unmap files that were opened through some libraries I'm linking with.
Once you've let the ELF loader (e.g. ld.linux.so) load the libraries for you, you've lost control.
You can not just unmap the area [regardless of method]. The loader has already done relocations and symbol linkages for these libraries. The unmap removes the area, but now everything breaks because the various pointers set up by the linker now point to empty space. The loader will have no knowledge of what you've done.
Then, how would you remap the new version of the library [and where in memory]? Even remapping it to the same address is no guarantee because you can't adjust the things that the loader has already done.
The reason for needing to do this is that the mappings made by these libraries hold references to modules that may need to be reloaded while the program is executing (potentially long running executions).
Most programs that need to update to new versions of libraries simply reexec themselves. If you need to preserve the data, you can work out a dump/restore mechanism.
However, if you truly want to unload/load a newer library version, you can do this using dynamic linking.
Instead of using ld to link with (e.g.) libA, leave if off the ld command line and have the program do its own loading of libA.
You use dlopen/dlsym/dlclose to open/load the library under your control.
You'll have to keep track of the symbol tables, but changing to the new version is easy. When you want the new version, simply do dlclose and then a dlopen. You'll have to redo the dlsym calls to get the updated addresses, but all this is fairly easy and standard
The problem is the modules cannot be unloaded while my process is holding a reference.
The reason is that ELF loader did this. With dlopen et. al., you don't have the same problem.
Yes it is possible. I had a closer look at my code; there was something I was misunderstanding when it came to passing the start address of the mapping to munmap. I had initially read the start address into an unsigned long long and for some reason, I was converting this value into a hex string instead of just casting to a void pointer when calling munmap. In essence:
/* Values assigned here are really read from /proc/self/maps */
unsigned long long vm_start = 140013986873344;
unsigned long long vm_end = 140013986877440;
unsigned long long length = vm_end - vm_start;
munmap((void *)vm_start, length);
I'm trying to write a basic userspace ELF loader that should be able to load statically linked (not dynamically linked) non-relocatable binaries (i.e. not built with -pie, -fPIE and so on). It should work on x86 CPU's for now.
I've followed the code on loading ELF file in C in user space and it works well when the executable is relocatable, but as expected completely fails if it isn't since the program is loaded in the wrong virtual memory range and instantly crashes.
But I tried modifying it to load the program at the virtual offset it expects (using phdr.p_vaddr) but I ran into a complication: my loader is already using that virtual memory range! I can't mmap it, much less write anything into it. How do I proceed so that I can load my non-relocatable binary into my loader's address space without overwriting the loader's own code before it's finished? Do I need to get my loader to run from a completely different virtual memory range, perhaps by getting the linker to link it way above the usual virtual memory range for a non-relocatable binary (which happens to start at 0x400000 in my case) or is there some trick to it?
I've read the ELF documentation (I am working with ELF64 here by the way, but I think ELF32 and ELF64 are very similar) and a lot of documents on the web and I still don't get it.
Can someone explain how an ELF loader deals with this particular complication? Thanks!
Archimedes called "heureka" when he found that at a location can only be one object. If your ELF binary must be at one location because you can't rebuild it for another location you have to relocate the loader itself.
The non-relocatable ELF doesn't include enough Information to move it to a different address. You could probably write a decompiler that detects all address references in the code but it's not worth. You will have problems when you try to analyze data references like pointers stored in pre-initialized variables.
Rewrite the loader if you can't get the source code of you ELF binary or a relocatable version.
BTW: Archimedes heureka was deadly for the goldsmith who cheated. I hope it's not so expensive in your case.
I found a C code that looks like this:
#include <stdio.h>
char code[] =
"\x31\xd2\xb2\x30\x64\x8b\x12\x8b\x52\x0c\x8b\x52\x1c\x8b\x42"
"\x08\x8b\x72\x20\x8b\x12\x80\x7e\x0c\x33\x75\xf2\x89\xc7\x03"
"\x78\x3c\x8b\x57\x78\x01\xc2\x8b\x7a\x20\x01\xc7\x31\xed\x8b"
"\x34\xaf\x01\xc6\x45\x81\x3e\x46\x61\x74\x61\x75\xf2\x81\x7e"
"\x08\x45\x78\x69\x74\x75\xe9\x8b\x7a\x24\x01\xc7\x66\x8b\x2c"
"\x6f\x8b\x7a\x1c\x01\xc7\x8b\x7c\xaf\xfc\x01\xc7\x68\x72\x6c"
"\x64\x01\x68\x6c\x6f\x57\x6f\x68\x20\x48\x65\x6c\x89\xe1\xfe"
"\x49\x0b\x31\xc0\x51\x50\xff\xd7";
int main(void)
{
int (*func)();
func = (int(*)()) code;
(int)(*func)();
return 0;
}
For the given HEX CODE this program runs well and printing ("HelloWorld"). I was thinking that the HEX CODE is some machine instructions and by calling a function pointer that's pointing to that CODE we are executing that CODE.
Was my thought right? is there something to improve it?
How this HEX CODE gets generated?
Tanks for advance.
You are correct that by forcing a function pointer like this you are calling into machine instructions written as a hexadecimal string variable.
I doubt that a program like this would work on any CPU since about 2005.
On most RISC CPUs (like ARM) and on all Intel and AMD CPUs that support 64-bit, memory pages have a No Execute bit. Or in reverse an Execute bit.
On memory pages that do not have an Execute bit, the CPU will not run code. Compilers do not put variables into executable memory pages.
In order to run injected shell codes, attackers now have to use "return into libc" or function pointer overwrite attacks which set things up to call mprotect or VirtualProtect to set the execute bit on their shell code. Either that or get it injected into a executable space such as the Java, .NET, or Javascript JIT compiler uses.
Security hardened kernels will deny the ability to call mprotect. Once the program's address space is set by the dynamic library loader, it sets a security flag and no new executable pages can be created.
In order to make it always work you could assign some executable_readwrite space with malloc or the like and put the code in there and then execute it. Then there won't be any access violation faults.
void main(int argc, char** argv)
{
void* PointerToNewMemoryRegion=0;
void (*FunctionPointer) ();
PointerToNewMemoryRegion=VirtualAlloc(RandomPointer,113,MEM_COMMIT | MEM_RESERVE,PAGE_EXECUTE_READWRITE);
if (PointerToNewMemoryRegion == NULL)
{
std::cout<<"Failed to Allocate Memory region Error code: "<<GetLastError();
return;
}
memcpy(PointerToNewMemoryRegion, code,113);
FunctionPointer = (void(*)()) PointerToNewMemoryRegion;
(void)(*FunctionPointer) ();
VirtualFree(PointerToNewMemoryRegion,113,MEM_DECOMMIT)
}
but the code never returns to my code to execute so my last line is pointless. So my code has a memory leak.
To ask this question from a "general C" point of view isn't all that meaningful.
First of all, your code has many major problems:
The literal "\xFF\xFF\xFF" equals 0xFFFFFF00, not 0x00FFFFFF as may or may not have been the intention.
What this hex code means and if it is at all meaningful, is endian-dependent and also depends on the address bus width of the given CPU.
As others have mentioned, casts between function pointers and regular pointers isn't supported or well-defined by C, the C standard lists it as a "common extension".
That being said, code like this has about one single purpose, and that is various forms of boot loaders and self-updating software used in embedded systems.
Suppose for example that you have a boot loader program that is tasked with re-programming something in the very same segment of flash memory where said program itself is executed from. That is impossible because of the way the memory hardware works. So in order to do so, you would have to execute the actual flash programming routine from RAM. Since the array of hex gibberish is stored in RAM, the program can execute from there with the function pointer trick, assuming that the C compiler has a non-standard extension that allows the cast.
As for how to generate the code, you either write it all in assembler and then translate the assembler instructions to op codes manually (very tedious). Or more likely, you write the function in C and then disassemble it and copy/paste the op codes from the disassembly.
The latter is more dangerous though, as the critical part of getting code like this to work is calling convention: you must be absolutely sure that the function stacks/unstacks things properly when it is called and when it is done, restoring the contents of any CPU registers used etc. Which may force you to write part of the function in assembler anyhow. Needless to say, the code will be completely non-portable.
This question already has an answer here:
which part of ELF file must be loaded into the memory?
(1 answer)
Closed 9 years ago.
I've been trying to figure this out for days. Clearly, I'm too inexperienced to understand the actual code from various examples, and no matter how hard I try, I cannot find an explanation simple enough to follow. This is really not my cup of tea.
My question is, could I get a link (or answer) that has some very easy-to-understand pseudo-code or explanation of how to do the following:
In a c program, load another ELF executable into memory, set up memory and stack and all other necessary variables, and then execute it.
I understand the basic concepts, but it's just not coming together for me. I've checked many other sources, including here on StackOverflow, and they're all too complicated for my idiot brain to understand.
Thank you.
On Linux x86 execve is syscall number 11, which can be called with:
long execve(const char *filename, char *const argv[], char *const envp[]){
long r;
asm volatile("int $128" : "=a"(r):"a"(11),"b"(filename),"c"(argv),"d"(envp):"memory");
return r;
}
Which is how most libc will implement it (more indirectly though with error handling etc...)
To see how the execve syscall works, check out the linux kernel source.
In a c program, load another ELF executable into memory, set up memory and stack and all other necessary variables, and then execute it.
You can't really do this in a C program, because the C program is already loaded into memory (from its own ELF image) and running. The two ELF images are going to conflict with each other, so when you try to map the new image, you'll screw up the old (running) image part way through and things will not work.
The first thing the kernel does when exec'ing an image is that it clears out (empties) the user address space, so the new image can be loaded without any conflicts.
Now that said, you CAN (with careful linking scripts) arrange to build two ELF images with no conflicts, so that the first one can load the second one and both can exist in memory at the same time. That's essentially the way that ld.so (the dynamic linker) works -- it's linked at special addresses so it can coexist with 'normal' programs. But any two 'normal' ELF executables are going to want to live at the same address(es)
I had a little too much time on my hands and started wondering if I could write a self-modifying program. To that end, I wrote a "Hello World" in C, then used a hex editor to find the location of the "Hello World" string in the compiled executable. Is it possible to modify this program to open itself and overwrite the "Hello World" string?
char* str = "Hello World\n";
int main(int argc, char* argv) {
printf(str);
FILE * file = fopen(argv, "r+");
fseek(file, 0x1000, SEEK_SET);
fputs("Goodbyewrld\n", file);
fclose(file);
return 0;
}
This doesn't work, I'm assuming there's something preventing it from opening itself since I can split this into two separate programs (A "Hello World" and something to modify it) and it works fine.
EDIT: My understanding is that when the program is run, it's loaded completely into ram. So the executable on the hard drive is, for all intents and purposes a copy. Why would it be a problem for it to modify itself?
Is there a workaround?
Thanks
On Windows, when a program is run the entire *.exe file is mapped into memory using the memory-mapped-file functions in Windows. This means that the file isn't necessarily all loaded at once, but instead the pages of the file are loaded on-demand as they are accessed.
When the file is mapped in this way, another application (including itself) can't write to the same file to change it while it's running. (Also, on Windows the running executable can't be renamed either, but it can on Linux and other Unix systems with inode-based filesystems).
It is possible to change the bits mapped into memory, but if you do this the OS does it using "copy-on-write" semantics, which means that the underlying file isn't changed on disk, but a copy of the page(s) in memory is made with your modifications. Before being allowed to do this though, you usually have to fiddle with protection bits on the memory in question (e.g. VirtualProtect).
At one time, it used to be common for low-level assembly programs that were in very constrained memory environments to use self-modifying code. However, nobody does this anymore because we're not running in the same constrained environments, and modern processors have long pipelines that get very upset if you start changing code from underneath them.
If you are using Windows, you can do the following:
Step-by-Step Example:
Call VirtualProtect() on the code pages you want to modify, with the PAGE_WRITECOPY protection.
Modify the code pages.
Call VirtualProtect() on the modified code pages, with the PAGE_EXECUTE protection.
Call FlushInstructionCache().
For more information, see How to Modify Executable Code in Memory (Archived: Aug. 2010)
It is very operating system dependent. Some operating systems lock the file, so you could try to cheat by making a new copy of it somewhere, but the you're just running another compy of the program.
Other operating systems do security checks on the file, e.g. iPhone, so writing it will be a lot of work, plus it resides as a readonly file.
With other systems you might not even know where the file is.
All present answers more or less revolve around the fact that today you cannot easily do self-modifying machine code anymore. I agree that that is basically true for today's PCs.
However, if you really want to see own self-modifying code in action, you have some possibilities available:
Try out microcontrollers, the simpler ones do not have advanced pipelining. The cheapest and quickest choice I found is an MSP430 USB-Stick
If an emulation is ok for you, you can run an emulator for an older non-pipelined platform.
If you wanted self-modifying code just for the fun of it, you can have even more fun with self-destroying code (more exactly enemy-destroying) at Corewars.
If you are willing to move from C to say a Lisp dialect, code that writes code is very natural there. I would suggest Scheme which is intentionally kept small.
If we're talking about doing this in an x86 environment it shouldn't be impossible. It should be used with caution though because x86 instructions are variable-length. A long instruction may overwrite the following instruction(s) and a shorter one will leave residual data from the overwritten instruction which should be noped (NOP instruction).
When the x86 first became protected the intel reference manuals recommended the following method for debugging access to XO (execute only) areas:
create a new, empty selector ("high" part of far pointers)
set its attributes to that of the XO area
the new selector's access properties must be set RO DATA if you only want to look at what's in it
if you want to modify the data the access properties must be set to RW DATA
So the answer to the problem is in the last step. The RW is necessary if you want to be able to insert the breakpoint instruction which is what debuggers do. More modern processors than the 80286 have internal debug registers to enable non-intrusive monitoring functionality which could result in a breakpoint being issued.
Windows made available the building blocks for doing this starting with Win16. They are probably still in place. I think Microsoft calls this class of pointer manipulation "thunking."
I once wrote a very fast 16-bit database engine in PL/M-86 for DOS. When Windows 3.1 arrived (running on 80386s) I ported it to the Win16 environment. I wanted to make use of the 32-bit memory available but there was no PL/M-32 available (or Win32 for that matter).
to solve the problem my program used thunking in the following way
defined 32-bit far pointers (sel_16:offs_32) using structures
allocated 32-bit data areas (<=> >64KB size) using global memory and received them in 16-bit far pointer (sel_16:offs_16) format
filled in the data in the structures by copying the selector, then calculating the offset using 16-bit multiplication with 32-bit results.
loaded the pointer/structure into es:ebx using the instruction size override prefix
accessed the data using a combination of the instruction size and operand size prefixes
Once the mechanism was bug free it worked without a hitch. The largest memory areas my program used were 2304*2304 double precision which comes out to around 40MB. Even today, I would call this a "large" block of memory. In 1995 it was 30% of a typical SDRAM stick (128 MB PC100).
There are non-portable ways to do this on many platforms. In Windows you can do this with WriteProcessMemory(), for example. However, in 2010 it's usually a very bad idea to do this. This isn't the days of DOS where you code in assembly and do this to save space. It's very hard to get right, and you're basically asking for stability and security problems. Unless you are doing something very low-level like a debugger I would say don't bother with this, the problems you will introduce are not worth whatever gain you might have.
Self-modifying code is used for modifications in memory, not in file (like run-time unpackers as UPX do). Also, the file representation of a program is more difficult to operate because of relative virtual addresses, possible relocations and modifications to the headers needed for most updates (eg. by changing the Hello world! to longer Hello World you'll need to extend the data segment in file).
I'll suggest that you first learn to do it in memory. For file updates the simplest and more generic approach would be running a copy of the program so that it would modify the original.
EDIT: And don't forget about the main reasons the self-modifying code is used:
1) Obfuscation, so that the code that is actually executed isn't the code you'll see with simple statical analysis of the file.
2) Performance, something like JIT.
None of them benefits from modifying the executable.
If you operating on Windows, I believe it locks the file to prevent it from being modified while its being run. Thats why you often needs to exit a program in order to install an update. The same is not true on a linux system.
On newer versions of Windows CE (atleast 5.x an newer) where apps run in user space, (compared to earlier versions where all apps ran in supervisor mode), apps cannot even read it's own executable file.