I'm trying to write a basic userspace ELF loader that should be able to load statically linked (not dynamically linked) non-relocatable binaries (i.e. not built with -pie, -fPIE and so on). It should work on x86 CPU's for now.
I've followed the code on loading ELF file in C in user space and it works well when the executable is relocatable, but as expected completely fails if it isn't since the program is loaded in the wrong virtual memory range and instantly crashes.
But I tried modifying it to load the program at the virtual offset it expects (using phdr.p_vaddr) but I ran into a complication: my loader is already using that virtual memory range! I can't mmap it, much less write anything into it. How do I proceed so that I can load my non-relocatable binary into my loader's address space without overwriting the loader's own code before it's finished? Do I need to get my loader to run from a completely different virtual memory range, perhaps by getting the linker to link it way above the usual virtual memory range for a non-relocatable binary (which happens to start at 0x400000 in my case) or is there some trick to it?
I've read the ELF documentation (I am working with ELF64 here by the way, but I think ELF32 and ELF64 are very similar) and a lot of documents on the web and I still don't get it.
Can someone explain how an ELF loader deals with this particular complication? Thanks!
Archimedes called "heureka" when he found that at a location can only be one object. If your ELF binary must be at one location because you can't rebuild it for another location you have to relocate the loader itself.
The non-relocatable ELF doesn't include enough Information to move it to a different address. You could probably write a decompiler that detects all address references in the code but it's not worth. You will have problems when you try to analyze data references like pointers stored in pre-initialized variables.
Rewrite the loader if you can't get the source code of you ELF binary or a relocatable version.
BTW: Archimedes heureka was deadly for the goldsmith who cheated. I hope it's not so expensive in your case.
Related
I am trying to use the a.out format for my bootloader and I recall being able to do it in the past. ELF doesn't support 16 bit very well and produces a lot of undefined behavior when linked with C code. I am using BCC/dev86 to compile the code. The problem I'm having is finding any documentation on to where in memory you are supposed to place the text segment on a position-dependent 8086/real mode a.out file. It's in the header where the entry point is but I am unable to locate any sort of documentation of the loading of an a.out. Any help would be much appreciated
Usually, the first part of a bootloader (the 16 bit part) is written in assembly. This is mainly because a bootloader needs to do such low level tasks that using c is not really worth it. Once the bootloader gets itself to protected or long mode, it can use c. Another reason why using c is a bad idea for a bootloader is that a bootloader's task usually is to get the system into a state in which the kernel can start executing. On x86 this usually means going to protected mode. This involves linking some 16bit c code to some 32 bit c code (and maybe even some 64 bit code). This is really hard to do.
If you really think you want to continue with what you're doing: an a.out file is just an elf file. In bootloaders the cpu starts execution at address 0x7c00. So I suppose you should link the .text section to 0x7c00.
One more comment Id like to make, is that on modern cpus you don't even need to worry much about how to get sections from an a.out file. Usually UEFI can just boot elf files directly. And Qemu can as well.
Is it possible to execute a raw binary stored in a char array? I tried doing it like so:
#include "stdio.h"
int main(int argc, char **argv)
{
FILE *f = fopen(argv[1],"r");
if(!f)
return 1;
fseek(f,0,SEEK_END);
long l=ftell(f);
rewind(f);
char *buf = malloc(l+1);
fread(buf,1,l,f);
fclose(f);
void (*func)() = (void(*))buf;
func();
}
but it only gives my segfaults. I'm working on my own OS (from scratch), so I'm getting rid of them.
Apologies that this isn't exactly an answer but it's too long to fit as a comment...
I'm going to assume the intent of the file read with raw binary into a buffer is get code bytes into RAM, and you want to execute these bytes. Let's assume you've got the file I/O fixed, so now you have a buffer with code bytes. There's a few reasons why you could still segfault.
First, does your O/S implement virtual memory with page attributes such as read, write and execute? Most modern O/S's won't let you execute code on a page that isn't marked as code. (Marking pages this way is important to know what can be swapped and also to prevent malicious coding.)
Second, is the binary code you've loaded in fully relocatable? In other words, if the code has any JUMPs in it are they all relative? If there's any absolute JUMP ops in their then you need to run through a patch them up to line up where your buffer is in memory.
Third, is the binary code 100% self contained? If it calls out to any external functions then you need to patch those up, too.
Finally, does the binary code need to access data? If so, is all the data also in the binary and also relative addressed vs. absolute.
You might be able to do that but:
You cannot (in general) store your executable in the heap as your doing it here with malloc (nor in the stack for the same reason) because if your hardware supports it, your OS probably marks those areas as readable, writable but no executable (or at least it should do it).
You cannot just take the code of a compiled program, extract it to a file and expect to run it because it usually needs relocation, importing dynamic libraries, setting up another virtual memory area for the variables.
You could be able to do this with a simple handcrafted program which makes a system call to exit(0) ot prints "Hello World".
You might be able to use compiled code. For this, you would need to (at least):
compile a self-contained program (no imported dynamic libraries, link the libraries statically and recompile those statically linked library);
with position-independant code (-fpic of -fpie);
without any relocation (maybe -fvisibility=hidden might help?).
If you manage to do this, you might be able to generate a raw file from the PT_LOAD sections of the ELF file. It would probably need to be executable, readable and writable (because you'll have code and data). And you will probably have to prepend an instruction to jump to the entry point which might be in the middle of the file.
You might look at how ld.so is compiled: it is expected to be loaded anywhere in the virtual address space and has a subset of itself which is supposed to be functional before relocations (because ld.so relocates itself as fare as I understand).
But you should probably just try to implement a basic ELF loader instead (and properly handle relocations).
Generally the user program binaries will be loaded in low address (usually around 0x400000) in the programs address space which will be specified in the elf binary (in the case of linux).
Can we force a user binary to load at a high address, possibly within the 2GB range of addresses where libc or other such libraries are loaded?
I have tried finding a solution on the net but could not find any concrete solution for this.
(I am working on Ubuntu 12.10 64bit OS)
Thanks
Unless the binary is position-independent (PIE), this is not possible. Normal (non-PIE) binaries are hard-coded for a particular load address at link time, and during linking, the information necessary for relocating to a different address was already lost.
Edit: The above is assuming you're working with an existing binary. If you are producing the binary yourself, you can control the load address that's hard-coded into it with the following link options:
-Wl,-Ttext-segment,0x80000000
replacing 0x80000000 by your desired address. Certain addresses (such as those reserved for kernel use, typically beginning at 0xc0000000) will not work, and the address must be page-aligned (the last 3 hex digits must be 0).
I am developing a mutlicore project for our embedded architecture using the gnu toolchain. In this architecture, all independent cores share the same global flat memory space. Each core has its own internal memory, which is addressable from any other core through its global 32-bit address.
There is no OS implemented and we do low-level programming, but in C instead of assembly. Each core has its own executable, generated with a separate compilation. The current method we use for inter-core communication is through calculation of absolute addresses of objects in the destination core's data space. If we build the same code for all cores, then the objects are located by the linker in the same place, so accessing an object in a remote core is merely changing the high-order bits of the address of the object in the current core and making the transaction. Similar concept allows us to share objects that are located in the external DRAM.
Things start getting complicated when:
The code is not the same in the two cores, so objects may not be allocated in similar addresses,
We sometimes use a "host", which is another processor running some control code that requires access to objects in the cores, as well as shared objects in the external memory.
In order to overcome this problem, I am looking for an elegant way of placing variables in build time. I would like to avoid changing the linker script file as possible. However, it seems like in the C level, I could only control placement up to using a combination of the section attribute (which is too coarse) and the align attribute (which doesn't guarantee the exact place).
A possible hack is to use inline assembly to define the objects and explicitly place them (using the .org and .global keywords), but it seems somewhat ugly (and we did not yet actually test this idea...)
So, here's the questions:
Is there a semistandard way, or an elegant solution for manually placing objects in a C program?
Can I declare an "uber"-extarnel objects in my code and make the linker resolve their addresses using another project's executable?
This question describes a similar situation, but there the user references a pre-allocated resource (like a peripheral) whose address is known prior to build time.
Maybe you should try to use 'placement' tag from new operator. More exactly if you have already an allocated/shared memory you may create new objects on that. Please see: create objects in pre-allocated memory
You don't say exactly what sort of data you'll be sharing, but assuming it's mostly fixed-size statically allocated variables, I would place all the data in a single struct and share only that.
The key point here is that this struct must be shared code, even if the rest of the programs are not. It would be possible to append extra fields (perhaps with a version field so that the reader can interpret it correctly), but existing fields must not be removed or modifed. structs are already used as the interface between libraries everywhere, so their layout can be relied upon (although a little more care will be need in a heterogeneous environment, as long as the type sizes and alignments are the same you should be ok).
You can then share structs by either:
a) putting them in a special section and using the linker script to put that in a known location;
b) allocating the struct in static data, and placing a pointer to that at a known location, say in your assembly start-up files; or
c) as (b), but allocate the struct on the heap, and copy the pointer to the known pointer location at run-time. The has the advantage that the pointer can be pre-adjusted for external consumers, thus avoiding a certain amount of messing about.
Hope that helps
Response to question 1: no, there isn't.
As for the rest, it depends very much of the operating system you use. On our system at the time I was in embedded, we had only one processor's memory to handle (80186 and 68030 based), but had multi-tasking but from the same binary. Our tool chain was extended to handle the memory in a certain way.
The toolchain looked like that (on 80186):
Microsoft C 16bit or Borland-C
Linker linking to our specific crt.o which defined some special symbols and segments.
Microsoft linker, generating an exe and a map file with a MS-DOS address schema
A locator that adjusted the addresses in the executable and generated a flat binary
Address patcher.
An EPROM burner (later a Flash loader).
In our assembly we defined a symbol that was always at the beginning of data segment and we patched the binary with a hard coded value coming from the located map file. This allowed the library to use all the remaining memory as a heap.
In fact, if you haven't the controle on the locator (the elf loader on linux or the exe/dll loader in windows) you're screwed.
You're well off the beaten path here - don't expect anything 'standard' for any of this :)
This answer suggests a method of passing a list of raw addresses to the linker. When linking the external executable, generate a linker map file, then process it to produce this raw symbol table.
You could also try linking the entire program (all cores' programs) into a single executable. Use section definitions and a linker script to put each core's program into its internal memory address space; you can build each core's program separately, incrementally link it to a single .o file, then use objcopy to rename its sections to contain the core ID for the linker script, and rename (hide) private symbols if you're duplicating the same code across multiple cores. Finally, manually supply the start address for each core to your bootstrap code instead of using the normal start symbol.
I am writing a simple user-space ELF loader under Linux (why? for 'fun'). My loader at the moment is quite simple and is designed to load only statically-linked ELF files containing position-independent code.
Normally, when a program is loaded by the kernel's ELF loader, it is loaded into its own address space. As such, the data segment and code segment can be loaded at the correct virtual address as specified in the ELF segments.
In my case, however, I am requesting addresses from the kernel via mmap, and may or may not get the addresses requested in the ELF segments. This is not a problem for the code segment since it is position independent. However, if the data segment is not loaded at the expected address, code will not be able to properly reference anything stored in the data segment.
Indeed, my loader appears to work fine with a simple assembly executable that does not contain any data. But as soon as I add a data segment and reference it, the executable fails to run correctly or SEGFAULTs.
How, if possible, can I fixup any references to the data segment to point to the correct place? Is there a relocation section stored in the (static) ELF file for this purpose?
If you modify the absolute addresses available in the .got section, (global offset table) your program should work. Make sure to modify the absolute address calculation to cater for the new distance between .text and .data, I'm afraid you need to figure out where this information comes from, for your architecture.
See this: Global Offset Table (Processor-Specific)
Good luck.
I don't see any way you can do that, unless you emulate the kernel-provided virtual address space completely, and run the code inside that virtual space. When you mmap the data section from the file, you are intrinsically relocating it to an unknown address of the virtual address space of your ELF interpreter, and your code will not be able to reference to it in any way.
Glad to be proven wrong. There's something very cool to learn here.