How to create an ELF executable from process memory image - c

First of all, Engilish is not my native language. Please excuse if there are any mistakes.
As stated above, I want to create an ELF executable from process memory image. Up until now, I successfully extracted an ELF Header, Program Headers and a list of Elf64_Dyn structures resides in Dynamic segment. I also restored GOT. However, I can't figure out how to reconstruct section headers.
The problem is when an ELF executable is loaded into memory, section headers are not loaded. If we use a list of Elf64_Dyn structures inside Dynamic segment, we can get .rela* sections' address, GOT's address, string table's address, and so on. However, it doesn't contain addresses for sections like .text and .data. To reconstruct section headers we need section's offset and address, but it seems that there is no way to get these information.
How can I reconstruct section headers properly?
Thanks for your consideration.

How can I reconstruct section headers properly?
You can't, but you don't have to -- sections (and section headers) are not used at runtime (at least not by the dynamic loader).
You can also run strip --strip-all a.out to remove them from a "normal" ELF binary, which will continue to run just fine.

Related

Does the Linker OR the Loader make the necessary relocations of a program?

I want to create my own linker and loader. I know that in the linking stage the linker will take into consideration the relocation data in the ELF header for all the object files.
The linker then will create an executable file with all the addresses resolved and will store it in the hard drive.
When the time comes the loader will have to load that executable in main memory but the memmory already contain running programs so there will be conflicts.
Question1: Must the loader relocate the addresses all over again?
Question2: If yes, does that mean that the loader must scan all the text sectors of the executable and change the addresses of all cpu instructions??*
*that means that the loader have a copy of the ISA in memory and must scan instruction per instruction. It's like an execution before the execution.
There are no relocation data in the ELF header. Linkable ELF object files store relocation data in subservient sections named .rela.text, .rela.data etc.
Static linker on Linux will choose the starting address where the executable image will be loaded (usually 0x08048000) and then it uses relocations to update instructions and data in code and data sections. After those .rela.text and .rela.data have been handled, subservient .rela section are no longer needed and may be stripped off the final ELF executable file.
When the time comes to load the linked executable file in memory, loader creates a new process in protected mode. All virtual address space is assigned to the process and it is unoccupied. Other programs may be loaded in the same computer but they run happily each in their private addressing space.
The scenario you're afraid of sometimes happens on Windows, when different dynamic libraries were linked to start at conflicting virtual address. Therefore Portable executable format (PE/DLL) keeps relocation records in subservient section .reloc and yes, the loader must relocate all addresses mentioned in this section then.
Similar situation is on DOS in real mode, where there is only one 1 MiB address space common for all processes. MZ executables are linked to virtual address 0 and all adresses which require relocation are kept in Relocation pointer table following the MZ EXE header, and the loader is responsible for updating segment addresses mentioned in this pointer table.
Answer1: Relocation is necessary only if the executable image is loaded at different address that it was linked to, and if it is not linked to Position-Independed Executable.
Answer2: Relocation does not concern addresses of all CPU instruction, only those fields in instruction body (displacement or immediate address) which refer to an address. Such places must be explicitly specified in relocation records. If the relocation information was stripped off the file, your loader should refuse execution.
Good source of information: Linkers and Blog by Ian Lance Taylor.

Difference between the ELF vs MAP file

The linker can output both the ELF and the MAP file. These files are especially relevant in the embedded systems world because the ELF file is usually used to read out the addresses of variables or functions. Additionally, the ELF file is used by different embedded measurement or analysis tools.
When I open a MAP file, then within it, I can see for each global variable and every external function the following information: allocated address, symbolic name, allocated bytes, memory unit, and memory section.
On the other hand, once I open the ELF file, it is a binary file and not human readable. However, some tools I use are able to read it out and interpret it. Those tools can interpret the ELF file, and obtain the information about the symbolic name of a variable/function and its address or even show a function prototype.
From my understanding the ELF and MAP files are basically containing the same information, it is just that the first one is binary and the latter one is the text file.
So what are the actual differences between these two files from the content perspective?
Thank you in advance!
The primary output of the linker (i.e. its main purpose) is to produce the fully linked executable code. That is the ELF (Executable Linkable Format) file. An ELF file may as you have observed contain symbols - these are used for debug. It may also contain meta-data that associates the machine code with the source code from which it was generated. But the bulk of its content (and the part that is not optional) is the executable machine code and data objects that are your application.
The MAP file is an optional information only human readable output that contains information about the location and size of code and data objects in your application. The MAP file includes a summary that shows the total size and memory usage of your code.
In an embedded cross-development environment, the symbol information in the ELF file is used when the code is loaded into a source-level symbolic debugger. The debugger takes the binary code/data segments in the ELF file and loads them onto the target (typically using a JTAG or other debug/programming hardware tool), it loads the symbols and source-level debug meta-data into the debugger, then while the real machine code is executing on the target, that execution is reflected in the debugger in the original source code where you can view, step and break-point the code at the source level.
In short, the ELF file is your program. The MAP file is, as its name suggests, a map of your executable - it tells you where things are in the executable.

How can I read a custom section within the loader?

I'm trying to embed information (a simple integer) inside the executable object file (Elf) of a process in Linux.
I've accomplished that by writing the integer value as binary inside a file, and then by copying the binary file content using the objcopy command.
objcopy --add-section .customsection=binaryfile processElfFile newProcessElfFile
In this way, inside newProcessElfFile I have a perfectly working copy of the process with a new section containing the integer value, and I can see the section by using
readelf -e newProcessElfFile
I have also verified the section value being correct by using some C code on top of the Libelf library. Everything works fine.
Now, I want to read the integer value contained in the new section and perform a printk when the elf file is loaded to be executed.
In order to achieve this, I need to modify the loader code, which is kernel side.
The problem now is that:
I cannot write code inside the kernel which uses the libelf library, so I cannot access directly the section value as I do with my user-side program.
The elf kernel loader, contained inside linux-VERSION/fs/binfmt_elf.c, in the function load_elf_binary(), doesn't read elf sections, but access the elf program headers, which point towards elf segments, not the single sections.
In order to solve the problem I guess I need to link my custom section within a segment such that I can access it.
So I have 2 related questions:
Can I somehow read directly my custom section within the loader?
If not, How can I make a segment link to the custom section, so that I can access it using the elf file program headers?
Another possibility may be to add the integer value as an element of the already existent .rodata section, but I unfortunately don't know how to perform it and again how to access that section in the kernel.
The ELF header (Elf32_Ehdr or Elf64_Ehdr) contains information pointing to the section header table (members e_shoff, e_shentsize). Together with the section string table index (e_shstrndx), this information can be used to read the section headers and eventually locate the data you are interested in.

__start_section and __stop_section symbols missing when linking to library

I'm using custom elf headers in an autotools C project similar to this thread: How do you get the start and end addresses of a custom ELF section in C (gcc)?. The problem is that the c files that declare the custom sections are linked into a static library which is then linked to the final application.
In this configuration the symbols __start_custom_section and __stop_custom_section do not get generated. I define the elf section like this:
struct mystruct __attribute((__section__("custom_section"))) __attribute((__used__) = {
...
};
If I link to the object file instead of the library the symbols get created and everything works as expected. This isn't a scalable solution though because I'd like new modules to just work by compiling them into the modules library. Any idea why the linker doesn't create these special symbols when the section exists in a library vs a single object file?
I have done something similar to this recently, and my solution does not rely on any compiler specific implementations, internal undocumented symbols, etc. However, it does require a bit more work :)
Background
The ELF binary on disk can be loaded and parsed quite easily by knowing its format and using a couple structures provided to us: http://linux.die.net/man/5/elf. You can iterate through each of its segments and sections (segments are containers for sections). If you do this, you can calculate the the relative start/end virtual addresses of your section. By this logic, you would think that you can do the same thing at runtime by iterating through the segments and sections of the loaded, in-memory version of the ELF binary. But alas, you can only iterate through the segments themselves (via http://linux.die.net/man/3/dl_iterate_phdr), and all section metadata has been lost.
So, how can we retain the section metadata? Store it ourselves.
Solution
If you have a custom section named '.mycustom', then define a metadata struct that should at minimum store two numbers that will indicate the relative start address and the size of your '.mycustom' section. Create a global instance of this metadata struct that will live by itself in another custom section named '.mycustom_meta'.
Example:
typedef struct
{
unsigned long ulStart;
unsinged long ulSize;
} CustomSectionMeta;
__attribute((__section__(".mycustom_meta"))) CustomSectionMeta g_customSectionMeta = { 0, 0 };
You can see that our struct instance is initialized with zero for both start and size values. When you compile this code, your object file will contain a section named '.mycustom_meta' which will be 8 bytes in size for a 32-bit compilation (or 16 bytes for 64-bit), and the values will be all zeroes. Run objdump on it and you will see as much. Go ahead and put that into a static lib (.a) if you want, run readelf on it, and you will see exactly the same thing. Build it into a shared object (.so) if you want, run readelf on it, and again you will see the same thing. Build it into an executable program, run readelf on it, and voila its still there.
Now the trick comes in. You need to write a little executable (lets call it MetaWriter) that will update your ELF file on disk to fill in the start and size values. Here are the basic steps:
Open your ELF file (.o, .so, or executable) in binary mode and read it into a contiguous array. Or, you can mmap it into memory to achieve the same.
Read through the binary using header structures and instructions found in the ELF link I listed above.
Find your '.mycustom' section and read section.sh_addr and section.sh_size.
Find your '.mycustom_meta' section. Create an instance of CustomSectionMeta using the start and size values from step 3. memcpy() your struct over the top of the existing '.mycustom_meta' section data, which up to now was all zeroes.
Save you ELF data back to the original file. It should now be completely unmodified except for the few bytes you wrote into your '.mycustom_meta' section.
What I did was executed this MetaWriter program as part of the build process in my Makefile. So, you would build your .so or executable, then run MetaWriter on it to fill in the meta section. After that, its ready to go.
Now, when the code in your .so or executable runs, it can just read from g_customSectionMeta, which will be populated with the starting address offset of your '.mycustom' section, as well as the size of it, which can be used to easily calculate the end, of course. This start offset must be added to the base address of your loaded ELF binary. There are a couple ways to get this, but the easiest way I found was to run dladdr on a symbol that I know to exist in the binary (such as g_customSectionMeta!) and use the resulting value of dli_fbase to know the base address of the module.
Example:
#include <dlfcn.h>
Dl_info dlInfo;
if (dladdr(&g_customSectionMeta, &dlInfo) != 0)
{
void * vpBase = dlInfo.dli_fbase;
void * vpMyCustomStart = vpBase + g_customSectionMeta.ulStart;
void * vpMyCustomEnd = vpMyCustomStart + g_customSectionMeta.ulSize;
}
It would be a bit overboard to post the full amount of code required to do all this work, especially the parsing of the ELF binary in MetaWriter. However, if you need some help, feel free to reach out to me.
In my case, the variable was not referenced in the code and the section was optimised out in release mode (-O2). Adding used attribute solved the issue. Example:
static const unsigned char unused_var[] __attribute__((used, section("foo"))) = {
0xCA, 0xFE, 0xBA, 0xBE
};

Linker scripts: strategies for debugging?

I'm trying to debug a linker problem that I have, when writing a kernel.
The issue is that I have a variable SCAN_CODE_MAPPING that I'm not able to use -- it appears to be empty or something. I can fix this by changing the way I link my program, but I don't know why.
When I look inside the generated binary file using objdump, the data for the variable is definitely there, so there's just something broken with the reference to it.
Here's a gist with both of the linker scripts and the part of the symbol table that's different between the two files.
What confuses me is that both of the symbol tables have all the same symbols, they're all the same length, and they appear to contain the right data. The only difference that I can see is that they're not in the same order.
So far I've tried
inspecting the SCAN_CODE_MAPPING memory location to make sure it has the data I expect and hasn't been zeroed out
checking that all the symbols are the same
checking that all the symbol contents are the same length
looking at .data.rel.ro.local to make sure it has the address of the data
One possible clue is this warning:
warning: uninitialized space declared in non-BSS section `.text': zeroing
which I get in both the broken and the correct case.
What should I try next?
The problem here turned out to be that I was writing an OS, and only 12k of it was being loaded instead of the whole thing. So the linker script was actually working fine.
The main tools I used to understand binaries were:
nm
objdump
readelf
You can get a ton more information using "readelf".
In particular, take a look at the program headers:
readelf -l program
Your BSS section is quite different than the standard one, which probably causing the warning. Here's what the default looks like on my system:
.bss :
{
*(.dynbss)
*(.bss .bss.* .gnu.linkonce.b.*)
*(COMMON)
/* Align here to ensure that the .bss section occupies space up to
_end. Align after .bss to ensure correct alignment even if the
.bss section disappears because there are no input sections.
FIXME: Why do we need it? When there is no .bss section, we don't
pad the .data section. */
. = ALIGN(. != 0 ? 64 / 8 : 1);
}
If an input section doesn't match anything in your linker script, the linker still has to place it somewhere. Make sure you're covering all the input sections.
Note that there is a difference between sections and segments. Sections are used by the linker, but the only thing the program loader looks at are the segments. The text segment includes the text section, but it also includes other sections. Sections that go into the same segment must be adjacent. So order does matter.
The rodata section usually goes after the text section. These are both read-only during execution and will show up once in your program headers as a LOAD entry with read & execute permissions. That LOAD entry is the text segment.
The bss section usually goes after the data section. These are both writable during execution and will show up once in your program headers as a LOAD entry with read & write permissions. That LOAD entry is the data segment.
If you change the order, it affects how the linker generates the program headers. The program headers, rather than the section headers, are used when loading your program prior to executing it. Make sure to check the program headers when using a custom linker script.
If you can give more specifics about what your actual symptoms are, then it'll be easier to help.

Resources