I'm trying to embed information (a simple integer) inside the executable object file (Elf) of a process in Linux.
I've accomplished that by writing the integer value as binary inside a file, and then by copying the binary file content using the objcopy command.
objcopy --add-section .customsection=binaryfile processElfFile newProcessElfFile
In this way, inside newProcessElfFile I have a perfectly working copy of the process with a new section containing the integer value, and I can see the section by using
readelf -e newProcessElfFile
I have also verified the section value being correct by using some C code on top of the Libelf library. Everything works fine.
Now, I want to read the integer value contained in the new section and perform a printk when the elf file is loaded to be executed.
In order to achieve this, I need to modify the loader code, which is kernel side.
The problem now is that:
I cannot write code inside the kernel which uses the libelf library, so I cannot access directly the section value as I do with my user-side program.
The elf kernel loader, contained inside linux-VERSION/fs/binfmt_elf.c, in the function load_elf_binary(), doesn't read elf sections, but access the elf program headers, which point towards elf segments, not the single sections.
In order to solve the problem I guess I need to link my custom section within a segment such that I can access it.
So I have 2 related questions:
Can I somehow read directly my custom section within the loader?
If not, How can I make a segment link to the custom section, so that I can access it using the elf file program headers?
Another possibility may be to add the integer value as an element of the already existent .rodata section, but I unfortunately don't know how to perform it and again how to access that section in the kernel.
The ELF header (Elf32_Ehdr or Elf64_Ehdr) contains information pointing to the section header table (members e_shoff, e_shentsize). Together with the section string table index (e_shstrndx), this information can be used to read the section headers and eventually locate the data you are interested in.
Related
The linker can output both the ELF and the MAP file. These files are especially relevant in the embedded systems world because the ELF file is usually used to read out the addresses of variables or functions. Additionally, the ELF file is used by different embedded measurement or analysis tools.
When I open a MAP file, then within it, I can see for each global variable and every external function the following information: allocated address, symbolic name, allocated bytes, memory unit, and memory section.
On the other hand, once I open the ELF file, it is a binary file and not human readable. However, some tools I use are able to read it out and interpret it. Those tools can interpret the ELF file, and obtain the information about the symbolic name of a variable/function and its address or even show a function prototype.
From my understanding the ELF and MAP files are basically containing the same information, it is just that the first one is binary and the latter one is the text file.
So what are the actual differences between these two files from the content perspective?
Thank you in advance!
The primary output of the linker (i.e. its main purpose) is to produce the fully linked executable code. That is the ELF (Executable Linkable Format) file. An ELF file may as you have observed contain symbols - these are used for debug. It may also contain meta-data that associates the machine code with the source code from which it was generated. But the bulk of its content (and the part that is not optional) is the executable machine code and data objects that are your application.
The MAP file is an optional information only human readable output that contains information about the location and size of code and data objects in your application. The MAP file includes a summary that shows the total size and memory usage of your code.
In an embedded cross-development environment, the symbol information in the ELF file is used when the code is loaded into a source-level symbolic debugger. The debugger takes the binary code/data segments in the ELF file and loads them onto the target (typically using a JTAG or other debug/programming hardware tool), it loads the symbols and source-level debug meta-data into the debugger, then while the real machine code is executing on the target, that execution is reflected in the debugger in the original source code where you can view, step and break-point the code at the source level.
In short, the ELF file is your program. The MAP file is, as its name suggests, a map of your executable - it tells you where things are in the executable.
I want to create an operating system for embedded device with very limited resources (an ESP8266) that can load ELF files as program or shared object (shared object is in second importance).
I want to know is it possible to link any program for this OS against map file of OS?
for example I implement memcpy in OS and make a header file that declares it as extern, Compile OS and generate map file. then when i want to write a program, include the header to compile it successfully and make linker to peek the address of memcpy from map file of OS.
the OS is place non-independent and its functions are always at a fixed address, but programs are place independent ELF files. it is not necessary to program be loadable for different builds of OS.
This is by no means a complete solution to the problem of running ELFs on a embedded target but for the specific problem of providing known addresses during the linking process, GNU LD allows you to provide addresses for symbols in code defined as extern by adding a PROVIDE statement or a simple assignment to the linker script. LD won't directly read a map file, but you could parse the map file, find the relevant addresses, generate a linker script that has the appropriate symbols provided, and use that linker script in the compilation of the ELF. The documentation for the provide and assignment features can be found at https://sourceware.org/binutils/docs/ld/Assignments.html
First of all, Engilish is not my native language. Please excuse if there are any mistakes.
As stated above, I want to create an ELF executable from process memory image. Up until now, I successfully extracted an ELF Header, Program Headers and a list of Elf64_Dyn structures resides in Dynamic segment. I also restored GOT. However, I can't figure out how to reconstruct section headers.
The problem is when an ELF executable is loaded into memory, section headers are not loaded. If we use a list of Elf64_Dyn structures inside Dynamic segment, we can get .rela* sections' address, GOT's address, string table's address, and so on. However, it doesn't contain addresses for sections like .text and .data. To reconstruct section headers we need section's offset and address, but it seems that there is no way to get these information.
How can I reconstruct section headers properly?
Thanks for your consideration.
How can I reconstruct section headers properly?
You can't, but you don't have to -- sections (and section headers) are not used at runtime (at least not by the dynamic loader).
You can also run strip --strip-all a.out to remove them from a "normal" ELF binary, which will continue to run just fine.
I'm using custom elf headers in an autotools C project similar to this thread: How do you get the start and end addresses of a custom ELF section in C (gcc)?. The problem is that the c files that declare the custom sections are linked into a static library which is then linked to the final application.
In this configuration the symbols __start_custom_section and __stop_custom_section do not get generated. I define the elf section like this:
struct mystruct __attribute((__section__("custom_section"))) __attribute((__used__) = {
...
};
If I link to the object file instead of the library the symbols get created and everything works as expected. This isn't a scalable solution though because I'd like new modules to just work by compiling them into the modules library. Any idea why the linker doesn't create these special symbols when the section exists in a library vs a single object file?
I have done something similar to this recently, and my solution does not rely on any compiler specific implementations, internal undocumented symbols, etc. However, it does require a bit more work :)
Background
The ELF binary on disk can be loaded and parsed quite easily by knowing its format and using a couple structures provided to us: http://linux.die.net/man/5/elf. You can iterate through each of its segments and sections (segments are containers for sections). If you do this, you can calculate the the relative start/end virtual addresses of your section. By this logic, you would think that you can do the same thing at runtime by iterating through the segments and sections of the loaded, in-memory version of the ELF binary. But alas, you can only iterate through the segments themselves (via http://linux.die.net/man/3/dl_iterate_phdr), and all section metadata has been lost.
So, how can we retain the section metadata? Store it ourselves.
Solution
If you have a custom section named '.mycustom', then define a metadata struct that should at minimum store two numbers that will indicate the relative start address and the size of your '.mycustom' section. Create a global instance of this metadata struct that will live by itself in another custom section named '.mycustom_meta'.
Example:
typedef struct
{
unsigned long ulStart;
unsinged long ulSize;
} CustomSectionMeta;
__attribute((__section__(".mycustom_meta"))) CustomSectionMeta g_customSectionMeta = { 0, 0 };
You can see that our struct instance is initialized with zero for both start and size values. When you compile this code, your object file will contain a section named '.mycustom_meta' which will be 8 bytes in size for a 32-bit compilation (or 16 bytes for 64-bit), and the values will be all zeroes. Run objdump on it and you will see as much. Go ahead and put that into a static lib (.a) if you want, run readelf on it, and you will see exactly the same thing. Build it into a shared object (.so) if you want, run readelf on it, and again you will see the same thing. Build it into an executable program, run readelf on it, and voila its still there.
Now the trick comes in. You need to write a little executable (lets call it MetaWriter) that will update your ELF file on disk to fill in the start and size values. Here are the basic steps:
Open your ELF file (.o, .so, or executable) in binary mode and read it into a contiguous array. Or, you can mmap it into memory to achieve the same.
Read through the binary using header structures and instructions found in the ELF link I listed above.
Find your '.mycustom' section and read section.sh_addr and section.sh_size.
Find your '.mycustom_meta' section. Create an instance of CustomSectionMeta using the start and size values from step 3. memcpy() your struct over the top of the existing '.mycustom_meta' section data, which up to now was all zeroes.
Save you ELF data back to the original file. It should now be completely unmodified except for the few bytes you wrote into your '.mycustom_meta' section.
What I did was executed this MetaWriter program as part of the build process in my Makefile. So, you would build your .so or executable, then run MetaWriter on it to fill in the meta section. After that, its ready to go.
Now, when the code in your .so or executable runs, it can just read from g_customSectionMeta, which will be populated with the starting address offset of your '.mycustom' section, as well as the size of it, which can be used to easily calculate the end, of course. This start offset must be added to the base address of your loaded ELF binary. There are a couple ways to get this, but the easiest way I found was to run dladdr on a symbol that I know to exist in the binary (such as g_customSectionMeta!) and use the resulting value of dli_fbase to know the base address of the module.
Example:
#include <dlfcn.h>
Dl_info dlInfo;
if (dladdr(&g_customSectionMeta, &dlInfo) != 0)
{
void * vpBase = dlInfo.dli_fbase;
void * vpMyCustomStart = vpBase + g_customSectionMeta.ulStart;
void * vpMyCustomEnd = vpMyCustomStart + g_customSectionMeta.ulSize;
}
It would be a bit overboard to post the full amount of code required to do all this work, especially the parsing of the ELF binary in MetaWriter. However, if you need some help, feel free to reach out to me.
In my case, the variable was not referenced in the code and the section was optimised out in release mode (-O2). Adding used attribute solved the issue. Example:
static const unsigned char unused_var[] __attribute__((used, section("foo"))) = {
0xCA, 0xFE, 0xBA, 0xBE
};
I have a elf binary which has been statically linked to libc.
I do not have access to its C code.
I would like to use OpenOnload library, which has implementation of sockets in user-space and therefore provides lower latency compared to standard libc versions.
OpenOnload implements standard socket api, and overrides libc version using LD_PRELOAD.
But, since this elf binary is statically linked, it cannot use the OpenOnload version of the socket API.
I believe that converting this binary to dynamically link with OpenOnload is possible, with the following steps:
Add new Program headers: PT_INTERP, PT_DYNAMIC and PT_LOAD.
Add entries in PT_DYNAMIC to list dependency with libc.
Add PLT stubs for required libc functions in the new PT_LOAD section.
Modify existing binary code for libc functions to jump to corresponding PLT stubs.
As a first cut, I tried just adding 3 PT_LOAD segments. New segment headers were added after existing PT_LOAD segment headers. Also, vm_addr of existing segments was not modified. File offsets of existing segments were shifted below to next aligned address based on p_align.
New PT_LOAD segments were added in the file at end of the file.
After re-writing the file, when I ran it, it was loaded properly by the kernel, but then it immediately seg-faulted.
My questions are:
If I just shift the file-offsets in the elf binary, without modifying the vm_addresses, can it cause any error while running the binary?
Is it possible to do what I am attempting? Has anybody attempted it?
What you are attempting is not possible in any automated way. At the time of static linking, all relocation information identifying calls to libc as calls to libc has been resolved and removed. If debugging symbols exist in the binary, it's possible to identify "this range of bytes in the text segment corresponds to such-and-such libc function", but there is no way to identify references to the function, which will be embedded in the instruction byte stream with no markup to identify them. You could use heuristics based on disassembly, but they would be incomplete and unreliable (possibility of both false negatives and false positives).
As far as shifting offsets, you absolutely cannot change anything about the load addresses for a static linked binary. If you need to insert headers before the load segments, you'd have to insert a whole page, and update the file offsets in the program header table (adding 1 page to them) while leaving the virtual address load offsets the same. However, since what you're trying to do is not possible overall, the offset-shifting issue is the least of your worries.
Perhaps, if the program doesn't require high performance, you could run it under qemu app-level emulation, with qemu going through the sockets emulation/wrapper.
OpenOnload ... overrides libc version using LD_PRELOAD. But, since this elf binary is statically linked ...
also answered in intercept system calls of statically linked executable