Foreword
There already exist questions like this one, but the ones I have found so far were either specific to a given toolchain (this solution works with GCC, but not with Clang), or specific to a given format (this one is specific to Mach-O). I tagged ELF in this question merely out of familiarity, but I'm trying to figure out as "portable" a solution as reasonably possible, for other platforms. Anything that can be done with GCC and GCC-compatible toolchains (like MinGW and Clang) would suffice for me.
Problem
I have an ELF section containing a collection of relocatable "gadgets" that are to be copied/injected into something that can execute them. They are completely relocatable in that the raw bytes of the section can be copied verbatim (e.g. by memcpy) to any [correctly aligned] memory location, with little to no relocation processing (done by the injector). The problem is that I don't have a portable way of determining the size of such a section. I could "cheat" a little by using --section-start, and outright determine a start address, but that feels a little hacky, and I still wouldn't have a way to get the section's end/size.
Overview
Most of the gadgets in the section are written in assembly, so the section itself is declared there. No languages were tagged because I'm trying to be portable with this and get it working for various architectures/platforms. I have separate assembly sources for each (e.g. ARM, AArch64, x86_64, etc).
; The injector won't be running this code, so no need to be executable.
; Relocations (if any) will be done by the injector.
.section gadgets, "a", #progbits
...
The more "heavy duty" code is written in C and compiled in via a section attribute.
__attribute__((section("gadgets")))
void do_totally_innocent_things();
Alternatives
I technically don't need to make use of sections like this at all. I could instead figure out the ends of each function in the gadget, and then copy those however I like. I figured using a section would be a more straightforward way to go about it, to keep everything in one modular relocatable bundle.
I'm not sure if you considered this or this is out of picture but you could read the elf headers.
This would be sort of 'universal' as you can do the same thing with Mach-O binaries
So for example:
Creating 3 integer variables inside the 'custom_sect' section
These would add up to 12 or 0xC bytes which if we read the headers we can confirm:
Size with readelf
and here is how a section is represented in the ELF executables: ELF section representation
So each section will have its own size property which you can just read out
Related
As far as I know compiler convert source code to machine code. But this code do not have any OS-related sections and linker add them to file.
But is it's possible to make some executable without linker?
Answering your question very literally - yes, it is possible to make an executive file without a linker: you don't need a compiler or linker to generate machine code. Binaries are a series of opcodes and relevant information (offsets, addresses etc). If you open a binary editor then type out some opcodes and make a program. Save and run it.
Of course the binary will be processor specific, just as if you had compiled a binary (native) executive. Here's a reference to the Intel x86 opcodes.
http://ref.x86asm.net/coder32.html.
If you're however asking, "Can I compile a source file directly into an executive file without a linker?" then speaking purely: no - unless the compiler has aspects of a linker integrated within it. The compiler generates intermediate objects that are passed on to the linker to "link" them into a binary such as a library or executive. Without the link step the pipeline is not complete.
Let's first make a statement that is to be considered true, compilers do not generate machine code that can be immediately executed (JIT's do, but lets ignore that).
Instead they generate files (object, static, dynamic, executable) which describe what they contains as well as groups of symbols. Symbols can be global variables or functions.
But symbols just like the file itself contain metadata. This metadata is very important. See the machine code stored in a symbol is the raw instructions for the target architecture but it does not know where memory is stored.
While modern CPU's give each process its own address space, a symbol may not land and probably won't land in the same address twice. In very recent times this is a security measure, but in past its so that dynamic linking works correctly.
So when the OS loads up an executable or shared library it can place it wherever it wants and by doing so make it not repeatable. Otherwise we'd all have to start caring and saying "this file contains 100% of the code I intend to execute". Usually on load the raw binary in the symbol table get transformed by patching it with the symbol locations in RAM. Making everything just work.
In summary the compiler emits files that allow for dynamic patching of assembly
prior to execution. If it didn't, we would be living in a very restrictive and problematic world.
Linkers even have scripts to change how they operate. They are a very complex and delicate piece of software required to make our programs work.
Have a read of the PE-COFF and ELF standards if you want to get an idea of just how complex those formats really are.
For my master's thesis i'm trying to adapt a shared library approach for an ARM Cortex-M3 embedded system. As our targeted board has no MMU I think that it would make no sense to use "normal" dynamic shared libraries. Because .text is executed directly from flash and .data is copied to RAM at boot time I can't address .data relative to the code thus GOT too. GOT would have to be accessed through an absolute address which has to be defined at link time. So why not assigning fixed absolute addresses to all symbols at link time...?
From the book "Linkers and Loaders" I got aware of "static linked shared libraries, that is, libraries where program and data addresses in libraries are bound to executables at link time". The linked chapter describes how such libraries could be created in general and gives references to Unix System V, BSD/OS; but also mentions Linux and it's uselib() system call. Unfortunately the book gives no information how to actually create such libraries such as tools and/or compiler/linker switches. Apart from that book I hardly found any other information about such libraries "in the wild". The only thing I found in this regard was prelink for Linux. But as this operates on "normal" dynamic libraries thats not really what I'm searching for.
I fear that the use of these kind of libaries is very specific, so that no common tools exists to create them. Although the mentioned uselib() syscall in this context makes me wondering. But I wanted to make sure that I haven't overlooked anything before starting to hack my own linker... ;) So could anyone give me more information about such libraries?
Furthermore I'm wondering if there is any gcc/ld switch which links and relocates a file but keeps the relocation entries in the file - so that it could be re-relocated? I found the "-r" option, but that completely skips the relocation process. Does anyone have an idea?
edit:
Yes, I'm also aware of linker scripts. With gcc libfoo.c -o libfoo -nostdlib -e initLib -Ttext 0xdeadc0de I managed to get some sort of linked & relocated object file. But so far I haven't found any possibility to link a main program against this and use it as shared library. (The "normal way" of linking a dynamic shared library will be refused by the linker.)
Concepts
Minimum concept of what such a shared library maybe about.
same code
different data
There are variations on this. Do you support linking between libraries. Are the references a DAG structure or fully cyclic? Do you want to put the code in ROM, or support code updates? Do you wish to load libraries after a process is initially run? The last one is generally the difference between static shared libraries and dynamic shared libraries. Although many people will forbid references between libraries as well.
Facilities
Eventually, everything will come down to the addressing modes of the processor. In this case, the ARM thumb. The loader is generally coupled to the OS and the binary format in use. Your tool chain (compiler and linker) must also support the binary format and can generate the needed code.
Support for accessing data via a register is intrinsic in the APCS (the ARM Procedure calling standard). In this case, the data is accessed via the sb (for static base) which is register R9. The static base and stack checking are optional features. I believe you may need to configure/compile GCC to enable or disable these options.
The options -msingle-pic-base and -mpic-register are in the GCC manual. The idea is that an OS will initially allocate separate data for each library user and then load/reload the sb on a context switch. When code runs to a library, the data is accessed via the sb for that instances data.
Gcc's arm.c code has the require_pic_register() which does code generation for data references in a shared library. It may correspond to the ARM ATPCS shared library mechanics.See Sec 5.5
You may circumvent the tool chain by using macros and inline assembler and possibly function annotations, like naked and section. However, the library and possibly the process need code modification in this case; Ie, non-standard macros like EXPORT(myFunction), etc.
One possibility
If the system is fully specified (a ROM image), you can make the offsets you can pre-generate data offsets that are unique for each library in the system. This is done fairly easily with a linker script. Use the NOLOAD and put the library data in some phony section. It is even possible to make the main program a static shared library. For instance, you are making a network device with four Ethernet ports. The main application handles traffic on one port. You can spawn four instances of the application with different data to indicate which port is being handled.
If you have a large mix/match of library types, the foot print for the library data may become large. In this case you need to re-adjust the sb when calls are made through a wrapper function on the external API to the library.
void *__wrap_malloc(size_t size) /* Wrapped version. */
{
/* Locals on stack */
unsigned int new_sb = glob_libc; /* accessed via current sb. */
void * rval;
unsigned int old_sb;
volatile asm(" mov %0, sb\n" : "=r" (old_sb);
volatile asm(" mov sb, %0\n" :: "r" (new_sb);
rval = __real_malloc(size);
volatile asm(" mov sb, %0\n" :: "r" (old_sb);
return rval;
}
See the GNU ld --wrap option. This complexity is needed if you have a larger homogenous set of libraries. If your libraries consists of only 'libc/libsupc++', then you may not need to wrap anything.
The ARM ATPCS has veneers inserted by the compiler that do the equivalent,
LDR a4, [PC, #4] ; data address
MOV SB, a4
LDR a4, [PC, #4] ; function-entry
BX a4
DCD data-address
DCD function-entry
The size of the library data using this technique is 4k (possibly 8k, but that might need compiler modification). The limit is via ldr rN, [sb, #offset], were ARM limits offset to 12bits. Using the wrapping, each library has a 4k limit.
If you have multiple libraries that are not known when the original application builds, then you need to wrap each one and place a GOT type table via the OS loader at a fixed location in the main applications static base. Each application will require space for a pointer for each library. If the library is not used by the application, then the OS does not need to allocate the space and that pointer can be NULL.
The library table can be accessed via known locations in .text, via the original processes sb or via a mask of the stack. For instance, if all processes get a 2K stack, you can reserve the lower 16 words for a library table. sp & ~0x7ff will give an implicit anchor for all tasks. The OS will need to allocate task stacks as well.
Note, this mechanism is different than the ATPCS, which uses sb as a table to get offsets to the actual library data. As the memory is rather limited for the Cortex-M3 described it is unlikely that each individual library will need to use more than 4k of data. If the system supports an allocator this is a work around to this limitation.
References
Xflat technical overview - Technical discussion from the Xflat authors; Xflat is a uCLinux binary format that supports shared libraries. A very good read.
Linkage table and GOT - SO on PLT and GOT.
ARM EABI - The normal ARM binary format.
Assemblers and Loader, by David Solomon. Especially, pg262 A.3 Base Registers
ARM ATPCS, especially Section 5.5, Shared Libraries, pg18.
bFLT is another uCLinux binary format that supports shared libraries.
How much RAM do you have attached? Cortex-M systems have only a few dozen kiB on-chip and for the rest they require external SRAM.
I can't address .data relative to the code
You don't have to. You can place the library symbol jump table in the .data segment (or a segment that behaves similarly) at a fixed position.
thus GOT too. GOT would have to be accessed through an absolute address which has to be defined at link time. So why not assigning fixed absolute addresses to all symbols at link time...?
Nothing prevents you from having a second GOT placed at a fixed location, that's writable. You have to instruct your linker where and how to create it. For this you give the linker a so called "linker script", which is kind of a template-blueprint for the memory layout of the final program.
I'll try to answer your question before commenting about your intentions.
To compile a file in linux/solaris/any platform that uses ELF binaries:
gcc -o libFoo.so.1.0.0 -shared -fPIC foo1.c foo2.c foo3.c ... -Wl,-soname=libFoo.so.1
I'll explain all the options next:
-o libFoo.so.1.0.0
is the name we are going to give to the shared library file, once linked.
-shared
means that you have a shared object file at end, so there can be unsolved references after compilation and linked, that would be solved in late binding.
-fPIC
instructs the compiler to generate position independent code, so the library can be linked in a relocatable fashion.
-Wl,-soname=libFoo.so.1
has two parts: first, -Wl instructs the compiler to pass the next option (separated by comma) to the linker. The option is -soname=libFoo.so.1. This option, tells the linker the soname used for this library. The exact value of the soname is free style string, but there's a convenience custom to use the name of the library and the major version number. This is important, as when you do static linking of a shared library, the soname of the library gets stuck to the executable, so only a library with that soname can be loaded to assist this executable. Traditionally, when only the implementation of a library changes, we change only the name of the library, without changing the soname part, as the interface of the library doesn't change. But when you change the interface, you are building a new, incompatible one, so you must change the soname part, as it doesn't get in conflict with other 'versions' of it.
To link to a shared library is the same than to link to a static one (one that has .a as extension) Just put it on the command file, as in:
gcc -o bar bar.c libFoo.so.1.0.0
Normally, when you get some library in the system, you get one file and one or two symbolic links to it in /usr/lib directory:
/usr/lib/libFoo.so.1.0.0
/usr/lib/libFoo.so.1 --> /usr/lib/libFoo.so.1.0.0
/usr/lib/libFoo.so --> /usr/lib/libFoo.so.1
The first is the actual library called on executing your program. The second is a link with the soname as the name of the file, just to be able to do the late binding. The third is the one you must have to make
gcc -o bar bar.c -lFoo
work. (gcc and other ELF compilers search for libFoo.so, then for libFoo.a, in /usr/lib directory)
After all, there's an explanation of the concept of shared libraries, that perhaps will make you to change your image about statically linked shared code.
Dynamic libraries are a way for several programs to share the functionalities of them (that means the code, perhaps the data also). I think you are a little disoriented, as I feel you have someway misinterpreted what a statically linked shared library means.
static linking refers to the association of a program to the shared libraries it's going to use before even launching it, so there's a hardwired link between the program and all the symbols the library has. Once you launch the program, the linking process begins and you get a program running with all of its statically linked shared libraries. The references to the shared library are resolved, as the shared library is given a fixed place in the virtual memory map of the process. That's the reason the library has to be compiled with the -fPIC option (relocatable code) as it can be placed differently in the virtual space of each program.
On the opposite, dynamic linking of shared libraries refers to the use of a library (libdl.so) that allows you to load (once the program is executing) a shared library (even one that has not been known about before), search for its public symbols, solve references, load more libraries related to this one (and solve recursively as the linker could have done) and allow the program to make calls to symbols on it. The program doesn't even need to know the library was there on compiling or linking time.
Shared libraries is a concept related to the sharing of code. A long time ago, there was UNIX, and it made a great advance to share the text segment (whit the penalty of not being able for a program to modify its own code) of a program by all instances of it, so you have to wait for it to load just the first time. Nowadays, the concept of code sharing has extended to the library concept, and you can have several programs making use of the same library (perhaps libc, libdl or libm) The kernel makes a count reference of all the programs that are using it, and it just gets unloaded when no other program is using it.
using shared libraries has only one drawback: the compiler must create relocatable code to generate a shared library as the space used by one program for it can be used for another library when we try to link it to another program. This imposes normally a restriction in the set of op codes to be generated or imposes the use of one/several registers to cope with the mobility of code (there's no mobility but several linkings can make it to be situated at different places)
Believe me, using static code just derives you to making bigger executables, as you cannot share effectively the code, but with a shared library.
A small program I made contains a lot of small bitmaps and sound clips that I would prefer to include into the binary itself (they need to be memory mapped anyway). In the MS PE/COFF standard, there is a specific description on how to include resources (the .rsrc section) that has a nice file system-like hierarchy. I have not found anything like that in the Linux ELF specification, thus I assume one is free to include these resources as seemed fit.
What I want to achieve is that I can include all resources in only one ELF section with a symbolic name on the start of each resource (so that I can address them from my C code). What I am doing now is using a small NASM file that has the following layout:
SECTION .rsrc
_resource_1:
incbin "../rsrc/file_name_1"
_resource_1_length:
dw $-resource_1
_resource_2:
incbin "../rsrc/file_name_2"
_resource_2_length:
dw $-resource_2
...
I can easily assemble this to an ELF object that can be linked with my C code. However, I dislike the use of assembly as that makes my code platform-dependent.
What would be a better way to achieve the same result?
This question has already been asked on stackoverflow, but the proposed solutions are not applicable to my case:
The solution proposed over here: C/C++ with GCC: Statically add resource files to executable/library
Including the resources as hex arrays in C code is not really useful, as that mixes the code and the data in one section. (Besides, it's not practical either, as I can't preview the resources once they are converted to arrays)
Using objcopy --add-section on every resource works, but then every resource gets its own section (including header and all that). That seems a little wasteful as I have around 120 files to include (each of +/- 4K).
You're wrong saying that using hexarrays mixes data and code, as ELF files will split them by default, in particular, if you define the hexarray as a constant array, it'll end up in .rodata. See an old post of mine for more details on .rodata.
Adding resources with objcopy should create multiple sections in the object file, but then they should all be merged in the output executable, although then you would have some extra padding almost certainly. Another post on a related topic.
Your other alternative if you want to go from the actual binary file (say a PNG) to ELF, you can use ldscripts, which allow you to build ELF files with arbitrary sections/symbols and reading the data from files. You'll still need custom rules to build your ELF file.
I'm actually surprised this kind of resource management is not used more often for ELF, especially since, for many small files, it'll improve filesystem performance quite quickly, as then you only have one file to map rather than many.
If your resource is not too large, you can translate them into C/C++ source code, for example, as a unsigned char array. Then you can access them as global variables, and compile & link them like normal source code.
I have programmed avr microcontroller , but new to arm.I just looked a sample code for sam7s64 that comes with winarm.I am confused about these files rom.ld , ram.ld , scatter file , cstartup.s file. I never saw these kind of files when i programmed avr .Please clarify my doubts what each of them file do.
I have even more samples for you to ponder over http://github.com/dwelch67
Assume you have a toolchain that supports a specific instruction set. Tools often try to support different implementations. You might have a microcontroller with X amount of flash and Y amount of ram. One chip might have the ram at a different place than another, etc. The instruction set may be the same (or itself may have subtle changes) in order for the toolchain to encode some of the instructions it eventually wants to know what your memory layout is. It is possible to write code for some processors that is purely position independent, in general though that is not necessarily a goal as it has a cost. tools also tend to have a unix approach to things. From source language to object file, which doesnt know the memory layout yet, it leaves some holes to be filled in later. You can get from different languages depending on the toolchain and instruction set, maybe mixing ada and C and other languages that compile to object. Then the linker needs to combine all of those things. You as the programmer can and sometimes have to control what goes where. You want the vector table to be at the right place, you want your entry code perhaps to be at a certain place, you definitely want .data in ram ultimately and .text in flash.
For the gnu tools you tell the linker where things go using a linker script, other toolchains may have other methods. With gnu ld you can also use the ld command line...the .ld files you are seeing are there to control this. Now sometimes this is buried in the bowels of the toolchain install, there is a default place where the default linker script will be found, if that is fine then you dont need to craft a linker script and carry it around with the project. Depending on the tools you were using on the avr, you either didnt need to mess with it (were using assembly, avra or something where you control this with .org or other similar statements) or the toolchain/sandbox took care of it for you, it was buried (for example with the arduino sandbox). For example if you write a hello world program
#include <stdio.h>
int main ( void )
{
printf("Hello World!\n");
return(0);
}
and compile that on your desktop/laptop
gcc hello.c -o hello
there was a linker script involved, likely a nasty, scary, ugly one. But since you are content with the default linker script and layout for your operating system, you dont need to mess with it it just works. For these microcontrollers where one toolchain can support a vast array of chips and vendors, you start to have to deal with this. It is a good idea to keep the linker script with the project as you dont know from one machine or person to the next what exact gnu cross compiler they have, it is not difficult to create projects that work on many gnu cross compiler installs if you keep a few things with the project rather than force them into the toolchain.
The other half of this, in particular with the gnu tools an intimate relationship with the linker script is the startup code. Before your C program is called there are some expectations. for example the .data is in place and .bss has been zeroed. For a microcontroller you want .data saved in non volatile memory so it is there when you start your C program, so it needs to be in flash, but it cant run from there as .data is read/write, so before the entry point of the C code is called you need to copy .data from flash to the proper place in ram. The linker script describes both where in flash to keep .data and where in ram to copy it. The startup code, which you can name whatever you want startup.s, start.s, crt0.s, etc, gets variables filled in during the link stage so that code can copy .data to ram, can zero out .bss, can set the stack pointer so you have a stack (another item you need for C to work), then that code calls the C entry point. This is true for any other high level language as well, if nothing else everyone needs a stack pointer so you need some startup code.
If you look at some of my examples you will see me doing linker scripts and startup code for avr processors as well.
It's hard to know exactly what the content of each of the files (rom.ld , ram.ld , scatter file , cstartup.s) are in your specific case. However assuming their names are descriptive enough I will give you an idea of what they are intended to do:
1- rom.ld/ram.ld: by the files extensions these are "linker scripts". These files tell the linker how where to put each of the memory sections of the object files (see GNU LD to learn all about linker scripts and their syntax)
2- cstartup.s: Again, from the extension of this file. It appears to be code written in assembly. Generally in this file the software developer will initialize that microcontroller before passing control to the your main application. Examples of actions performed by this file are:
Setup the ARM vectors
Configure the oscillator frequency
Initialize volatile memory
Call main()
3- Scatter : Personally I have never used this file. However it appears to be a file used to control the memory layout of your application and how that is laid out in your micro (see reference). This appears to be a Keil specific file no different from any other linker script.
The final images produced by compliers contain both bin file and extended loader format ELf file ,what is the difference between the two , especially the utility of ELF file.
A Bin file is a pure binary file with no memory fix-ups or relocations, more than likely it has explicit instructions to be loaded at a specific memory address. Whereas....
ELF files are Executable Linkable Format which consists of a symbol look-ups and relocatable table, that is, it can be loaded at any memory address by the kernel and automatically, all symbols used, are adjusted to the offset from that memory address where it was loaded into. Usually ELF files have a number of sections, such as 'data', 'text', 'bss', to name but a few...it is within those sections where the run-time can calculate where to adjust the symbol's memory references dynamically at run-time.
A bin file is just the bits and bytes that go into the rom or a particular address from which you will run the program. You can take this data and load it directly as is, you need to know what the base address is though as that is normally not in there.
An elf file contains the bin information but it is surrounded by lots of other information, possible debug info, symbols, can distinguish code from data within the binary. Allows for more than one chunk of binary data (when you dump one of these to a bin you get one big bin file with fill data to pad it to the next block). Tells you how much binary you have and how much bss data is there that wants to be initialised to zeros (gnu tools have problems creating bin files correctly).
The elf file format is a standard, arm publishes its enhancements/variations on the standard. I recommend everyone writes an elf parsing program to understand what is in there, dont bother with a library, it is quite simple to just use the information and structures in the spec. Helps to overcome gnu problems in general creating .bin files as well as debugging linker scripts and other things that can help to mess up your bin or elf output.
some resources:
ELF for the ARM architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf
ELF from wiki
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
ELF format is generally the default output of compiling.
if you use GNU tool chains, you can translate it to binary format by using objcopy, such as:
arm-elf-objcopy -O binary [elf-input-file] [binary-output-file]
or using fromELF utility(built in most IDEs such as ADS though):
fromelf -bin -o [binary-output-file] [elf-input-file]
bin is the final way that the memory looks before the CPU starts executing it.
ELF is a cut-up/compressed version of that, which the CPU/MCU thus can't run directly.
The (dynamic) linker first has to sufficiently reverse that (and thus modify offsets back to the correct positions).
But there is no linker/OS on the MCU, hence you have to flash the bin instead.
Moreover, Ahmed Gamal is correct.
Compiling and linking are separate stages; the whole process is called "building", hence the GNU Compiler Collection has separate executables:
One for the compiler (which technically outputs assembly), another one for the assembler (which outputs object code in the ELF format),
then one for the linker (which combines several object files into a single ELF file), and finally, at runtime, there is the dynamic linker,
which effectively turns an elf into a bin, but purely in memory, for the CPU to run.
Note that it is common to refer to the whole process as "compiling" (as in GCC's name itself), but that then causes confusion when the specifics are discussed,
such as in this case, and Ahmed was clarifying.
It's a common problem due to the inexact nature of human language itself.
To avoid confusion, GCC outputs object code (after internally using the assembler) using the ELF format.
The linker simply takes several of them (with an .o extension), and produces a single combined result, probably even compressing them (into "a.out").
But all of them, even ".so" are ELF.
It is like several Word documents, each ending in ".chapter", all being combined into a final ".book",
where all files technically use the same standard/format and hence could have had ".docx" as the extension.
The bin is then kind of like converting the book into a ".txt" file while adding as many whitespace as necessary to be equivalent to the size of the final book (printed on a single spool),
with places for all the pictures to be overlaid.
I just want to correct a point here. ELF file is produced by the Linker, not the compiler.
The Compiler mission ends after producing the object files (*.o) out of the source code files. Linker links all .o files together and produces the ELF.