Add resources to ELF object, grouped in one section - c

A small program I made contains a lot of small bitmaps and sound clips that I would prefer to include into the binary itself (they need to be memory mapped anyway). In the MS PE/COFF standard, there is a specific description on how to include resources (the .rsrc section) that has a nice file system-like hierarchy. I have not found anything like that in the Linux ELF specification, thus I assume one is free to include these resources as seemed fit.
What I want to achieve is that I can include all resources in only one ELF section with a symbolic name on the start of each resource (so that I can address them from my C code). What I am doing now is using a small NASM file that has the following layout:
SECTION .rsrc
_resource_1:
incbin "../rsrc/file_name_1"
_resource_1_length:
dw $-resource_1
_resource_2:
incbin "../rsrc/file_name_2"
_resource_2_length:
dw $-resource_2
...
I can easily assemble this to an ELF object that can be linked with my C code. However, I dislike the use of assembly as that makes my code platform-dependent.
What would be a better way to achieve the same result?
This question has already been asked on stackoverflow, but the proposed solutions are not applicable to my case:
The solution proposed over here: C/C++ with GCC: Statically add resource files to executable/library
Including the resources as hex arrays in C code is not really useful, as that mixes the code and the data in one section. (Besides, it's not practical either, as I can't preview the resources once they are converted to arrays)
Using objcopy --add-section on every resource works, but then every resource gets its own section (including header and all that). That seems a little wasteful as I have around 120 files to include (each of +/- 4K).

You're wrong saying that using hexarrays mixes data and code, as ELF files will split them by default, in particular, if you define the hexarray as a constant array, it'll end up in .rodata. See an old post of mine for more details on .rodata.
Adding resources with objcopy should create multiple sections in the object file, but then they should all be merged in the output executable, although then you would have some extra padding almost certainly. Another post on a related topic.
Your other alternative if you want to go from the actual binary file (say a PNG) to ELF, you can use ldscripts, which allow you to build ELF files with arbitrary sections/symbols and reading the data from files. You'll still need custom rules to build your ELF file.
I'm actually surprised this kind of resource management is not used more often for ELF, especially since, for many small files, it'll improve filesystem performance quite quickly, as then you only have one file to map rather than many.

If your resource is not too large, you can translate them into C/C++ source code, for example, as a unsigned char array. Then you can access them as global variables, and compile & link them like normal source code.

Related

Get start and end of a section

Foreword
There already exist questions like this one, but the ones I have found so far were either specific to a given toolchain (this solution works with GCC, but not with Clang), or specific to a given format (this one is specific to Mach-O). I tagged ELF in this question merely out of familiarity, but I'm trying to figure out as "portable" a solution as reasonably possible, for other platforms. Anything that can be done with GCC and GCC-compatible toolchains (like MinGW and Clang) would suffice for me.
Problem
I have an ELF section containing a collection of relocatable "gadgets" that are to be copied/injected into something that can execute them. They are completely relocatable in that the raw bytes of the section can be copied verbatim (e.g. by memcpy) to any [correctly aligned] memory location, with little to no relocation processing (done by the injector). The problem is that I don't have a portable way of determining the size of such a section. I could "cheat" a little by using --section-start, and outright determine a start address, but that feels a little hacky, and I still wouldn't have a way to get the section's end/size.
Overview
Most of the gadgets in the section are written in assembly, so the section itself is declared there. No languages were tagged because I'm trying to be portable with this and get it working for various architectures/platforms. I have separate assembly sources for each (e.g. ARM, AArch64, x86_64, etc).
; The injector won't be running this code, so no need to be executable.
; Relocations (if any) will be done by the injector.
.section gadgets, "a", #progbits
...
The more "heavy duty" code is written in C and compiled in via a section attribute.
__attribute__((section("gadgets")))
void do_totally_innocent_things();
Alternatives
I technically don't need to make use of sections like this at all. I could instead figure out the ends of each function in the gadget, and then copy those however I like. I figured using a section would be a more straightforward way to go about it, to keep everything in one modular relocatable bundle.
I'm not sure if you considered this or this is out of picture but you could read the elf headers.
This would be sort of 'universal' as you can do the same thing with Mach-O binaries
So for example:
Creating 3 integer variables inside the 'custom_sect' section
These would add up to 12 or 0xC bytes which if we read the headers we can confirm:
Size with readelf
and here is how a section is represented in the ELF executables: ELF section representation
So each section will have its own size property which you can just read out

Is it possible to produce working binary without linker?

As far as I know compiler convert source code to machine code. But this code do not have any OS-related sections and linker add them to file.
But is it's possible to make some executable without linker?
Answering your question very literally - yes, it is possible to make an executive file without a linker: you don't need a compiler or linker to generate machine code. Binaries are a series of opcodes and relevant information (offsets, addresses etc). If you open a binary editor then type out some opcodes and make a program. Save and run it.
Of course the binary will be processor specific, just as if you had compiled a binary (native) executive. Here's a reference to the Intel x86 opcodes.
http://ref.x86asm.net/coder32.html.
If you're however asking, "Can I compile a source file directly into an executive file without a linker?" then speaking purely: no - unless the compiler has aspects of a linker integrated within it. The compiler generates intermediate objects that are passed on to the linker to "link" them into a binary such as a library or executive. Without the link step the pipeline is not complete.
Let's first make a statement that is to be considered true, compilers do not generate machine code that can be immediately executed (JIT's do, but lets ignore that).
Instead they generate files (object, static, dynamic, executable) which describe what they contains as well as groups of symbols. Symbols can be global variables or functions.
But symbols just like the file itself contain metadata. This metadata is very important. See the machine code stored in a symbol is the raw instructions for the target architecture but it does not know where memory is stored.
While modern CPU's give each process its own address space, a symbol may not land and probably won't land in the same address twice. In very recent times this is a security measure, but in past its so that dynamic linking works correctly.
So when the OS loads up an executable or shared library it can place it wherever it wants and by doing so make it not repeatable. Otherwise we'd all have to start caring and saying "this file contains 100% of the code I intend to execute". Usually on load the raw binary in the symbol table get transformed by patching it with the symbol locations in RAM. Making everything just work.
In summary the compiler emits files that allow for dynamic patching of assembly
prior to execution. If it didn't, we would be living in a very restrictive and problematic world.
Linkers even have scripts to change how they operate. They are a very complex and delicate piece of software required to make our programs work.
Have a read of the PE-COFF and ELF standards if you want to get an idea of just how complex those formats really are.

How linkers work

I need to understand how linkers work and how a linker makes an .exe with all its different sections and segments. To be more precise, I wanna be atleast able to make an .exe which contains the following code, with my hands using a hex editor.
int main()
{
}
Please suggest me some necessary reading. I have Linkers & Loaders by John Levine.
Ian Lance Taylor wrote a new ELF linker, called gold. He described the general tasks that linkers do in a series of blog posts.
This is different in details from what a PE linker does, but the general ideas are all the same (and same as in "Linkers & Loaders".
I wanna be atleast able to make an .exe which contains the following code, with my hands using a hex editor
I am not sure why you want a linker in that case. Read the description of PE file format, construct the necessary headers, write a tiny .text section containing xor %eax,%eax; ret, set the entry point to that section, ..., profit?
How (static) linkers work:
Take all necessary object files (module), merging/combining/flattening the addresses so that inter-module calls are possible and correct, structuring them to fulfill the executable format requirements, and perhaps do link time optimizations.
To create an executable (note that here I'm not referring to any specific format), you have to know the executable format, usually consists of headers and sections, plus any required startup code (compilers usually already provided this as an object file, precompiled). For PE (.exe) files, MSDN has a good explanation.

How to ensure unused symbols are not linked into the final executable?

First of all my apologies to those of you who would have followed my questions posted in the last few days. This might sound a little repetitive as I had been asking questions related to -ffunction-sections & -fdata-sections and this one is on the same line. Those questions and their answers didn't solve my problem, so I realized it is best for me to state the full problem here and let SO experts ponder about it. Sorry for not doing so earlier.
So, here goes my problem:
I build a set of static libraries which provide a lot of functionalities. These static libraries will be provided to many products. Not all products will use all of the functionalities provided by my libs. The problem is that the library sizes are quite big and the products want it to be reduced. The main goal is to reduce the final executable size and not the library size itself.
Now, I did some research and found out that, if there are 4 functions in a source file and only one function of that is used by the application, the linker will still include the rest of the 3 functions into the final executable as they all belong to the same object file. I further analyzed and found that -ffunction-sections, -fdata-sections and -gc-sections(this one is a linker option) will ensure only that one function gets linked.
But, these options for some reasons beyond my control cannot be used now.
Is there any other way in which I can ensure that the linker will link only the function which is strictly required and exclude all other functions even if they are in the same object file?
Are there any other ways of dealing with the problem?
Note: Reorganizing my code is almost ruled out as it is a legacy code and big.
I am dealing mainly with VxWorks & GCC here.
Thanks for any help!
Ultimately, the only way to ensure that only the functions you want are linked is to ensure that each source (object) file in the library only exports one function symbol - one (visible) function per file. Typically, there are some files which export several functions which are always all used together - the initialization and finalization functions for a package, for example. Also, there are often functions used by the exported function that do not need to be visible outside the source (object) file - make sure they are static.
If you looked at Plauger's "The Standard C Library", you'll find that every function is implemented in a separate file, even if the file ends up 4 lines long (one header, one function line, an open brace, one line of code, and a close brace).
Jay asked:
In the case of a big project, doesn't it become difficult to manage with so many files? Also, I don't find many open source projects following this model. OpenSSL is one example.
I didn't say it was widely used - it isn't. But it is the way to make sure that binaries are minimized. The compiler (linker) won't do the minimization for you - at least, I'm not aware of any that do. On a large project, you design the source files so that closely related functions that will normally all be used together are grouped in single source files. Functions that are only occasionally used should be placed in separate files. Ideally, the rarely used functions should each be in their own file; failing that, group small numbers of them into small (but non-minimal) files. That way, if one of the rarely used functions is used, you only get a limited amount of extra unused code linked.
As to number of files - yes, the technique espoused does mean a lot of files. You have to weigh the workload of managing (naming) lots of files against the benefit of minimal code size. Automatic build systems remove most of the pain; VCS systems handle lots of files.
Another alternative is to put the library code into a shared object - or dynamic link library (DLL). The programs then link with the shared object, which is loaded into memory just once and shared between programs using it. The (non-constant) data is replicated for each process. This reduces the size of the programs on disk, at the cost of fixups during the load process. However, you then don't need to worry about executable size; the executables do not include the shared objects. And you can update the library (if you're careful) without recompiling the main programs that use it. The reduced size of the executables is one reason shared libraries are popular.

What is the difference between ELF files and bin files?

The final images produced by compliers contain both bin file and extended loader format ELf file ,what is the difference between the two , especially the utility of ELF file.
A Bin file is a pure binary file with no memory fix-ups or relocations, more than likely it has explicit instructions to be loaded at a specific memory address. Whereas....
ELF files are Executable Linkable Format which consists of a symbol look-ups and relocatable table, that is, it can be loaded at any memory address by the kernel and automatically, all symbols used, are adjusted to the offset from that memory address where it was loaded into. Usually ELF files have a number of sections, such as 'data', 'text', 'bss', to name but a few...it is within those sections where the run-time can calculate where to adjust the symbol's memory references dynamically at run-time.
A bin file is just the bits and bytes that go into the rom or a particular address from which you will run the program. You can take this data and load it directly as is, you need to know what the base address is though as that is normally not in there.
An elf file contains the bin information but it is surrounded by lots of other information, possible debug info, symbols, can distinguish code from data within the binary. Allows for more than one chunk of binary data (when you dump one of these to a bin you get one big bin file with fill data to pad it to the next block). Tells you how much binary you have and how much bss data is there that wants to be initialised to zeros (gnu tools have problems creating bin files correctly).
The elf file format is a standard, arm publishes its enhancements/variations on the standard. I recommend everyone writes an elf parsing program to understand what is in there, dont bother with a library, it is quite simple to just use the information and structures in the spec. Helps to overcome gnu problems in general creating .bin files as well as debugging linker scripts and other things that can help to mess up your bin or elf output.
some resources:
ELF for the ARM architecture
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0044d/IHI0044D_aaelf.pdf
ELF from wiki
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
ELF format is generally the default output of compiling.
if you use GNU tool chains, you can translate it to binary format by using objcopy, such as:
arm-elf-objcopy -O binary [elf-input-file] [binary-output-file]
or using fromELF utility(built in most IDEs such as ADS though):
fromelf -bin -o [binary-output-file] [elf-input-file]
bin is the final way that the memory looks before the CPU starts executing it.
ELF is a cut-up/compressed version of that, which the CPU/MCU thus can't run directly.
The (dynamic) linker first has to sufficiently reverse that (and thus modify offsets back to the correct positions).
But there is no linker/OS on the MCU, hence you have to flash the bin instead.
Moreover, Ahmed Gamal is correct.
Compiling and linking are separate stages; the whole process is called "building", hence the GNU Compiler Collection has separate executables:
One for the compiler (which technically outputs assembly), another one for the assembler (which outputs object code in the ELF format),
then one for the linker (which combines several object files into a single ELF file), and finally, at runtime, there is the dynamic linker,
which effectively turns an elf into a bin, but purely in memory, for the CPU to run.
Note that it is common to refer to the whole process as "compiling" (as in GCC's name itself), but that then causes confusion when the specifics are discussed,
such as in this case, and Ahmed was clarifying.
It's a common problem due to the inexact nature of human language itself.
To avoid confusion, GCC outputs object code (after internally using the assembler) using the ELF format.
The linker simply takes several of them (with an .o extension), and produces a single combined result, probably even compressing them (into "a.out").
But all of them, even ".so" are ELF.
It is like several Word documents, each ending in ".chapter", all being combined into a final ".book",
where all files technically use the same standard/format and hence could have had ".docx" as the extension.
The bin is then kind of like converting the book into a ".txt" file while adding as many whitespace as necessary to be equivalent to the size of the final book (printed on a single spool),
with places for all the pictures to be overlaid.
I just want to correct a point here. ELF file is produced by the Linker, not the compiler.
The Compiler mission ends after producing the object files (*.o) out of the source code files. Linker links all .o files together and produces the ELF.

Resources