The problem definition:
There is a need to have two parts of the code in an AVR microcontroller, a fixed one that is always there and does not change (often), and a transient one, that is (not so) often to be replaced or appended. The challenge is to give the ability for the transient code to call functions and access global variables of the fixed one -- and vice versa.
It is quite obvious that there should be special methods for the fixed code to access transient one -- like having calculated function pointers in RAM and using only them to call transient code procedures.
For the calling in backwards direction, I was thinking about linking transient code against existing .elf file of the fixed code.
I'm using avr-gcc toolchain (as in ubuntu 20.20), gcc version 5.4.0
What's I've already tried:
adding '-shared' as a link argument when building fixed code -- appears to be unsupported for AVR (linker reports an error).
adding instead '-Wl,--export-dynamic' as a link argument -- it seems to be ignored, no .dynsym section appears in the elf.
There is still a .symtab section in the fixed code elf -- could that be somehow used to link against it?
Note: my division of 'fixed' and 'transient' code has nothing to do with boot-area of some AVR microcontroller, boot is just something I do not care here about.
Note2: The question is much alike this one, but gives clear explanation for the need.
You have to forget all big computer knowledge. 8 bits AVRs are timy microcontrollers. Code has to be linked statically. There is no other way.
Related
Foreword
There already exist questions like this one, but the ones I have found so far were either specific to a given toolchain (this solution works with GCC, but not with Clang), or specific to a given format (this one is specific to Mach-O). I tagged ELF in this question merely out of familiarity, but I'm trying to figure out as "portable" a solution as reasonably possible, for other platforms. Anything that can be done with GCC and GCC-compatible toolchains (like MinGW and Clang) would suffice for me.
Problem
I have an ELF section containing a collection of relocatable "gadgets" that are to be copied/injected into something that can execute them. They are completely relocatable in that the raw bytes of the section can be copied verbatim (e.g. by memcpy) to any [correctly aligned] memory location, with little to no relocation processing (done by the injector). The problem is that I don't have a portable way of determining the size of such a section. I could "cheat" a little by using --section-start, and outright determine a start address, but that feels a little hacky, and I still wouldn't have a way to get the section's end/size.
Overview
Most of the gadgets in the section are written in assembly, so the section itself is declared there. No languages were tagged because I'm trying to be portable with this and get it working for various architectures/platforms. I have separate assembly sources for each (e.g. ARM, AArch64, x86_64, etc).
; The injector won't be running this code, so no need to be executable.
; Relocations (if any) will be done by the injector.
.section gadgets, "a", #progbits
...
The more "heavy duty" code is written in C and compiled in via a section attribute.
__attribute__((section("gadgets")))
void do_totally_innocent_things();
Alternatives
I technically don't need to make use of sections like this at all. I could instead figure out the ends of each function in the gadget, and then copy those however I like. I figured using a section would be a more straightforward way to go about it, to keep everything in one modular relocatable bundle.
I'm not sure if you considered this or this is out of picture but you could read the elf headers.
This would be sort of 'universal' as you can do the same thing with Mach-O binaries
So for example:
Creating 3 integer variables inside the 'custom_sect' section
These would add up to 12 or 0xC bytes which if we read the headers we can confirm:
Size with readelf
and here is how a section is represented in the ELF executables: ELF section representation
So each section will have its own size property which you can just read out
I have managed to build an RV32E cross-compiler on my Intel Ubuntu machine by using the official riscv GitHub toolchain (github.com/riscv/riscv-gnu-toolchain) with the following configuration:-
./configure --prefix=/home/riscv --with-arch=rv32i --with-abi=ilp32e
The ip32e specifies soft float for RV32E. This generates a working compiler that works fine on my simple C source code. If I disassemble the created application then it does indeed stick to the RV32E specification. It only generates assembly for my code that uses the first 16 registers.
I use static linking and it pulls in the expected set of soft float routines such as __divdi3 and __mulsi3. Unfortunately the pulled in routines use all 32 registers and not the restricted lower 16 for RV32E. Hence, not very useful!
I cannot find where this statically linked code is coming from, is it compiled from C source and therefore being compiled without the RV32E restriction? Or maybe it was written as hand coded assembly that has been written only for the full RV32I instead of RV32E? I tried to grep around the source but have had no luck finding anything like the actual code that is statically linked.
Any ideas?
EDIT: Just checked in more details and the compiler is not generating using just the first 16 registers. Turns out with a simple test routine it manages to only use the first 16 but more complex code does use others as well. Maybe RV32E is not implemented yet?
The configure.ac file contains this code:
AS_IF([test "x$with_abi" == xdefault],
[AS_CASE([$with_arch],
[*rv64g* | *rv64*d*], [with_abi=lp64d],
[*rv64*f*], [with_abi=lp64f],
[*rv64*], [with_abi=lp64],
[*rv32g* | *rv32*d*], [with_abi=ilp32d],
[*rv32*f*], [with_abi=ilp32f],
[*rv32*], [with_abi=ilp32],
[AC_MSG_ERROR([Unknown arch])]
)])
Which seems to map your input of rv32i to the ABI ilp32, ignoring the e. So yes, it seems support for the ...e ABIs is not fully implemented yet.
I've been looking at examples of C code that is compiled for some lesser known processors (like ZPU) using the gcc cross compiler.
Most of the working examples I see assume a certain arquitecture (Memory map and set of peripherals) and simply give you a recipe to compile for these and they work.
However I can find very little information on what needs to modified if you use the same cpu with a different memory map and set of peripherals.
From what I've read. There are two main files that I need to make sure that are done "right". The linker script that is used and the crt0.o (Which if I need to modify means recompiling the crt0.S which is assembler). On this last one, especially I find very little information on what is actually supposed to do (other that setting up reset there is no clear info, and I'm talking conceptually not for an specific processor. Although something for this would also be useful).
Can any one tell me what is the relationship between a the c files for the code of program (bare metal development), the crt0.S (specially why it is needed) and it's relationship with a working linker script?
PD: Answers of the form "read this book" are welcome and I would love them.
PD: I realize this kind of question is usually vague and closed quickly but I don't know where else to turn, so I ask for a bit of leniency.
I was aiming at reducing the size of the executable for my C project and I have tried all compiler/linker options, which have helped to some extent. My code consists of a lot of separate files. My question was whether combining all source code into a single file will help with optimization that I desire? I read somewhere that a compiler will optimize better if it finds all code in a single file in place of separate multiple files. Is that true?
A compiler can indeed optimize better when it finds needed code in the same compilable (*.c) file. If your program is longer than 1000 lines or so, you'll probably regret putting all the code in one file, because doing so will make your program hard to maintain, but if shorter than 500 lines, you might try the one file, and see if it does not help.
The crucial consideration is how often code in one compilable file calls or otherwise uses objects (including functions) defined in another. If there are few transfers of control across this boundary, then erasing the boundary will not help performance appreciably. Therefore, when coding for performance, the key is to put tightly related code in the same file.
I like your question a great deal. It is the right kind of question to ask, in my view; and, though the complete answer is not simple enough to treat fully in a Stackexchange answer, your pursuit of the answer will teach you much. Though you may not yet realize it, your question really regards linking, a subject every advancing programmer eventually has to learn. Your question regards symbol tables, inlining, the in-place construction of return values and several, other, subtle factors.
At any rate, if your program is shorter than 500 lines or so, then you have little to lose by trying the single-file approach. If longer than 1000 lines, then a single file is not recommended.
It depends on the compiler. The Intel C++ Composer XE for example can automatically optimize over multiple files (when building using icc -fast *.c *.cpp or icl /fast *.c *.cpp, for linux/windows respectively).
When you use Microsoft Visual Studio, or a derived product (like Atmel Studio for microcontrollers), every single source file is compiled on its own (i. e. one cl, icl, or gcc command is issued for every c and cpp file in the project). This means no optimization.
For microcontroller projects I sometimes have to put everything in a single file in order make it even fit in the limited flash memory on the controller. If your compiler/IDE does it like visual studio, you can use a trick: Select all the source files and make them not participate in the build process (but leave them in the project), then create a file (I always use whole_program.c, and #include every single source (i.e. non-header) file in it (note that including c files is frowned upon by many high level programmers, but sometimes, you have to do it the dirty way, and with microcontrollers, that's actually more often than not).
My experience has been that with gnu/gcc the optimization is within the single file plus includes to create a single object. With clang/llvm it is quite easy and I recommend, DO NOT optimize the clang step, use clang to get from C to bytecode, the use llvm-link to link all of your bytecode modules into one bytecode module, then you can optimize the whole project, all source files optimized together, the llc adds more optimization as it heads for the target. Your best results are to tell clang using the something triple command line option what your ultimate target is. For the gnu path to do the same thing either use includes to make one big file compiled to one object, or if there is a machine code level optimizer other than a few things the linker does, then that is where it would have to happen. maybe gnu has an exposed ir file format, optimizer, and ir to target tool, but I think I would have seen that by now.
http://github.com/dwelch67 a number of my projects, although very simple programs, have llvm and gnu builds for the same source files, you can see where the llvm builds I make a binary from unoptimized bytecode and also optimized bytecode (llvm's optimizer has problems with small while loops and sometimes generates non-working code, a very quick check to see if it is you or them is to try the non-optimized llvm binary and the gnu binary to see if they all behave the same (you) or if only the optimized llvm doesnt work (them)).
I have a big software project with a complicated build process, which works like this:
Compile individual source files.
Partially link object files for each module together into another .o using ld -r.
Hide private symbols in each module using objcopy -G.
Partially link module objects together, again using ld -r.
Link modules together into a shared object.
Step 3 is required to allow module-private global variables that aren't exported to the rest of the project.
This all works fine with ARM and IA32. Unfortunately, now I have to make things work on mips (specifically, mipsel-linux-gnu for Android). And the MIPS shared object ABI is significantly more complex than on the other platforms and it's not working.
What's happening is that step 5 is failing with this error:
CALL16 reloc at 0x1234 not against global symbol
This seems to be because the compiler generates CALL16 relocations to call functions in another compilation unit, but CALL16 only allows you to call global symbols --- and because of step 3, some of the symbols that we're trying to call aren't global any more.
At this point I can see several possible options:
persuade the linker to resolve the CALL16 relocations to normal intra-compilation-unit PC-relative calls at step 2.
ditto, but at step 4 or 5.
tell the compiler not to generate CALL16 relocations for inter-compilation-unit function calls.
other.
Disabling step 3 is, I'm afraid, not an option due to external requirements.
What I'd really, really like to do is to generate absolute code which gets patched at load time to the right addresses; it's smaller, much faster, and vastly simpler, and we don't need to share the library between processes. Unfortunately it appears that Android's dlopen() doesn't seem to support this.
Currently I'm out of my depth. Anyone have any suggestions?
This is gcc 4.4.5 (from Emdebian), binutils 2.20.1. Target BFD is elf32-tradlittlemips. Host OS is Linux, and I'm cross-compiling for Android.
Addendum
I am also getting warnings like this from step 4.
$MODULE.o: Can't find matching LO16 reloc against `$SYMBOLNAME' for R_MIPS_GOT16 at 0x18 in section `.text.$SYMBOLNAME'
Looking at the disassembly of the input to step 4, I can see that the compiler's generated code like this:
50: 8f9e0000 lw s8,0(gp)
50: R_MIPS_GOT16 $SYMBOLNAME
54: 8fd9001c lw t9,28(s8)
58: 0320f809 jalr t9
5c: 00a02021 move a0,a1
Doesn't GOT16 fix up to the high half of an address, and should be followed with a LO16 for the low half? But the code looks like it's trying to do a GOT indirection. This puzzles me. I've no idea if this is related to my earlier problem, or is a different problem, or is not a problem at all...
Update
Apparently MIPS simply does not support hidden global symbols!
We've gotten around it by mangling the names of the symbols that are supposed to be hidden so that nobody can tell what they are. This is pushing the external requirements quite a lot, but I sold management on it by pointing out that it was the only way to get a shippable product.
That's totally gruesome (and involves some deeply disgusting makefile work to do), so I'd rather like a better solution, if anyone has one...
I'm not sure about about the specific GOT issues you are having. There are a lot of bugs and issues with GOT, LO16/HI16 stuff in binutils. I think most have been fixed in the version your using, unless you are targeting MIPS16 (which you don't seem to be doing). LO16 is really only necessary there, beyond MIPS16 you're pulling the full 26-bit offset out of the GOT since you have 32-bit registers. LO16 isn't needed, but is still formally required by some ABI/APIs but it was fudged to be at most an warning (you may try removing a -Werror at that phase if you are using it). I only understand the very basics of that part honestly, the rest of your situation I had some recommendations on though, if not an answer (hard to be sure given the complexity of your setup).
In MIPS (and most assemblies I'm familiar with) you have your basic three levels of visibility: local, global, and weak. In addition you have comm for shared objects. GNU, of course, likes to have things more complicated and adds more. gas provides protected, hidden, and internal (minimally, it is hard to keep up with all the extensions). With all of this the steps your setting in manually fiddling around with visibility seem unnecessary.
If you can remove the intermediate globalness of the variables, it should remove you need to make them unglobal, which can only serve to simplify any GOT issues you run into later.
The overall problems is a bit confusing. I'm not sure what you mean by hidden global symbols, it's a bit a contradiction (of course portability and specific projects give crazy problems and restrictions). You seem to want cross assembly unit symbols at one stage, but not a later stage. Without using GNU extensions (something best avoided in my book), you may want to replace the globals in steps 1-2 with comm and/or weakglobals. You could always use use preprocessor trickery to avoid having multiple sub-units at the stage even (ugly, but that's portable code at this level).
You really have a setup of 1) sub-modules 2) sub-modules -> modules 3-5) modules -> shared library. Simplifying that can't hurt. You can always interpose at 2) or 3-5) a C-level interface just to find what assembly GCC will product for you architectures and use that as a basis for breaking visibility up into clean interfaces.
Wish I could give you a tailor made solution, but that's pretty impossible without your full project to work from. I can reassure that while MIPS location (especially the toolchains) have issues, the visibility options (especially if you are using gas, libbfd, and gcc) are the same.
your binutils is too old. some changesets in 2.23 may resolve your problem, like "hide symbols without PLT nor GOT references".