Compiling PowerPC binary with gcc and restrict useable registers - c

I have a PowerPC device running a software and I'd like to modify this software by inserting some own code parts.
I can easily write my own assembler code, put it somewhere in an unused region in RAM, replace any instruction in the "official" code by b 0x80001234 where 0x80001234 is the RAM address where my own code extension is loaded.
However, when I compile a C code with powerpc-eabi-gcc, gcc assumes it compiles a complete program and not only "code parts" to be inserted into a running program.
This leads to a problem: The main program uses some of the CPUs registers to store data, and when I just copy my extension into it, it will mess with the previous contents.
For example, if the main program I want to insert code into uses register 5 and register 8 in that code block, the program will crash if my own code writes to r5 or r8. Then I need to convert the compiled binary back to assembler code, edit the appropriate registers to use registers other than r5 and r8 and then compile that ASM source again.
Waht I'm now searching for is an option to the ppc-gcc which tells it "never ever use the PPC registers r5 and r8 while creating the bytecode".
Is this possible or do I need to continue crawling through the ASM code on my own replacing all the "used" registers with other registers?

You should think of another approach to solve this problem.
There is a gcc extension to reserve a register as a global variable:
register int *foo asm ("r12");
Please note that if you use this extension, your program does no longer confirm to the ABI of the operating system you are working on. This means that you cannot call any library functions without risking program crashes, overwritten variables, or crashes.

Related

Compile assembly code from C code without using specific register in gcc

I am injecting some control flow monitoring codes to a program. I get an assembly code generated by GCC C compiler (flag -S). Then I add some monitoring codes in assembly before every indirect branches within the application. Those monitoring codes needs to use some registers and therefore, for every branch I inject the code I have to push and pop the registers I use in order to save the previously written value and return them after.
However since the performance is an issue, I was wondering if I can avoid the push pops when I convert the C code to assembly code and tell the GCC to generate assembly code without using one or two specific register. Therefore, I can avoid using push pops for every indirect branch to save the existing values in the register.
Is there any way to do that?
See the -ffixed-reg option.
Note that if the register in question is required to be used for passing arguments, etc, this won't work (indeed, it appears that in that case, gcc will silently use it anyway).

ARM + gcc: don't use one big .rodata section

I want to compile a program with gcc with link time optimization for an ARM processor. When I compile without LTO, the system gets compiled. When I enable LTO
(with -flto), I get the following assembler-error:
Error: invalid literal constant: pool needs to be closer
Looking around the web I found out that this has something to do with the constants in my system, which are placed in a special section called .rodata, which is called a constant pool and is placed right after the .text section in my system. It seems that when compiling with LTO because of inlining and other optimizations this .rodata section gets too far away from the instructions, so that the addressing of the constants is not possible anymore. Is it possible to place the constants right after the function that uses them? Or is it possible to use another addressing mode so the .rodata section can still be addressed? Thanks.
This is an assembler message, not a linker message, so this happens before sections are generated.
The assembler has a pseudo instruction for loading constants into registers:
ldr r0, =0x12345678
this is expanded into
ldr r0, [constant_12345678, r15]
...
bx lr
constant_12345678:
dw 0x12345678
The constant pool usually follows the return instruction. With function inlining, the function can get long enough that the return instruction is too far away; unfortunately, the compiler has no idea of the distance between memory addresses, and the assembler has no idea of control flow other than "flow does not pass beyond the return instruction, so it is safe to emit the constant pool here".
Unfortunately, there is no good solution at the moment.
You could try an asm block containing
b 1f
.ltorg
1:
This will force-emit the constant pool at this point, at the cost of an extra branch instruction.
It may be possible to instruct the assembler to omit the branch if the constant pool is empty, but I cannot test that at the moment, so this is probably not valid:
.if (2f - 1f)
.b 2f
.endif
1:
.ltorg
2:
"This is an assembler message, not a linker message, so this happens before sections are generated" - I am not sure but I think it is a little bit more complicated with LTO. Compiling (including assembling) of the individual c-files with LTO enabled works fine and does not cause any problems. The problem occurs when I try to link them together with LTO enabled. I don't know how LTO is exactly done, but apparently this also includes calling the assembler again and then I get this error message. When linking without LTO, everything is fine and when I look at the disassemly I can see that my constants are not placed after a function. Instead all constants are placed in the .rodata section. With LTO enabled because of inlining, my functions probably get to large to reach the constant pool...

Why the entry file for uboot is written in assembly?

Why entry point (Start.S) of uboot is written in assembly? Is it for performance reason or there are other issues. why it is not written in C?
Unless the entry point is guaranteed an initial state that fits the form of a C function call in the ABI the C compiler uses, C cannot express an entry point. If there is any relevant state in registers, this would be (1) potentially-clobbered by any prologue code the compiler generates, for call-clobbered registers, and (2) even if the registers are call-saved, the compiler might move them somewhere not exposed to the C code, even if the C code has access to inline assembly extensions. (A side note: uClibc's setjmp implementation for some archs is buggy in this regard; it is wrongly written with inline asm, rather than an asm function, and assumes that the compiler has not modified/moved call-saved registers already when the inline asm is reached.) Many entry points (e.g. for ELF binaries) also have initial state positioned on the stack in ways that are not representable from C.
Each Processor architecture has its own startup sequence and procedure.
They can be too specific, to be generalized under C.
For example
ARM requires that startup and initialization of a kernel be done in supervisor mode, which is enabled by setting the S bit in the control register. And then switch the control to user mode. This procedure varies in x86 and PowerPC.
Yes it can be done in C, but it makes more sense to perform architecture related initialization, in architecture specific assembly language.
The entry point is in assembly because during the early boot phase there is NO facility to call C functions. Before we can call a C function the system should already have a valid stack. The valid stack could be in DDR RAM or in SRAM. Before we use DDR RAM or SRAM we must initialize it first. Before initializing these, we must set the PLLs and other clocks first. You should see a pattern here. Everything starts at the reset vector (well unless the u-boot is a RAMBOOT variant).
All of this early low-level initialization is performed by the start up code (in assembly). After the memory is initialized, the code sets up the stack and heap, and continues running the C-coded part.

Arbitrary code execution using existing code only

Let's say I want to execute an arbitrary mov instruction. I can write the following function (using GCC inline assembly):
void mov_value_to_eax()
{
asm volatile("movl %0, %%eax"::"m"(function_parameter):"%eax");
// will move the value of the variable function_parameter to register eax
}
And I can make functions like this one that will work on every possible register.
I mean -
void movl_value_to_ebx() { asm volatile("movl %0, %%ebx"::"m"(function_parameter):"%ebx"); }
void movl_value_to_ecx() { asm volatile("movl %0, %%ecx"::"m"(function_parameter):"%ecx"); }
...
In a similar way I can write functions that will move memory in arbitrary addresses into specific registers, and specific registers to arbitrary addresses in memory. (mov eax, [memory_address] and mov [memory_address],eax)
Now, I can perform these basic instructions whenever I want, so I can create other instructions. For example, to move a register to another register:
function_parameter = 0x028FC;
mov_eax_to_memory(); // parameter is a pointer to some temporary memory address
mov_memory_to_ebx(); // same parameter
So I can parse an assembly instruction and decide what functions to use based on it, like this:
if (sourceRegister == ECX) mov_ecx_to_memory();
if (sourceRegister == EAX) mov_eax_to_memory();
...
if (destRegister == EBX) mov_memory_to_ebx();
if (destRegister == EDX) mov_memory_to_edx();
...
If it can work, It allows you to execute arbitrary mov instructions.
Another option is to make a list of functions to call and then loop through the list and call each function. Maybe it requires more tricks for making equivalent instructions like these.
So my question is this: Is is possible to make such things for all (or some) of the possible opcodes? It probably requires a lot of functions to write, but is it possible to make a parser, that will build code somehow based on given assembly instructions ,and than execute it, or that's impossible?
EDIT: You cannot change memory protections or write to executable memory locations.
It is really unclear to me why you're asking this question. First of all, this function...
void mov_value_to_eax()
{
asm volatile("movl %0, %%eax"::"m"(function_parameter):"%eax");
// will move the value of the variable function_parameter to register eax
}
...uses GCC inline assembly, but the function itself is not inline, meaning that there will be prologue & epilogue code wrapping it, which will probably affect your intended result. You may instead want to use GCC inline assembly functions (as opposed to functions that contain GCC inline assembly), which may get you closer to what you want, but there are still problems with that.....
OK, so supposing you write a GCC inline assembly function for every possible x86 opcode (at least the ones that the GCC assembler knows about). Now supposing you want to invoke those functions in arbitrary order to accomplish whatever you might wish to accomplish (taking into account which opcodes are legal to execute at ring 3 (or in whatever ring you're coding for)). Your example shows you using C statements to encode logic for determining whether to call an inline assembly function or not. Guess what: Those C statements are using processor registers (perhaps even EAX!) to accomplish their tasks. Whatever you wanted to do by calling these arbitrary inline assembly functions is being stomped on by the compiler-emitted assembly code for the logic (if (...), etc). And vice-versa: Your inline assembly function arbitrary instructions are stomping on the registers that the compiler-emitted instructions expect to not be stomped-on. The result is not likely to run without crashing.
If you want to write code in assembly, I suggest you simply write it in assembly & use the GCC assembler to assemble it. Alternatively, you can write whole C-callable assembly functions within an asm() statement, and call them from your C code, if you like. But the C-callable assembly functions you write need to operate within the rules of the calling convention (ABI) you're using: If your assembly functions use a callee-saved register, your function will need to save the original value in that register (generally on the stack), and then restore it before returning to the caller.
...OK, based on your comment Because if it's working it can be a way to execute code if you can't write it to memory. (the OS may prevent it)....
Of course you can execute arbitrary instructions (as long as they're legal for whatever ring you're running in). How else would JIT work? You just need to call the OS system call(s) for setting the permissions of the memory page(s) in which your instructions reside... change them to "executable" and then call 'em!

Embedded: memcpy/memset not used by most CRT startup code ― why?

Context:
I'm working on an ARM target, more specifically a Cortex-M4F microcontroller from ST. When working on such platforms (microcontrollers in general), there's obviously no OS; in order to get a working C/C++ "environment" (moreover, to be standard compliant in regard to initialization of variables) there must be some kind of startup code run at reset that does the minimum setup required before explicitly calling main. Such startup code, as I hinted, must initialize initialized global and static variables (such as int foo = 42;at global scope) and zero-out the other globals (such as int bar; at global scope). Then, if necessary, global "ctors" are called.
On a microcontroller, that simply means that the startup code has to copy data from flash to ram for every initialized global (all in section '.data') and clear the others (all in '.bss'). Because I use GCC, I must supply such a startup code and I happily analyzed several startup codes (and its associated linker script!) bundled with numerous examples I've found on the Internet, all using the same demo board I'm developing on.
Question:
As stated, I've seen numerous startup codes, and they initialize globals in different ways, some more efficient in term of space and time than others. But they all have something odd in common: they didn't use memset nor memcpy, resorting instead to hand-written loops to do the job. As it appears natural to me to use standard functions when possible (simple "DRY principle"), I tried the following in lieu of the initial hand-written loops:
/* Initialize .data section */
ldr r0, DATA_LOAD
ldr r1, DATA_START
ldr r2, DATA_SIZE
bl memcpy /* memcpy(DATA_LOAD, DATA_START, DATA_SIZE); */
/* Initialize .bss section */
ldr r0, BSS_START
mov r1, #0
ldr r2, BSS_SIZE
bl memset /* memset(BSS_START, 0, BSS_SIZE); */
... and it worked perfectly. The space saving are negligible, but it is clearly dead simple now.
So, I thought about it, and I see no reason to do hand-written loops in this case:
memcpy and memset are very likely to be linked in the executable anyway, because the programmer would use it directly, or indirectly through another library;
It is smaller;
Speed is not a very important factor for startup code, but nevertheless it is likely faster;
It's nearly impossible to get it wrong.
Any idea why one wouldn't rely on memcpy and memset for startup code?
I suspect the startup code does not want to make assumptions about the implementation of memcpy and such in libc. For example, the implementation of memcpy might use a global variable set by libc initialization code to report which cpu extensions are available, in order to provide optimized SIMD copying on machines that support such operations. At the point where the early "crt" startup code is running, the storage for such a global might be completely uninitialized (containing random junk), in which case it would be dangerous to call memcpy. Even if making the call works for you, it's a consequence of the implementation (or maybe even the unpredictable results of UB...) making it work; this is probably not something the crt code wants to depend on.
Whether the standard library is linked at all is decision for the application developer (--nostdlib may be used for example), but the start-up code is required, so it cannot make any assumptions.
Further, the purpose of the start-up code is to establish an environment in which C code can run; before that is complete, it is by no means a given that any library code that might reasonably assume a complete run-time environment will run correctly. For the functions in question this is perhaps not an issue in many cases, but you cannot know that.
The start-up code has to at least establish a stack and initialise static data, in C++ it additionally calls the constructors of global static objects. The standard library might reasonably assume those are established, so using the standard library before then may conceivably result in erroneous behaviour.
Finally you should be clear that the C language and the C standard library are distinct entities. The language must necessarily be capable of standing alone.
I don't think this is likely to have anything to do with "assumptions about the internal state of memcy/memset", they are unlikely to use any global resources (though I suppose some odd cases exist where they do).
All start up code on microcontrollers is usually written "inline assembler" in this manner, simply because it runs at an early stage in the code, where a stack might not yet be present and the MMU setup may not yet have been executed. Init code therefore doesn't want to risk putting anything on the stack, simple as that. Function calls put things on the stack.
So while this happened to be the initialization code of the static storage copy-down, you are likely to find the same inline assembler in other such init code as well. For example you will likely find some fundamental register setup code written in assembler somewhere before the copy-down, and you will also find the MMU setup in assembler somewhere around there too.

Resources