Is the Global Offset Table (GOT) pointer available from C/C++? - c

I'm trying to track down an issues that a handful of users are reporting. I cannot reproduce it at the moment, but I suspect the issue is related to the use of PIC and inline assembly.
PIC uses the Global Offset Table (GOT), and the inline assembly must preserve EBX and RBX according to the ABI. I've audited the code, and it appears to preserve EBX and RBX as required. But that does not mean the generated code is consistent with expectations because GCC will interleave instructions as it sees fit. All GCC guarantees is consecutiveness (i.e., my ASM will not be reordered).
I want to instrument debug builds with code similar to the following:
volatile void* got1 = GlobalOffsetTablePointer();
// Call a routine that uses inline assembly
volatile void* got2 = GlobalOffsetTablePointer();
assert(got1 = got2);
The problem I am experiencing is I cannot locate the function GlobalOffsetTablePointer. I already have a suspicion for bad interactions with inline assembly, so I am trying to avoid more inline assembly to fetch the GOT pointer.
Is the Global Offset Table (GOT) pointer available from C/C++? If so, how do I access it?

Related

Why we need Clobbered registers list in Inline Assembly?

In my guide book it says:
In inline assembly, Clobbered registers list is used to tell the
compiler which registers we are using (So it can empty them before
that).
Which I totally don't understand, why the compiler needs to know so? what's the problem of leaving those registers as is? did they meant instead to back them up and restore them after the assembly code.
Hope someone can provide an example as I spent hours reading about Clobbered registers list with no clear answers to this problem.
The problems you'd see from failing to tell the compiler about registers you modify would be exactly the same as if you wrote a function in asm that modified some call-preserved registers1. See more explanation and a partial example in Why should certain registers be saved? What could go wrong if not?
In GNU inline-asm, all registers are assumed preserved, except for ones the compiler picks for "=r" / "+r" or other output operands. The compiler might be keeping a loop counter in any register, or anything else that it's going to read later and expect it to still have the value it put there before the instructions from the asm template. (With optimization disabled, the compiler won't keep variables in registers across statements, but it will when you use -O1 or higher.)
Same for all memory except for locations that are part of an "=m" or "+m" memory output operand. (Unless you use a "memory" clobber.) See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for more details.
Footnote 1:
Unlike for a function, you should not save/restore any registers with your own instructions inside the asm template. Just tell the compiler about it so it can save/restore at the start/end of the whole function after inlining, and avoid having any values it needs in them. In fact, in ABIs with a red-zone (like x86-64 System V) using push/pop inside the asm would be destructive: Using base pointer register in C++ inline asm
The design philosophy of GNU C inline asm is that it uses the same syntax as the compiler internal machine-description files. The standard use-case is for wrapping a single instruction, which is why you need early-clobber declarations if the asm code in the template string doesn't read all its inputs before it writes some registers.
The template is a black box to the compiler; it's up to you to accurately describe it to the optimizing compiler. Any mistake is effectively undefined behaviour, and leaves room for the compiler to mess up other variables in the surrounding code, potentially even in functions that call this one if you modify a call-preserved register that the compiler wasn't otherwise using.
That makes it impossible to verify correctness just by testing. You can't distinguish "correct" from "happens to work with this surrounding code and set of compiler options". This is one reason why you should avoid inline asm unless the benefits outweigh the downsides and risk of bugs. https://gcc.gnu.org/wiki/DontUseInlineAsm
GCC just does a string substitution into the template string, very much like printf, and sends the whole result (including the compiler-generated instructions for the pure C code) to the assembler as a single file. Have a look on https://godbolt.org/ sometime; even if you have invalid instructions in the inline asm, the compiler itself doesn't notice. Only when you actually assemble will there be a problem. ("binary" mode on the compiler-explorer site.)
See also https://stackoverflow.com/tags/inline-assembly/info for more links to guides.

gcc - OS-independent function labels

void foo(){
...
}
Compiling this to assembly, it seems that gcc on linux will create label foo as an entry point but label _foo on OSX.
We can, of course, do an OS-specific selection whenever we need a label, but this is cumbersome.
Is there any way to suppress this so that the labels on both systems are the same (preferably one that is also Windows-compatible)?
No. It's part of the name mangling specifications of the platform.
You can't change that. You're still writing assembly. Don't expect it to be portable in any way, that's what C was invented for.
The early C compilers decorated the name of the functions with an _ to avoid name clashing when linking against the already developed and huge assembly libraries of the times.
Credits for this information go to this excellent old answer.
Today this is not needed anymore but the tradition is still sticking around, mostly for backward compatibility, even though some systems are getting rid of it.
This is not an OS issue, OSes are completely orthogonal to programming languages, name decoration is not something defined by the OS ABI, it is a matter of the compiler/linker designers; though standards have been created to reduce the incompatibilities and an ABI may suggest their use.
In order to fully understand how you can mitigate your problem it is worth noting that while the OS API are language agnostic, a C program rarely invoke them directly, more likely it uses the C run-time.
The C run-time is usually statically linked and it expects names to be decorated according to the scheme of the compiler used to create it.
So if you need to use the C run-time you have to stick with the same name decoration as your system components are using.
This last point rules out the -fno-leading-underscore option as it will generate a linker error on the relevant platforms.
It is better to work on the assembly files, since you have the freedom to define and imports names exactly as typed. Furthermore usually the assembly code is limited.
If you are using NASM1 there is a nice trick you can use, it's called Macro indirection and it allow you to append a symbol, define at command line, to a name.
Consider:
BITS 32
mov eax, %[p]data
_data db 0
data db 0
If you compile this file twice, the first time as nasm -Dp=_ ... and the second as nasm -Dp= ..., by inspecting the immediate value in the generated opcode for mov eax, %[p]data you can check that in the first case it has been translated as mov eax, _data and in the second as mov eax, data.
Assuming you access external symbols by declaring them as EXTERN symn (precise syntax is irrelevant here), you can define a macro PEXTERN that works like the directive EXTERN but import the symbol with or without a leading underscore based on the value of the macro p (you can change this name) and define an alias for it so that its imported name is the same regardless.
BITS 32
%macro PEXTERN 1
EXTERN %[p]%1
%ifnidn %1, %[p]%1
%define %1 %[p]%1
%endif
%endmacro
PEXTERN foo
PEXTERN bar
mov eax, foo
call bar
Running nasm -Dp= -e ... and nasm -Dp=_ -e ... produces the listings
extern foo extern _foo
extern bar extern _bar
mov eax, foo mov eax, _foo
call bar call _bar
You'll need to update the building scripts/Makefiles, off the top of my head you can use two methods:
Detect the OS type and properly define the symbol p.
With Makefiles this may be easier.
Try compiling a test program.
Write a minimal C program that import/export a function and a minimal assembly file that export/import that function.
Define the symbol as _ and try to assemble + compile (redirecting everything into /dev/null).
If it fails redefine the symbol as empty.
Note that besides names, individual OSes may need specific assembly flags, so a universal building script maybe more involved but not necessarily unmanageable.
You'll end up needing something like Cygwin for Windows.
1 If not, check if you can port the idea into your assembler.

Compiling PowerPC binary with gcc and restrict useable registers

I have a PowerPC device running a software and I'd like to modify this software by inserting some own code parts.
I can easily write my own assembler code, put it somewhere in an unused region in RAM, replace any instruction in the "official" code by b 0x80001234 where 0x80001234 is the RAM address where my own code extension is loaded.
However, when I compile a C code with powerpc-eabi-gcc, gcc assumes it compiles a complete program and not only "code parts" to be inserted into a running program.
This leads to a problem: The main program uses some of the CPUs registers to store data, and when I just copy my extension into it, it will mess with the previous contents.
For example, if the main program I want to insert code into uses register 5 and register 8 in that code block, the program will crash if my own code writes to r5 or r8. Then I need to convert the compiled binary back to assembler code, edit the appropriate registers to use registers other than r5 and r8 and then compile that ASM source again.
Waht I'm now searching for is an option to the ppc-gcc which tells it "never ever use the PPC registers r5 and r8 while creating the bytecode".
Is this possible or do I need to continue crawling through the ASM code on my own replacing all the "used" registers with other registers?
You should think of another approach to solve this problem.
There is a gcc extension to reserve a register as a global variable:
register int *foo asm ("r12");
Please note that if you use this extension, your program does no longer confirm to the ABI of the operating system you are working on. This means that you cannot call any library functions without risking program crashes, overwritten variables, or crashes.

Arbitrary code execution using existing code only

Let's say I want to execute an arbitrary mov instruction. I can write the following function (using GCC inline assembly):
void mov_value_to_eax()
{
asm volatile("movl %0, %%eax"::"m"(function_parameter):"%eax");
// will move the value of the variable function_parameter to register eax
}
And I can make functions like this one that will work on every possible register.
I mean -
void movl_value_to_ebx() { asm volatile("movl %0, %%ebx"::"m"(function_parameter):"%ebx"); }
void movl_value_to_ecx() { asm volatile("movl %0, %%ecx"::"m"(function_parameter):"%ecx"); }
...
In a similar way I can write functions that will move memory in arbitrary addresses into specific registers, and specific registers to arbitrary addresses in memory. (mov eax, [memory_address] and mov [memory_address],eax)
Now, I can perform these basic instructions whenever I want, so I can create other instructions. For example, to move a register to another register:
function_parameter = 0x028FC;
mov_eax_to_memory(); // parameter is a pointer to some temporary memory address
mov_memory_to_ebx(); // same parameter
So I can parse an assembly instruction and decide what functions to use based on it, like this:
if (sourceRegister == ECX) mov_ecx_to_memory();
if (sourceRegister == EAX) mov_eax_to_memory();
...
if (destRegister == EBX) mov_memory_to_ebx();
if (destRegister == EDX) mov_memory_to_edx();
...
If it can work, It allows you to execute arbitrary mov instructions.
Another option is to make a list of functions to call and then loop through the list and call each function. Maybe it requires more tricks for making equivalent instructions like these.
So my question is this: Is is possible to make such things for all (or some) of the possible opcodes? It probably requires a lot of functions to write, but is it possible to make a parser, that will build code somehow based on given assembly instructions ,and than execute it, or that's impossible?
EDIT: You cannot change memory protections or write to executable memory locations.
It is really unclear to me why you're asking this question. First of all, this function...
void mov_value_to_eax()
{
asm volatile("movl %0, %%eax"::"m"(function_parameter):"%eax");
// will move the value of the variable function_parameter to register eax
}
...uses GCC inline assembly, but the function itself is not inline, meaning that there will be prologue & epilogue code wrapping it, which will probably affect your intended result. You may instead want to use GCC inline assembly functions (as opposed to functions that contain GCC inline assembly), which may get you closer to what you want, but there are still problems with that.....
OK, so supposing you write a GCC inline assembly function for every possible x86 opcode (at least the ones that the GCC assembler knows about). Now supposing you want to invoke those functions in arbitrary order to accomplish whatever you might wish to accomplish (taking into account which opcodes are legal to execute at ring 3 (or in whatever ring you're coding for)). Your example shows you using C statements to encode logic for determining whether to call an inline assembly function or not. Guess what: Those C statements are using processor registers (perhaps even EAX!) to accomplish their tasks. Whatever you wanted to do by calling these arbitrary inline assembly functions is being stomped on by the compiler-emitted assembly code for the logic (if (...), etc). And vice-versa: Your inline assembly function arbitrary instructions are stomping on the registers that the compiler-emitted instructions expect to not be stomped-on. The result is not likely to run without crashing.
If you want to write code in assembly, I suggest you simply write it in assembly & use the GCC assembler to assemble it. Alternatively, you can write whole C-callable assembly functions within an asm() statement, and call them from your C code, if you like. But the C-callable assembly functions you write need to operate within the rules of the calling convention (ABI) you're using: If your assembly functions use a callee-saved register, your function will need to save the original value in that register (generally on the stack), and then restore it before returning to the caller.
...OK, based on your comment Because if it's working it can be a way to execute code if you can't write it to memory. (the OS may prevent it)....
Of course you can execute arbitrary instructions (as long as they're legal for whatever ring you're running in). How else would JIT work? You just need to call the OS system call(s) for setting the permissions of the memory page(s) in which your instructions reside... change them to "executable" and then call 'em!

Pushing a pointer into the eax and ebx registers in GCC

I need to push a pointer into the eax and another into the ebx register. I first solved this with:
register int eax asm("eax");
register int ebx asm("ebx");
int main()
{
eax = ptr1;
ebx = ptr2;
}
Which worked like a charm. However, when I added this into my other code, I got some strange errors about gcc being unable to find a register to spill in class AREG, in totally unrelated part of the code. I googled, and it turns out to actually be a bug in gcc -.-. So, I need an other way, to push two pointers, into the eax and ebx registers. Anyone any ideas?
Edit:
Since people have been asking what I am trying to accomplish here, I thought I'd explain a bit.
I need to change the eax and ebx for some assembly code I'm trying to run in my program. I need to execute this assembly code, and give a pointer to the parameter via the eax and ebx register. I execute the assembly code by pushing a pointer to it in ebx and that call ebx. When I don't call the register stuff globally, but locally, the assembly code crashes. If I call it globally, I get this weird error at the end of a random function. When I remove that functions, it throws the same error at another random function. Until I ran out of functions, then it works, but then I miss the rest of the code :P
If you have (inline) assembly code that requires specific parameters in EAX/EBX, the way to do this in gcc is to use the following:
__asm__("transmogrify %0, %1\n" : "+a"(val_for_eax), "+b"(val_for_ebx));
This uses what gcc calls inline assembly constraints which tell the compiler that the assembly code - whatever it is - expects val_for_eax/val_for_ebx in EAX/EBX (that's the a/b part) as well as that it will return potentially modified versions of these variables (that's the +) in these registers as well.
Beyond that, the actual code within the asm() statement doesn't matter to the compiler - it'll only need/want to know where the parameters %0 and %1 live. The above example will, due to a transmogrify instruction not existing in the current x86 instruction set, fail when the assembler runs; just substitute it with something valid.
The explanations why gcc behaves this way and exactly what you can tell it to do is in the GCC manual, at:
Extended Assembly - Assembler Instructions with C operands
Constraints for asm operands, in particular the Intel/386 section of the Machine-specific Constraints list for what to say if you need to pass/retrieve a value in a specific register, and the Modifiers section about the meaning of things like the + (to both pass and return a value; there are other such "modifiers" to the constraints)
You can specify a specific register for a variable but due to the way gcc works / the way inline assembly is implemented in gcc, doing so does not mean (!) the register is from then on reserved (out of scope) for gcc to use for its own purposes. That can only be achieved through constraints, for a specific, single asm() block - the constraints tells gcc what to write into those registers before the placement of the actual assembly code, and what to read from them afterwards.
Since the eax register is need all over the place in a valid program on your architecture, your strategy can't work with global variables that are bound to the specific registers. Don't do that, reserving a register globally is not a good idea.
Place the variables that are bound to registers in the particular function, as close as possible to their use.

Resources