How can I run some machine code from within my program - c

How can I run some machine code from within my C program?
Lets say I have the instruction 'B2', how can I execute this? (note that the instructions will change at run time)

Load code into memory.
Either create a function pointer to this memory and call it (void (*foo)(void) = mmap(...), foo();) , or use inline assembly to "jmp" to the code.
Note that on newer systems you will need to make sure you have requested memory which does NOT have the NX (no execute) bit set. If NX is set, jumping to your code will produce a processor exception and your process will be killed.
On Linux this is an mmap flag, on Windows there are other means to request DEP-unprotected memory.
Your code should also not rely on fixed addresses, that is it should be position independent. You cannot guarantee the same load address.
If your code needs to call into your program, it is best to provide it a table via the function call where it can resolve function addresses of your executable or C library, or attempt to use the system linker (you might have some luck using ld.so functionality on Linux, but this is of course non-portable).

Related

C startup code is only written in assembly confusion

I understand that the C startup code is for initializing the C runtime environment, initializes static variables, sets up the stack pointer etc. and finally branches to main().
They say that this can only be written in assembly language as it's platform-specific. However, can't this still be written in C and compiled for the specific platform?
Function calls of course would be not possible because we "more than likely" don't have the stack pointer set up at that stage. I still can't see other main reasons. Thanks in advance.
Startup code can be written in C language only if:
Implementation provides all necessary intrinsic functions to set hardware features that cannot be set using standard C
Provides mechanism of placing fragments of code and data in the specific place and in specific order (gcc support for ld linker scripts for example).
If both conditions are met you can write the startup code in C language.
I use my own startup code written in C (instead of one provided by the chip vendors) for Cortex-M microcontrollers as ARM provides CMSIS header files with all needed inline assembly functions and gcc based toolchain gives me full memory layout control.
Most of the problem with writing early startup code in C is, in fact, the absence of a properly structured stack. It's worse than just not being able to make function calls. All of a C compiler's generated machine code assumes the existence of a stack, pointed to by the ABI-specified register, that can be used for scratch storage at any time. Changing this assumption would be so much work as to amount to a complete second "back end" for the compiler—way more work than continuing to write early startup code by hand in assembly.
Early bootstrap code, bringing up the machine from power-on, also has to do a bunch of special operations that can't usually be accessed from C, like configuring interrupts and virtual memory. And it may have to deal with the code not having been loaded at the address it was linked for, or the relocation table not having been processed, or other similar problems; these also break pervasive assumptions made by the C compiler (e.g. that it can inject a call to memcpy whenever it wants).
Despite all that, most of a user mode C library's startup code will, in fact, be written in C, for exactly the reason you are thinking. Nobody wants to write more code in assembly, over and over for each supported ISA, than absolutely necessary.
A minimal C runtime environment requires a stack, and a jump to a start address. Setting the stack pointer on most architectures requires assembly code. Once a stack is available it is possible to run code generated from C source.
ARM Cortex-M devices load the stack pointer and start address from the vector table on reset, so can in fact boot directly into code generated from C source.
On other architectures, the minimal assembly requires is to set a stack pointer, and jump to the start address. Thereafter it is possible to write other start-up tasks in C ( or C++ even). Such startup code is responsible for establishing the full C runtime, so must not assume static initialisation or library initialisation (no heap or filesystem for example), which are things that must be done by the startup code.
In that sense you can run code generated from C source, but the environment is not strictly conforming until main() has been called, so there are some constraints.
Even where assembly code is used, it need not be the whole start-up code that is in assembly.

ptrace - Retrieve the (symbol) name of the function called with the 'call' instruction

I am trying, as an exercise, to make some sort of custom profiler for binaries in C, using the ptrace api. I assume all binaries to be traced have been statically linked, I have access to tools such as nm(1), objdump and readelf and use a Linux, x86, 32 bit system.
In the current phase I am trying to create a dynamic call tree/graph (relative calls only) of the traced process and include the total number of instructions executed in each function call. In order to do that, I tried to:
Retrieve all user defined symbols in the ELF file using nm(1) as well as their addresses
Use ptrace to step through the code and identify call and ret restructions
After each call, use the rip register to figure out the address of the current instruction and within which function this instruction is; thus deducing the corresponding symbol name.
My question is relative to this last point. I was wondering if there is a way, using the ptrace api, to identify the call instruction as well as the address of the function to which the execution will jump; or even better directly the symbol name which corresponds to this function.
I have tried reading the documentation for ptrace but it is, at least for me, far from clear. Is there a standard approach to what I am trying to do? Is my approach maybe completely wrong?

C dynamically include library, by copying raw function data to memory block

I thought if I have the raw machine code output of an individual function, that I could read it with dread into an block of memory, and just create an function pointer to there, to dynamically include a function at runtime.
Would this work?
Yes it would work, but not in a cross-platform way. Memory returned by malloc is not executable, at least it is not guaranteed to be. On Windows you'll have to use VirtualAlloc, followed by VirtualProtect with the PAGE_EXECUTE_READ flag. On Linux, look into mprotect. There can also be differences based on compiler used and machine architecture, for a Linux+GCC example see this code golf.
Though why not just dynamically load the shared library with LoadLibrary+GetProcAddress (Win) or dlopen+dlsym (Linux)?

When is dynamic linking between a program and a shared library performed?

In C, when is dynamic linking between a program and a shared library performed:
Once loading of the program into the memory, but before executing the main() of the program, or
After executing the main() of the program, when the first call to a routine from the library is executed? Will dynamic linking happen again when a second or third or... call to a routine from the library is executed?
I was thinking the first, until I read the following quote, and now I am not sure.
Not sure if OS matters, I am using Linux.
From Operating System Concepts:
With dynamic linking, a stub is included in the image for each
library- routine reference. The stub is a small piece of code that
indicates how to locate the appropriate memory-resident library
routine or how to load the library if the routine is not already
present.
When the stub is executed, it checks to see whether the needed routine is already in memory. If it is not, the program loads the
routine into memory. Either way, the stub replaces itself with the
address of the routine and executes the routine. Thus, the next time
that particular code segment is reached, the library routine is
executed directly, incurring no cost for dynamic linking. Under this
scheme, all processes that use a language library execute only one
copy of the library code.
I was thinking the first, until I read the following quote, and now I am not sure.
It's complicated (and depends on exactly what you call "dynamic linking").
The Linux kernel loads a.out into memory. It then examines PT_INTERP segment (if any).
If that segment is not present, the binary is statically linked and the kernel transfers control to the Elf{32,64}Ehdr.e_entry (usually the _start routine).
If the PT_INTERP segment is present, the kernel loads it into memory, and transfers control to it's .e_entry. It is here that the dynamic linking begins.
The dynamic loader relocates itself, then looks in a.outs PT_DYNAMIC segment for instructions on what else is necessary.
For example, it will usually find one or more DT_NEEDED entries -- shared libraries that a.out was directly linked against. The loader loads any such libraries, initializes them, and resolves any data references between them.
IF a.outs PT_DYNAMIC has a DT_FLAGS entry, and IF that entry contains DF_BIND_NOW flag, then function references from a.out will also be resolved. Otherwise (and assuming that LD_BIND_NOW is not set in the environment), lazy PLT resolution will be performed (resolving functions as part of first call to any given function). Details here.
When the stub is executed, it checks to see whether the needed routine is already in memory. If it is not, the program loads the routine into memory.
I don't know which book you are quoting from, but no current UNIX OS works that way.
The OS (and compiler, etc.) certainly matters: the language itself has nothing to say about dynamic libraries (and very little about linking in general). Even if we know that dynamic linking is occurring, a strictly-conforming program cannot observe any effect from timing among its translation units (since non-local initialization cannot have side effects).
That said, the common toolchains on Linux do support automatic initialization upon loading a dynamic library (for implementing C++, among other things). Executables and the dynamic libraries on which they depend (usually specified with -l) are loaded and initialized recursively to allow initialization in each module to (successfully) use functions from its dependencies. (There is an unfortunate choice of order in some cases.) Of course, dlopen(3) can be used to load and initialize more libraries later.

Programmatically calling debugger in GCC

Is it possible to programmatically break into debugger from GCC?
For example I want something like:
#define STOP_EXECUTION_HERE ???
which when put on some code line will force debugger stop there.
Is it possible at all ?
I found some solution, but i can't use it because on my embedded ARM system I don't have signal.h.
(However I can use inline assembly).
What you are trying to do is called software breakpoint
It is very hard to say precisely without knowing how you actually debug. I assume your embedded system runs gdbstub. There are various possibilities how this can be supported:
Use dedicated BKPT instruction
This could be a standard way on your system and debugger to support software breakpoints
Feed invalid instruction to CPU
gdbstub could have placed own UNDEF ARM mode handler placed. If you go this route you must be aware of current CPU mode (ARM or THUMB), because instruction size will be different. Examples of undefined instructions:
ARM: 0xE7F123F4
THUMB: 0xDE56
In runtime CPU mode could be found from the lowest bit of PC register. But the easier way is to know how you compiled object file, where you placed software breakpoint
Use SWI instruction
We did so when used RealView ICE. Most likely this does not apply to you, if you run some OS on your embedded system. SWI is usually used by OS to implement system calls
Normally, the best way to do this is via a library function supplied with your device's toolchain.
I can't test it out, but a general solution might be to insert an ARM BKPT instruction. It takes an immediate argument, which your debugger might interpret, with potentially odd results.
You could run your application from GDB, and in the code call e.g. abort. This will stop your application at that point. I'm not sure if it's possible to continue after that though.

Resources