Save CPU registers to variables in GCC - c

I want get the values in EAX/EBX/ESP/EIP etc. and save them in C variables. For example:
int cEax;
asm("mov cEax,%eax"); ...

You can use this
register int eax asm("eax");
register int eax asm("ebx");
register int eax asm("esp");
//...
int cEax = eax;
int cEbx = ebx;
int cEsp = esp;
//...
You can also work with those registers in an expression just as any other variables or just use that register's value directly without assigning to another variable.
It's more tricky to get eip without inline assembly but in gcc you can get it with __builtin_return_address or the label as values extension.
void* getEIP()
{
return __builtin_return_address(0);
}
void *currentInstruction = getEIP();
currentAddr: void *nextInstruction = &&currentAddr;
If you want inline assembly you can use the way in this page

Related

C99: compiler optimizations when accessing global variables and aliased memory pointers

I'm writing C code for an embedded system. In this system, there are memory mapped registers at some fixed address in the memory map, and of course some RAM where my data segment / heap is.
I'm finding problems generating optimal code when my code is intermixing accesses to global variables in the data segment and accesses to hardware registers. This is a simplified snippet:
#include <stdint.h>
uint32_t * const restrict HWREGS = 0x20000;
struct {
uint32_t a, b;
} Context;
void example(void) {
Context.a = 123;
HWREGS[0x1234] = 5;
Context.b = Context.a;
}
This is the code generated on x86 (see also on godbolt):
example:
mov DWORD PTR Context[rip], 123
mov DWORD PTR ds:149712, 5
mov eax, DWORD PTR Context[rip]
mov DWORD PTR Context[rip+4], eax
ret
As you can see, after having written the hardware register, Context.a is reloaded from RAM before being stored into Context.b. This doesn't make sense because Context is at a different memory address than HWREGS. In other words, the memory pointed by HWREGS and the memory pointed by &Context do not alias, but it looks like there is not way to tell that to the compiler.
If I change HWREGS definition as this:
extern uint32_t * const restrict HWREGS;
that is, I hide the fixed memory address to the compiler, I get this:
example:
mov rax, QWORD PTR HWREGS[rip]
mov DWORD PTR [rax+18640], 5
movabs rax, 528280977531
mov QWORD PTR Context[rip], rax
ret
Context:
.zero 8
Now the two writes to Context are optimized (even coalesced to a single write), but on the other hand the access to the hardware register does not happen anymore with a direct memory access but it goes through a pointer indirection.
Is there a way to obtain optimal code here? I would like GCC to know that HWREGS is at a fixed memory address and at the same time to tell it that it does not alias Context.
If you want to avoid compilers reloading regularly values from a memory region (possibly due to aliasing), then the best is not to use global variables, or at least not to use direct accesses to global variables. The register keyword seems ignored for global variables (especially here on HWREGS) for both GCC and Clang. Using the restrict keyword on function parameters solves this problem:
#include <stdint.h>
uint32_t * const HWREGS = 0x20000;
struct Context {
uint32_t a, b;
} context;
static inline void exampleWithLocals(uint32_t* restrict localRegs, struct Context* restrict localContext) {
localContext->a = 123;
localRegs[0x1234] = 5;
localContext->b = localContext->a;
}
void example() {
exampleWithLocals(HWREGS, &context);
}
Here is the result (see also on godbolt):
example:
movabs rax, 528280977531
mov DWORD PTR ds:149712, 5
mov QWORD PTR context[rip], rax
ret
context:
.zero 8
Please note that the strict aliasing rule do not help in this case since the type of read/written variables/fields is always uint32_t.
Besides this, based on its name, the variable HWREGS looks like a hardware register. Please note that it should be put volatile so that compiler do not keep it to registers nor perform any similar optimization (like assuming the pointed value is left unchanged if the code do not change it).

Trying to read register values of a process from task_struct

Currently I'm able to find the register values for the program which was written, but not for other processes.
What I have written so far is is:
#include <linux/sched.h>
struct task_struct *task_list;
for_each_process(task_list){
register int* pc asm("%pc");
register int mar asm("%mar");
register int mdr asm("%mdr");
register int cir asm("%cir");
register int acc asm("%acc");
register int ir asm("%ir");
register int eax asm("%eax");
register int ebx asm("%ebx");
register int ecx asm("%ecx");
register int edx asm("%edx");
register int ip asm("%ip");
register int esp asm("%esp");
register int ebp asm("%ebp");
register int esi asm("%esi");
register int edi asm("%edi");
register int of asm("%of");
register int df asm("%df");
register int _if asm("%if");
register int tf asm("%tf");
register int sf asm("%sf");
register int zf asm("%zf");
register int af asm("%af");
register int pf asm("%pf");
register int cf asm("%cf");
}
I realize I need to use task_list and point to an element within the struct here, but I cannot seem to locate which element contains the registers.
You can access the registers from a task_struct using the macro task_pt_regs(). It yields a pointer to a struct pt_regs (definition) which is the saved copy of all the thread's registers from when it entered the kernel.
For example:
struct task_struct *t = /* find the one you want */ ;
unsigned long tasks_eax = task_pt_regs(t)->ax;
Note despite the name, the ax member is the full 32-bit eax register (on x86-32) or 64-bit rax register (on x86-64).
See also:
Get userspace RBP register from kernel syscall
Where is eax in the pt_regs struct? Only ax is present

How to save the context execution of a newly created user thread, Linux 64 to structure in C?

I am trying to implement a new user thread management library similar to the original pthread but only in C. Only the context switch should be assembler.
Looks like I am missing something fundamentally.
I have the following structure for the context execution:
enter code here
struct exec_ctx {
uint64_t rbp;
uint64_t r15;
uint64_t r14;
uint64_t r13;
uint64_t r12;
uint64_t r11;
uint64_t r10;
uint64_t r9;
uint64_t r8;
uint64_t rsi;
uint64_t rdi;
uint64_t rdx;
uint64_t rcx;
uint64_t rbx;
uint64_t rip;
}__attribute__((packed));
I create new thread structure and I should put the registers into the mentioned variables, part of the context execution structure. How may I do it on C? Everywhere only talks about setcontext, getcontext, but this is not the case here.
Also, the only hint I received is I need to have some kind of dump stack function into the create function.... not sure how to do it. Please advise where can I read further/how to do it.
Thanks in advance!
I started with:
char *stack;
stack = malloc(StackSize);
if (!stack)
return -1;
*(uint64_t *)&stack[StackSize - 8] = (uint64_t)stop;
*(uint64_t *)&stack[StackSize - 16] = (uint64_t)f;
pet_thread->ctx.rip = (uint64_t)&stack[StackSize - 16];
pet_thread->thread_state = Ready;
This is how I put a pointer to the thread function on the top of the stack in order to call the thread more easily.
First of all, you do not need to save all the registers. Since your context switch is implemented as a function, any register that the ABI defines as "caller saved" or "clobbered" you can safely leave out. The code generated by the C compiler will assume it might change.
Since this is a school assignment I will not give you the code to do this. I will give you the outline.
Your function needs to both save the registers to the struct for the outgoing micro-thread and load the register for the incoming micro-thread. The reason is that you have logically always have one register set "in effect". So your function needs two arguments, the struct for the outgoing micro-thread and the one for the incoming.
Those two arguments are stored in two registers. Those two you do not need to save. So your code should have the following structure (assuming your structure, which, as I said, is too complete):
# save context
mov [rdi], rbp
add 8, rdi
...
#load context
mov rbp, [rsi]
add 8, rsi
...
If you place that in a separate .S file, you'll make sure that the C compiler will not add anything or optimize anything.
This is not the cleanest or most efficient solution, but it is the simplest.

getvect function is undefined

I am trying to run this program. It uses the interrupts and when we press w, it replace it by s in keyboard buffer
#include "stdafx.h"
#include "stdio.h"
#include "bios.h"
#include <dos.h>
void interrupt_oldint15(*oldint15);
void interrupt_newint15(unsigned int BP, unsigned int DI, unsigned int SI, unsigned int DS,
unsigned int ES, unsigned int DX, unsigned int CX, unsigned int BX,
unsigned int AX, unsigned int IP, unsigned int CS, unsigned int flags);
void main ( )
{
oldint15 = getvect (0x15);
setvect (0x15, newint15);
keep (0, 1000);
}
void interrupt_newint15 (unsigned int BP, unsigned int DI, unsigned int SI, unsigned int DS, unsigned int ES, unsigned int DX, unsigned int CX, unsigned int BX, unsigned int AX, unsigned int IP, unsigned int CS, unsigned int flags )
{
if(*((char*)&AX)==0x11)
*((char*)&AX)=0x1F;
else if(*((char*)&AX)==0x1F)
*((char*)&AX)=0x11;
}
but it gives the error in getvect and setvect functions.
for one thing, interrupt functions, in C, do not have parameters, nor a returned value.
Listing all the registers is a waste of space (besides which it should not compile) as the entry into a interrupt event causes a saving of all the key registers (typically on the stack of the currently running process)
All the key registers (like the PC and Status registers) are restored upon exit from the interrupt.
The compiler will cause any general purpose registers changed in the interrupt function to be saved/restored.) If you are working at such a low level, then you should know exactly where the interrupt vectors are located, you should have a code segment that overlays the interrupt vectors and another code segment that mirrors the interrupt vectors.
Then, you copy the current set of interrupt vectors to the mirror and then replace the desired individual vector with a pointer to the interrupt function you wrote. At the end of your code, you need to copy the vector back into the original vector area.
It could be that functions you are having trouble with do those operations for you.
this, from a very old post, from a c.comp news group may be helpful:
You are mixing languages. In Borland C an "interrupt" has type
void (interrupt*)()
while in Borland C++ it has type
void (interrupt*)(...)
This affects the parameter type of setvect and the return type of
getvect which change in the same way according to the language used.
You obviously compiled your program as a C++ program and not as a C
program because according to the compiler's message 'oldvec' is
declared as a "C interrupt" and getvect returns a "C++ interrupt".
the format for using get/set vect() in C, which look nothing like your C++ example is:
tick_isr_old = getvect(0x08);
setvect(0x08, (void interrupt (*)(void)) tick_isr);
As mentioned already, the functions getvect() and setvect() are only available with Borland/Turbo C++. The functions _dos_getvect() and _dos_setvect() are almost identical and offer better portability across compilers (Borland/Turbo C++, MS Visual C++ 1.x, Open Watcom C++). They should be defined in <dos.h>.
Here is an example of their use (prints an '#' every second):
/*** Includes ***/
#include <stdint.h> // int*_t, uint*_t
#include <stdbool.h> // bool, true, false
#include <dos.h> // _chain_intr(), _dos_getvect() , _dos_setvect()
#include <stdio.h> // putchar()
/*** Definitions ***/
#define TICKS_PER_SEC 18ul
#define VECT_TIMER 0x1C
/*** Global Variables ***/
bool timer_hooked = false;
bool timer_1sec_elapsed = false;
uint32_t ticks = 0;
void (interrupt far * OrigTimerH)( ); // vector to original 0x1C handler
/*** Functions ***/
static void interrupt far TimerH( void ) {
ticks++;
if ( ticks % TICKS_PER_SEC == 0 ) {
timer_1sec_elapsed = true;
}
_chain_intr( OrigTimerH ); // handler callback
}
void TimerStart( void ) {
__asm { cli } // critical section; halt interrupts
OrigTimerH = _dos_getvect( VECT_TIMER ); // save original vector
_dos_setvect( VECT_TIMER, TimerH ); // put our handler in the vector
timer_hooked = true; // remember that we're hooked if we wanted to unhook
__asm { sti } // resume interrupts
}
int main( void ) {
TimerStart();
while ( true ) {
if ( timer_1sec_elapsed ) {
timer_1sec_elapsed = false;
putchar('#');
}
}
}

Compiler flags change code behavior (O2, Ox)

The following code works as expected with flags Od, O1 but fails with O2, Ox. Any ideas why?
edit: by "fails" I mean that the function does nothing, and seems to just return.
void thread_sleep()
{
listIterator nextThread = getNextThread();
void * pStack = 0;
struct ProcessControlBlock * currPcb = pPCBs->getData(currentThread);
struct ProcessControlBlock * nextPcb = pPCBs->getData(nextThread);
if(currentThread == nextThread)
{
return;
}
else
{
currentThread = nextThread;
__asm pushad // push general purpose registers
__asm pushfd // push control registers
__asm mov pStack, esp // store stack pointer in temporary
currPcb->pStack = pStack; // store current stack pointer in pcb
pStack = nextPcb->pStack; // grab new stack pointer from pcb
if(nextPcb->state == RUNNING_STATE)// only pop if function was running before
{
__asm mov esp, pStack // restore new stack pointer
__asm popfd
__asm popad;
}
else
{
__asm mov esp, pStack // restore new stack pointer
startThread(currentThread);
}
}
}
// After implementing suggestions: (still does not work)
listIterator nextThread = getNextThread();
struct ProcessControlBlock * currPcb = pPCBs->getData(currentThread);
struct ProcessControlBlock * nextPcb = pPCBs->getData(nextThread);
void * pStack = 0;
void * pNewStack = nextPcb->pStack; // grab new stack pointer from pcb
pgVoid2 = nextPcb->pStack;
if(currentThread == nextThread)
{
return;
}
else
{
lastThread = currentThread; // global var
currentThread = nextThread;
if(nextPcb->state == RUNNING_STATE)// only pop if function was running before
{
__asm pushad // push general purpose registers
__asm pushfd // push control registers
__asm mov pgVoid1, esp // store stack pointer in temporary
__asm mov esp, pgVoid2 // restore new stack pointer
__asm popfd
__asm popad;
{
struct ProcessControlBlock * pcb = pPCBs->getData(lastThread);
pcb->pStack = pgVoid1; // store old stack pointer in pcb
}
}
else
{
__asm pushad // push general purpose registers
__asm pushfd // push control registers
__asm mov pgVoid1, esp // store stack pointer in temporary
__asm mov esp, pgVoid2 // restore new stack pointer
{
struct ProcessControlBlock * pcb = pPCBs->getData(lastThread);
pcb->pStack = pgVoid1; // store old stack pointer in pcb
}
startThread(currentThread);
}
}
It is likely because your compiler is not using a specific frame pointer register on the higher optimisation levels, which frees up an additional general-purpose register.
This means that the compiler accesses the local variable pStack using an offset from the stack pointer. It cannot do this correctly after the stack pointer has been adjusted by the pushad and pushfd - it is not expecting the stack pointer to change.
To get around this, you shouldn't put any C code after those asm statements, until the stack pointer has been correctly restored: everything from the first pushad to the popad or startThread() should be in assembler. This way, you can load the address of the local variables and ensure that the accesses are done correctly.
As you use inline assembler, you'd probably want to see how's (or whether is) the code really modified when it's compiled with various -Ox options. Try this on your binary:
objdump -s your_program
It gives a heap of code, but finding the corresponding code section shouldn't be that hard (search for your assembly or for function names).
By the way, I was taught that heavy optimization doesn't do very well with inline assembly, so I tend to separate assembler routines to .S files because of this.

Resources