Compiler flags change code behavior (O2, Ox) - c

The following code works as expected with flags Od, O1 but fails with O2, Ox. Any ideas why?
edit: by "fails" I mean that the function does nothing, and seems to just return.
void thread_sleep()
{
listIterator nextThread = getNextThread();
void * pStack = 0;
struct ProcessControlBlock * currPcb = pPCBs->getData(currentThread);
struct ProcessControlBlock * nextPcb = pPCBs->getData(nextThread);
if(currentThread == nextThread)
{
return;
}
else
{
currentThread = nextThread;
__asm pushad // push general purpose registers
__asm pushfd // push control registers
__asm mov pStack, esp // store stack pointer in temporary
currPcb->pStack = pStack; // store current stack pointer in pcb
pStack = nextPcb->pStack; // grab new stack pointer from pcb
if(nextPcb->state == RUNNING_STATE)// only pop if function was running before
{
__asm mov esp, pStack // restore new stack pointer
__asm popfd
__asm popad;
}
else
{
__asm mov esp, pStack // restore new stack pointer
startThread(currentThread);
}
}
}
// After implementing suggestions: (still does not work)
listIterator nextThread = getNextThread();
struct ProcessControlBlock * currPcb = pPCBs->getData(currentThread);
struct ProcessControlBlock * nextPcb = pPCBs->getData(nextThread);
void * pStack = 0;
void * pNewStack = nextPcb->pStack; // grab new stack pointer from pcb
pgVoid2 = nextPcb->pStack;
if(currentThread == nextThread)
{
return;
}
else
{
lastThread = currentThread; // global var
currentThread = nextThread;
if(nextPcb->state == RUNNING_STATE)// only pop if function was running before
{
__asm pushad // push general purpose registers
__asm pushfd // push control registers
__asm mov pgVoid1, esp // store stack pointer in temporary
__asm mov esp, pgVoid2 // restore new stack pointer
__asm popfd
__asm popad;
{
struct ProcessControlBlock * pcb = pPCBs->getData(lastThread);
pcb->pStack = pgVoid1; // store old stack pointer in pcb
}
}
else
{
__asm pushad // push general purpose registers
__asm pushfd // push control registers
__asm mov pgVoid1, esp // store stack pointer in temporary
__asm mov esp, pgVoid2 // restore new stack pointer
{
struct ProcessControlBlock * pcb = pPCBs->getData(lastThread);
pcb->pStack = pgVoid1; // store old stack pointer in pcb
}
startThread(currentThread);
}
}

It is likely because your compiler is not using a specific frame pointer register on the higher optimisation levels, which frees up an additional general-purpose register.
This means that the compiler accesses the local variable pStack using an offset from the stack pointer. It cannot do this correctly after the stack pointer has been adjusted by the pushad and pushfd - it is not expecting the stack pointer to change.
To get around this, you shouldn't put any C code after those asm statements, until the stack pointer has been correctly restored: everything from the first pushad to the popad or startThread() should be in assembler. This way, you can load the address of the local variables and ensure that the accesses are done correctly.

As you use inline assembler, you'd probably want to see how's (or whether is) the code really modified when it's compiled with various -Ox options. Try this on your binary:
objdump -s your_program
It gives a heap of code, but finding the corresponding code section shouldn't be that hard (search for your assembly or for function names).
By the way, I was taught that heavy optimization doesn't do very well with inline assembly, so I tend to separate assembler routines to .S files because of this.

Related

Segmentation fault when calling printf in C after setting the stack

I'am doing an exercice for an Operational Systems class and getting an SegFault error when calling printf with arguments.
The objective of the exercice is to simulate the initialization of a thread and print a counter, not very difficult. I have a table of 4 entries each with size 4096 bytes, each entry must represent the thread's stack represented as
#define STACK_SIZE 4096
char table[4][STACK_SIZE];
I defined a type called coroutine that will get only a stack address
typedef void* coroutine_t;
The i have a initialization code. This code must take the end of the routine stack, append the address of the coroutine and the initialization of the registers and return the pointer that will be the stack pointer for the coroutine.
coroutine_t init_coroutine(void *stack_begin, unsigned int stack_size,
void (*initial_pc)(void)) {
char *stack_end = ((char *)stack_begin) + stack_size;
void **ptr = (void**) stack_end;
ptr--;
*ptr = initial_pc;
ptr--;
*ptr = stack_end; /* Frame pointer */
ptr--;
*ptr = 0; /* RBX*/
ptr--;
*ptr = 0; /* R12 */
ptr--;
*ptr = 0; /* R13 */
ptr--;
*ptr = 0; /* R14 */
ptr--;
*ptr = 0; /* R15 */
return ptr;
}
Then i have this code in x86 assembly to enter the coroutine that just pop the register previously pushed
.global enter_coroutine /* Makes enter_coroutine visible to the linker*/
enter_coroutine:
mov %rdi,%rsp /* RDI contains the argument to enter_coroutine. */
/* And is copied to RSP. */
pop %r15
pop %r14
pop %r13
pop %r12
pop %rbx
pop %rbp
ret /* Pop the program counter */
The rest of my code is this
coroutine_t cr;
void test_function() {
int counter = 0;
while(1) {
printf("counter1: %d\n", counter);
counter++;
}
}
int main() {
cr = init_coroutine(table[0], STACK_SIZE, &test_function);
enter_coroutine(cr);
return 0;
}
So for the error
If i run as it is i will get a segfault when the program call printf the output from gdb is
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7dfcfdd in __vfprintf_internal (s=0x7ffff7f9d760 <_IO_2_1_stdout_>, format=0x555555556004 "counter1: %d\n", ap=ap#entry=0x555555558f48 <table+3848>, mode_flags=mode_flags#entry=0) at vfprintf-internal.c:1385
I assume it has some thing happening with the stack for two causes:
If i just print a string without parameters i get no error
If i remove the first ptr-- statement from the init_coroutine function it will also work, but will alocate things in the end of the stack and hence in the other thread's stack
I'am running this in a Intel(R) Core(TM) i5-5200U CPU with ubuntu 21.10 and ggc version 11.2.0
Could you give me some light here ?
I wasn't able to reproduce the problem on my x86_64 Linux box, but I was on compiler explorer, and the problem seems to be simple stack overflow (i.e., 4096 is too small a stack for printf).
Increasing the stack size (or choosing table[1], table[2], or table[3] instead table[0], which is effectively the same as increasing stack size) appears to make it work: https://gcc.godbolt.org/z/rnfMThbjo

How do I get info from the stack, using inline assembly, to program in c?

I have a task to do and I'm asking for some help. (on simple c lang')
What I need to do?
I need to check every command on the main c program (using interrupt num 1) and printing a message only if the next command is the same procedure that was sent earlier to the stack, by some other procedure.
What I want to do?
I want to take info from the stack, using inline assembley, and put it on a variable that can be compare on c program itself after returnning to c. (volatile)
This is the program:
#include <stdio.h>
#include <dos.h>
#include <conio.h>
#include <stdlib.h>
typedef void (*FUN_PTR)(void);
void interrupt (*Int1Save) (void); //pointer to interrupt num 1//
volatile FUN_PTR our_func;
char *str2;
void interrupt my_inter (void) //New interrupt//
{volatile FUN_PTR next_command;
asm { PUSH BP
MOV BP,SP
PUSH AX
PUSH BX
PUSH ES
MOV ES,[BP+4]
MOV BX,[BP+2]
MOV AX,ES:[BX]
MOV word ptr next_command,AX
POP ES
POP BX
POP AX
pop BP}
if (our_func==next_command) printf("procedure %s has been called\n",str2);}
void animate(int *iptr,char str[],void (*funptr)(), char fstr[])
{
str2=fstr;
our_func=funptr;
Int1Save = getvect(1); // save old interrupt//
setvect(1,my_inter);
asm { pushf //TF is ON//
pop ax
or ax,100000000B
push ax
popf}}
void unanimate()
{asm { pushf //TF is OFF//
pop ax
and ax,1111111011111111B
push ax
popf}
setvect (1,Int1Save); //restore old interrupt//}
void main(void)
{int i;
int f1 = 1;
int f2 = 1;
int fibo = 1;
animate(&fibo, "fibo", sleep, "sleep");
for(i=0; i < 8; i++)
{
sleep(2);
f1 = f2;
f2 = fibo;
fibo = f1 + f2;} // for//
unanimate();} // main//
My question...
Off course the problem is at "my inter" on the inline assembly. but can't figure it out.
What am I doing wrong? (please take a look at the code above)
I wanted to save the address of the pointer for the specific procedure (sleep) in the volatile our_func. then take the info (address to each next command) from the stack to volatile next_command and then finaly returnning to c and make the compare each time. If the same value (address) is on both variables then to print a specific message.
Hope I'm clear..
10x,
Nir B
Answered as a comment by the OP
I got the answer I wanted:
asm { MOV SI,[BP+18] //Taking the address of each command//
MOV DI,[BP+20]
MOV word ptr next_command+2,DI
MOV word ptr next_command,SI}
if ((*our_func)==(*next_command)) //Making the next_command compare//
printf("procedure %s has been called\n",str2);

c generate function and call it

#include <stdio.h>
#define uint unsigned int
#define AddressOfLabel(sectionname,out) __asm{mov [out],offset sectionname};
void* CreateFunction(void* start,void *end) {
uint __start=(uint)start,__end=(uint)end-1
,size,__func_runtime;
void* func_runtime=malloc(size=(((__end)-(__start)))+1);
__func_runtime=(uint)func_runtime;
memcpy((void*)(__func_runtime),start,size);
((char*)func_runtime)[size]=0xC3; //ret
return func_runtime;
}
void CallRuntimeFunction(void* address) {
__asm {
call address
}
}
main() {
void* _start,*_end;
AddressOfLabel(__start,_start);
AddressOfLabel(__end,_end);
void* func = CreateFunction(_start,_end);
CallRuntimeFunction(func); //I expected this method to print "Test"
//but this method raised exception
return 0;
__start:
printf("Test");
__end:
}
CreateFunction - takes two points in memory (function scope), allocate, copy it to the allocated memory and returns it (The void* used like a function to call with Assembly)
CallRuntimeFunction - runs the functions that returns from CreateFunction
#define AddressOfLabel(sectionname,out) - Outs the address of label (sectionname) to variable (out)
When I debugged this code and stepped in the call of CallRuntimeFunction and go to disassembly ,
I saw alot of ??? instead of assembly code of between __start and __end labels.
I tried to copy machine code between two labels and then run it. But I don't have any idea why I can't call function that allocated with malloc.
Edit:
I changed some code and done part of the work.
Runtime Function's memory allocate:
void* func_runtime=VirtualAlloc(0, size=(((__end)-(__start)))+1, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
Copy from function scope:
CopyMemory((void*)(__func_runtime),start,size-1);
But when I ran this program I can that:
mov esi,esp
push 0E4FD14h
call dword ptr ds:[0E55598h] ; <--- printf ,after that I don't know what is it
add esp,4
cmp esi,esp
call 000B9DBB ; <--- here
mov dword ptr [ebp-198h],0
lea ecx,[ebp-34h]
call 000B9C17
mov eax,dword ptr [ebp-198h]
jmp 000D01CB
ret
At here it enters to another function and weird stuff.
void CallRuntimeFunction(void* address) {
__asm {
call address
}
}
here address is a "pointer" to a parameter of this function which is also a pointer.
pointer to a pointer
use:
void CallRuntimeFunction(void* address) {
_asm {
mov ecx,[address] //we get address of "func"
mov ecx,[ecx] //we get "func"
call [ecx] //we jump func(ecx is an address. yes)
}
}
you wanna call func which is a pointer. when passed in your CallRunt... function, this generates a new pointer to point to that pointer. Pointer of second degree.
void* func = CreateFunction(_start,_end);
yes func is a pointer
Important: check your compilers "calling convention" options. Try the decl one
Be sure to invalidate the caches (both instruction and data) between the function code generation and its calling. See self-modifying code for further info.

C - Inline asm patching at runtime

I am writing a program in C and i use inline asm. In the inline assembler code is have some addresses where i want to patch them at runtime.
A quick sample of the code is this:
void __declspec(naked) inline(void)
{
mov eax, 0xAABBCCDD
call 0xAABBCCDD
}
An say i want to modify the 0xAABBCCDD value from the main C program.
What i tried to do is to Call VirtualProtect an is the pointer of the function in order to make it Writeable, and then call memcpy to add the appropriate values to the code.
DWORD old;
VirtualProtect(inline, len, PAGE_EXECUTE_READWRITE, &old);
However VirtualProtect fails and GetLastError() returns 487 which means accessing invalid address. Anyone have a clue about this problem??
Thanks
Doesn't this work?
int X = 0xAABBCCDD;
void __declspec(naked) inline(void)
{
mov eax, [X]
call [X]
}
How to do it to another process at runtime,
Create a variable that holds the program base address
Get the target RVA (Relative Virtual Address)
Then calculate the real address like this PA=RVA + BASE
then call it from your inline assembly
You can get the base address like this
DWORD dwGetModuleBaseAddress(DWORD dwProcessID)
{
TCHAR zFileName[MAX_PATH];
ZeroMemory(zFileName, MAX_PATH);
HANDLE hProcess = OpenProcess(PROCESS_ALL_ACCESS, true, dwProcessID);
HANDLE hSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPMODULE, dwProcessID);
DWORD dwModuleBaseAddress = 0;
if (hSnapshot != INVALID_HANDLE_VALUE)
{
MODULEENTRY32 ModuleEntry32 = { 0 };
ModuleEntry32.dwSize = sizeof(MODULEENTRY32);
if (Module32First(hSnapshot, &ModuleEntry32))
{
do
{
if (wcscmp(ModuleEntry32.szModule, L"example.exe") == 0)
{
dwModuleBaseAddress = (DWORD_PTR)ModuleEntry32.modBaseAddr;
break;
}
} while (Module32Next(hSnapshot, &ModuleEntry32));
}
CloseHandle(hSnapshot);
CloseHandle(hProcess);
}
return dwModuleBaseAddress;
}
Assuming you have a local variable and your base address
mov dword ptr ss : [ebp - 0x14] , eax;
mov eax, dword ptr BaseAddress;
add eax, PA;
call eax;
mov eax, dword ptr ss : [ebp - 0x14] ;
You have to restore the value of your Register after the call returns, since this value may be used somewhere down the code execution, assuming you're trying to patch an existing application that may depend on the eax register after your call. Although this method has it disadvantages, but at least it will give anyone idea on what to do.

Multithreading with inline assembly and access to a c variable

I'm using inline assembly to construct a set of passwords, which I will use to brute force against a given hash. I used this website as a reference for the construction of the passwords.
This is working flawlessly in a singlethreaded environment. It produces an infinite amount of incrementing passwords.
As I have only basic knowledge of asm, I understand the idea. The gcc uses ATT, so I compile with -masm=intel
During the attempt to multithread the program, I realize that this approach might not work.
The following code uses 2 global C variables, and I assume that this might be the problem.
__asm__("pushad\n\t"
"mov edi, offset plaintext\n\t" <---- global variable
"mov ebx, offset charsetTable\n\t" <---- again
"L1: movzx eax, byte ptr [edi]\n\t"
" movzx eax, byte ptr [charsetTable+eax]\n\t"
" cmp al, 0\n\t"
" je L2\n\t"
" mov [edi],al\n\t"
" jmp L3\n\t"
"L2: xlat\n\t"
" mov [edi],al\n\t"
" inc edi\n\t"
" jmp L1\n\t"
"L3: popad\n\t");
It produces a non deterministic result in the plaintext variable.
How can i create a workaround, that every thread accesses his own plaintext variable? (If this is the problem...).
I tried modifying this code, to use extended assembly, but I failed every time. Probably due to the fact that all tutorials use ATT syntax.
I would really appreciate any help, as I'm stuck for several hours now :(
Edit: Running the program with 2 threads, and printing the content of plaintext right after the asm instruction, produces:
b
b
d
d
f
f
...
Edit2:
pthread_create(&thread[i], NULL, crack, (void *) &args[i]))
[...]
void *crack(void *arg) {
struct threadArgs *param = arg;
struct crypt_data crypt; // storage for reentrant version of crypt(3)
char *tmpHash = NULL;
size_t len = strlen(param->methodAndSalt);
size_t cipherlen = strlen(param->cipher);
crypt.initialized = 0;
for(int i = 0; i <= LIMIT; i++) {
// intel syntax
__asm__ ("pushad\n\t"
//mov edi, offset %0\n\t"
"mov edi, offset plaintext\n\t"
"mov ebx, offset charsetTable\n\t"
"L1: movzx eax, byte ptr [edi]\n\t"
" movzx eax, byte ptr [charsetTable+eax]\n\t"
" cmp al, 0\n\t"
" je L2\n\t"
" mov [edi],al\n\t"
" jmp L3\n\t"
"L2: xlat\n\t"
" mov [edi],al\n\t"
" inc edi\n\t"
" jmp L1\n\t"
"L3: popad\n\t");
tmpHash = crypt_r(plaintext, param->methodAndSalt, &crypt);
if(0 == memcmp(tmpHash+len, param->cipher, cipherlen)) {
printf("success: %s\n", plaintext);
break;
}
}
return 0;
}
Since you're already using pthreads, another option is making the variables that are modified by several threads into per-thread variables (threadspecific data). See pthread_getspecific OpenGroup manpage. The way this works is like:
In the main thread (before you create other threads), do:
static pthread_key_y tsd_key;
(void)pthread_key_create(&tsd_key); /* unlikely to fail; handle if you want */
and then within each thread, where you use the plaintext / charsetTable variables (or more such), do:
struct { char *plainText, char *charsetTable } *str =
pthread_getspecific(tsd_key);
if (str == NULL) {
str = malloc(2 * sizeof(char *));
str.plainText = malloc(size_of_plaintext);
str.charsetTable = malloc(size_of_charsetTable);
initialize(str.plainText); /* put the data for this thread in */
initialize(str.charsetTable); /* ditto */
pthread_setspecific(tsd_key, str);
}
char *plaintext = str.plainText;
char *charsetTable = str.charsetTable;
Or create / use several keys, one per such variable; in that case, you don't get the str container / double indirection / additional malloc.
Intel assembly syntax with gcc inline asm is, hm, not great; in particular, specifying input/output operands is not easy. I think to get that to use the pthread_getspecific mechanism, you'd change your code to do:
__asm__("pushad\n\t"
"push tsd_key\n\t" <---- threadspecific data key (arg to call)
"call pthread_getspecific\n\t" <---- gets "str" as per above
"add esp, 4\n\t" <---- get rid of the func argument
"mov edi, [eax]\n\t" <---- first ptr == "plainText"
"mov ebx, [eax + 4]\n\t" <---- 2nd ptr == "charsetTable"
...
That way, it becomes lock-free, at the expense of using more memory (one plaintext / charsetTable per thread), and the expense of an additional function call (to pthread_getspecific()). Also, if you do the above, make sure you free() each thread's specific data via pthread_atexit(), or else you'll leak.
If your function is fast to execute, then a lock is a much simpler solution because you don't need all the setup / cleanup overhead of threadspecific data; if the function is either slow or very frequently called, the lock would become a bottleneck though - in that case the memory / access overhead for TSD is justified. Your mileage may vary.
Protect this function with mutex outside of inline Assembly block.

Resources