I'm trying to compile and run following program without main() function in C. I have compiled my program using the following command.
gcc -nostartfiles nomain.c
And compiler gives warning
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400340
Ok, No problem. then, I have run executable file(a.out), both printf statements print successfully, and then get segmentation fault.
So, my question is, Why segmentation fault after successfully execute print statements?
my code:
#include <stdio.h>
void nomain()
{
printf("Hello World...\n");
printf("Successfully run without main...\n");
}
output:
Hello World...
Successfully run without main...
Segmentation fault (core dumped)
Note:
Here, -nostartfiles gcc flag prevents the compiler from using standard startup files when linking
Let's have a look at the generated assembly of your program:
.LC0:
.string "Hello World..."
.LC1:
.string "Successfully run without main..."
nomain:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC0
call puts
mov edi, OFFSET FLAT:.LC1
call puts
nop
pop rbp
ret
Note the ret statement. Your program's entry point is determined to be nomain, all is fine with that. But once the function returns, it attempts to jump into an address on the call stack... that isn't populated. That's an illegal access and a segmentation fault follows.
A quick solution would be to call exit() at the end of your program (and assuming C11 we might as well mark the function as _Noreturn):
#include <stdio.h>
#include <stdlib.h>
_Noreturn void nomain(void)
{
printf("Hello World...\n");
printf("Successfully run without main...\n");
exit(0);
}
In fact, now your function behaves pretty much like a regular main function, since after returning from main, the exit function is called with main's return value.
In C, when functions/subroutines are called the stack is populated as (in the order):
The arguments,
Return address,
Local variables, --> top of the stack
main() being the start point, ELF structures the program in such a way that whatever instructions comes first would get pushed first, in this case printfs are.
Now, program is sort of truncated without return-address OR __end__ and infact it assumes that whatever is there on the stack at that(__end__) location is the return-address, but unfortunately its not and hence it crashes.
Related
I have an 64-bits ELF binary. I don't have its source code, don't know with which parameters it was compiled, and am not allowed to provide it here. The only relevant information I have is that the source is a .c file (so no hand-crafted assembly), compiled through a Makefile.
While reversing this binary using IDA, I stumbled upon an extremely weird construction I have never seen before and absolutely cannot explain. Here is the raw decompilation of one function with IDA syntax:
mov rax, [rsp+var_20]
xor rax, fs:28h
jnz location
add rsp, 28h
pop rbx
pop rbp
retn
location:
call __stack_chk_fail
nop dword ptr [rax]
db 2Eh
nop word ptr [rax+rax+00000000h]
...then dozens of instructions of normal and functional code
Here, we have a simple canary check, where we return if it is valid, and call __stack_chk_fail otherwise. Everything is perfectly normal. But after this check, there is still assembly, and of fully-functional code.
Looking at the manual of __stack_chk_fail, I made sure that this function does exit the program, and that there is no edge case where it could continue:
Description
The interface __stack_chk_fail() shall abort the function that called it with a message that a stack overflow has been detected. The program that called the function shall then exit.
I also tried to write this small C program, to search for a method to reproduce this:
#include <stdio.h>
#include <stdlib.h>
int foo()
{
int a = 3;
printf("%d\n", a);
return 0;
int b = 7;
printf("%d\n", b);
}
int main()
{
foo();
return 0;
}
But the code after the return is simply omitted by gcc.
It does not appear either that my binary is vulnerable to a buffer overflow that I could exploit to control rip and jump to the code after the canary check. I also inspected every call and jumps using objdump, and this code seems to never be called.
Could someone explain what is going on? How was this code generated in the first place? Is it a joke from the author of the binary?
I suspect you are looking at padding, followed by an unrelated function that IDA does not have a name for.
To test this hypothesis, I need the following additional information:
The address of the byte immediately after call __stack_chk_fail.
The next higher address that is the target of a call or jump instruction.
A raw hex dump of the bytes in between those two addresses.
The disassembly of four or five instructions starting at the next higher address that is the target of a call or jump instruction.
I am trying to write my own _start function using inline assembly. But when I try to read argc and argv from stack (%rsp and %rsp + 8) I get wrong values. I don't know what I am doing wrong.
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <syscall.h>
int main(int argc, char *argv[]) {
printf("%d\n", argc);
printf("%s\n", argv[0]);
printf("got here\n");
return 0;
}
void _start() {
__asm__(
"xor %rbp, %rbp;"
"movl (%rsp), %edi;"
"lea 8(%rsp), %rsi;"
"xor %rax, %rax;"
"call main"
...
Terminal:
$ gcc test.c -nostartfiles
$ ./a.out one two three
0
Segmentation fault (core dumped)
$
Any idea where my fault could be ?
I am using a Ubuntu 20.04 VM
This looks correct for a minimal _start: but you put it inside a non-naked C function. Compiler-generated code will run, e.g. push %rbp / mov %rsp, %rbp, before execution enters before the asm statement. To see this, look at gcc -S output, or single-step in a debugger such as GDB.
Put your asm statement at global scope (like in How Get arguments value using inline assembly in C without Glibc?) or use __attribute__((naked)) on your _start(). Note that _start isn't really a function
As a rule, never use GNU C Basic asm statements in a non-naked function. Although you might get this to work with -O3 because that would imply -fomit-frame-pointer so the stack would still be pointing at argc and argv when your code ran.
A dynamically linked executable on GNU/Linux will run libc startup code from dynamic linker hooks, so you actually can use printf from _start without manually calling those init functions. Unlike if this was statically linked.
However, your main tries to return to your _start, but you don't show _start calling exit. You should call exit instead of making an _exit system call directly, to make sure stdio buffers get flushed even if output is redirected to a file (making stdout full buffered). Falling off the end of _start would be bad, crashing or getting into an infinite loop depending on what execution falls in to.
C code:
#include <stdio.h>
int main(){
printf("hello word!\n");
return;
}
Assembly code:
push offset aHelloWord ; "hello word!\n"
call sub_41104B
add esp, 4
Now, I expect sub_41104B will lead directly to printf, but thats not the case:
sub_41104B proc near
jmp sub_411870
sub_41104B endp
And finally, in sub_411870, the printf function starts. Can someone explain why the compiler didn't use just directly call sub_411870?
Now, I expect sub_41104B will lead directly to printf ...
... or directly to puts().
... but thats not the case
Did you disassemble the object file or the final EXE file?
If you disassembled the EXE file, it is probable that the function you called is implemented in the LIB file as function "renaming" another one:
int puts(const char *text) // this is sub_41104B
{
return __x_puts(text); // __x_puts is sub_411870
}
You see this very often when calling a function in a DLL file. However, in the case of DLL files the jmp instruction is an indirect jump (jmp dword ptr [411870]), not a direct one.
I'm trying to compile and run following program without main() function in C. I have compiled my program using the following command.
gcc -nostartfiles nomain.c
And compiler gives warning
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400340
Ok, No problem. then, I have run executable file(a.out), both printf statements print successfully, and then get segmentation fault.
So, my question is, Why segmentation fault after successfully execute print statements?
my code:
#include <stdio.h>
void nomain()
{
printf("Hello World...\n");
printf("Successfully run without main...\n");
}
output:
Hello World...
Successfully run without main...
Segmentation fault (core dumped)
Note:
Here, -nostartfiles gcc flag prevents the compiler from using standard startup files when linking
Let's have a look at the generated assembly of your program:
.LC0:
.string "Hello World..."
.LC1:
.string "Successfully run without main..."
nomain:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC0
call puts
mov edi, OFFSET FLAT:.LC1
call puts
nop
pop rbp
ret
Note the ret statement. Your program's entry point is determined to be nomain, all is fine with that. But once the function returns, it attempts to jump into an address on the call stack... that isn't populated. That's an illegal access and a segmentation fault follows.
A quick solution would be to call exit() at the end of your program (and assuming C11 we might as well mark the function as _Noreturn):
#include <stdio.h>
#include <stdlib.h>
_Noreturn void nomain(void)
{
printf("Hello World...\n");
printf("Successfully run without main...\n");
exit(0);
}
In fact, now your function behaves pretty much like a regular main function, since after returning from main, the exit function is called with main's return value.
In C, when functions/subroutines are called the stack is populated as (in the order):
The arguments,
Return address,
Local variables, --> top of the stack
main() being the start point, ELF structures the program in such a way that whatever instructions comes first would get pushed first, in this case printfs are.
Now, program is sort of truncated without return-address OR __end__ and infact it assumes that whatever is there on the stack at that(__end__) location is the return-address, but unfortunately its not and hence it crashes.
When compiling C and nasm on Mac OS X, I found it is different to Linux when passing parameters and making system call. My code works but I'm really confused with it.
I write a function myprint in nasm to print string passed from C.
Here is the C code main.c
#include <stdio.h>
void myprint(char* msg, int len);
int main(void){
myprint("hello\n",6);
return 0;
}
Here is the nasm code myprint.asm
section .text
global _myprint
_syscall:
int 0x80
ret
_myprint:
push dword [esp+8] ;after push, esp-4
push dword [esp+8]
push dword 1
mov eax,4
call _syscall
add esp,12
ret
Compile and link them:
nasm -f macho -o myprint.o myprint.asm
gcc -m32 -o main main.c myprint.o
it prints "hello" correctly.
As you can see, OS X(FreeBSD) use push to pass parameters to sys call, but the parameters char* and int are already pushed into the stack and their addresses are esp+4 and esp+8. However, I have to read them from the stack and push them into stack again to make it work.
If I delete
push dword [esp+8] ;after push, esp-4
push dword [esp+8]
it will print lots of error codes and Bus error: 10, like this:
???]?̀?j????????
?hello
`44?4
__mh_execute_headerm"ain1yprint6???;??
<??
(
libSystem.B?
`%?. _syscall__mh_execute_header_main_myprintdyld_stub_binder ??z0&?z?&?z?&?z۽??۽????N?R?N?o?N???N??N?e?N?h?N?0?zR?N???N???t??N?N???N?????#?`#b?`?`?a#c aaU??]N?zBus error: 10
Why it needs to push the parameters into stack again? How can I pass these parameters already in stack to syscall without pushing again?
This is normal for even plain C function calls. The problem is that the return address is still on the stack before the arguments. If you don't push them again below the return address you will have 2 return addresses on the stack before the arguments (the first to main and the second to myprint) but the syscall only expects 1.