I have written a minimal function to test whether I can call/link C and x86_64 assembly code.
Here is my main.c
#include <stdio.h>
extern int test(int);
int main(int argc, char* argv[])
{
int a = 10;
int b = test(a);
printf("b=%d\n", b);
return 0;
}
Here is my test.asm
section .text
global test
test:
mov ebx,2
add eax,ebx
ret
I built an executable using this script
#!/usr/bin/env bash
nasm -f elf64 test.asm -o test.o
gcc -c main.c -o main.o
gcc main.o test.o -o a.out
I wrote test.asm without having any real clue what I was doing. I then went away and did some reading, and now I don't understand how my code appears to be working, as I have convinced myself that it shouldn't be.
Here's a list of reasons why I think this shouldn't work:
I don't save or restore the base pointer (setup the stack frame). I actually don't understand why this is needed, but every example I have looked at does this.
The calling convention for the gcc compiler on Linux systems should be to pass arguments via the stack. Here I assume the arguments are passed using eax and ebx. I don't think that is right.
ret probably expects to pick up a return address from somewhere. I am fairly sure I haven't supplied this.
There may even be other reasons which I don't know about.
Is it a complete fluke that what I have written produces the correct output?
I am completely new to this. While I have heard of some x86 concepts in passing this is the first time I have actually attempted to write some. Got to start somewhere?
Edit: For future reference here is a corrected code
test:
; save old base pointer
push rbp ; sub rsp, 8; mov [rsp] rbp
mov rbp, rsp ; mov rbp, rsp ;; rbp = rsp
; initializes new stack frame
add rdi, 2 ; add 2 to the first argument passed to this function
mov rax, rdi ; return value passed via rax
; did not allocate any local variables, nothing to add to
; stack pointer
; the stack pointer is unchanged
pop rbp ; restore old base pointer
ret ; pop the return address off the stack and jump
; call and ret modify or save the rip instruction pointer
I don't save or restore the base pointer (setup the stack frame). I actually don't understand why this is needed, but every example I have looked at does this.
That's not needed. Try compiling some C code with -O3 and you'll see it doesn't happen then.
The calling convention for the gcc compiler on Linux systems should be to pass arguments via the stack. Here I assume the arguments are passed using eax and ebx. I don't think that is right.
That part is only working because of a fluke. The assembly happens to put 10 in eax too the way you compiled, but there's no guarantee this will always happen. Again, compile with -O3 and it won't anymore.
ret probably expects to pick up a return address from somewhere. I am fairly sure I haven't supplied this.
This part is fine. The return address gets supplied by the caller. It'll always be at the top of the stack when your function gets entered.
There may even be other reasons which I don't know about.
Yes, there's one more: ebx is call-saved but you're clobbering it. If the calling function (or anything above it in the stack) used it, then this would break it.
Is it a complete fluke that what I have written produces the correct output?
Yes, because of the second and fourth points above.
For reference, here's a comparison of the assembly that gcc produces from your C code at the -O0 (the default optimization level) and -O3: https://godbolt.org/z/7P13fbb1a
Related
I am having an issue with some inline assembly. I am writing a compiler, and it is compiling to assembly, and for portability i made it add the main function in C and just use inline assembly. Though even the simplest inline assembly is giving me a segfault. Thanks for your help
int main(int argc, char** argv) {
__asm__(
"push $1\n"
);
return 0;
}
TLDR at bottom. Note: everything here is assuming x86_64.
The issue here is that compilers will effectively never use push or pop in a function body (except for prologues/epilogues).
Consider this example.
When the function begins, room is made on the stack in the prologue with:
push rbp
mov rbp, rsp
sub rsp, 32
This creates 32 bytes of room for main. Then notice how throughout the function, instead of pushing items to the stack, they are mov'd to the stack through offsets from rbp:
mov DWORD PTR [rbp-20], edi
mov QWORD PTR [rbp-32], rsi
mov DWORD PTR [rbp-4], 2
mov DWORD PTR [rbp-8], 5
The reason for this is it allows for variables to be stored anywhere at anytime, and loaded from anywhere at anytime without requiring a huge amount of push/pops.
Consider the case where variables are stored using push and pop. Say a variable is stored early on in the function, let's call this foo. 8 variables on the stack later, you need foo, how should you access it?
Well, you can pop everything until foo, and then push everything back, but that's costly.
It also doesn't work when you have conditional statements. Say a variable is only ever stored if foo is some certain value. Now you have a conditional where the stack pointer could be at one of two locations after it!
For this reason, compilers always prefer to use rbp - N to store variables, as at any point in the function, the variable will still live at rbp - N.
NB: On different ABIs (such as i386 system V), parameters to arguments may be passed on the stack, but this isn't too much of an issue, as ABIs will generally specify how this should be handled. Again, using i386 system V as an example, the calling convention for a function will go something like:
push edi ; 2nd argument to the function.
push eax ; 1st argument to the function.
call my_func
; here, it can be assumed that the stack has been corrected
So, why does push actually cause an issue?
Well, I'll add a small asm snippet to the code
At the end of the function, we now have the following:
push 64
mov eax, 0
leave
ret
There's 2 things that fail now due to pushing to the stack.
The first is the leave instruction (see this thread)
The leave instruction will attempt to pop the value of rbp that was stored at the beginning of the function (notice the only push that the compiler generates is at the start: push rbp).
This is so that the stack frame of the caller is preserved following main. By pushing to the stack, in our case rbp is now going to be set to 64, since the last value pushed is 64. When the callee of main resumes it's execution, and tries to access a value at say, rbp - 8, a crash will occur, as rbp - 8 is 0x38 in hex, which is an invalid address.
But that assumes the callee even get's execution back!
After rbp has it's value restored with the invalid value, the next thing on the stack will be the original value of rbp.
The ret instruction will pop a value from the stack, and return to that address...
Notice how this might be slightly problematic?
The CPU is going to try and jump to the value of rbp stored at the start of the function!
On nearly every modern program, the stack is a "no execute" zone (see here), and attempting to execute code from there will immediately cause a crash.
So, TLDR: Pushing to the stack violates assumptions made by the compiler, most importantly about the return address of the function. This violation causes program execution to end up on the stack (generally), which will cause a crash
Source C Code:
int main()
{
int i;
for(i=0, i < 10; i++)
{
printf("Hello World!\n");
}
}
Dump of Intel syntax x86 assembler code for function main:
1. 0x000055555555463a <+0>: push rbp
2. 0x000055555555463b <+1>: mov rbp,rsp
3. 0x000055555555463e <+4>: sub rsp,0x10
4. 0x0000555555554642 <+8>: mov DWORD PTR [rbp-0x4],0x0
5. 0x0000555555554649 <+15>: jmp 0x55555555465b <main+33>
6. 0x000055555555464b <+17>: lea rdi,[rip+0xa2] # 0x5555555546f4
7. 0x0000555555554652 <+24>: call 0x555555554510 <puts#plt>
8. 0x0000555555554657 <+29>: add DWORD PTR [rbp-0x4],0x1
9. 0x000055555555465b <+33>: cmp DWORD PTR [rbp-0x4],0x9
10. 0x000055555555465f <+37>: jle 0x55555555464b <main+17>
11. 0x0000555555554661 <+39>: mov eax,0x0
12. 0x0000555555554666 <+44>: leave
13. 0x0000555555554667 <+45>: ret
I'm currently working through "Hacking, The Art of Exploitation 2nd Edition by Jon Erickson", and I'm just starting to tackle assembly.
I have a few questions about the translation of the provided C code to Assembly, but I am mainly wondering about my first question.
1st Question: What is the purpose of line 6? (lea rdi,[rip+0xa2]).
My current working theory, is that this is used to save where the next instructions will jump to in order to track what is going on. I believe this line correlates with the printf function in the source C code.
So essentially, its loading the effective address of rip+0xa2 (0x5555555546f4) into the register rdi, to simply track where it will jump to for the printf function?
2nd Question: What is the purpose of line 11? (mov eax,0x0?)
I do not see a prior use of the register, EAX and am not sure why it needs to be set to 0.
The LEA puts a pointer to the string literal into a register, as the first arg for puts. The search term you're looking for is "calling convention" and/or ABI. (And also RIP-relative addressing). Why is the address of static variables relative to the Instruction Pointer?
The small offset between code and data (only +0xa2) is because the .rodata section gets linked into the same ELF segment as .text, and your program is tiny. (Newer gcc + ld versions will put it in a separate page so it can be non-executable.)
The compiler can't use a shorter more efficient mov edi, address in position-independent code in your Linux PIE executable. It would do that with gcc -fno-pie -no-pie
mov eax,0 implements the implicit return 0 at the end of main that C99 and C++ guarantee. EAX is the return-value register in all calling conventions.
If you don't use gcc -O2 or higher, you won't get peephole optimizations like xor-zeroing (xor eax,eax).
This:
lea rdi,[rip+0xa2]
Is a typical position independent LEA, putting the string address into a register (instead of loading from that memory address).
Your executable is position independent, meaning that it can be loaded at runtime at any address. Therefore, the real address of the argument to be passed to puts() needs to be calculated at runtime every single time, since the base address of the program could be different each time. Also, puts() is used instead of printf() because the compiler optimized the call since there is no need to format anything.
In this case, the binary was most probably loaded with the base address 0x555555554000. The string to use is stored in your binary at offset 0x6f4. Since the next instruction is at offset 0x652, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x6f4 - 0x652) = rip + 0xa2, which is what you see above. See this answer of mine for another example.
The purpose of:
mov eax,0x0
Is to set the return value of main(). In Intel x86, the calling convention is to return values in the rax register (eax if the value is 32 bits, which is true in this case since main returns an int). See the table entry for x86-64 at the end of this page.
Even if you don't add an explicit return statement, main() is a special function, and the compiler will add a default return 0 for you.
If you add some debug data and symbols to the assembly everything will be easier. It is also easier to read the code if you add some optimizations.
There is a very useful tool godbolt and your example https://godbolt.org/z/9sRFmU
On the asm listing there you can clearly see that that lines loads the address of the string literal which will be then printed by the function.
EAX is considered volatile and main by default returns zero and thats the reason why it is zeroed.
The calling convention is explained here: https://en.wikipedia.org/wiki/X86_calling_conventions
Here you have more interesting cases https://godbolt.org/z/M4MeGk
So I have this c code:
#include <stdio.h>
int main(void)
{
int a;
int b;
int c;
a=b=c=5;
printf("Hi%d%d%dHi",a,b,c);
}
I compiled it on ubuntu with:
gcc program.c -o program -ggdb -m32 -O2
And then disassembled it with:
objdump -M intel program -d
And in main printf() gets called like this:
mov DWORD PTR [esp+0x10],0x5
mov DWORD PTR [esp+0xc],0x5
mov DWORD PTR [esp+0x8],0x5
mov DWORD PTR [esp+0x4],0x8048500
mov DWORD PTR [esp],0x1
call 8048330 <__printf_chk#plt>
What I am wondering right now is what this means:
mov DWORD PTR [esp],0x1
I know what the first 4 mov instructions are for, but I just can't figure out why a '1' gets pushed onto the stack. Also this mov only occurs when optimization is turned on. Any ideas?
The GNU C library (glibc) will use __printf_chk instead of printf if you (or the the compiler) defines _FORTIFY_SOURCE and optimization is enabled. The _chk version of the function behaves just like the function it replaces except it's supposed to check for stack overflow and maybe validate the arguments. The extra first argument indicates how much checking and validation should occur.
Looking at the actual glibc implmenation it appears that doesn't do any additional stack checking over what the compiler automatic provides (and so shouldn't be necessary) and the validation of arguments is very minimal. It will check that %n only appears on read-only format strings, and checks that if the special %m$ argument specifiers are used that they're used for all arguments without any gaps.
I have this little C code
void decode(int *xp,int *yp,int *zp)
{
int a,b,c;
a=*yp;
b=*zp;
c=*xp;
*yp=c;
*zp=a;
*xp=b;
}
Then I compiled it to object file using gcc -c -O1 decode.c, and then dumped the object with objdump -M intel -d decode.o and the equivalent assembly code for this is
mov ecx,DWORD PTR [rsi]
mov eax,DWORD PTR [rdx]
mov r8d,DWORD PTR [rdi]
mov DWORD PTR [rsi],r8d
mov DWORD PTR [rdx],ecx
mov DWORD PTR [rdi],eax
ret
And I noticed that it doesnt use stack at all.But firstly values still need to be loaded to the registers. So my question is how do the arguments get loaded into the registers? does the compiler automatically loads the arguments to the registers behind the scenes? or something else happens? because there is no instructions that would load the arguments into the registers.
And a little off topic. When you increase optimization for compiling the relationship between original source code and machine code decreases,imposing dificulties to relate the machine code back to the source code. By default if you dont specify the optimization flag to the GCC it doesnt optimize the code. So I tried to compile without any optimizations to get expected results from the source, but what I got was 4-5 times bigger machine code that wasnt related to the source and understandable. But when I applied Level 1 optimization the code appeared understandable and related to the source. But why?
The arguments are loaded to registers in the caller. Example:
int a;
int b;
int f(int, int);
int g(void) {
return f(a, b);
}
Look at the code generated for g:
$ gcc -O1 -S t.c
$ cat t.s
…
movl b(%rip), %esi
movl a(%rip), %edi
call f
Second question:
So I tried to compile without any optimizations to get expected results from the source, but what I got was 4-5 times bigger machine code that wasnt related to the source and understandable.
This happens because unoptimized code is stupid. It is a straight translation of an intermediate representation in which each variable is stored in the stack even if it doesn't need to, each conversion is represented by an explicit operation even if one isn't needed, and so on. -O1 is the best level of optimization for reading the generated assembly. It is also possible to disable the frame pointer, which keeps the overhead to the minimum for simple functions.
I am learning assembly using GDB & Eclipse
Here is a simple C code.
int absdiff(int x, int y)
{
if(x < y)
return y-x;
else
return x-y;
}
int main(void) {
int x = 10;
int y = 15;
absdiff(x,y);
return EXIT_SUCCESS;
}
Here is corresponding assembly instructions for main()
main:
080483bb: push %ebp #push old frame pointer onto the stack
080483bc: mov %esp,%ebp #move the frame pointer down, to the position of stack pointer
080483be: sub $0x18,%esp # ???
25 int x = 10;
080483c1: movl $0xa,-0x4(%ebp) #move the "x(10)" to 4 address below frame pointer (why not push?)
26 int y = 15;
080483c8: movl $0xf,-0x8(%ebp) #move the "y(15)" to 8 address below frame pointer (why not push?)
28 absdiff(x,y);
080483cf: mov -0x8(%ebp),%eax # -0x8(%ebp) == 15 = y, and move it into %eax
080483d2: mov %eax,0x4(%esp) # from this point on, I am confused
080483d6: mov -0x4(%ebp),%eax
080483d9: mov %eax,(%esp)
080483dc: call 0x8048394 <absdiff>
31 return EXIT_SUCCESS;
080483e1: mov $0x0,%eax
32 }
Basically, I am asking to help me to make sense of this assembly code, and why it is doing things in this particular order. Point where I am stuck, is shown in assembly comments. Thanks !
Lines 0x080483cf to 0x080483d9 are copying x and y from the current frame on the stack, and pushing them back onto the stack as arguments for absdiff() (this is typical; see e.g. http://en.wikipedia.org/wiki/X86_calling_conventions#cdecl). If you look at the disassembler for absdiff() (starting at 0x8048394), I bet you'll see it pick these values up from the stack and use them.
This might seem like a waste of cycles in this instance, but that's probably because you've compiled without optimisation, so the compiler does literally what you asked for. If you use e.g. -O2, you'll probably see most of this code disappear.
First it bears saying that this assembly is in the AT&T syntax version of x86_32, and that the order of arguments to operations is reversed from the Intel syntax (used with MASM, YASM, and many other assemblers and debuggers).
080483bb: push %ebp #push old frame pointer onto the stack
080483bc: mov %esp,%ebp #move the frame pointer down, to the position of stack pointer
080483be: sub $0x18,%esp # ???
This enters a stack frame. A frame is an area of memory between the stack pointer (esp) and the base pointer (ebp). This area is intended to be used for local variables that have to live on the stack. NOTE: Stack frames don't have to be implemented in this way, and GCC has the optimization switch -fomit-frame-pointer that does away with it except when alloca or variable sized arrays are used, because they are implemented by changing the stack pointer by arbitrary values. Not using ebp as the frame pointer allows it to be used as an extra general purpose register (more general purpose registers is usually good).
Using the base pointer makes several things simpler to calculate for compilers and debuggers, since where variables are located relative to the base does not change while in the function, but you can also index them relative to the stack pointer and get the same results, though the stack pointer does tend to change around so the same location may require a different index at different times.
In this code 0x18 (or 24) bytes are being reserved on the stack for local use.
This code so far is often called the function prologue (not to be confused with the programming language "prolog").
25 int x = 10;
080483c1: movl $0xa,-0x4(%ebp) #move the "x(10)" to 4 address below frame pointer (why not push?)
This line moves the constant 10 (0xA) to a location within the current stack frame relative to the base pointer. Because the base pointer below the top of the stack and since the stack grows downward in RAM the index is negative rather than positive. If this were indexed relative to the stack pointer a different index would be used, but it would be positive.
You are correct that this value could have been pushed rather than copied like this. I suspect that this is done this way because you have not compiled with optimizations turned on. By default gcc (which I assume you are using based on your use of gdb) does not optimize much, and so this code is probably the default "copy a constant to a location in the stack frame" code. This may not be the case, but it is one possible explanation.
26 int y = 15;
080483c8: movl $0xf,-0x8(%ebp) #move the "y(15)" to 8 address below frame pointer (why not push?)
Similar to the previous line of code. These two lines of code put the 10 and 15 into local variables. They are on the stack (rather than in registers) because this is unoptimized code.
28 absdiff(x,y);
gdb printing this meant that this is the source code line being executed, not that this function is being executed (yet).
080483cf: mov -0x8(%ebp),%eax # -0x8(%ebp) == 15 = y, and move it into %eax
In preparation for calling the function the values that are being passed as arguments need to be retrieved from their storage locations (even though they were just placed at those locations and their values are known because of the no optimization thing)
080483d2: mov %eax,0x4(%esp) # from this point on, I am confused
This is the second part of the move to the stack of one of the local variables' value so that it can be use as an argument to the function. You can't (usually) move from one memory address to another on x86, so you have to move it through a register (eax in this case).
080483d6: mov -0x4(%ebp),%eax
080483d9: mov %eax,(%esp)
These two lines do the same thing except for the other variable. Note that since this variable is being moved to the top of the stack that no offset is being used in the second instruction.
080483dc: call 0x8048394 <absdiff>
This pushed the return address to the top of the stack and jumps to the address of absdiff.
You didn't include code for absdiff, so you probably did not step through that.
31 return EXIT_SUCCESS;
080483e1: mov $0x0,%eax
C programs return 0 upon success, so EXIT_SUCCESS was defined as 0 by someone. Integer return values are put in eax, and some code that called the main function will use that value as the argument when calling the exit function.
32 }
This is the end. The reason that gdb stopped here is that there are things that actually happen to clean up. In C++ it is common to see destructor for local class instances being called here, but in C you will probably just see the function epilogue. This is the compliment to the function prologue, and consists of returning the stack pointer and base pointer to the values that they were originally at. Sometimes this is done with similar math on them, but sometimes it is done with the leave instruction. There is also an enter instruction which can be used for the prologue, but gcc doesn't do this (I don't know why). If you had continued to view the disassembly here you would have seen the epilogue code and a ret instruction.
Something you may be interested in is the ability to tell gcc to produce assembly files. If you do:
gcc -S source_file.c
a file named source_file.s will be produced with assembly code in it.
If you do:
gcc -S -O source_file.c
Then the same thing will happen, but some basic optimizations will be done. This will probably make reading the assembly code easier since the code will not likely have as many odd instructions that seem like they could have been done a better way (like moving constant values to the stack, then to a register, then to another location on the stack and never using the push instruction).
You regular optimization flags for gcc are:
-O0 default -- none
-O1 a few optimizations
-O the same as -O1
-O2 a lot of optimizations
-O3 a bunch more, some of which may take a long time and/or make the code a lot bigger
-Os optimize for size -- similar to -O2, but not quite
If you are actually trying to debug C programs then you will probably want the least optimizations possible since things will happen in the order that they are written in your code and variables won't disappear.
You should have a look at the gcc man page:
man gcc
Remember, if you're running in a debugger or debug mode, the compiler reserves the right to insert whatever debugging code it likes and make other nonsensical code changes.
For example, this is Visual Studio's debug main():
int main(void) {
001F13D0 push ebp
001F13D1 mov ebp,esp
001F13D3 sub esp,0D8h
001F13D9 push ebx
001F13DA push esi
001F13DB push edi
001F13DC lea edi,[ebp-0D8h]
001F13E2 mov ecx,36h
001F13E7 mov eax,0CCCCCCCCh
001F13EC rep stos dword ptr es:[edi]
int x = 10;
001F13EE mov dword ptr [x],0Ah
int y = 15;
001F13F5 mov dword ptr [y],0Fh
absdiff(x,y);
001F13FC mov eax,dword ptr [y]
001F13FF push eax
001F1400 mov ecx,dword ptr [x]
001F1403 push ecx
001F1404 call absdiff (1F10A0h)
001F1409 add esp,8
*(int*)nullptr = 5;
001F140C mov dword ptr ds:[0],5
return 0;
001F1416 xor eax,eax
}
001F1418 pop edi
001F1419 pop esi
001F141A pop ebx
001F141B add esp,0D8h
001F1421 cmp ebp,esp
001F1423 call #ILT+300(__RTC_CheckEsp) (1F1131h)
001F1428 mov esp,ebp
001F142A pop ebp
001F142B ret
It helpfully posts the C++ source next to the corresponding assembly. In this case, you can fairly clearly see that x and y are stored on the stack explicitly, and an explicit copy is pushed on, then absdiff is called. I explicitly de-referenced nullptr to cause the debugger to break in. You may wish to change compiler.
Compile with -fverbose-asm -g -save-temps for additional information with GCC.