Differences in dis-assembled C code of GCC and Borland? - c

Recently I have gotten interested into dis-assembling C code (very simple C code) and followed a tutorial that used Borland C++ Compiler v 5.5 (compiles C code just fine) and everything worked. Then I decided to try my own c code and compiled them in Dev C++ (which uses gcc). Upon opening it in IDA Pro I got a surprise, the asm of gcc was really different compared to Borland's. I expected some difference but the C code was EXTREMELY simple, so is it just that gcc doesn't optimize as much or is it that they use different default compiler settings?
The C Code
int main(int argc, char **argv)
{
int a;
a = 1;
}
Borland ASM
.text:00401150 ; int __cdecl main(int argc,const char **argv,const char *envp)
.text:00401150 _main proc near ; DATA XREF: .data:004090D0
.text:00401150
.text:00401150 argc = dword ptr 8
.text:00401150 argv = dword ptr 0Ch
.text:00401150 envp = dword ptr 10h
.text:00401150
.text:00401150 push ebp
.text:00401151 mov ebp, esp
.text:00401153 pop ebp
.text:00401154 retn
.text:00401154 _main endp
GCC ASM (UPDATED BELLOW)
.text:00401220 ; ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
.text:00401220
.text:00401220 ; Attributes: bp-based frame
.text:00401220
.text:00401220 public start
.text:00401220 start proc near
.text:00401220
.text:00401220 var_14 = dword ptr -14h
.text:00401220 var_8 = dword ptr -8
.text:00401220
.text:00401220 push ebp
.text:00401221 mov ebp, esp
.text:00401223 sub esp, 8
.text:00401226 mov [esp+8+var_8], 1
.text:0040122D call ds:__set_app_type
.text:00401233 call sub_401100
.text:00401238 nop
.text:00401239 lea esi, [esi+0]
.text:00401240 push ebp
.text:00401241 mov ebp, esp
.text:00401243 sub esp, 8
.text:00401246 mov [esp+14h+var_14], 2
.text:0040124D call ds:__set_app_type
.text:00401253 call sub_401100
.text:00401258 nop
.text:00401259 lea esi, [esi+0]
.text:00401259 start endp
GCC Update
Upon following the suggestion of JimR I went to see what sub_401100 is and then I followed that code to another and this seems to be the code (Am I correct in that assumption and if sowhy does GCC have all of its code in the main function?):
.text:00401100 sub_401100 proc near ; CODE XREF: .text:004010F1j
.text:00401100 ; start+13p ...
.text:00401100
.text:00401100 var_28 = dword ptr -28h
.text:00401100 var_24 = dword ptr -24h
.text:00401100 var_20 = dword ptr -20h
.text:00401100 var_1C = dword ptr -1Ch
.text:00401100 var_18 = dword ptr -18h
.text:00401100 var_C = dword ptr -0Ch
.text:00401100 var_8 = dword ptr -8
.text:00401100
.text:00401100 push ebp
.text:00401101 mov ebp, esp
.text:00401103 push ebx
.text:00401104 sub esp, 24h ; lpTopLevelExceptionFilter
.text:00401107 lea ebx, [ebp+var_8]
.text:0040110A mov [esp+28h+var_28], offset sub_401000
.text:00401111 call SetUnhandledExceptionFilter
.text:00401116 sub esp, 4 ; uExitCode
.text:00401119 call sub_4012E0
.text:0040111E mov [ebp+var_8], 0
.text:00401125 mov eax, offset dword_404000
.text:0040112A lea edx, [ebp+var_C]
.text:0040112D mov [esp+28h+var_18], ebx
.text:00401131 mov ecx, dword_402000
.text:00401137 mov [esp+28h+var_24], eax
.text:0040113B mov [esp+28h+var_20], edx
.text:0040113F mov [esp+28h+var_1C], ecx
.text:00401143 mov [esp+28h+var_28], offset dword_404004
.text:0040114A call __getmainargs
.text:0040114F mov eax, ds:dword_404010
.text:00401154 test eax, eax
.text:00401156 jz short loc_4011B0
.text:00401158 mov dword_402010, eax
.text:0040115D mov edx, ds:_iob
.text:00401163 test edx, edx
.text:00401165 jnz loc_4011F6
.text:004012E0 sub_4012E0 proc near ; CODE XREF: sub_401000+C6p
.text:004012E0 ; sub_401100+19p
.text:004012E0 push ebp
.text:004012E1 mov ebp, esp
.text:004012E3 fninit
.text:004012E5 pop ebp
.text:004012E6 retn
.text:004012E6 sub_4012E0 endp

Compiler output is expected to be different, sometimes dramatically different for the same source. In the same way that a toyota and a honda are different. Four wheels and some seats sure, but more different than the same when you look at the details.
Likewise the same compiler with different compiler options can and often will produce dramatically different output for the same source code. Even for what appears to be simple programs.
In the case of your simple program, which actually does not do anything (code does not affect the input, nor output, nor anything outside the function), a good optimized compiler will result in nothing but main: with a return of some random number since you didnt specify the return value. Actually it should give a warning or error. This is the biggest problem I have when I compare compiler output is making something simple enough to see what they are doing but something complicated enough that the compiler does more than just pre-compute the answer and return it.
In the case of x86, which I assume is what you are talking about here, being microcoded these days there is really no answer for good code vs bad code, each family of processor they change the guts around and what used to be fast is slow and what is now fast is slow on the old processor. So for compilers like gcc that have continued to evolve with the new cores, the optimization can be both generic to all x86es or specific to a particular family (resulting in different code despite max optimization).
With your new interest in disassembling, you will continue to see the similarities and differences and find out just how many different ways the same code can be compiled. the differences are expected, even for trivial programs. And I encourage you to try as many compilers as you can. Even in the gcc family 2.x, 3.x, 4.x and the different ways to build it will result in different code for what might be though thought of as the same compiler.
Good vs bad output is in the eyes of the beholder. Folks that use debuggers will want their code steppable and their variables watchable (in written code order). This makes for very big, bulky, and slow code (particularly for x86). And when you compile for release you end up with a completely different program which you have so far spent zero time debugging. Also optimizing for performance you take a risk of the compiler optimizing out something you wanted it to do (your example above, no variable will be allocated, no code to step through, even with minor optimization). Or worse, you expose the bugs in the compiler and your program simply doesnt work (this is why -O3 is discouraged for gcc). That and/or you find out the large number of places in the C standard whose interpretation is implementation defined.
Unoptimized code is easier to compile, as it is a bit more obvious. In the case of your example the expectation is a variable is allocated on the stack, some sort of stack pointer arrangement set up, the immediate 1 is eventually written to that location, stack cleaned up and function returns. Harder for compilers to get wrong and more likely that your program works as you intended. Detecting and removing dead code is the business of optimization and
that is where it gets risky. Often the risk is worth the reward. But that depends on the user, beauty is in the eye of the beholder.
Bottom line, short answer. Differences are expected (even dramatic differences). Default compile options vary from compiler to compiler. Experiment with the compile/optimization options and different compilers and continue to disassemble your programs in order to gain a better education about the language and the compilers you use. You are on the right track so far. In the case of the borland output, it detected that your program does nothing, no input variables are used, no return variables are used, nor related to the local variables, and no global variables or other external to the function resources are used. The integer a and the assignment of an immediate are dead code, a good optimizer will essentially remove/ignore both lines of code. So it bothered to setup a stack frame then clean it up which it didnt need to do, then returned. gcc looks to be setting up an exception handler which is perfectly fine even though it doesnt need to, start optimizing or use a function name other than main() and you should see different results.

What is most likely happening here is that Borland calls main from its start up code after initializing everything with code present in their run time lib.
The gcc code does not look like main to me, but like generated code that calls main. Disassemble the code at sub_401100 and see if it looks like your main proc.

First of all, make sure you have at least enabled the -O2 optimization flag to gcc, otherwise you get no optimization at all.
With this little example, you arn't really testing optimization, you're seeing how program initialization works, e.g. gcc calls __set_app_type to inform windows of the application type, as well as other initialization. e.g. sub_401100 registers atexit handlers for the runtime. Borland might call the runtime initialization beforehand, while gcc does it within main().

Here's the disassembly of main() that I get from MinGW's gcc 4.5.1 in gdb (I added a return 0 at the end so GCC wouldn't complain):
First, when the program is compiled with -O3 optimization:
(gdb) set disassembly-flavor intel
(gdb) disassemble
Dump of assembler code for function main:
0x00401350 <+0>: push ebp
0x00401351 <+1>: mov ebp,esp
0x00401353 <+3>: and esp,0xfffffff0
0x00401356 <+6>: call 0x4018aa <__main>
=> 0x0040135b <+11>: xor eax,eax
0x0040135d <+13>: mov esp,ebp
0x0040135f <+15>: pop ebp
0x00401360 <+16>: ret
End of assembler dump.
And with no optimizations:
(gdb) set disassembly-flavor intel
(gdb) disassemble
Dump of assembler code for function main:
0x00401350 <+0>: push ebp
0x00401351 <+1>: mov ebp,esp
0x00401353 <+3>: and esp,0xfffffff0
0x00401356 <+6>: sub esp,0x10
0x00401359 <+9>: call 0x4018aa <__main>
=> 0x0040135e <+14>: mov DWORD PTR [esp+0xc],0x1
0x00401366 <+22>: mov eax,0x0
0x0040136b <+27>: leave
0x0040136c <+28>: ret
End of assembler dump.
These are a little more complex than Borland's example, but not excessively.
Note, the calls to 0x4018aa are calls to a library/compiler supplied function to construct C++ objects. Here's a snippet from some GCC toolchain docs:
The actual calls to the constructors are carried out by a subroutine called __main, which is called (automatically) at the beginning of the body of main (provided main was compiled with GNU CC). Calling __main is necessary, even when compiling C code, to allow linking C and C++ object code together. (If you use '-nostdlib', you get an unresolved reference to __main, since it's defined in the standard GCC library. Include '-lgcc' at the end of your compiler command line to resolve this reference.)
I'm not sure what exactly IDA Pro is showing in your examples. IDA Pro labels what it's showing as start not main so I'd guess that JimR's answer is right - it's probably the runtime's initialization (perhaps the entry point as described in the .exe header - which is not main(), but the runtime initialization entry point).
Does IDA Pro understand gcc's debug symbols? Did you compile with the -g option so the debug symbols are generated?

It looks like the Borland compiler is recognizing that you never actually do anything with a and is just giving you the equivalent assembly for an empty main function.

Difference here is mosly not in compiled code, but in what disassembler shows to you.
You may think that main is the only function in your program but it is not. In fact your program is something like this:
void start()
{
... some initialization code here
int result = main();
... some deinitialization code here
ExitProcess(result);
}
IDA Pro knows how Borland works, so it can navigate directly to your main, but it doesn't know how gcc works so it shows you the true entry point of your program. You can see in Borland ASM that main is called from some other function. In GCC ASM you can go thru all of these sub_40xxx to find your main

Related

Assembly: Purpose of loading the effective address before a call to a function?

Source C Code:
int main()
{
int i;
for(i=0, i < 10; i++)
{
printf("Hello World!\n");
}
}
Dump of Intel syntax x86 assembler code for function main:
1. 0x000055555555463a <+0>: push rbp
2. 0x000055555555463b <+1>: mov rbp,rsp
3. 0x000055555555463e <+4>: sub rsp,0x10
4. 0x0000555555554642 <+8>: mov DWORD PTR [rbp-0x4],0x0
5. 0x0000555555554649 <+15>: jmp 0x55555555465b <main+33>
6. 0x000055555555464b <+17>: lea rdi,[rip+0xa2] # 0x5555555546f4
7. 0x0000555555554652 <+24>: call 0x555555554510 <puts#plt>
8. 0x0000555555554657 <+29>: add DWORD PTR [rbp-0x4],0x1
9. 0x000055555555465b <+33>: cmp DWORD PTR [rbp-0x4],0x9
10. 0x000055555555465f <+37>: jle 0x55555555464b <main+17>
11. 0x0000555555554661 <+39>: mov eax,0x0
12. 0x0000555555554666 <+44>: leave
13. 0x0000555555554667 <+45>: ret
I'm currently working through "Hacking, The Art of Exploitation 2nd Edition by Jon Erickson", and I'm just starting to tackle assembly.
I have a few questions about the translation of the provided C code to Assembly, but I am mainly wondering about my first question.
1st Question: What is the purpose of line 6? (lea rdi,[rip+0xa2]).
My current working theory, is that this is used to save where the next instructions will jump to in order to track what is going on. I believe this line correlates with the printf function in the source C code.
So essentially, its loading the effective address of rip+0xa2 (0x5555555546f4) into the register rdi, to simply track where it will jump to for the printf function?
2nd Question: What is the purpose of line 11? (mov eax,0x0?)
I do not see a prior use of the register, EAX and am not sure why it needs to be set to 0.
The LEA puts a pointer to the string literal into a register, as the first arg for puts. The search term you're looking for is "calling convention" and/or ABI. (And also RIP-relative addressing). Why is the address of static variables relative to the Instruction Pointer?
The small offset between code and data (only +0xa2) is because the .rodata section gets linked into the same ELF segment as .text, and your program is tiny. (Newer gcc + ld versions will put it in a separate page so it can be non-executable.)
The compiler can't use a shorter more efficient mov edi, address in position-independent code in your Linux PIE executable. It would do that with gcc -fno-pie -no-pie
mov eax,0 implements the implicit return 0 at the end of main that C99 and C++ guarantee. EAX is the return-value register in all calling conventions.
If you don't use gcc -O2 or higher, you won't get peephole optimizations like xor-zeroing (xor eax,eax).
This:
lea rdi,[rip+0xa2]
Is a typical position independent LEA, putting the string address into a register (instead of loading from that memory address).
Your executable is position independent, meaning that it can be loaded at runtime at any address. Therefore, the real address of the argument to be passed to puts() needs to be calculated at runtime every single time, since the base address of the program could be different each time. Also, puts() is used instead of printf() because the compiler optimized the call since there is no need to format anything.
In this case, the binary was most probably loaded with the base address 0x555555554000. The string to use is stored in your binary at offset 0x6f4. Since the next instruction is at offset 0x652, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x6f4 - 0x652) = rip + 0xa2, which is what you see above. See this answer of mine for another example.
The purpose of:
mov eax,0x0
Is to set the return value of main(). In Intel x86, the calling convention is to return values in the rax register (eax if the value is 32 bits, which is true in this case since main returns an int). See the table entry for x86-64 at the end of this page.
Even if you don't add an explicit return statement, main() is a special function, and the compiler will add a default return 0 for you.
If you add some debug data and symbols to the assembly everything will be easier. It is also easier to read the code if you add some optimizations.
There is a very useful tool godbolt and your example https://godbolt.org/z/9sRFmU
On the asm listing there you can clearly see that that lines loads the address of the string literal which will be then printed by the function.
EAX is considered volatile and main by default returns zero and thats the reason why it is zeroed.
The calling convention is explained here: https://en.wikipedia.org/wiki/X86_calling_conventions
Here you have more interesting cases https://godbolt.org/z/M4MeGk

How to pass variables to a external assembly function

How to pass a variable from a C program to a assembly function.
Example:
main.c:
void main() {
unsigned int passthis = 42
extern asm_function();
asm_function(passthis)
}
main.asm:
bits 32
global start
section .text
asm_function:
...
How do I access passthis within asm_function.
Edit: Probably should've mentioned i'm not using a OS, i'm compiling using a i686-elf cross compiler and am going to use it as a kernel.
If you compile your C into 32-bit code with the default GCC options (not -mregparm=3 like the Linux kernel uses), then on function entry the first argument is on the stack just above the return address (at [esp+4]), but that offset changes after you push anything or move ESP around.
You can use [ebp+8] after setting up a traditional stack pointer (which doesn't change during the function even when ESP does).
For example, int asm_function(int) can be implemented as:
;bits 32 ; unneeded, nasm -felf32 implies this.
global asm_function ; include asm_function in ELF .o symbol table for linking
section .text
asm_function:
push ebp
mov ebp, esp
mov eax, [ebp+8] ; return the first argument
mov esp, ebp
pop ebp
ret
For each parameter after this, just simply add another 4 (i.e. for the 2nd parameter, use [ebp+12]). As you can see, setting up EBP as a frame pointer adds a lot of overhead for tiny functions.
Some non-ELF systems/ABIs prepend a leading underscore to C symbol names, so you should declare both asm_function and _asm_function for your code to be roughly equivalent across these ABIs, like so:
global _asm_function
global asm_function ; make both names of the function global
section .text
_asm_function:
asm_function: ; make both symbols point to the same place
push ebp
mov ebp, esp
mov eax, [ebp+8]
mov esp, ebp
pop ebp
ret
x86 has probably a handful of different calling conventions. It depends on a number of things, like windows vs. linux, and on what compiler environment you're using, 32-bit vs. 64-bit, etc..
Best probably to look at the output of the compiler you're using to see how the parameter is being passed, and do the corresponding appropriate way of accepting the parameter in your assembly. So if it is pushed on the stack then expect it there, if it is passed in a register, expect it there...
You can look at the output using a disassembler, or using a debugger.

Is the stack frame required for all functions in C on x86-64?

I've made a function to calculate the length of a C string (I'm trying to beat clang's optimizer using -O3). I'm running macOS.
_string_length1:
push rbp
mov rbp, rsp
xor rax, rax
.body:
cmp byte [rdi], 0
je .exit
inc rdi
inc rax
jmp .body
.exit:
pop rbp
ret
This is the C function I'm trying to beat:
size_t string_length2(const char *str) {
size_t ret = 0;
while (str[ret]) {
ret++;
}
return ret;
}
And it disassembles to this:
string_length2:
push rbp
mov rbp, rsp
mov rax, -1
LBB0_1:
cmp byte ptr [rdi + rax + 1], 0
lea rax, [rax + 1]
jne LBB0_1
pop rbp
ret
Every C function sets up a stack frame using push rbp and mov rbp, rsp, and breaks it using pop rbp. But I'm not using the stack in any way here, I'm only using processor registers. It worked without using a stack frame (when I tested on x86-64), but is it necessary?
No, the stack frame is, at least in theory, not always required. An optimizing compiler might in some cases avoid using the call stack. Notably when it is able to inline a called function (in some specific call site), or when the compiler successfully detects a tail call (which reuses the caller's frame).
Read the ABI of your platform to understand requirements related to the stack.
You might try to compile your program with link time optimization (e.g. compile and link with gcc -flto -O2) to get more optimizations.
In principle, one could imagine a compiler clever enough to (for some programs) avoid using any call stack.
BTW, I just compiled a naive recursive long fact(int n) factorial function with GCC 7.1 (on Debian/Sid/x86-64) at -O3 (i.e. gcc -fverbose-asm -S -O3 fact.c). The resulting assembler code fact.s contains no call machine instruction.
Every C function sets up a stack frame using...
This is true for your compiler, not in general. It is possible to compile a C program without using the stack at all—see, for example, the method CPS, continuation passing style. Probably no C compiler on the market does so, but it is important to know that there are other ways to execute programs, in addition to stack-evaluation.
The ISO 9899 standard says nothing about the stack. It leaves compiler implementations free to choose whichever method of evaluation they consider to be the best.

Inline assembly in C with Turbo C 3.0 - how to get address of a label

I'm trying to get address of a label - here is some sample code:
int main() {
asm {
mov ax,1
mov bx,ax
}
_labelname:
asm {
mov ax, OFFSET _labelname
}
return 0;
}
Compilation of this code returns this error: "Undefined symbol _labelname"
If I define the label in asm block, I can't even use
jmp _labelname
I found this and doesn't work, for this, no way actually. This says just jumping, not addressing. And this doesn't help at all. Any suggestions?
I've found a way, but you can't use a C label, it has to be an asm label:
int main(void)
{
asm {
mov ax,1
mov bx,ax
}
asm { _labelname: }
asm {
mov ax, OFFSET _labelname
jmp cs:_labelname
}
return 0;
}
There's usually a route in any language, but you need to faff about for a day or two because these things aren't always documented
Declare a global memory op space in your flavour of HLL. DIM LABELNAME1(0)
Then search for the asm syntax which puts the address into eax
mov eax, ^LABELNAME(0)
mov eax, dword [_lablename]
mov eax, ^_lablename
etc etc etc
then pop it in asm
You won't find pop [^ anywhere on google, but it's one which works in certain HLLs
push eax
pop [^LABELNAME1(0)]
Now your HLL and asm can chat to each other whenever you like
So it's well worth figuring out
Undefined symbol _labelname
Probbly needs to be declared at the very start of the program
._labelname
mov dword [_lablename], 0
and used later by asm as a label
As I say, you'll have to mess about and suss it out for your particular flavour of HLL, and globals seem to work best
You'll also need to figure out how to declare separate memory zones for storing asm dynamic variables and running the opcodes or you'll get cache overwrites which will cripple the speed advantages of asm
A small routine I wrote without separating these asm areas took 20 hours to run. With separation it took 1 hour
mov ax, OFFSET _labelname
This is 16 bit stuff, (DOS etc, with goofy memory rules) aren't you doing 32 bit stuff with your HLL???
Unless it's all happening in one segment you will need a double memory operand to find _labelname, dx:ax etc, and as mentioned previously, you're 20 years too late
jmp cs:_labelname
Works in the same segment but for a bigger program the cs part will need to be a specific segment override and a far jump/return
Additionally, if your dynamic asm variables are plonked into your asm code segment then a cardinal rule for maximising asm speed has been broken

Help deciphering simple Assembly Code

I am learning assembly using GDB & Eclipse
Here is a simple C code.
int absdiff(int x, int y)
{
if(x < y)
return y-x;
else
return x-y;
}
int main(void) {
int x = 10;
int y = 15;
absdiff(x,y);
return EXIT_SUCCESS;
}
Here is corresponding assembly instructions for main()
main:
080483bb: push %ebp #push old frame pointer onto the stack
080483bc: mov %esp,%ebp #move the frame pointer down, to the position of stack pointer
080483be: sub $0x18,%esp # ???
25 int x = 10;
080483c1: movl $0xa,-0x4(%ebp) #move the "x(10)" to 4 address below frame pointer (why not push?)
26 int y = 15;
080483c8: movl $0xf,-0x8(%ebp) #move the "y(15)" to 8 address below frame pointer (why not push?)
28 absdiff(x,y);
080483cf: mov -0x8(%ebp),%eax # -0x8(%ebp) == 15 = y, and move it into %eax
080483d2: mov %eax,0x4(%esp) # from this point on, I am confused
080483d6: mov -0x4(%ebp),%eax
080483d9: mov %eax,(%esp)
080483dc: call 0x8048394 <absdiff>
31 return EXIT_SUCCESS;
080483e1: mov $0x0,%eax
32 }
Basically, I am asking to help me to make sense of this assembly code, and why it is doing things in this particular order. Point where I am stuck, is shown in assembly comments. Thanks !
Lines 0x080483cf to 0x080483d9 are copying x and y from the current frame on the stack, and pushing them back onto the stack as arguments for absdiff() (this is typical; see e.g. http://en.wikipedia.org/wiki/X86_calling_conventions#cdecl). If you look at the disassembler for absdiff() (starting at 0x8048394), I bet you'll see it pick these values up from the stack and use them.
This might seem like a waste of cycles in this instance, but that's probably because you've compiled without optimisation, so the compiler does literally what you asked for. If you use e.g. -O2, you'll probably see most of this code disappear.
First it bears saying that this assembly is in the AT&T syntax version of x86_32, and that the order of arguments to operations is reversed from the Intel syntax (used with MASM, YASM, and many other assemblers and debuggers).
080483bb: push %ebp #push old frame pointer onto the stack
080483bc: mov %esp,%ebp #move the frame pointer down, to the position of stack pointer
080483be: sub $0x18,%esp # ???
This enters a stack frame. A frame is an area of memory between the stack pointer (esp) and the base pointer (ebp). This area is intended to be used for local variables that have to live on the stack. NOTE: Stack frames don't have to be implemented in this way, and GCC has the optimization switch -fomit-frame-pointer that does away with it except when alloca or variable sized arrays are used, because they are implemented by changing the stack pointer by arbitrary values. Not using ebp as the frame pointer allows it to be used as an extra general purpose register (more general purpose registers is usually good).
Using the base pointer makes several things simpler to calculate for compilers and debuggers, since where variables are located relative to the base does not change while in the function, but you can also index them relative to the stack pointer and get the same results, though the stack pointer does tend to change around so the same location may require a different index at different times.
In this code 0x18 (or 24) bytes are being reserved on the stack for local use.
This code so far is often called the function prologue (not to be confused with the programming language "prolog").
25 int x = 10;
080483c1: movl $0xa,-0x4(%ebp) #move the "x(10)" to 4 address below frame pointer (why not push?)
This line moves the constant 10 (0xA) to a location within the current stack frame relative to the base pointer. Because the base pointer below the top of the stack and since the stack grows downward in RAM the index is negative rather than positive. If this were indexed relative to the stack pointer a different index would be used, but it would be positive.
You are correct that this value could have been pushed rather than copied like this. I suspect that this is done this way because you have not compiled with optimizations turned on. By default gcc (which I assume you are using based on your use of gdb) does not optimize much, and so this code is probably the default "copy a constant to a location in the stack frame" code. This may not be the case, but it is one possible explanation.
26 int y = 15;
080483c8: movl $0xf,-0x8(%ebp) #move the "y(15)" to 8 address below frame pointer (why not push?)
Similar to the previous line of code. These two lines of code put the 10 and 15 into local variables. They are on the stack (rather than in registers) because this is unoptimized code.
28 absdiff(x,y);
gdb printing this meant that this is the source code line being executed, not that this function is being executed (yet).
080483cf: mov -0x8(%ebp),%eax # -0x8(%ebp) == 15 = y, and move it into %eax
In preparation for calling the function the values that are being passed as arguments need to be retrieved from their storage locations (even though they were just placed at those locations and their values are known because of the no optimization thing)
080483d2: mov %eax,0x4(%esp) # from this point on, I am confused
This is the second part of the move to the stack of one of the local variables' value so that it can be use as an argument to the function. You can't (usually) move from one memory address to another on x86, so you have to move it through a register (eax in this case).
080483d6: mov -0x4(%ebp),%eax
080483d9: mov %eax,(%esp)
These two lines do the same thing except for the other variable. Note that since this variable is being moved to the top of the stack that no offset is being used in the second instruction.
080483dc: call 0x8048394 <absdiff>
This pushed the return address to the top of the stack and jumps to the address of absdiff.
You didn't include code for absdiff, so you probably did not step through that.
31 return EXIT_SUCCESS;
080483e1: mov $0x0,%eax
C programs return 0 upon success, so EXIT_SUCCESS was defined as 0 by someone. Integer return values are put in eax, and some code that called the main function will use that value as the argument when calling the exit function.
32 }
This is the end. The reason that gdb stopped here is that there are things that actually happen to clean up. In C++ it is common to see destructor for local class instances being called here, but in C you will probably just see the function epilogue. This is the compliment to the function prologue, and consists of returning the stack pointer and base pointer to the values that they were originally at. Sometimes this is done with similar math on them, but sometimes it is done with the leave instruction. There is also an enter instruction which can be used for the prologue, but gcc doesn't do this (I don't know why). If you had continued to view the disassembly here you would have seen the epilogue code and a ret instruction.
Something you may be interested in is the ability to tell gcc to produce assembly files. If you do:
gcc -S source_file.c
a file named source_file.s will be produced with assembly code in it.
If you do:
gcc -S -O source_file.c
Then the same thing will happen, but some basic optimizations will be done. This will probably make reading the assembly code easier since the code will not likely have as many odd instructions that seem like they could have been done a better way (like moving constant values to the stack, then to a register, then to another location on the stack and never using the push instruction).
You regular optimization flags for gcc are:
-O0 default -- none
-O1 a few optimizations
-O the same as -O1
-O2 a lot of optimizations
-O3 a bunch more, some of which may take a long time and/or make the code a lot bigger
-Os optimize for size -- similar to -O2, but not quite
If you are actually trying to debug C programs then you will probably want the least optimizations possible since things will happen in the order that they are written in your code and variables won't disappear.
You should have a look at the gcc man page:
man gcc
Remember, if you're running in a debugger or debug mode, the compiler reserves the right to insert whatever debugging code it likes and make other nonsensical code changes.
For example, this is Visual Studio's debug main():
int main(void) {
001F13D0 push ebp
001F13D1 mov ebp,esp
001F13D3 sub esp,0D8h
001F13D9 push ebx
001F13DA push esi
001F13DB push edi
001F13DC lea edi,[ebp-0D8h]
001F13E2 mov ecx,36h
001F13E7 mov eax,0CCCCCCCCh
001F13EC rep stos dword ptr es:[edi]
int x = 10;
001F13EE mov dword ptr [x],0Ah
int y = 15;
001F13F5 mov dword ptr [y],0Fh
absdiff(x,y);
001F13FC mov eax,dword ptr [y]
001F13FF push eax
001F1400 mov ecx,dword ptr [x]
001F1403 push ecx
001F1404 call absdiff (1F10A0h)
001F1409 add esp,8
*(int*)nullptr = 5;
001F140C mov dword ptr ds:[0],5
return 0;
001F1416 xor eax,eax
}
001F1418 pop edi
001F1419 pop esi
001F141A pop ebx
001F141B add esp,0D8h
001F1421 cmp ebp,esp
001F1423 call #ILT+300(__RTC_CheckEsp) (1F1131h)
001F1428 mov esp,ebp
001F142A pop ebp
001F142B ret
It helpfully posts the C++ source next to the corresponding assembly. In this case, you can fairly clearly see that x and y are stored on the stack explicitly, and an explicit copy is pushed on, then absdiff is called. I explicitly de-referenced nullptr to cause the debugger to break in. You may wish to change compiler.
Compile with -fverbose-asm -g -save-temps for additional information with GCC.

Resources