Let's say I have declared in the global scope:
const int a =0x93191;
And in the main function I have the following condition:
if(a>0)
do_something
An awkward thing I have noticed is that the RVDS compiler will drop the if statement and there is no branch/jmp in the object file.
but If I write:
if(*(&a)>0)
do_something
The if (cmp and branch) will be in the compiled object file.
In contrast, GCC do optimizes both with (-O1 or -O2 or -O3) :
#include <stdio.h>
const a = 3333;
int main()
{
if (a >333)
printf("first\n");
return 0;
}
compiled with -O3:
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000100000f10 <main+0>: push %rbp
0x0000000100000f11 <main+1>: mov %rsp,%rbp
0x0000000100000f14 <main+4>: lea 0x3d(%rip),%rdi # 0x100000f58
0x0000000100000f1b <main+11>: callq 0x100000f2a <dyld_stub_puts>
0x0000000100000f20 <main+16>: xor %eax,%eax
0x0000000100000f22 <main+18>: pop %rbp
0x0000000100000f23 <main+19>: retq
End of assembler dump.
And for
#include <stdio.h>
const a = 3333;
int main()
{
if (*(&a) >333)
printf("first\n");
return 0;
}
will give:
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000100000f10 <main+0>: push %rbp
0x0000000100000f11 <main+1>: mov %rsp,%rbp
0x0000000100000f14 <main+4>: lea 0x3d(%rip),%rdi # 0x100000f58
0x0000000100000f1b <main+11>: callq 0x100000f2a <dyld_stub_puts>
0x0000000100000f20 <main+16>: xor %eax,%eax
0x0000000100000f22 <main+18>: pop %rbp
0x0000000100000f23 <main+19>: retq
End of assembler dump.
GCC treat both as same (as should be) and RVDS doesn't ?
I tried to examine the affect of using volatile and in the RVDS it did drop the the if(a>333) but gcc didn't:
#include <stdio.h>
volatile const a = 3333;
int main()
{
if (a >333)
printf("first\n");
return 0;
}
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000100000f10 <main+0>: push %rbp
0x0000000100000f11 <main+1>: mov %rsp,%rbp
0x0000000100000f14 <main+4>: cmpl $0x14e,0x12a(%rip) # 0x100001048 <a>
0x0000000100000f1e <main+14>: jl 0x100000f2c <main+28>
0x0000000100000f20 <main+16>: lea 0x39(%rip),%rdi # 0x100000f60
0x0000000100000f27 <main+23>: callq 0x100000f36 <dyld_stub_puts>
0x0000000100000f2c <main+28>: xor %eax,%eax
0x0000000100000f2e <main+30>: pop %rbp
0x0000000100000f2f <main+31>: retq
End of assembler dump.
Probably there are some bugs in the compiler version I used of RVDS.
The level of complexity the compiler will go through to find out "is this something I can figure out what the actual value is", is not unbounded. If you write a sufficiently complex statement, the compiler will simply say "I don't know what the value is, I'll generate code to compute it".
This is perfectly possible for a compiler to figure out that it's not going to change. But it's also possible that some compilers "give up" in the process - it may also depends on where in the compilation chain this analysis is done.
This is probably a fairly typical example of "as-if" rule - the compiler is allowed to perform any optimisation that generates the result "as-if" this was executed.
Having said all that, this should be fairly trivial (and as per comments, the compiler should consdier *(&a) the same as a), so it seems strange that it then doesn't get rid of the comparison.
Optimizations are implementation details of the compilers. It takes time and effort to implement them and compiler writers usually focus on the common uses of the language (i.e. the return of investment of optimizing code that is highly infrequent is close to nothing).
That being said there is a important difference in both pieces of code, in the first case a is not odr-used, only used as an rvalue and that means that it can be processed as a compile time constant. That is, when a is used directly (no address-of, no references bound to it) compilers immediately substitute the value in. The value must be known by the compiler without accessing the variable, since it could be used in contexts where constant expressions are required (i.e. defining the size of an array).
In the second case a is odr-used, the address is taken and the value at that location is read. The compiler must produce code that does those steps before passing the result to the optimizer. The optimizer in turn can detect that it is a constant and replace the whole operation with the value, but this is a bit more involved than the previous case where the compiler itself filled the value in.
Related
There's a series of problems in SPOJ about creating a function in a single line with some constraints. I've already solved the easy, medium and hard ones, but for the impossible one I keep getting Wrong Answer.
To sum it up, the problem requests to fill in the code of the return statement such that if x is 1, the return value should be 2. For other x values, it should return 3. The constraint is that the letter 'x' can't be used, and no more code can be added; one can only code that return statement. Clearly, to solve this, one must create a hack.
So I've used gcc's built in way to get the stack frame, and then decreased the pointer to get a pointer to the first parameter. Other than that, the statement is just a normal comparison.
On my machine it works fine, but for the cluster (Intel Pentinum G860) used by the online judge, it doesn't work, probably due to a different calling convention. I'm not sure I understood the processor's ABI (I'm not sure if the stack frame pointer is saved on the stack or only on a register), or even if I'm reading the correct ABI.
The question is: what would be the correct way to get the first parameter of a function using the stack?
My code is (it must be formatted this way, otherwise it's not accepted):
#include <stdio.h>
int count(int x){
return (*(((int*)__builtin_frame_address(0))-1) == 1) ? 2 : 3;
}
int main(i){
for(i=1;i%1000001;i++)
printf("%d %d\n",i,count(i));
return 0;
}
The question is: what would be the correct way to get the first
parameter of a function using the stack?
There is no way in portable manner. You must assume specific compiler, its settings and ABI, along with calling conventions.
The gcc compiler is likely to "lay down" an int local variable with -0x4 offset (assuming that sizeof(int) == 4). You might observe with most basic definition of count:
4 {
0x00000000004004c4 <+0>: push %rbp
0x00000000004004c5 <+1>: mov %rsp,%rbp
0x00000000004004c8 <+4>: mov %edi,-0x4(%rbp)
5 return x == 1 ? 2 : 3;
0x00000000004004cb <+7>: cmpl $0x1,-0x4(%rbp)
0x00000000004004cf <+11>: jne 0x4004d8 <count+20>
0x00000000004004d1 <+13>: mov $0x2,%eax
0x00000000004004d6 <+18>: jmp 0x4004dd <count+25>
0x00000000004004d8 <+20>: mov $0x3,%eax
6 }
0x00000000004004dd <+25>: leaveq
0x00000000004004de <+26>: retq
You may also see that %edi register holds first parameter. This is the case for AMD64 ABI (%edi is also not preserved between calls).
Now, with that knowledge, you might write something like:
int count(int x)
{
return *((int*)(__builtin_frame_address(0) - sizeof(int))) == 1 ? 2 : 3;
}
which can be obfuscated as:
return *((int*)(__builtin_frame_address(0)-sizeof(int)))==1?2:3;
However, trick is that such optimizing compiler may enthusiastically assume that since x is not referenced in count, it could simply skip moving into stack. For example it produces following assembly with -O flag:
4 {
0x00000000004004c4 <+0>: push %rbp
0x00000000004004c5 <+1>: mov %rsp,%rbp
5 return *((int*)(__builtin_frame_address(0)-sizeof(int)))==1?2:3;
0x00000000004004c8 <+4>: cmpl $0x1,-0x4(%rbp)
0x00000000004004cc <+8>: setne %al
0x00000000004004cf <+11>: movzbl %al,%eax
0x00000000004004d2 <+14>: add $0x2,%eax
6 }
0x00000000004004d5 <+17>: leaveq
0x00000000004004d6 <+18>: retq
As you can see mov %edi,-0x4(%rbp) instruction is now missing, thus the only way1 would be to access value of x from %edi register:
int count(int x)
{
return ({register int edi asm("edi");edi==1?2:3;});
}
but this method lacks of ability to "obfuscate", as whitespaces are needed for variable declaration, that holds value of %edi.
1) Not necessarily. Even if compiler decides to skip mov operation from register to stack, there is still a possibility to "force" it to do so, by asm("mov %edi,-0x4(%rbp)"); inline assembly. Beware though, compiler may have its revenge, sooner or later.
C standard does NOT require a stack in any implementation, so really your problem doesn't make any sense.
in the context of gcc, the behavior is different in x86 and x86-64(and any others).
in x86, parameters reside in stack, but in x86-64, the first 6 parameters(including the implicit ones) reside in registers. so basically you can't do the hacking as you say.
if you want to hack the code, you need to specify the platform you want to run on, otherwise, there is no point to answer your question.
A simple example that demonstrates my issue:
// test.c
#include <stdio.h>
int foo1(int i) {
i = i * 2;
return i;
}
void foo2(int i) {
printf("greetings from foo! i = %i", i);
}
int main() {
int i = 7;
foo1(i);
foo2(i);
return 0;
}
$ clang -o test -O0 -Wall -g test.c
Inside GDB I do the following and start the execution:
(gdb) b foo1
(gdb) b foo2
After reaching the first breakpoint, I disassemble:
(gdb) disassemble
Dump of assembler code for function foo1:
0x0000000000400530 <+0>: push %rbp
0x0000000000400531 <+1>: mov %rsp,%rbp
0x0000000000400534 <+4>: mov %edi,-0x4(%rbp)
=> 0x0000000000400537 <+7>: mov -0x4(%rbp),%edi
0x000000000040053a <+10>: shl $0x1,%edi
0x000000000040053d <+13>: mov %edi,-0x4(%rbp)
0x0000000000400540 <+16>: mov -0x4(%rbp),%eax
0x0000000000400543 <+19>: pop %rbp
0x0000000000400544 <+20>: retq
End of assembler dump.
I do the same after reaching the second breakpoint:
(gdb) disassemble
Dump of assembler code for function foo2:
0x0000000000400550 <+0>: push %rbp
0x0000000000400551 <+1>: mov %rsp,%rbp
0x0000000000400554 <+4>: sub $0x10,%rsp
0x0000000000400558 <+8>: lea 0x400644,%rax
0x0000000000400560 <+16>: mov %edi,-0x4(%rbp)
=> 0x0000000000400563 <+19>: mov -0x4(%rbp),%esi
0x0000000000400566 <+22>: mov %rax,%rdi
0x0000000000400569 <+25>: mov $0x0,%al
0x000000000040056b <+27>: callq 0x400410 <printf#plt>
0x0000000000400570 <+32>: mov %eax,-0x8(%rbp)
0x0000000000400573 <+35>: add $0x10,%rsp
0x0000000000400577 <+39>: pop %rbp
0x0000000000400578 <+40>: retq
End of assembler dump.
GDB obviously uses different offsets (+7 in foo1 and +19 in foo2), with respect to the beginning of the function, when setting the breakpoint. How can I determine this offset by myself without using GDB?
gdb uses a few methods to decide this information.
First, the very best way is if your compiler emits DWARF describing the function. Then gdb can decode the DWARF to find the end of the prologue.
However, this isn't always available. GCC emits it, but IIRC only when optimization is used.
I believe there's also a convention that if the first line number of a function is repeated in the line table, then the address of the second instance is used as the end of the prologue. That is if the lines look like:
< function f >
line 23 0xffff0000
line 23 0xffff0010
Then gdb will assume that the function f's prologue is complete at 0xfff0010.
I think this is the mode used by gcc when not optimizing.
Finally gdb has some prologue decoders that know how common prologues are written on many platforms. These are used when debuginfo isn't available, though offhand I don't recall what the purpose of that is.
As others mentioned, even without debugging symbols GDB has a function prologue decoder, i.e. heuristic magic.
To disable that, you can add an asterisk before the function name:
break *func
On Binutils 2.25 the skip algorithm on seems to be at: symtab.c:skip_prologue_sal, which breakpoints.c:break_command, the command definition, calls indirectly.
The prologue is a common "boilerplate" used at the start of function calls.
The prologues of foo2 is longer than that of foo1 by two instructions because:
sub $0x10,%rsp
foo2 calls another function, so it is not a leaf function. This prevents some optimizations, in particular it must reduce the rsp before another call to save room for the local state.
Leaf functions don't need that because of the 128 byte ABI red zone, see also: Why does the x86-64 GCC function prologue allocate less stack than the local variables?
foo1 however is a leaf function.
lea 0x400644,%rax
For some reason, clang stores the address of local string constants (stored in .rodata) in registers as part of the function prologue.
We know that rax contains "greetings from foo! i = %i" because it is then passed to %rdi, the first argument of printf.
foo1 does not have local strings constants however.
The other instructions of the prologue are common to both functions:
rbp manipulation is discussed at: What is the purpose of the EBP frame pointer register?
mov %edi,-0x4(%rbp) stores the first argument on the stack. This is not required on leaf functions, but clang does it anyways. It makes register allocation easier.
On ELF platforms like linux, debug information is stored in a separate (non-executable) section in the executable. In this separate section there is all the information that is needed by the debugger. Check the DWARF2 specification for the specifics.
I'm trying to call a simple piece of assembly (as a test for something more complex later), however when I try and run the program it crashes (This program has stopped responding).
main.c:
#include <stdio.h>
#include <stdlib.h>
extern int bar(int param);
int main()
{
int i=8;
i = bar(i);
printf("Hello world! - %i\n",i);
return 0;
}
bar.S
.file "bar.S"
.text
.align 8
.global bar
bar:
add %rdi,1000;
mov %rax,%rdi;
ret;
I'm concerned that it might be something to do with the way my compiler is configured (I'm more used to the hand holding of Visual Studio than dealing with a real environment).
You are using at&t syntax assembly but you are apparently not familiar with it. The simple solution would be to stick .intel_syntax noprefix into bar.S so you can use intel syntax.
At&t syntax uses reversed operand order and different effective address format, among other things. You got a crash because add %rdi, 1000 means add [1000], rdi in intel syntax, that is add the content of rdi to memory location 1000 which is out of bounds. Presumably you wanted to do add $1000, %rdi. To return the value you need to swap the operands of the mov %rax, %rdi too.
This code is incorrect:
add %rdi,1000;
mov %rax,%rdi;
Remember that in AT&T syntax the operand order is source, destination. Also, immediate values should be prefixed by a $. So the code should be:
add $1000,rdi
mov %rdi,%rax
I removed the semicolons since they're not necessary.
Also, since you seem to be compiling for Windows you should be following Microsoft's 64-bit calling convention, not the System V one. So the argument will be in rcx, not in rdi.
start with this
int bar ( int param )
{
return(param);
}
compile separately and link with main, and see what main is doing and passing, note main is using edi not rdi.
Now dissassemble the function above.
0000000000000000 <bar>:
0: 89 f8 mov %edi,%eax
2: c3 retq
edi and eax as well. Also note that this is ATT syntax not intel, so it is backwards the destination is on the right instead of the left.
so make different flavors of our own:
.global bark
bark:
mov %edi,%eax
addl $1000,%eax
retq
.global barf
barf:
addl $1000,%edi
mov %edi,%eax
retq
.global bar
bar:
add $1000,%edi
mov %edi,%eax
retq
assemble and link with main instead of the C version. And
./main
Hello world! - 1008
Basically, whatever compiler you are using, get it to generate similar/simple code which will follow its convention, then mimic that.
Note, I am using gcc not necessarily the same as what you are running, but the process is the same.
While studying Compiler optimizations, I simply compiled following piece of code :
#include<stdio.h>
struct fraction {
int num ;
int denum ;
};
int main()
{
struct fraction pi;
pi.num = 22;
pi.denum = 7;
return 0;
}
using
gcc test.c -o test
When I disassemble this, I get :
push %ebp
mov %esp,%ebp
sub $0x10,%esp
movl $0x16,-0x8(%ebp)
movl $0x7,-0x4(%ebp)
mov $0x0,%eax
leave
ret
But if I apply optimizations like :
gcc test.c -o test -O3
all I get in disassembly is :
push %ebp
xor %eax,%eax
mov %esp,%ebp
pop %ebp
ret
Without Optimizations the values 22 and 7 where clearly visible in disassembly and I could clearly understand how the code worked but where are those values now after optimizations ? how is the code working now ? Please somebody explain .
Since your code doesn't effectively do anything externally visible that would have unpredictable side effects, the creation of the struct is simply eliminated completely, and now all your code does is returning 0 from main().
(If you tell the compiler that it does indeed need to create the struct because it may be modified by someone/something else, it won't get rid of the code. Declare your variables as volatile and you'll see it in the assembler.)
The compiler determined that pi was never used outside of that scope, and since there were no side effects, optimized away that whole variable, along with the assignments.
The resulting assembly code pretty much loads 0 into eax, fiddles with stack pointers, and then returns.
Recently I have gotten interested into dis-assembling C code (very simple C code) and followed a tutorial that used Borland C++ Compiler v 5.5 (compiles C code just fine) and everything worked. Then I decided to try my own c code and compiled them in Dev C++ (which uses gcc). Upon opening it in IDA Pro I got a surprise, the asm of gcc was really different compared to Borland's. I expected some difference but the C code was EXTREMELY simple, so is it just that gcc doesn't optimize as much or is it that they use different default compiler settings?
The C Code
int main(int argc, char **argv)
{
int a;
a = 1;
}
Borland ASM
.text:00401150 ; int __cdecl main(int argc,const char **argv,const char *envp)
.text:00401150 _main proc near ; DATA XREF: .data:004090D0
.text:00401150
.text:00401150 argc = dword ptr 8
.text:00401150 argv = dword ptr 0Ch
.text:00401150 envp = dword ptr 10h
.text:00401150
.text:00401150 push ebp
.text:00401151 mov ebp, esp
.text:00401153 pop ebp
.text:00401154 retn
.text:00401154 _main endp
GCC ASM (UPDATED BELLOW)
.text:00401220 ; ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
.text:00401220
.text:00401220 ; Attributes: bp-based frame
.text:00401220
.text:00401220 public start
.text:00401220 start proc near
.text:00401220
.text:00401220 var_14 = dword ptr -14h
.text:00401220 var_8 = dword ptr -8
.text:00401220
.text:00401220 push ebp
.text:00401221 mov ebp, esp
.text:00401223 sub esp, 8
.text:00401226 mov [esp+8+var_8], 1
.text:0040122D call ds:__set_app_type
.text:00401233 call sub_401100
.text:00401238 nop
.text:00401239 lea esi, [esi+0]
.text:00401240 push ebp
.text:00401241 mov ebp, esp
.text:00401243 sub esp, 8
.text:00401246 mov [esp+14h+var_14], 2
.text:0040124D call ds:__set_app_type
.text:00401253 call sub_401100
.text:00401258 nop
.text:00401259 lea esi, [esi+0]
.text:00401259 start endp
GCC Update
Upon following the suggestion of JimR I went to see what sub_401100 is and then I followed that code to another and this seems to be the code (Am I correct in that assumption and if sowhy does GCC have all of its code in the main function?):
.text:00401100 sub_401100 proc near ; CODE XREF: .text:004010F1j
.text:00401100 ; start+13p ...
.text:00401100
.text:00401100 var_28 = dword ptr -28h
.text:00401100 var_24 = dword ptr -24h
.text:00401100 var_20 = dword ptr -20h
.text:00401100 var_1C = dword ptr -1Ch
.text:00401100 var_18 = dword ptr -18h
.text:00401100 var_C = dword ptr -0Ch
.text:00401100 var_8 = dword ptr -8
.text:00401100
.text:00401100 push ebp
.text:00401101 mov ebp, esp
.text:00401103 push ebx
.text:00401104 sub esp, 24h ; lpTopLevelExceptionFilter
.text:00401107 lea ebx, [ebp+var_8]
.text:0040110A mov [esp+28h+var_28], offset sub_401000
.text:00401111 call SetUnhandledExceptionFilter
.text:00401116 sub esp, 4 ; uExitCode
.text:00401119 call sub_4012E0
.text:0040111E mov [ebp+var_8], 0
.text:00401125 mov eax, offset dword_404000
.text:0040112A lea edx, [ebp+var_C]
.text:0040112D mov [esp+28h+var_18], ebx
.text:00401131 mov ecx, dword_402000
.text:00401137 mov [esp+28h+var_24], eax
.text:0040113B mov [esp+28h+var_20], edx
.text:0040113F mov [esp+28h+var_1C], ecx
.text:00401143 mov [esp+28h+var_28], offset dword_404004
.text:0040114A call __getmainargs
.text:0040114F mov eax, ds:dword_404010
.text:00401154 test eax, eax
.text:00401156 jz short loc_4011B0
.text:00401158 mov dword_402010, eax
.text:0040115D mov edx, ds:_iob
.text:00401163 test edx, edx
.text:00401165 jnz loc_4011F6
.text:004012E0 sub_4012E0 proc near ; CODE XREF: sub_401000+C6p
.text:004012E0 ; sub_401100+19p
.text:004012E0 push ebp
.text:004012E1 mov ebp, esp
.text:004012E3 fninit
.text:004012E5 pop ebp
.text:004012E6 retn
.text:004012E6 sub_4012E0 endp
Compiler output is expected to be different, sometimes dramatically different for the same source. In the same way that a toyota and a honda are different. Four wheels and some seats sure, but more different than the same when you look at the details.
Likewise the same compiler with different compiler options can and often will produce dramatically different output for the same source code. Even for what appears to be simple programs.
In the case of your simple program, which actually does not do anything (code does not affect the input, nor output, nor anything outside the function), a good optimized compiler will result in nothing but main: with a return of some random number since you didnt specify the return value. Actually it should give a warning or error. This is the biggest problem I have when I compare compiler output is making something simple enough to see what they are doing but something complicated enough that the compiler does more than just pre-compute the answer and return it.
In the case of x86, which I assume is what you are talking about here, being microcoded these days there is really no answer for good code vs bad code, each family of processor they change the guts around and what used to be fast is slow and what is now fast is slow on the old processor. So for compilers like gcc that have continued to evolve with the new cores, the optimization can be both generic to all x86es or specific to a particular family (resulting in different code despite max optimization).
With your new interest in disassembling, you will continue to see the similarities and differences and find out just how many different ways the same code can be compiled. the differences are expected, even for trivial programs. And I encourage you to try as many compilers as you can. Even in the gcc family 2.x, 3.x, 4.x and the different ways to build it will result in different code for what might be though thought of as the same compiler.
Good vs bad output is in the eyes of the beholder. Folks that use debuggers will want their code steppable and their variables watchable (in written code order). This makes for very big, bulky, and slow code (particularly for x86). And when you compile for release you end up with a completely different program which you have so far spent zero time debugging. Also optimizing for performance you take a risk of the compiler optimizing out something you wanted it to do (your example above, no variable will be allocated, no code to step through, even with minor optimization). Or worse, you expose the bugs in the compiler and your program simply doesnt work (this is why -O3 is discouraged for gcc). That and/or you find out the large number of places in the C standard whose interpretation is implementation defined.
Unoptimized code is easier to compile, as it is a bit more obvious. In the case of your example the expectation is a variable is allocated on the stack, some sort of stack pointer arrangement set up, the immediate 1 is eventually written to that location, stack cleaned up and function returns. Harder for compilers to get wrong and more likely that your program works as you intended. Detecting and removing dead code is the business of optimization and
that is where it gets risky. Often the risk is worth the reward. But that depends on the user, beauty is in the eye of the beholder.
Bottom line, short answer. Differences are expected (even dramatic differences). Default compile options vary from compiler to compiler. Experiment with the compile/optimization options and different compilers and continue to disassemble your programs in order to gain a better education about the language and the compilers you use. You are on the right track so far. In the case of the borland output, it detected that your program does nothing, no input variables are used, no return variables are used, nor related to the local variables, and no global variables or other external to the function resources are used. The integer a and the assignment of an immediate are dead code, a good optimizer will essentially remove/ignore both lines of code. So it bothered to setup a stack frame then clean it up which it didnt need to do, then returned. gcc looks to be setting up an exception handler which is perfectly fine even though it doesnt need to, start optimizing or use a function name other than main() and you should see different results.
What is most likely happening here is that Borland calls main from its start up code after initializing everything with code present in their run time lib.
The gcc code does not look like main to me, but like generated code that calls main. Disassemble the code at sub_401100 and see if it looks like your main proc.
First of all, make sure you have at least enabled the -O2 optimization flag to gcc, otherwise you get no optimization at all.
With this little example, you arn't really testing optimization, you're seeing how program initialization works, e.g. gcc calls __set_app_type to inform windows of the application type, as well as other initialization. e.g. sub_401100 registers atexit handlers for the runtime. Borland might call the runtime initialization beforehand, while gcc does it within main().
Here's the disassembly of main() that I get from MinGW's gcc 4.5.1 in gdb (I added a return 0 at the end so GCC wouldn't complain):
First, when the program is compiled with -O3 optimization:
(gdb) set disassembly-flavor intel
(gdb) disassemble
Dump of assembler code for function main:
0x00401350 <+0>: push ebp
0x00401351 <+1>: mov ebp,esp
0x00401353 <+3>: and esp,0xfffffff0
0x00401356 <+6>: call 0x4018aa <__main>
=> 0x0040135b <+11>: xor eax,eax
0x0040135d <+13>: mov esp,ebp
0x0040135f <+15>: pop ebp
0x00401360 <+16>: ret
End of assembler dump.
And with no optimizations:
(gdb) set disassembly-flavor intel
(gdb) disassemble
Dump of assembler code for function main:
0x00401350 <+0>: push ebp
0x00401351 <+1>: mov ebp,esp
0x00401353 <+3>: and esp,0xfffffff0
0x00401356 <+6>: sub esp,0x10
0x00401359 <+9>: call 0x4018aa <__main>
=> 0x0040135e <+14>: mov DWORD PTR [esp+0xc],0x1
0x00401366 <+22>: mov eax,0x0
0x0040136b <+27>: leave
0x0040136c <+28>: ret
End of assembler dump.
These are a little more complex than Borland's example, but not excessively.
Note, the calls to 0x4018aa are calls to a library/compiler supplied function to construct C++ objects. Here's a snippet from some GCC toolchain docs:
The actual calls to the constructors are carried out by a subroutine called __main, which is called (automatically) at the beginning of the body of main (provided main was compiled with GNU CC). Calling __main is necessary, even when compiling C code, to allow linking C and C++ object code together. (If you use '-nostdlib', you get an unresolved reference to __main, since it's defined in the standard GCC library. Include '-lgcc' at the end of your compiler command line to resolve this reference.)
I'm not sure what exactly IDA Pro is showing in your examples. IDA Pro labels what it's showing as start not main so I'd guess that JimR's answer is right - it's probably the runtime's initialization (perhaps the entry point as described in the .exe header - which is not main(), but the runtime initialization entry point).
Does IDA Pro understand gcc's debug symbols? Did you compile with the -g option so the debug symbols are generated?
It looks like the Borland compiler is recognizing that you never actually do anything with a and is just giving you the equivalent assembly for an empty main function.
Difference here is mosly not in compiled code, but in what disassembler shows to you.
You may think that main is the only function in your program but it is not. In fact your program is something like this:
void start()
{
... some initialization code here
int result = main();
... some deinitialization code here
ExitProcess(result);
}
IDA Pro knows how Borland works, so it can navigate directly to your main, but it doesn't know how gcc works so it shows you the true entry point of your program. You can see in Borland ASM that main is called from some other function. In GCC ASM you can go thru all of these sub_40xxx to find your main