1)How C structures get passed to function in assembly. I mean pass by value, not pass by reference.
2)By the way, how callees return structure to its callers?
I'm so sorry for the poor expression since I'm not a native English speaker.
I wrote a simple program to testify how C structures get passed to function. But the result was quite surpirsed. Some value was passed by register, but some value was passed by pushing them into stack. Here is the code.
source code
#include <stdio.h>
typedef struct {
int age;
enum {Man, Woman} gen;
double height;
int class;
char *name;
} student;
void print_student_info(student s) {
printf("age: %d, gen: %s, height: %f, name: %s\n",
s.age,
s.gen == Man? "Man":"Woman",
s.height, s.name);
}
int main() {
student s;
s.age = 10;
s.gen = Man;
s.height = 1.30;
s.class = 3;
s.name = "Tom";
print_student_info(s);
return 0;
}
asm
6fa: 55 push %rbp
6fb: 48 89 e5 mov %rsp,%rbp
6fe: 48 83 ec 20 sub $0x20,%rsp
702: c7 45 e0 0a 00 00 00 movl $0xa,-0x20(%rbp)
709: c7 45 e4 00 00 00 00 movl $0x0,-0x1c(%rbp)
710: f2 0f 10 05 00 01 00 movsd 0x100(%rip),%xmm0 # 818 <_IO_stdin_used+0x48>
717: 00
718: f2 0f 11 45 e8 movsd %xmm0,-0x18(%rbp)
71d: c7 45 f0 03 00 00 00 movl $0x3,-0x10(%rbp)
724: 48 8d 05 e5 00 00 00 lea 0xe5(%rip),%rax # 810 <_IO_stdin_used+0x40>
72b: 48 89 45 f8 mov %rax,-0x8(%rbp)
72f: ff 75 f8 pushq -0x8(%rbp)
732: ff 75 f0 pushq -0x10(%rbp)
735: ff 75 e8 pushq -0x18(%rbp)
738: ff 75 e0 pushq -0x20(%rbp)
73b: e8 70 ff ff ff callq 6b0 <print_student_info>
740: 48 83 c4 20 add $0x20,%rsp
744: b8 00 00 00 00 mov $0x0,%eax
749: c9 leaveq
74a: c3 retq
74b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
I expected structure was passed to function using the stack, but the code above showed it wasn't.
As has been pointed out by others - passing structures by value is generally frowned upon in most cases, but it is allowable by the C language nonetheless. I'll discuss the code you did use even though it isn't how I would have done it.
How structures are passed is dependent on the ABI / Calling convention. There are two primary 64-bit ABIs in use today (there may be others). The 64-bit Microsoft ABI and the x86-64 System V ABI. The 64-bit Microsoft ABI is simple as all structures passed by value are on the stack. In The x86-64 System V ABI (used by Linux/MacOS/BSD) is more complex as there is a recursive algorithm that is used to determine if a structure can be passed in a combination of general purpose registers / vector registers / X87 FPU stack registers. If it determines the structure can be passed in registers then the object isn't placed on the stack for the purpose of calling a function. If it doesn't fit in registers per the rules then it is passed in memory on the stack.
There is a telltale sign that your code isn't using the 64-bit Microsoft ABI as 32 bytes of shadow space weren't reserved by the compiler before making the function call so this is almost certainly a compiler targeting the x86-64 System V ABI. I can generate the same assembly code in your question using the online godbolt compiler with the GCC compiler with optimizations disabled.
Going through the algorithm for passing aggregate types (like structures and unions) is beyond the scope of this answer but you can refer to section 3.2.3 Parameter Passing, but I can say that this structure is passed on the stack because of a post cleanup rule that says:
If the size of the aggregate exceeds two eightbytes and the first eightbyte isn’t SSE or any other eightbyte isn’t SSEUP, the whole argument is passed in memory.
It happens to be that your structure would have attempted to have the first two 32-bit int values packed in a 64-bit register and the double placed in a vector register followed by the int being placed in a 64-bit register (because of alignment rules) and the pointer passed in another 64-bit register. Your structure would have exceeded two eightbyte (64-bit) registers and the first eightbyte (64-bit) register isn't an SSE register so the structure is passed on the stack by the compiler.
You have unoptimized code but we can break down the code into chunks. First is building the stack frame and allocating room for the local variable(s). Without optimizations enabled (which is the case here), the structure variable s will be built on the stack and then a copy of that structure will be pushed onto the stack to make the call to print_student_info.
This builds the stackframe and allocates 32 bytes (0x20) for local variables (and maintains 16-byte alignment). Your structure happens to be exactly 32 bytes in size in this case following natural alignment rules:
6fa: 55 push %rbp
6fb: 48 89 e5 mov %rsp,%rbp
6fe: 48 83 ec 20 sub $0x20,%rsp
Your variable s will start at RBP-0x20 and ends at RBP-0x01 (inclusive). The code builds and initializes the s variable (student struct) on the stack. A 32-bit int 0xa (10) for the age field is placed at the beginning of the structure at RBP-0x20. The 32-bit enum for Man is placed in field gen at RBP-0x1c:
702: c7 45 e0 0a 00 00 00 movl $0xa,-0x20(%rbp)
709: c7 45 e4 00 00 00 00 movl $0x0,-0x1c(%rbp)
The constant value 1.30 (type double) is stored in memory by the compiler. You can't move from memory to memory with one instruction on Intel x86 processors so the compiler moved the double value 1.30 from memory location RIP+0x100 to vector register XMM0 then moved the lower 64-bits of XMM0 to the height field on the stack at RBP-0x18:
710: f2 0f 10 05 00 01 00 movsd 0x100(%rip),%xmm0 # 818 <_IO_stdin_used+0x48>
717: 00
718: f2 0f 11 45 e8 movsd %xmm0,-0x18(%rbp)
The value 3 is placed on the stack for the class field at RBP-0x10:
71d: c7 45 f0 03 00 00 00 movl $0x3,-0x10(%rbp)
Lastly the 64-bit address of the string Tom (in the read only data section of the program) is loaded into RAX and then finally moved into the name field on the stack at RBP-0x08. Although the type for class was only 32-bits (an int type) it was padded to 8 bytes because the following field name has to be naturally aligned on an 8 byte boundary since a pointer is 8 bytes in size.
724: 48 8d 05 e5 00 00 00 lea 0xe5(%rip),%rax # 810 <_IO_stdin_used+0x40>
72b: 48 89 45 f8 mov %rax,-0x8(%rbp)
At this point we have a structure entirely built on the stack. The compiler then copies it by pushing all 32 bytes (using 4 64-bit pushes) of the structure onto the stack to make the function call:
72f: ff 75 f8 pushq -0x8(%rbp)
732: ff 75 f0 pushq -0x10(%rbp)
735: ff 75 e8 pushq -0x18(%rbp)
738: ff 75 e0 pushq -0x20(%rbp)
73b: e8 70 ff ff ff callq 6b0 <print_student_info>
Then typical stack cleanup and function epilogue:
740: 48 83 c4 20 add $0x20,%rsp
744: b8 00 00 00 00 mov $0x0,%eax
749: c9 leaveq
Important Note: The registers used were not for the purpose of passing parameters in this case, but were part of the code that initialized the s variable (struct) on the stack.
Returning Structures
This is dependent on the ABI as well, but I'll focus on the x86-64 System V ABI in this case since that is what your code is using.
By Reference: A pointer to a structure is returned in RAX. Returning pointers to structures is preferred.
By value: A structure in C that is returned by value forces the compiler to allocate additional space for the return structure in the caller and then the address of that structure is passed as a hidden first parameter in RDI to the function. The called function will place the address that was passed in RDI as a parameter into RAX as the return value when it is finished. Upon return from the function the value in RAX is a pointer to the address where the return structure is stored which is always the same address passed in the hidden first parameter RDI. The ABI discusses this in section 3.2.3 Parameter Passing under the subheading Returning of Values which says:
If the type has class MEMORY, then the caller provides space for the return
value and passes the address of this storage in %rdi as if it were the first
argument to the function. In effect, this address becomes a “hidden” first argument. This storage must not overlap any data visible to the callee through
other names than this argument.
On return %rax will contain the address that has been passed in by the
caller in %rdi.
It depends on the ABI for your system. On x86_64, most systems use SYSV ABI fo AMD64 -- the exception being Microsoft, who use their own non-standard ABI.
On either of those ABIs, this structure will be passed on the stack, which is what is happening in the code -- first s is constructed in main's stack frame, then a copy of it is pushed on the stack (the 4 pushq instructions).
There's no general answer to your question - every compiler works differently and can do things differently according to what optimisations you select. What you've observed is a common optimisation - the first few parameters of suitable types are passed in registers, with extra and/or complex ones passed on the stack.
Related
This question already has an answer here:
Why does the x86-64 System V calling convention pass args in registers instead of just the stack?
(1 answer)
Closed 8 months ago.
This C code
void test_function(int a, int b, int c, int d) {}
int main() {
test_function(1, 2, 3, 4);
return 0;
}
gets compiled by GCC (no flags, version 12.1.1, target x86_64-redhat-linux) into
0000000000401106 <test_function>:
401106: 55 push rbp
401107: 48 89 e5 mov rbp,rsp
40110a: 89 7d fc mov DWORD PTR [rbp-0x4],edi
40110d: 89 75 f8 mov DWORD PTR [rbp-0x8],esi
401110: 89 55 f4 mov DWORD PTR [rbp-0xc],edx
401113: 89 4d f0 mov DWORD PTR [rbp-0x10],ecx
401116: 90 nop
401117: 5d pop rbp
401118: c3 ret
0000000000401119 <main>:
401119: 55 push rbp
40111a: 48 89 e5 mov rbp,rsp
40111d: b9 04 00 00 00 mov ecx,0x4
401122: ba 03 00 00 00 mov edx,0x3
401127: be 02 00 00 00 mov esi,0x2
40112c: bf 01 00 00 00 mov edi,0x1
401131: e8 d0 ff ff ff call 401106 <test_function>
401136: b8 00 00 00 00 mov eax,0x0
40113b: 5d pop rbp
40113c: c3 ret
Why are additional registers (ecx, edx, esi, edi) used as intermediary storage for values 1, 2, 3, 4 instead of putting them into rbp directly?
"as intermediary storage": You confusion seems to be this part.
The ABI specifies that these function arguments are passed in the registers you are seeing (see comments under the question). The registers are not just used as intermediary. The value are never supposed to be put on the stack at all. They stay in the register the whole time, unless the function needs to reuse the register for something else or pass on a pointer to the function parameter or something similar.
What you are seeing in test_function is just an artifact of not compiling with optimizations enabled. The mov instructions putting the registers on the stack are pointless, since nothing is done with them afterwards. The stack pointer is just immediately restored and then the function returns.
The whole function should just be a single ret instruction. See https://godbolt.org/z/qG9GjMohY where -O2 is used.
Without optimizations enabled the compiler makes no attempt to remove instructions even if they are pointless and it always stores values of variables to memory and loads them from memory again, even if they could have been held in registers. That's why it is almost always pointless to look at -O0 assembly.
The registers are used for the arguments to call the function. The standard calling convertion calls for aguments to be placed in certain register, so the code you see in main puts the arguments into those registers and the code in test_function expects them in those registers and reads them from there.
So your follow-on question might be "why is test_function copying those argument on to the stack?". That's because you're compiling without optimization, so the compiler produces inefficient code, allocation space in the stack frame for every argument and local var and copying the arguments from their input register into the stack frame as part of the function prolog. If you were to use those values in th function, you would see it reading them from the stack frame locations even though they are probably still in the registers. If you compile with -O, you'll see the compiler get rid of all this, as the stack frame is not needed.
This may be a simple question but I am trying to understand how the compiler assigns all the elements in the array to zero for example:
''' array[5] = {0}; '''
I checked gcc.gnu.org for some explanation but was unsuccessful. Maybe I am looking in the wrong location. I also checked C99 and stumbled upon section 6.7.8/21, but it does not tell me how it is initialized. I also looked at other posts about this topic but most address what happens, not how it happens.
Anyway, my questions are:
How does the compiler assign all the elements to zero through this single expression?
Where can I find information on how the compiler works, specifically with the example above?
Thanks...
C 2018 6.7.9 19 says:
… all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.
C 2018 6.7.9 10 says, for objects with static storage duration that are not initialized explicitly:
— if it has pointer type, it is initialized to a null pointer;
— if it has arithmetic type, it is initialized to (positive or unsigned) zero;
— if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
— if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;
Thus, when you initialize part but not all of an array, the rules of the C standard require the compiler to initialize the other elements to zero for arithmetic types and a null pointer for pointer types.
Compilers are free to implement this as they choose. The method is not usually document, and it varies by situation: For a small array, the compiler might generate explicit instructions. For a large array, the compiler might generate a loop (including a single instruction that implements a loop to zero memory) or call a library routine. Or, if the compiler can detect that initialization is not actually necessary for correct program operation, it might omit it. For example, if the compiler can see that a subsequent assignment sets an array element to a value before that array element is used, the compiler might omit any zeroing of the array element and instead let the assignment be the initial value.
This is an implementation detail that will vary widely between different compilers and different source code, but the general answer is the same - the compiler will generate whatever code is necessary to zero out the uninitialized elements.
Here's how things shake out on my implementation (gcc version 7.3.1).
Source code (file init.c):
#include <stdio.h>
int main( void )
{
int arr[5] = {0};
for ( size_t i = 0; i < 5; i++ )
printf( "arr[%zu] = %d\n", i, arr[i] );
return 0;
}
Compiled using the command
gcc -o init -std=c11 -pedantic -Wall -Werror init.c
gives me the following machine code (viewed using objdump -d init):
00000000004004c7 <main>:
4004c7: 55 push %rbp
4004c8: 48 89 e5 mov %rsp,%rbp
4004cb: 48 83 ec 20 sub $0x20,%rsp ;; allocates space for arr and i, rounded up to multiple of 4
4004cf: 48 c7 45 e0 00 00 00 movq $0x0,-0x20(%rbp) ;; zeros out a[0] and a[1]
4004d6: 00
4004d7: 48 c7 45 e8 00 00 00 movq $0x0,-0x18(%rbp) ;; zeros out a[2] and a[3]
4004de: 00
4004df: c7 45 f0 00 00 00 00 movl $0x0,-0x10(%rbp) ;; zeros out a[4]
4004e6: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp) ;; zeros out i
...
Above a certain size, explicitly zeroing out one or two elements at a time gets cumbersome - when I change the array size to 50, the generated assembly code became
00000000004004c7 <main>:
4004c7: 55 push %rbp
4004c8: 48 89 e5 mov %rsp,%rbp
4004cb: 48 81 ec d0 00 00 00 sub $0xd0,%rsp
4004d2: 48 8d 95 30 ff ff ff lea -0xd0(%rbp),%rdx
4004d9: b8 00 00 00 00 mov $0x0,%eax
4004de: b9 19 00 00 00 mov $0x19,%ecx
4004e3: 48 89 d7 mov %rdx,%rdi
4004e6: f3 48 ab rep stos %rax,%es:(%rdi)
4004e9: 48 c7 45 f8 00 00 00 movq $0x0,-0x8(%rbp)
4004f0: 00
...
The rep stos instruction is essentially a loop (I don't speak x86 assembly fluently, somebody else will have to explain exactly how it works). It zeros out each element of the array in succession, rather than zeroing out the entire array in a single instruction if I understand it correctly.
I am working through the OverTheWire wargames and one of my exploits overwrites the return address of main with the address of system. I have then used the fact that at the point main returns, esp is still pointing at one of my local variables and hence I can fill it with the command I want system to run (e.g. sh;#).
My confusion comes from that I thought functions in C reclaim the stack before returning and hence at the point the return address is called the stack pointer would be pointing at the return address rather than at the local variables. However, my exploit works so it seems that my stack pointer is pointing at the local variables when the return address is called.
The main thing I have noticed about this particular challenge compared to others is that it calls exit(0) at the end, instead of just ending, so the assembly doesn't end with leave, which may be the reason for this behaviour.
I haven't included the actual code since it's quite long and I was hoping there was a general explanation for what I am seeing, but please let me know if the assembly would be useful.
#include <stdio.h>
int main ( void )
{
printf("hello\n");
return(0);
}
the interesting relevant parts.
0000000000400430 <main>:
400430: 48 83 ec 08 sub $0x8,%rsp
400434: bf d4 05 40 00 mov $0x4005d4,%edi
400439: e8 c2 ff ff ff callq 400400 <puts#plt>
40043e: 31 c0 xor %eax,%eax
400440: 48 83 c4 08 add $0x8,%rsp
400444: c3 retq
400445: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40044c: 00 00 00
40044f: 90 nop
0000000000400450 <_start>:
400450: 31 ed xor %ebp,%ebp
400452: 49 89 d1 mov %rdx,%r9
400455: 5e pop %rsi
400456: 48 89 e2 mov %rsp,%rdx
400459: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
40045d: 50 push %rax
40045e: 54 push %rsp
40045f: 49 c7 c0 c0 05 40 00 mov $0x4005c0,%r8
400466: 48 c7 c1 50 05 40 00 mov $0x400550,%rcx
40046d: 48 c7 c7 30 04 40 00 mov $0x400430,%rdi
400474: e8 97 ff ff ff callq 400410 <__libc_start_main#plt>
400479: f4 hlt
40047a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
For the most part there is nothing special about main nor printf, etc these are just functions that conform to the calling convention. As re-asked SO questions will show sometimes the compiler will add extra stack or other calls when it sees a main() that it doesnt otherwise. but still it is a function that needs to conform to the calling convention. As seen in this case where the stack pointer is put back where it was found.
Before an operating system (Linux, Windows, MacOS, etc) can even think about running a program it needs to allocate some space for that program and tag that memory for that program in some way depending on the features of the processor and the OS, etc. Then you load the program from whatever media, and launch it at the binary file specified and/or well known entry point. A clean exit of the program will cause the operating system to free that memory, which the .text, .data, .bss and stack are the trivial/obvious ones that just go away as their memory just goes away. Other items that may have been allocated and associated with this program, open files, runtime allocated (not stack) memory, etc can/should also be freed, depends on the design of the os and/or the C library as to how that happens.
In the above case we see the bootstrap calls main and main returns then hlt is hit which this is an application not kernel code so that should cause a trap that causes the OS to clean up. An explicit exit() should be no different than a printf() or puts() or fopen() or any other function that ultimately makes one or more syscalls to the operating system. All that you can possibly find for these types of operating systems (Linux, Windows, MacOS) is the syscall. The release of memory happens outside the program as the program does not have control over it, would be a chicken and egg problem, program frees mmu tables that is using to free mmu tables...
compile and disassemble the object for main rather than the whole program
0000000000000000 <main>:
0: 48 83 ec 08 sub $0x8,%rsp
4: bf 00 00 00 00 mov $0x0,%edi
9: e8 00 00 00 00 callq e <main+0xe>
e: 31 c0 xor %eax,%eax
10: 48 83 c4 08 add $0x8,%rsp
14: c3 retq
no surprise there same as before, all we needed to see to understand that the stack was cleaned up before return. and that main is not special:
#include <stdio.h>
int notmain ( void )
{
printf("hello\n");
return(0);
}
0000000000000000 <notmain>:
0: 48 83 ec 08 sub $0x8,%rsp
4: bf 00 00 00 00 mov $0x0,%edi
9: e8 00 00 00 00 callq e <notmain+0xe>
e: 31 c0 xor %eax,%eax
10: 48 83 c4 08 add $0x8,%rsp
14: c3 retq
Now if you are asking if there is an exit() within main then sure it wont hit the return point in main so the stack pointer is offset by whatever amount. but if main calls some function and that function calls some function then that function calls exit() then the stack pointer is left at the stack frame point of function number two plus whatever the call (this is an x86) plus the exit() stack frame adds to it. You cannot simply assume that when exit() is called, if it is called, what the stack pointer is pointing at. You would have to examine the disassembly around that call to exit() plus the exit() code and anything it calls, to figure this out.
There is a problem which confuses me a lot.
int main(int argc, char *argv[])
{
int i = 12345678;
return 0;
}
int main(int argc, char *argv[])
{
int i = 0;
return 0;
}
The programs have the same bytes in total. Why?
And where the literal value indeed stored? Text segment or other place?
The programs have the same bytes in total.Why?
There are two possibilities:
The compiler is optimizing out the variable. It isn't used anywhere and therefore doesn't make sense.
If 1. doesn't apply, the program sizes are equal anyway. Why shouldn't they? 0 is just as large in size as 12345678. Two variables of type T occupy the same size in memory.
And where the literal value indeed stored?
On the stack. Local variables are commonly stored on the stack.
Consider your bedroom.if you filled it with stuff or you left it empty,does that change the area of your bedroom?
the size of int is sizeof(int).it does not matter what value you store in it.
Because your program is optimized. At compile time, the compiler found out that i was useless and removed it.
If optimization didn't occurs, another simple explanation is that an int is the same size of another int.
TL;DR
First question: They're the same size because the instructions output of your program are roughly the same (more on that below). Further, they're the same size because the size(number of bytes) of your ints never change.
Second question: i variable is stored in your local variables frame which is in the function stack. The actual value you set to i is in the instructions (hardcoded) in the text-segment.
GDB and Assembly
I know you're using Windows, but consider these codes and output on Linux. I used the exactly same sources you posted.
For the first one, with i = 12345678, the actual main function is these computer instructions:
(gdb) disass main
Dump of assembler code for function main:
0x00000000004004ed <+0>: push %rbp
0x00000000004004ee <+1>: mov %rsp,%rbp
0x00000000004004f1 <+4>: mov %edi,-0x14(%rbp)
0x00000000004004f4 <+7>: mov %rsi,-0x20(%rbp)
0x00000000004004f8 <+11>:movl $0xbc614e,-0x4(%rbp)
0x00000000004004ff <+18>:mov $0x0,%eax
0x0000000000400504 <+23>:pop %rbp
0x0000000000400505 <+24>:retq
End of assembler dump.
As for the other program, with i = 0, main is:
(gdb) disass main
Dump of assembler code for function main:
0x00000000004004ed <+0>: push %rbp
0x00000000004004ee <+1>: mov %rsp,%rbp
0x00000000004004f1 <+4>: mov %edi,-0x14(%rbp)
0x00000000004004f4 <+7>: mov %rsi,-0x20(%rbp)
0x00000000004004f8 <+11>:movl $0x0,-0x4(%rbp)
0x00000000004004ff <+18>:mov $0x0,%eax
0x0000000000400504 <+23>:pop %rbp
0x0000000000400505 <+24>:retq
End of assembler dump.
The only difference between both codes is the actual value being stored in your variable. Lets go in a step by step trough these lines bellow (my computer is x86_64, so if your architecture is different, instructions may differ).
OPCODES
And the actual instructions of main (using objdump):
00000000004004ed <main>:
4004ed: 55 push %rbp
4004ee: 48 89 e5 mov %rsp,%rbp
4004f1: 89 7d ec mov %edi,-0x14(%rbp)
4004f4: 48 89 75 e0 mov %rsi,-0x20(%rbp)
4004f8: c7 45 fc 4e 61 bc 00 movl $0xbc614e,-0x4(%rbp)
4004ff: b8 00 00 00 00 mov $0x0,%eax
400504: 5d pop %rbp
400505: c3 retq
400506: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40050d: 00 00 00
To get the actual difference of bytes, using objdump -D prog1 > prog1_dump and objdump -D prog2 > prog2_dump and them diff prog1_dump prog2_dump:
2c2
< draft1: file format elf64-x86-64
---
> draft2: file format elf64-x86-64
51,58c51,58
< 400283: 00 bc f6 06 64 9f ba add %bh,-0x45609bfa(%rsi,%rsi,8)
< 40028a: 01 3b add %edi,(%rbx)
< 40028c: 14 d1 adc $0xd1,%al
< 40028e: 12 cf adc %bh,%cl
< 400290: cd 2e int $0x2e
< 400292: 11 77 5d adc %esi,0x5d(%rdi)
< 400295: 79 fe jns 400295 <_init-0x113>
< 400297: 3b .byte 0x3b
---
> 400283: 00 e8 add %ch,%al
> 400285: f1 icebp
> 400286: 6e outsb %ds:(%rsi),(%dx)
> 400287: 8a f8 mov %al,%bh
> 400289: a8 05 test $0x5,%al
> 40028b: ab stos %eax,%es:(%rdi)
> 40028c: 48 2d 3f e9 e2 b2 sub $0xffffffffb2e2e93f,%rax
> 400292: f7 06 53 df ba af testl $0xafbadf53,(%rsi)
287c287
< 4004f8: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
---
> 4004f8: c7 45 fc 4e 61 bc 00 movl $0xbc614e,-0x4(%rbp)
Note on address 0x4004f8 your number is there, 4e 61 bc 00 on prog2 and 00 00 00 00 on prog1, both 4 bytes which is equal to sizeof(int). The bytes c7 45 fc are the rest of the instructions (move some value into an offset of rbp). Also note that the first two sections that differ have the same size in bytes (21). So, there you go, although slightly different, they're the same size.
Step by step through Assembly Instructions
push %rbp; mov %rsp, %rbp: This is called setting up the Stack Frame, and is standard for all C functions (unless you tell gcc -fomit-frame-pointer). This enables you to access the stack and your local variables through a fixed register, in this case, rbp.
mov %edi, -0x14(%rbp): This moves the content of register edi into our local variables frame. Specifically, into offset -0x14
mov %rsi, -0x20(%rbp): Same here. But this time it saves rsi. This is part of the x86_64 calling convention (which now uses registers instead of pushing everything on stack like x86_32), but instead of keeping them in registers, we free the registers by saving the contents in our local variables frame - register are way faster and are the only way the CPU can actually process anything, so the more free registers we have, the better.
Note: edi is the 4-bytes part of the rsi register and from the x86_64 calling convention, we know that rsi register is used for the first argument. main's first argument is int argc, so it makes sense we use a 4-byte register to store it. rsi is the second argument, effectively the address of a pointer to pointer to chars (**argv). So, in 64bit architectures, that fits perfectly into a 64bit register.
<+11>: movl $0xbc614e,-0x4(%rbp): This is the actual line int i = 12345678 (0xbc614e = 12345678d). Now, note that the way we "move" that value is very similar to how we stored the main arguments. We use offset -0x4(%rbp) to store it memory, on the "local variables frame" (this answers your question on where it gets stored).
mov $0x0, %eax; pop %rbp; retq: Again, dull stuff to clear up the frame pointer and return (end the program since we're in main).
Note that on the second example, the only difference is the line <+11>: movl $0x0,-0x4(%rbp), which effectively stores the value zero - in C words, int i = 0.
So, by these instructions you can see that the main function of both programs gets translated to assembly in the exact the same way, so their sizes are the same in the end. (Assuming you compiled them the same way, because the compiler also puts lots of other things in the binaries, like data, library functions, etc. In linux, you can get a full disassembly dump using objdump -D program.
Note 2: In these examples, you cannot see how the computer subtracts values from rsp in order to allocate stack space, but that's how it's normally done.
Stack Representation
The stack would be like this for both cases (only the value of i would change, or the value at -0x4(%rbp))
| ~~~ | Higher Memory addresses
| |
+------------------+ <--- Address 0x8(%rbp)
| RETURN ADDRESS |
+------------------+ <--- Address 0x0(%rbp) // instruction push %rbp
| previous rbp |
+------------------+ <--- Address -0x4(%rbp)
| i=0x11223344 |
+------------------+ <---- Address -0x14(%rbp)
| argc |
+------------------+ <---- address -0x20(%rbp)
| argv |
+------------------+
| |
+~~~~~~~~~~~~~~~~~~+ Lower memory addresses
Note 3: The direction to where the stack grows depends on your architecture. How data gets written in memory also depends on your architecture.
Resources
What are the calling conventions for UNIX & Linux system calls on x86-64
Call Stack
GCC Optimization Options
Understanding the Stack
How does the stack work in assembly language?
x86_64 : is stack frame pointer almost useless?
I have some weird question about probably undefined behavior between C calling convention and 64/32 bits compilation.
First here is my code:
int f() { return 0; }
int main()
{
int x = 42;
return f(x);
}
As you can see I am calling f with an argument while f takes no parameters.
My first question was does this argument is really given to f while calling it.
The mysterious lines
After a little objdump I obtained curious results.
While passing x as argument of f:
00000000004004b6 <f>:
4004b6: 55 push %rbp
4004b7: 48 89 e5 mov %rsp,%rbp
4004ba: b8 00 00 00 00 mov $0x0,%eax
4004bf: 5d pop %rbp
4004c0: c3 retq
00000000004004c1 <main>:
4004c1: 55 push %rbp
4004c2: 48 89 e5 mov %rsp,%rbp
4004c5: 48 83 ec 10 sub $0x10,%rsp
4004c9: c7 45 fc 2a 00 00 00 movl $0x2a,-0x4(%rbp)
4004d0: 8b 45 fc mov -0x4(%rbp),%eax
4004d3: 89 c7 mov %eax,%edi
4004d5: b8 00 00 00 00 mov $0x0,%eax
4004da: e8 d7 ff ff ff callq 4004b6 <f>
4004df: c9 leaveq
4004e0: c3 retq
4004e1: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4004e8: 00 00 00
4004eb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
Without passing x as a argument:
00000000004004b6 <f>:
4004b6: 55 push %rbp
4004b7: 48 89 e5 mov %rsp,%rbp
4004ba: b8 00 00 00 00 mov $0x0,%eax
4004bf: 5d pop %rbp
4004c0: c3 retq
00000000004004c1 <main>:
4004c1: 55 push %rbp
4004c2: 48 89 e5 mov %rsp,%rbp
4004c5: 48 83 ec 10 sub $0x10,%rsp
4004c9: c7 45 fc 2a 00 00 00 movl $0x2a,-0x4(%rbp)
4004d0: b8 00 00 00 00 mov $0x0,%eax
4004d5: e8 dc ff ff ff callq 4004b6 <f>
4004da: c9 leaveq
4004db: c3 retq
4004dc: 0f 1f 40 00 nopl 0x0(%rax)
So as we can see:
4004d0: 8b 45 fc mov -0x4(%rbp),%eax
4004d3: 89 c7 mov %eax,%edi
happen when I call f with x but because I am not really good in assembly I don't really understand these lines.
The 64/32 bits paradoxe
Otherwise I tried something else and start printing the stack of my program.
Stack with x given to f (compiled in 64bits):
Address of x: ffcf115c
ffcf1128: 0 0
ffcf1130: -3206820 0
ffcf1138: -3206808 134513826
ffcf1140: 42 -3206820
ffcf1148: -145495616 134513915
ffcf1150: 1 -3206636
ffcf1158: -3206628 42
ffcf1160: -143903780 -3206784
Stack with x not given to f (compiled in 64bits):
Address of x: 3c19183c
3c191818: 0 0
3c191820: 1008277568 32766
3c191828: 4195766 0
3c191830: 1008277792 32766
3c191838: 0 42
3c191840: 4195776 0
And for some reason in 32bits x seems to be push on the stack.
Stack with x given to f (compiled in 32bits):
Address of x: ffdc8eac
ffdc8e78: 0 0
ffdc8e80: -2322772 0
ffdc8e88: -2322760 134513826
ffdc8e90: 42 -2322772
ffdc8e98: -145086016 134513915
ffdc8ea0: 1 -2322588
ffdc8ea8: -2322580 42
ffdc8eb0: -143494180 -2322736
Why the hell does x appear in 32 but not 64 ???
Code for printing: http://paste.awesom.eu/yayg/QYw6&ln
Why am I asking such stupid questions ?
First because I didn't found any standard that answer to my question
Secondly, think about calling a variadic function in C without the count of arguments given.
Last but not least, I think undefined behavior is fun.
Thank you for taking the time to read until here and for helping me understanding something or making me realize that my questions are pointless.
The answer is that, as you suspect, what you are doing is undefined behavior (in the case where the superfluous argument is passed).
The actual behavior in many implementations is harmless, however. An argument is prepared on the stack, and is ignored by the called function. The called function is not responsible for removing arguments from the stack, so there no harm (such as an unbalanced stack pointer).
This harmless behavior was what enabled C hackers to develop, once upon a time, a variable argument list facility that used to be under #include <varargs.h> in ancient versions of the Unix C library.
This evolved into the ANSI C <stdarg.h>.
The idea was: pass extra arguments into a function, and then march through the stack dynamically to retrieve them.
That won't work today. For instance, as you can see, the parameter is not in fact put into the stack, but loaded into the RDI register. This is the convention used by GCC on x86-64. If you march through the stack, you won't find the first several parameters. On IA-32, GCC passes parameters using the stack, by contrast: though you can get register-based behavior with the "fastcall" convention.
The va_arg macro from <stdarg.h> will correctly take into account the mixed register/stack parameter passing convention. (Or, rather, when you use the correct declaration for a variadic function, it will perhaps suppress the passage of the trailing arguments in registers, so that va_arg can just march through memory.)
P.S. your machine code might be easier to follow if you added some optimization. For instance, the sequence
4004c9: c7 45 fc 2a 00 00 00 movl $0x2a,-0x4(%rbp)
4004d0: 8b 45 fc mov -0x4(%rbp),%eax
4004d3: 89 c7 mov %eax,%edi
4004d5: b8 00 00 00 00 mov $0x0,%eax
is fairly obtuse due to what look like some wasteful data moves.
How arguments are passed to a function is dependent on the platform ABI (application binary interface). The ABI makes it possible to compile libraries with compiler X and use them with code compiled with compiler Y. None of this is defined by the standard.
There is no requirement by the standard that a "stack" even exist, much less that it be used for function calling.
The x86 chips had limited numbers of registers, and the ABI reflects that fact; the normal 32-bit x86 calling convention uses the stack for all arguments.
That is not the case with the 64-bit architecture, which has many more registers and uses some of them for the first few parameters. This significantly speeds up function calls.
Similarly, the Windows 32-bit "fastcall" calling convention passes a few arguments in registers. (In order to use a non-standard calling convention, you need to appropriately annotate the function declaration, and do so consistently where it is defined.)
You can find more information on various calling conventions in this Wikipedia article. The AMD64 ABI can be found on x86-64.org (PDF document). The original System V IA-32 ABI (the basis of the ABI used on Linux, xBSD and OS X) can still be accessed from www.sco.com (PDF document).
Undefined behaviour?
The code presented in the OP is definitely undefined behaviour.
In a function definition, an empty parameter list means that the function does not take any arguments. In a function declaration, an empty parameter fails to declare how many arguments the function takes.
§6.7.6.3/p.14: An empty list in a function declarator that is part of a definition of that function specifies that the function has no parameters. The empty list in a function declarator that is not part of a definition of that function specifies that no information about the number or types of the parameters is supplied.
When the function is eventually called, it must be called with the correct number of parameters:
§6.5.2.2/p.6: If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double... If the number of arguments does not equal the number of parameters, the behavior is undefined.
If the function is defined as a vararg function (with a trailing ellipsis), the vararg declaration must be visible wherever the function is called.
(Continuing from previous quote): If the function is defined with a type that includes a prototype, and either the prototype ends with an ellipsis (, ...) or the types of the arguments after promotion are not compatible with the types of the parameters, the behavior is undefined.