Can a C compiler assume that two different extern globals cannot be aliased to the same address?
In my case, I have a situation like this:
extern int array_of_int[], array_end;
void some_func(void)
{
int *t;
for (t = &array_of_int[0]; t != &array_end; t++)
{
...
The resulting binary compiled with optimization on does not test the t != &array_end condition before entering the loop. The compiler's optimization is that the loop must execute at least once since t cannot immediately equal &array_end at the outset.
Of course we found this the hard way. Apparently, some assembler hackery with linker sections resulted in a case where the two externs are the same address.
Thanks for any advice!
In short, yes, it's free to make that assumption. There is nothing special about extern variables. Two variables may not be aliases of each other. (If the answer was any different, think about the chaos that would ensue. extern int a, b could alias each other, which would make the semantics of any code using those variables completely insane!)
In fact, you are relying on undefined behaviour here, full stop. It is not valid to compare addresses of unrelated variables in this way.
The C99 says in 6.2.2 "Linages of identifiers":
An identifier declared in different
scopes or in the same scope more than
once can be made to refer to the same
object or function by a process called
linkage. (Footnote 21)
...
Footnote 21: There is no linkage
between different identifiers.
So unfortunately, this somewhat common assembly language trick (that I've used...) isn't well-defined. You'd be better to have your assembly module define array_end to be be actual pointer that the asm code loads with the address of the end of the array. That way the C code can be well-defined since the array_end pointer would be a separate object.
I think here's the fixed code
#include <stdio.h>
extern int array_of_int[];
extern int *array_end;
int main()
{
int *t;
for (t = &array_of_int[0]; t != array_end; t++)
{
printf("%i\n", *t);
}
return 0;
}
in another compilation unit:
int array_of_int[] = { }; // { 1,2,3,4 };
int *array_end = array_of_int + (sizeof(array_of_int)/sizeof(array_of_int[0]));
It compiles into this (-O3, gcc 4.4.5 i686)
080483f0 <main>:
80483f0: 55 push %ebp
80483f1: 89 e5 mov %esp,%ebp
80483f3: 83 e4 f0 and $0xfffffff0,%esp
80483f6: 53 push %ebx
80483f7: 83 ec 1c sub $0x1c,%esp
80483fa: 81 3d 24 a0 04 08 14 cmpl $0x804a014,0x804a024
8048401: a0 04 08
8048404: 74 2f je 8048435 <main+0x45>
8048406: bb 14 a0 04 08 mov $0x804a014,%ebx
804840b: 90 nop
804840c: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
8048410: 8b 03 mov (%ebx),%eax
8048412: 83 c3 04 add $0x4,%ebx
8048415: c7 44 24 04 00 85 04 movl $0x8048500,0x4(%esp)
804841c: 08
804841d: c7 04 24 01 00 00 00 movl $0x1,(%esp)
8048424: 89 44 24 08 mov %eax,0x8(%esp)
8048428: e8 d7 fe ff ff call 8048304 <__printf_chk#plt>
804842d: 39 1d 24 a0 04 08 cmp %ebx,0x804a024
8048433: 75 db jne 8048410 <main+0x20>
8048435: 83 c4 1c add $0x1c,%esp
8048438: 31 c0 xor %eax,%eax
804843a: 5b pop %ebx
804843b: 89 ec mov %ebp,%esp
804843d: 5d pop %ebp
804843e: c3 ret
804843f: 90 nop
Its very simple in case if we do it in arm code -
we have an attribute for it ..
#include <stdio.h>
int oldname = 1;
extern int newname __attribute__((alias("oldname"))); // declaration
void foo(void)
{
printf("newname = %d\n", newname); // prints 1
}
and only extern is enough here.
To import it in other files - its seamless.
for assembly file - you can use IMPORT command and you have alias there. :)
Related
This question already has answers here:
How dangerous is it to access an array out of bounds?
(12 answers)
Closed 3 years ago.
Excuse my bad English.
I have written down some lines to return max, min, sum of all values, and arrange all values in ascending order when five integers are input.
While writing, I mistakenly wrote 'num[4]' when I declared a INT array when I needed to put in 5 integers.
But as I compiled with TDM-GCC 4.9.2 64-bit release, it worked without any problem. As soon as I realized and changed to TDM-GCC 4.9.2 32-bit release, it did not.
This is my whole code;
#include<stdio.h>
int main()
{
int num[4],i,j,k,a,b,c,m,number,sum=0;
printf("This program returns max, min, sum of all values, and arranges all values in ascending order when five integers are input.\n");
printf("Please enter five integers.\n");
for(i=0;i<5;i++)
{
printf("Enter #%d\n",i+1);
scanf("%d",&num[i]);
}
//arrange all values
for(j=0;j<5;j++)
{
for(k=j+1;k<5;k++)
{
if(num[j]>num[k])
{
number=num[j];
num[j]=num[k];
num[k]=number;
}
}
}
//find maximum value
int max=num[0];
for(a=1;a<5;a++)
{
if(max<num[a])
{
max=num[a];
}
}
//find minimum value
int min=num[0];
for(b=1;b<5;b++)
{
if(min>num[b])
{
min=num[b];
}
}
//find sum of all values
for(c=0;c<5;c++)
{
sum=sum+num[c];
}
printf("Max Value : %d\n",max);//print max
printf("Min Value : %d\n",min);//print min
printf("Sum : %d\n",sum); //print sum
printf("In ascending order : "); //print all values in ascending order
for(m=0;m<5;m++)
{
printf("%d ",num[m]);
}
}
I am new to C and all kinds of programming, and don't know how to search these kind of problems. I know my way of asking like this here is very inappropriate, and I sincerely apologize to people who are irritated by these types of questioning posts. But this is my best try, so please don't blame, but I'm willing to accept any kind of advice or tips.
Thank you.
When allocating on the stack, GCC targeting 64-bit (and probably Clang) will align stack allocations to 8 bytes.
For 32-bit targets, it's only going to use 4 bytes of padding.
So when you compiled your program for 64-bit, an extra four bytes was used to pad the stack. That's why when you accessed that last integer, it didn't segfault.
To see this in action, we'll create a test file.
void test_func() {
int n[4];
int b = 11;
for (int i = 0; i < 4; i++) {
n[i] = b;
}
}
And we'll compile it for 32-bit and 64-bit.
gcc -g -c -m64 test.c -o test_64.o
gcc -g -c -m32 test.c -o test_32.o
And now we'll print the disassembly for each.
objdump -S test_64.o >test_64_dis.txt
objdump -S test_32.o >test_32_dis.txt
Here's the contents of the 64-bit version.
test_64.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <func>:
void func() {
0: f3 0f 1e fa endbr64
4: 55 push %rbp
5: 48 89 e5 mov %rsp,%rbp
8: 48 83 ec 30 sub $0x30,%rsp
c: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
13: 00 00
15: 48 89 45 f8 mov %rax,-0x8(%rbp)
19: 31 c0 xor %eax,%eax
int n[4];
int b = 11;
1b: c7 45 dc 0b 00 00 00 movl $0xb,-0x24(%rbp)
for (int i = 0; i < 4; i++) {
22: c7 45 d8 00 00 00 00 movl $0x0,-0x28(%rbp)
29: eb 10 jmp 3b <func+0x3b>
n[i] = b;
2b: 8b 45 d8 mov -0x28(%rbp),%eax
2e: 48 98 cltq
30: 8b 55 dc mov -0x24(%rbp),%edx
33: 89 54 85 e0 mov %edx,-0x20(%rbp,%rax,4)
for (int i = 0; i < 4; i++) {
37: 83 45 d8 01 addl $0x1,-0x28(%rbp)
3b: 83 7d d8 03 cmpl $0x3,-0x28(%rbp)
3f: 7e ea jle 2b <func+0x2b>
}
}
41: 90 nop
42: 48 8b 45 f8 mov -0x8(%rbp),%rax
46: 64 48 33 04 25 28 00 xor %fs:0x28,%rax
4d: 00 00
4f: 74 05 je 56 <func+0x56>
51: e8 00 00 00 00 callq 56 <func+0x56>
56: c9 leaveq
57: c3 retq
Here's the 32-bit version.
test_32.o: file format elf32-i386
Disassembly of section .text:
00000000 <func>:
void func() {
0: f3 0f 1e fb endbr32
4: 55 push %ebp
5: 89 e5 mov %esp,%ebp
7: 83 ec 28 sub $0x28,%esp
a: e8 fc ff ff ff call b <func+0xb>
f: 05 01 00 00 00 add $0x1,%eax
14: 65 a1 14 00 00 00 mov %gs:0x14,%eax
1a: 89 45 f4 mov %eax,-0xc(%ebp)
1d: 31 c0 xor %eax,%eax
int n[4];
int b = 11;
1f: c7 45 e0 0b 00 00 00 movl $0xb,-0x20(%ebp)
for (int i = 0; i < 4; i++) {
26: c7 45 dc 00 00 00 00 movl $0x0,-0x24(%ebp)
2d: eb 0e jmp 3d <func+0x3d>
n[i] = b;
2f: 8b 45 dc mov -0x24(%ebp),%eax
32: 8b 55 e0 mov -0x20(%ebp),%edx
35: 89 54 85 e4 mov %edx,-0x1c(%ebp,%eax,4)
for (int i = 0; i < 4; i++) {
39: 83 45 dc 01 addl $0x1,-0x24(%ebp)
3d: 83 7d dc 03 cmpl $0x3,-0x24(%ebp)
41: 7e ec jle 2f <func+0x2f>
}
}
43: 90 nop
44: 8b 45 f4 mov -0xc(%ebp),%eax
47: 65 33 05 14 00 00 00 xor %gs:0x14,%eax
4e: 74 05 je 55 <func+0x55>
50: e8 fc ff ff ff call 51 <func+0x51>
55: c9 leave
56: c3 ret
Disassembly of section .text.__x86.get_pc_thunk.ax:
00000000 <__x86.get_pc_thunk.ax>:
0: 8b 04 24 mov (%esp),%eax
3: c3 ret
You can see the compiler is generating 24 bytes and then 20 bytes respectively, if you look right after the variable declarations.
Regarding advice/tips you asked for, a good starting point would be to enable all compiler warnings and treat them as errors. In GCC and Clang, you'd use the -Wall -Wextra -Werror -Wfatal-errors.
I wouldn't recommend this if you're using the MSVC compiler, though, which often issues warnings about declarations from the header files it's distributed with.
Other answers cover what might he actually happening, by analyzing the generated assembly, but the really relevant explanation is: Indexing out of array bounds is Undefined Behavior in C. And that's kinda the end of story.
UB means, the code is "allowed" to do anything by C standard. It could do different thing every time it is run. It could do what you want it to do with no ill effects. It might do what you want, but then something completely unrelated behaves in a funny way. Compiler, operating system, or even phase of the moon could make a difference. Or not.
It is generally not useful to think about what actually happens with Undefined Behavior at C level. You can of course produce the assembly output of a particular compilation, and inspect what it does, but that is result of that one compilation. A new compilation might change things (even if you just do new build at different time, because value of __TIME__ macro depends on time...).
Consider the following code:
bool isFoo(const char* bar) {
return !strcmp(bar, "some_long_complicated_name");
}
Here, the string literal "some_long_complicated_name" is immediately passed to strcmp. Does this mean that everytime isFoo is called, accordingly many bytes of this string literal is allocated on that stack frame? If that was the case, wouldn't this:
const char FOO_NAME[] = "some_long_complicated_name";
bool isFoo(const char* bar) {
return !strcmp(bar, FOO_NAME);
}
be more efficient?
No, they are not inefficient. They are usually placed in the read-only memory part of your compiled binary, as their size is known at compile time and they can't be modified during runtime.
The expensive parts of strings (in terms of runtime performance) is the memory allocation. In both versions of isFoo, there is no memory allocation taking place, so I'd assume that it's quite hard to measure a performance difference between the two. FOO_NAME technically occupies some bytes somewhere, but is likely to be optimized away by the compiler.
Here are both versions on compiler explorer. The assembly with -O3 is not identical, but to be honest, I am not able to further exploit these results.
Constant strings do not get allocated, they are merely stored within the compiled binary and accessed via pointer. So no, there is no difference in speed between either approach.
There is absolute no change in the compiled file. It will result in the exact same binary!
If you compile both versions in a single executable like this:
bool isFoo(const char* bar) {
return !strcmp(bar, "some_long_complicated_name");
}
const char FOO_NAME[] = "some_long_complicated_name";
bool isFoo2(const char* bar) {
return !strcmp(bar, FOO_NAME);
}
int main()
{
isFoo( "nnn" );
isFoo2( "nnn" );
}
You can investigate the binary:
0000000000401156 <isFoo(char const*)>:
401156: 55 push %rbp
401157: 48 89 e5 mov %rsp,%rbp
40115a: 48 83 ec 10 sub $0x10,%rsp
40115e: 48 89 7d f8 mov %rdi,-0x8(%rbp)
401162: 48 8b 45 f8 mov -0x8(%rbp),%rax
401166: be c0 20 40 00 mov $0x4020c0,%esi
40116b: 48 89 c7 mov %rax,%rdi
40116e: e8 cd fe ff ff callq 401040 <strcmp#plt>
401173: 85 c0 test %eax,%eax
401175: 0f 94 c0 sete %al
401178: c9 leaveq
401179: c3 retq
000000000040117a <isFoo2(char const*)>:
40117a: 55 push %rbp
40117b: 48 89 e5 mov %rsp,%rbp
40117e: 48 83 ec 10 sub $0x10,%rsp
401182: 48 89 7d f8 mov %rdi,-0x8(%rbp)
401186: 48 8b 45 f8 mov -0x8(%rbp),%rax
40118a: be e0 20 40 00 mov $0x4020e0,%esi
40118f: 48 89 c7 mov %rax,%rdi
401192: e8 a9 fe ff ff callq 401040 <strcmp#plt>
401197: 85 c0 test %eax,%eax
401199: 0f 94 c0 sete %al
40119c: c9 leaveq
40119d: c3 retq
and here the strings are located:
4020c0 736f6d65 5f6c6f6e 675f636f 6d706c69 some_long_compli
4020d0 63617465 645f6e61 6d650000 00000000 cated_name......
4020e0 736f6d65 5f6c6f6e 675f636f 6d706c69 some_long_compli
4020f0 63617465 645f6e61 6d65006e 6e6e00 cated_name.nnn.
You also see the "nnn" string here!
The output was generated with:
objdump -s -S go | c++filt > x
Attention: You have to compile with -O0 as otherwise the compiler is smart enough to do all the stuff already in compile time. If I use -O2 none of the strings can be seen anymore and all call results are present already in the binary. Good to see how much a compiler can do in compile time!
So exactly NO difference, exactly the same binary code. But with standard optimization, no code generated for string compare, already done in compile time!
I modified main to see that the result of the comparison is used somewhere with:
int main()
{
volatile bool x;
x = isFoo( "nnn" );
x = isFoo2( "nnn" );
}
The resulting binary:
0000000000401060 <main>:
}
int main()
{
volatile bool x;
x = isFoo( "nnn" );
401060: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
x = isFoo2( "nnn" );
}
401065: 31 c0 xor %eax,%eax
x = isFoo2( "nnn" );
401067: c6 44 24 ff 00 movb $0x0,-0x1(%rsp)
}
40106c: c3 retq
As you can see, the result of the comparison is already present in the compiled code. No string is compared anymore in runtime.
For all questions regarding speed and memory usage: Measure! As you can see in the example, the results are different to most assumptions we see in other answers. If speed or memory footprint is really important: Take a look on the compiler generated results. Mostly it is much more perfect as you think!
i am write a mini os. And when i write this code to show time clock, its goes wrong
7 void timer_callback(pt_regs *regs)
8 {
9 static uint32_t tick = 0;
10 printf("Tick: %dtimes\n", tick);
11 tick++;
12 }
tick is initialise not with 0, but 1818389861. but if tick init with 0x01 or anything else zero, it's ok!!!
so i wirte a simple c file then objdump:
staic.o: file format elf32-i386
Disassembly of section .text:
00000000 <main>:
extern void printf(char *, int);
int main(){
0: 8d 4c 24 04 lea 0x4(%esp),%ecx
4: 83 e4 f0 and $0xfffffff0,%esp
7: ff 71 fc pushl -0x4(%ecx)
a: 55 push %ebp
b: 89 e5 mov %esp,%ebp
d: 51 push %ecx
e: 83 ec 04 sub $0x4,%esp
static int a = 1;
printf("%d\n", a);
11: a1 00 00 00 00 mov 0x0,%eax
16: 83 ec 08 sub $0x8,%esp
19: 50 push %eax
1a: 68 00 00 00 00 push $0x0
1f: e8 fc ff ff ff call 20 <main+0x20>
24: 83 c4 10 add $0x10,%esp
return 0;
27: b8 00 00 00 00 mov $0x0,%eax
}
2c: 8b 4d fc mov -0x4(%ebp),%ecx
2f: c9 leave
30: 8d 61 fc lea -0x4(%ecx),%esp
33: c3 ret
so strange, no memory used!!!
Update: let me say it clearly
the second static.c is an experiment, it was thought it show no memory used, but i was wrong, mov 0x0 %eab is. i confuse 0x0 and $0x0 /..\
my origin problem is why tick not succeed init with 0.(but can init with 1 or anyelsenumber).
i look up it again use gdb, ok, it do use memory like mov
eax,ds:0x106010,but the real strong thing is the memory x 0x106010 is not 0,but it should be, just as i said, if i let tick = 1 or anythingelse, memory do init as i want, that is the strange thing!
the tool: gdb ,objdump return different asm(different means,not formate),because, just learn os,not good at c, so i let it go,ignore it....
Memory is used, be sure of that; however, you won't find that memory in the .text section. Memory for static variables is allocated in either .bss (when zero-initialized; or, in case of C++, dynamically initialized) or .data (when non-zero initialized) section.
When dumping object files with objdump using the -d (disassembly) option, it is important to also use the -r (relocations) option. Without that, the disassembly you get is deceiving and makes little sense.
In your case, the instruction at addresses 11 and 1f must have relocations, at address 11, to the variable a and at address 1f, to the function printf. The instruction at address 11 loads the value from your variable a, without proper relocations it looks as if it loaded a value from address 0.
As to your original question, the value you get, 1818389861, or 0x6C626D65, is quite remarkable. I would bet that somewhere in your program you have a buffer overrun involving a string containing the subsequence embl.
As a side note, I would like to call your attention to the use of correct type specifications in printf calls. The type specification %d corresponds to the type int; on all modern mainstream architectures, int and int32_t are of the same size. However, that is not guaranteed to always be so. There are special type specifications for use with explicitly-sized types, for example, for an int32_t you use "PRId32":
uint32_t x;
printf("%"PRId32, x);
Assembly newbie here... I wrote the following simple C program:
void fun(int x, int* y)
{
char arr[4];
int* sp;
sp = y;
}
int main()
{
int i = 4;
fun(i, &i);
return 0;
}
I compiled it with gcc and ran objdump with -S, but the Assembly code output is confusing me:
000000000040055d <fun>:
void fun(int x, int* y)
{
40055d: 55 push %rbp
40055e: 48 89 e5 mov %rsp,%rbp
400561: 48 83 ec 30 sub $0x30,%rsp
400565: 89 7d dc mov %edi,-0x24(%rbp)
400568: 48 89 75 d0 mov %rsi,-0x30(%rbp)
40056c: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
400573: 00 00
400575: 48 89 45 f8 mov %rax,-0x8(%rbp)
400579: 31 c0 xor %eax,%eax
char arr[4];
int* sp;
sp = y;
40057b: 48 8b 45 d0 mov -0x30(%rbp),%rax
40057f: 48 89 45 e8 mov %rax,-0x18(%rbp)
}
400583: 48 8b 45 f8 mov -0x8(%rbp),%rax
400587: 64 48 33 04 25 28 00 xor %fs:0x28,%rax
40058e: 00 00
400590: 74 05 je 400597 <fun+0x3a>
400592: e8 a9 fe ff ff callq 400440 <__stack_chk_fail#plt>
400597: c9 leaveq
400598: c3 retq
0000000000400599 <main>:
int main()
{
400599: 55 push %rbp
40059a: 48 89 e5 mov %rsp,%rbp
40059d: 48 83 ec 10 sub $0x10,%rsp
int i = 4;
4005a1: c7 45 fc 04 00 00 00 movl $0x4,-0x4(%rbp)
fun(i, &i);
4005a8: 8b 45 fc mov -0x4(%rbp),%eax
4005ab: 48 8d 55 fc lea -0x4(%rbp),%rdx
4005af: 48 89 d6 mov %rdx,%rsi
4005b2: 89 c7 mov %eax,%edi
4005b4: e8 a4 ff ff ff callq 40055d <fun>
return 0;
4005b9: b8 00 00 00 00 mov $0x0,%eax
}
4005be: c9 leaveq
4005bf: c3 retq
First, in the line:
400561: 48 83 ec 30 sub $0x30,%rsp
Why is the stack pointer decremented so much in the call to 'fun' (48 bytes)? I assume it has to do with alignment issues, but I cannot visualize why it would need so much space (I only count 12 bytes for local variables (assuming 8 byte pointers))?
Second, I thought that in x86_64, the arguments to a function are either stored in specific registers, or if there are a lot of them, just 'above' (with a downward growing stack) the base pointer, %rbp. Like in the picture at http://en.wikipedia.org/wiki/Call_stack#Structure except 'upside-down'.
But the lines:
400565: 89 7d dc mov %edi,-0x24(%rbp)
400568: 48 89 75 d0 mov %rsi,-0x30(%rbp)
suggest to me that they are being stored way down from the base of the stack (%rsi and %edi are where main put the arguments, right before calling 'fun', and 0x30 down from %rbp is exactly where the stack pointer is pointing...). And when I try to do stuff with them , like assigning their values to local variables, it grabs them from those locations near the head of the stack:
sp = y;
40057b: 48 8b 45 d0 mov -0x30(%rbp),%rax
40057f: 48 89 45 e8 mov %rax,-0x18(%rbp)
... what is going on here?! I would expect them to grab the arguments from either the registers they were stored in, or just above the base pointer, where I thought they are 'supposed to be', according to every basic tutorial I read. Every answer and post I found on here related to stack frame questions confirms my understanding of what stack frames "should" look like, so why is my Assembly output so darn weird?
Because that stuff is a hideously simplified version of what really goes on. It's like wondering why Newtonian mechanics doesn't model the movement of the planets down to the millimeter. Compilers need stack space for all sorts of things. For example, saving callee-saved registers.
Also, the fundamental fact is that debug-mode compilations contain all sorts of debugging and checking machinery. The compiler outputs all sorts of code that checks that your code is correct, for example the call to __stack_chk_fail.
There are only two ways to understand the output of a given compiler. The first is to implement the compiler, or be otherwise very familiar with the implementation. The second is to accept that whatever you understand is a gross simplification. Pick one.
Because you're compiling without optimization, the compiler does lots of extra stuff to maybe make things easier to debug, which use lots of extra space.
it does not attempt to compress the stack frame to reuse memory for anything, or get rid of any unused things.
it redundantly copies the arguments into the stack frame (which requires still more memory)
it copies a 'canary' on to the stack to guard against stack smashing buffer overflows (even though they can't happen in this code).
Try turning on optimization, and you'll see more real code.
This is 64 bit code. 0x30 of stack space corresponds to 6 slots on the stack. You have what appears to be:
2 slots for function arguments (which happen also to be passed in registers)
2 slots for local variables
1 slot for saving the AX register
1 slot looks like a stack guard, probably related to DEBUG mode.
Best thing is to experiment rather than ask questions. Try compiling in different modes (DEBUG, optimisation, etc), and with different numbers and types of arguments and variables. Sometimes asking other people is just too easy -- you learn better by doing your own experiments.
I'm doing some experimenting with x86-64 assembly. Having compiled this dummy function:
long myfunc(long a, long b, long c, long d,
long e, long f, long g, long h)
{
long xx = a * b * c * d * e * f * g * h;
long yy = a + b + c + d + e + f + g + h;
long zz = utilfunc(xx, yy, xx % yy);
return zz + 20;
}
With gcc -O0 -g I was surprised to find the following in the beginning of the function's assembly:
0000000000400520 <myfunc>:
400520: 55 push rbp
400521: 48 89 e5 mov rbp,rsp
400524: 48 83 ec 50 sub rsp,0x50
400528: 48 89 7d d8 mov QWORD PTR [rbp-0x28],rdi
40052c: 48 89 75 d0 mov QWORD PTR [rbp-0x30],rsi
400530: 48 89 55 c8 mov QWORD PTR [rbp-0x38],rdx
400534: 48 89 4d c0 mov QWORD PTR [rbp-0x40],rcx
400538: 4c 89 45 b8 mov QWORD PTR [rbp-0x48],r8
40053c: 4c 89 4d b0 mov QWORD PTR [rbp-0x50],r9
400540: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400544: 48 0f af 45 d0 imul rax,QWORD PTR [rbp-0x30]
400549: 48 0f af 45 c8 imul rax,QWORD PTR [rbp-0x38]
40054e: 48 0f af 45 c0 imul rax,QWORD PTR [rbp-0x40]
400553: 48 0f af 45 b8 imul rax,QWORD PTR [rbp-0x48]
400558: 48 0f af 45 b0 imul rax,QWORD PTR [rbp-0x50]
40055d: 48 0f af 45 10 imul rax,QWORD PTR [rbp+0x10]
400562: 48 0f af 45 18 imul rax,QWORD PTR [rbp+0x18]
gcc very strangely spills all argument registers onto the stack and then takes them from memory for further operations.
This only happens on -O0 (with -O1 there are no problems), but still, why? This looks like an anti-optimization to me - why would gcc do that?
I am by no means a GCC internals expert, but I'll give it a shot. Unfortunately most of the information on GCCs register allocation and spilling seems to be out of date (referencing files like local-alloc.c that don't exist anymore).
I'm looking at the source code of gcc-4.5-20110825.
In GNU C Compiler Internals it is mentioned that the initial function code is generated by expand_function_start in gcc/function.c. There we find the following for handling parameters:
4462 /* Initialize rtx for parameters and local variables.
4463 In some cases this requires emitting insns. */
4464 assign_parms (subr);
In assign_parms the code that handles where each arguments is stored is the following:
3207 if (assign_parm_setup_block_p (&data))
3208 assign_parm_setup_block (&all, parm, &data);
3209 else if (data.passed_pointer || use_register_for_decl (parm))
3210 assign_parm_setup_reg (&all, parm, &data);
3211 else
3212 assign_parm_setup_stack (&all, parm, &data);
assign_parm_setup_block_p handles aggregate data types and is not applicable in this case and since the data is not passed as a pointer GCC checks use_register_for_decl.
Here the relevant part is:
1972 if (optimize)
1973 return true;
1974
1975 if (!DECL_REGISTER (decl))
1976 return false;
DECL_REGISTER tests whether the variable was declared with the register keyword. And now we have our answer: Most parameters live on the stack when optimizations are not enabled, and are then handled by assign_parm_setup_stack. The route taken through the source code before it ends up spilling the value is slightly more complicated for pointer arguments, but can be traced in the same file if you're curious.
Why does GCC spill all arguments and local variables with optimizations disabled? To help debugging. Consider this simple function:
1 extern int bar(int);
2 int foo(int a) {
3 int b = bar(a | 1);
4 b += 42;
5 return b;
6 }
Compiled with gcc -O1 -c this generates the following on my machine:
0: 48 83 ec 08 sub $0x8,%rsp
4: 83 cf 01 or $0x1,%edi
7: e8 00 00 00 00 callq c <foo+0xc>
c: 83 c0 2a add $0x2a,%eax
f: 48 83 c4 08 add $0x8,%rsp
13: c3 retq
Which is fine except if you break on line 5 and try to print the value of a, you get
(gdb) print a
$1 = <value optimized out>
As the argument gets overwritten since it's not used after the call to bar.
A couple of reasons:
In the general case, an argument to a function has to be treated like a local variable because it could be stored to or have its address taken within the function. Therefore, it is simplest to just allocate a stack slot for every arguments.
Debug information becomes much simpler to emit with stack locations: the argument's value is always at some specific location, instead of moving around between registers and memory.
When you're looking at -O0 code in general, consider that the compiler's top priorities are reducing compile-time as much as possible and generating high-quality debugging information.