Why stack behaved so strangely? - c

I noted a strange behavior if a function takes an argument as plain struct like this:
struct Foo
{
int a;
int b;
};
int foo(struct Foo d)
{
return d.a;
}
compiled ARM Cortex-M3 using GCC 10.2 with Os optimization (or any other optimization level):
arm-none-eabi-gcc.exe -Os -mcpu=cortex-m3 -o test2.c.obj -c test2.c
generates a code where the argument struct's data saved on stack for no reason.
Disassembly of section .text:
00000000 <foo>:
0: b082 sub sp, #8
2: ab02 add r3, sp, #8
4: e903 0003 stmdb r3, {r0, r1}
8: b002 add sp, #8
a: 4770 bx lr
What is the reason to save struct's data on stack? It never use this data.
If I compile this code on RISC-V architecture it will be more interesting:
Disassembly of section .text:
00000000 <foo>:
0: 1141 addi sp,sp,-16
2: 0141 addi sp,sp,16
4: 8082 ret
Here just stack pointer moves forward and back again. Why? What is the reason?

The optimizer just doesn't "optimize it away", probably because its relying on a later part of the optimizer to handle it.
Try changing the code to
struct Foo
{
int a;
int b;
};
extern int extern_bar(struct Foo d);
int bar(struct Foo d)
{
return d.a;
}
#include <stdlib.h>
#include <stdio.h>
int main()
{
struct Foo baz;
baz.a = rand();
baz.b = rand();
printf("%d",bar(baz));
printf("%d",extern_bar(baz));
return bar(baz);
}
And compiling at godbolt.org under the different architectures. (Make sure to set -Os).
You can see it many cases completely optimizes away the call to bar and just uses the value in the register. While we don't show it, the linker can/could completely cull the function body of bar because it's unnecessary.
The call to extern_bar is still there because the compiler can't know what's going on inside of it, so it dutifully does what it needs to do to pass the struct by value according to the architecture ABI (most architectures push the struct on the stack). That means the function must copy it off the stack.
Apparently RISCV EABI is different and it passes smaller structs by value in registers. I guess it just has built in prologue/epilogue to push and pop the stack and the optimizer doesn't trim it away because its a sort of an edge case.
Or, who knows.
But, the short of it is: if size and cycles REALLY matter, don't trust the compiler. Also don't trust the compiler to keep doing what its doing. Changing revisions of the toolchain is asking for slight differences in code generation. Even not changing the toolchain revision could end up with different code based on heuristics you just aren't privy to or don't realize.

As per AAPCS standard defined by Arm, composite types are passed to function in stack. Refer section 6.4 of below document for details.
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#64parameter-passing

Related

Abnormal presence of code after __stack_chk_fail

I have an 64-bits ELF binary. I don't have its source code, don't know with which parameters it was compiled, and am not allowed to provide it here. The only relevant information I have is that the source is a .c file (so no hand-crafted assembly), compiled through a Makefile.
While reversing this binary using IDA, I stumbled upon an extremely weird construction I have never seen before and absolutely cannot explain. Here is the raw decompilation of one function with IDA syntax:
mov rax, [rsp+var_20]
xor rax, fs:28h
jnz location
add rsp, 28h
pop rbx
pop rbp
retn
location:
call __stack_chk_fail
nop dword ptr [rax]
db 2Eh
nop word ptr [rax+rax+00000000h]
...then dozens of instructions of normal and functional code
Here, we have a simple canary check, where we return if it is valid, and call __stack_chk_fail otherwise. Everything is perfectly normal. But after this check, there is still assembly, and of fully-functional code.
Looking at the manual of __stack_chk_fail, I made sure that this function does exit the program, and that there is no edge case where it could continue:
Description
The interface __stack_chk_fail() shall abort the function that called it with a message that a stack overflow has been detected. The program that called the function shall then exit.
I also tried to write this small C program, to search for a method to reproduce this:
#include <stdio.h>
#include <stdlib.h>
int foo()
{
int a = 3;
printf("%d\n", a);
return 0;
int b = 7;
printf("%d\n", b);
}
int main()
{
foo();
return 0;
}
But the code after the return is simply omitted by gcc.
It does not appear either that my binary is vulnerable to a buffer overflow that I could exploit to control rip and jump to the code after the canary check. I also inspected every call and jumps using objdump, and this code seems to never be called.
Could someone explain what is going on? How was this code generated in the first place? Is it a joke from the author of the binary?
I suspect you are looking at padding, followed by an unrelated function that IDA does not have a name for.
To test this hypothesis, I need the following additional information:
The address of the byte immediately after call __stack_chk_fail.
The next higher address that is the target of a call or jump instruction.
A raw hex dump of the bytes in between those two addresses.
The disassembly of four or five instructions starting at the next higher address that is the target of a call or jump instruction.

STM32 MCU GCC Compilation behavior

I have some misunderstanding about MCU GCC compilation behavior regarding function that return other things that 32bits value.
MCU: STM32 L0 Series (STM32L083)
GCC : gcc version 7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907] (GNU Tools for Arm Embedded Processors 7-2018-q2-update)
My code is optimized for size (with option -Os ). In my understanding, this will allow the gcc to use implicit -fshort-enums in order to pack enums.
I have two enum var, 1-byte wide :
enum eRadioMode radio_mode // (# 0x20003200)
enum eRadioFunction radio_func // (# 0x20003201)
And a function :
enum eRadioMode radio_get_mode(enum eRadioFunction _radio_func);
When i call this bunch of code :
radio_mode = radio_get_mode(radio_func);
It will produce this bunch of ASM at compile time:
; At this point :
; r4 value is 0x20003201 (Address of radio_func)
7820 ldrb r0, [r4, #0] ; GCC treat correctly r4 as a pointer to 1 byte wide var, no problem here
f7ff ffcd bl 80098a8 <radio_get_mode> ; Call to radio_get_mode()
4d1e ldr r5, [pc, #120] ; r5 is loaded with 0x20003200 (Address of radio_mode)
6028 str r0, [r5, #0] ; Why GCC use 'str' and not 'strb' at this point ?
The last line here is the problem : The value of r0, return value of radio_get_mode(), is stored into address pointed by r5, as a 32bit value.
Since radio_func is 1 byte after radio_mode, its value is overwritten by the second byte of r0 (that is always 0x00 since enum is only 1 byte wide).
As my function radio_get_mode is declared as returning 1 single byte, why GCC doesn't use instruction strb in order to save this single byte into the address pointed by r5 ?
I have tried :
radio_get_mode() as returning uint8_t : uint8_t radio_get_mode(enum eRadioFunction _radio_func);
Forcing cast to uint8_t : radio_mode = (uint8_t)radio_get_mode(radio_func);
Passing by a third var (but GCC cancel that useless move at compile - not so dumb) :
uint32_t r = radio_get_mode(radio_func);
radio_mode = (uint8_t) r;
But none of these solutions work.
Since the size optimization (-Os) is needed in first sight to reduce rom usage (and not ram - at this time of my project -) I found that the workaround gcc option -fno-short-enums will let the compiler to use 4 bytes by enum, discarding by the way any overlapping memory in this case.
But, in my opinion, this is a dirty way to hide a real problem here :
Is GCC not able to correctly handle other return size than 32bit ?
There is a correct way to do that ?
Thanks in advance.
EDIT :
I did NOT use -f-short-enums at any moment.
I'm sure that these enum has no value greater than 0xFF
I have tried to declare radio_mode and radio_func as uint8_t (aka unsigned char) : The problem is the same.
When compiled with -Os, Output.map is as follow :
Common symbol size file
...
radio_mode 0x1 src/radio/radio.o
radio_func 0x1 src/radio/radio.o
...
...
...
Section address label
0x2000319c radio_state
0x20003200 radio_mode
0x20003201 radio_func
0x20003202 radio_protocol
...
The output of the mapfile show clearly that radio_mode and radio_func is 1 byte wide and at following address.
When compiled without -Os, Output.map show clearly that enums become 4 byte wide (with address padded to 4).
When compiled with -Os and -fno-short-enums, do the same things that without -Os for all enums (This is why I guess -Os implies implicit -f-short-enums)
I will try to provide minimal reproducible example
My analysis of the problem is that I'm pretty sure it is a compiler bug. For me, this is clearly a memory overlapping. My question is more about the best things to do in order to avoid this - in the "best practice" way.
EDIT 2
It is my bad, I have re-tester changing all signature to uint8_t (aka unsigned char) and it work well.
#Peter Cordes seems to found the problem here : When using it, -Os is partly enabling -fshort-enums, getting some parts of GCC to treat it as size 1 and other parts to treat it as size 4.
ASM code using only uint8_t is :
; Same position than before
7820 ldrb r0, [r4, #0]
f7ff ffcd bl 80098a8 <radio_get_mode>
4d1e ldr r5, [pc, #120]
7028 strb r0, [r5, #0] ; Yes ! GCC use 'strb' and not 'str' like before !
To clarify :
It seems to have compiler bug when using -Os and enums. This is bad luck that two enum is at consecutive adresses that overlap.
Using -fno-short-enums in conjonction with -Os appear to be a good workaround IMO, since the problem is concerning only enum, and not all 1 byte var at all.
Thanks again.
ARM port abi defines none-aebi enums to be a variable sized type, linux-eabi to be standards fixed one.
That is the reason the behaviour you observe. It is not related to the optimisation.
In this example you can see how it works. https://godbolt.org/z/-mY_WY

Saving value of stack pointer in C variable in Code Composer studio for ARM cortex M4f

I would like to know a method that can store the value of the stack pointer onto a variable in C.
I find inline asm to be useless as it is so compiler specific, esp for something like this just use an asm function. for gnu assembler:
.thumb
.thumb_func
.globl GETSP
GETSP:
mov r0,sp
bx lr
in c
extern unsigned int GETSP ( void );
...
unsigned int sp;
...
sp=GETSP();
understand that each place you use this will give the same value every time. for many compilers the whole function will give the same result across the function, if the function is reused by other different functions then you might get the sp value to vary.

Swap with push / assignment / pop in GNU C inline assembly?

I was reading some answers and questions on here and kept coming up with this suggestion but I noticed no one ever actually explained "exactly" what you need to do to do it, On Windows using Intel and GCC compiler. Commented below is exactly what I am trying to do.
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
//assembly code begin
/*
push x into stack; < Need Help
x=y; < With This
pop stack into y; < Please
*/
//assembly code end
printf("x=%d,y=%d",x,y);
getchar();
return 0;
}
You can't just push/pop safely from inline asm, if it's going to be portable to systems with a red-zone. That includes every non-Windows x86-64 platform. (There's no way to tell gcc you want to clobber it). Well, you could add rsp, -128 first to skip past the red-zone before pushing/popping anything, then restore it later. But then you can't use an "m" constraints, because the compiler might use RSP-relative addressing with offsets that assume RSP hasn't been modified.
But really this is a ridiculous thing to be doing in inline asm.
Here's how you use inline-asm to swap two C variables:
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
asm("" // no actual instructions.
: "=r"(y), "=r"(x) // request both outputs in the compiler's choice of register
: "0"(x), "1"(y) // matching constraints: request each input in the same register as the other output
);
// apparently "=m" doesn't compile: you can't use a matching constraint on a memory operand
printf("x=%d,y=%d\n",x,y);
// getchar(); // Set up your terminal not to close after the program exits if you want similar behaviour: don't embed it into your programs
return 0;
}
gcc -O3 output (targeting the x86-64 System V ABI, not Windows) from the Godbolt compiler explorer:
.section .rodata
.LC0:
.string "x=%d,y=%d"
.section .text
main:
sub rsp, 8
mov edi, OFFSET FLAT:.LC0
xor eax, eax
mov edx, 1
mov esi, 2
#APP
# 8 "/tmp/gcc-explorer-compiler116814-16347-5i3lz1/example.cpp" 1
# I used "\n" instead of just "" so we could see exactly where our inline-asm code ended up.
# 0 "" 2
#NO_APP
call printf
xor eax, eax
add rsp, 8
ret
C variables are a high level concept; it doesn't cost anything to decide that the same registers now logically hold different named variables, instead of swapping the register contents without changing the varname->register mapping.
When hand-writing asm, use comments to keep track of the current logical meaning of different registers, or parts of a vector register.
The inline-asm didn't lead to any extra instructions outside the inline-asm block either, so it's perfectly efficient in this case. Still, the compiler can't see through it, and doesn't know that the values are still 1 and 2, so further constant-propagation would be defeated. https://gcc.gnu.org/wiki/DontUseInlineAsm
#include <stdio.h>
int main()
{
int x=1;
int y=2;
printf("x::%d,y::%d\n",x,y);
__asm__( "movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(y)
:"r"(x)
:"%eax"
);
printf("x::%d,y::%d\n",x,y);
return 0;
}
/* Load x to eax
Load eax to y */
If you want to exchange the values, it can also be done using this way. Please note that this instructs GCC to take care of the clobbered EAX register. For educational purposes, it is okay, but I find it more suitable to leave micro-optimizations to the compiler.
You can use extended inline assembly. It is a compiler feature whicg allows you to write assembly instructions within your C code. A good reference for inline gcc assembly is available here.
The following code copies the value of x into y using pop and push instructions.
( compiled and tested using gcc on x86_64 )
This is only safe if compiled with -mno-red-zone, or if you subtract 128 from RSP before pushing anything. It will happen to work without problems in some functions: testing with one set of surrounding code is not sufficient to verify the correctness of something you did with GNU C inline asm.
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
asm volatile (
"pushq %%rax\n" /* Push x into the stack */
"movq %%rbx, %%rax\n" /* Copy y into x */
"popq %%rbx\n" /* Pop x into y */
: "=b"(y), "=a"(x) /* OUTPUT values */
: "a"(x), "b"(y) /* INPUT values */
: /*No need for the clobber list, since the compiler knows
which registers have been modified */
);
printf("x=%d,y=%d",x,y);
getchar();
return 0;
}
Result x=2 y=1, as you expected.
The intel compiler works in a similar way, I think you have just to change the keyword asm to __asm__. You can find info about inline assembly for the INTEL compiler here.

Is GCC broken when taking the address of an argument on ARM7TDMI?

My C code snippet takes the address of an argument and stores it in a volatile memory location (preprocessed code):
void foo(unsigned int x) {
*(volatile unsigned int*)(0x4000000 + 0xd4) = (unsigned int)(&x);
}
int main() {
foo(1);
while(1);
}
I used an SVN version of GCC for compiling this code. At the end of function foo I would expect to have the value 1 stored in the stack and, at 0x40000d4, an address pointing to that value. When I compile without optimizations using the flag -O0, I get the expected ARM7TMDI assembly output (commented for your convenience):
.align 2
.global foo
.type foo, %function
foo:
# Function supports interworking.
# args = 0, pretend = 0, frame = 8
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
sub sp, sp, #8
str r0, [sp, #4] # 3. Store the argument on the stack
mov r3, #67108864
add r3, r3, #212
add r2, sp, #4 # 4. Address of the stack variable
str r2, [r3, #0] # 5. Store the address at 0x40000d4
add sp, sp, #8
bx lr
.size foo, .-foo
.align 2
.global main
.type main, %function
main:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
stmfd sp!, {r4, lr}
mov r0, #1 # 1. Pass the argument in register 0
bl foo # 2. Call function foo
.L4:
b .L4
.size main, .-main
.ident "GCC: (GNU) 4.4.0 20080820 (experimental)"
It clearly stores the argument first on the stack and from there stores it at 0x40000d4. When I compile with optimizations using -O1, I get something unexpected:
.align 2
.global foo
.type foo, %function
foo:
# Function supports interworking.
# args = 0, pretend = 0, frame = 8
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
sub sp, sp, #8
mov r2, #67108864
add r3, sp, #4 # 3. Address of *something* on the stack
str r3, [r2, #212] # 4. Store the address at 0x40000d4
add sp, sp, #8
bx lr
.size foo, .-foo
.align 2
.global main
.type main, %function
main:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
stmfd sp!, {r4, lr}
mov r0, #1 # 1. Pass the argument in register 0
bl foo # 2. Call function foo
.L4:
b .L4
.size main, .-main
.ident "GCC: (GNU) 4.4.0 20080820 (experimental)"
This time the argument is never stored on the stack even though something from the stack is still stored at 0x40000d4.
Is this just expected/undefined behaviour? Have I done something wrong or have I in fact found a Compiler Bug™?
Once you return from foo(), x is gone, and any pointers to it are invalid. Subsequently using such a pointer results in what the C standard likes to call "undefined behavior," which means the compiler is absolutely allowed to assume you won't dereference it, or (if you insist on doing it anyway) need not produce code that does anything remotely like what you might expect. If you want the pointer to x to remain valid after foo() returns, you must not allocate x on foo's stack, period -- even if you know that in principle, nothing has any reason to clobber it -- because that just isn't allowed in C, no matter how often it happens to do what you expect.
The simplest solution might be to make x a local variable in main() (or in whatever other function has a sufficiently long-lived scope) and to pass the address in to foo. You could also make x a global variable, or allocate it on the heap using malloc(), or set aside memory for it in some more exotic way. You can even try to figure out where the top of the stack is in some (hopefully) more portable way and explicitly store your data in some part of the stack, if you're sure you won't be needing for anything else and you're convinced that's what you really need to do. But the method you've been using to do that isn't sufficiently reliable, as you've discovered.
I actually don't think the compiler is wrong, although this is an odd case.
From a code analysis point-of-view, it sees you storing the address of a variable, but that address is never dereferenced and you don't jump outside of the function to external code that could use that address you stored. When you exit the function, the address of the stack is now considered bogus, since its the address of a variable that no longer exists.
The "volatile" keyword really doesn't do much in C, especially with regards to multiple threads or hardware. It just tells the compiler that it has to do the access. However, since there's no users of the value of x according to the data flow, there's no reason to store the "1" on the stack.
It probably would work if you wrote
void foo(unsigned int x) {
volatile int y = x;
*(volatile unsigned int*)(0x4000000 + 0xd4) = (unsigned int)(&y);
}
although it still may be illegal code, since the address of y is considered invalid as soon as foo returns, but the nature of the DMA system would be to reference that location independently of the program flow.
So you're putting the address of a local stack variable into the DMA controller to be used, and then you're returning from the function where the stack variable is available?
While this might work with your main() example (since you aren't writing on the stack again) it won't work in a 'real' program later - that value will be overwritten before or while DMA is accessing it when another function is called and the stack is used again.
You need to have a structure, or a global variable you can use to store this value while the DMA accesses it - otherwise it's just going to get clobbered!
-Adam
One thing to note is that according to the standard, casts are r-values. GCC used to allow it, but in recent versions has become a bit of a standards stickler.
I don't know if it will make a difference, but you should try this:
void foo(unsigned int x) {
volatile unsigned int* ptr = (unsigned int*)(0x4000000 + 0xd4);
*ptr = (unsigned int)(&x);
}
int main() {
foo(1);
while(1);
}
Also, I doubt you intended it, but you are storing the address of the function local x (which is a copy of the int you passed). You likely want to make foo take an "unsigned int *" and pass the address of what you really want to store.
So I feel a more proper solution would be this:
void foo(unsigned int *x) {
volatile unsigned int* ptr = (unsigned int*)(0x4000000 + 0xd4);
*ptr = (unsigned int)(x);
}
int main() {
int x = 1;
foo(&x);
while(1);
}
EDIT: finally, if you code breaks with optimizations it is usually a sign that your code is doing something wrong.
I'm darned if I can find a reference at the moment, but I'm 99% sure that you are always supposed to be able to take the address of an argument, and it's up to the compiler to finesse the details of calling conventions, register usage, etc.
Indeed, I would have thought it to be such a common requirement that it's hard to see there can be general problem in this - I wonder if it's something about the volatile pointers which have upset the optimisation.
Personally, I might do try this to see if it compiled better:
void foo(unsigned int x)
{
volatile unsigned int* pArg = &x;
*(volatile unsigned int*)(0x4000000 + 0xd4) = (unsigned int)pArg;
}
Tomi Kyöstilä wrote
development for the Game Boy Advance.
I was reading about its DMA system and
I experimented with it by creating
single-color tile bitmaps. The idea
was to have the indexed color be
passed as an argument to a function
which would use DMA to fill a tile
with that color. The source address
for the DMA transfer is stored at
0x40000d4.
That's a perfectly valid thing for you to do, and I can see how the (unexpected) code you got with the -O1 optimization wouldn't work.
I see the (expected) code you got with the -O0 optimization does what you expect -- it puts value of the color you want on the stack, and a pointer to that color in the DMA transfer register.
However, even the (expected) code you got with the -O0 optimization wouldn't work, either.
By the time the DMA hardware gets around to taking that pointer and using it to read the desired color, that value on the stack has (probably) long been overwritten by other subroutines or interrupt handlers or both.
And so both the expected and the unexpected code result in the same thing -- the DMA is (probably) going to fetch the wrong color.
I think you really intended to store the color value in some location where it stays safe until the DMA is finished reading it.
So a global variable, or a function-local static variable such as
// Warning: Three Star Programmer at work
// Warning: untested code.
void foo(unsigned int x) {
static volatile unsigned int color = x; // "static" so it's not on the stack
volatile unsigned int** dma_register =
(volatile unsigned int**)(0x4000000 + 0xd4);
*dma_register = &color;
}
int main() {
foo(1);
while(1);
}
Does that work for you?
You see I use "volatile" twice, because I want to force two values to be written in that particular order.
sparkes wrote
If you think you have found a bug in
GCC the mailing lists will be glad you
dropped by but generally they find
some hole in your knowledge is to
blame and mock mercilessly :(
I figured I'd try my luck here first before going to the GCC mailing list to show my incompetence :)
Adam Davis wrote
Out of curiosity, what are you trying
to accomplish?
I was trying out development for the Game Boy Advance. I was reading about its DMA system and I experimented with it by creating single-color tile bitmaps. The idea was to have the indexed color be passed as an argument to a function which would use DMA to fill a tile with that color. The source address for the DMA transfer is stored at 0x40000d4.
Will Dean wrote
Personally, I might do try this to see
if it compiled better:
void foo(unsigned int x)
{
volatile unsigned int* pArg = &x;
*(volatile unsigned int*)(0x4000000 + 0xd4) = (unsigned int)pArg;
}
With -O0 that works as well and with -O1 that is optimized to the exact same -O1 assembly I've posted in my question.
Not an answer, but just some more info for you.
We are running 3.4.5 20051201 (Red Hat 3.4.5-2) at my day job.
We have also noticed some of our code (which I can't post here) stops working when
we add the -O1 flag. Our solution was to remove the flag for now :(
In general I would say, that it is a valid optimization.
If you want to look deeper into it, you could compile with -da
This generates a .c.Number.Passname, where you can have a look at the rtl (intermediate representation within the gcc). There you can see which pass makes which optimization (and maybe disable just the one, you dont want to have)
I think Even T. has the answer. You passed in a variable, you cannot take the address of that variable inside the function, you can take the address of a copy of that variable though, btw that variable is typically a register so it doesnt have an address. Once you leave that function its all gone, the calling function loses it. If you need the address in the function you have to pass by reference not pass by value, send the address. It looks to me that the bug is in your code, not gcc.
BTW, using *(volatile blah *)0xabcd or any other method to try to program registers is going to bite you eventually. gcc and most other compilers have this uncanny way of knowing exactly the worst time to strike.
Say the day you change from this
*(volatile unsigned int *)0x12345 = someuintvariable;
to
*(volatile unsigned int *)0x12345 = 0x12;
A good compiler will realize that you are only storing 8 bits and there is no reason to waste a 32 bit store for that, depending on the architecture you specified, or the default architecture for that compiler that day, so it is within its rights to optimize that to an strb instead of an str.
After having been burned by gcc and others with this dozens of times I have resorted to forcing the issue:
.globl PUT32
PUT32:
str r1,[r0]
bx lr
PUT32(0x12345,0x12);
Costs a few extra clock cycles but my code continues to work yesterday, today, and will work tomorrow with any optimization flag. Not having to re-visit old code and sleeping peacefully through the night is worth a few extra clock cycles here and there.
Also if your code breaks when you compile for release instead of compile for debug, that also means it is most likely a bug in your code.
Is this just expected/undefined
behaviour? Have I done something wrong
or have I in fact found a Compiler
Bug™?
No bug just the defined behaviour that optimisation options can produce odd code which might not work :)
EDIT:
If you think you have found a bug in GCC the mailing lists will be glad you dropped by but generally they find some hole in your knowledge is to blame and mock mercilessly :(
In this case I think it's probably the -O options attempting shortcuts that break your code that need working around.

Resources