I know that when you do certain things in a C program, the results are undefined. However, the compiler should not be generating invalid (machine) code, right? It would be reasonable if the code did the wrong thing, or if the code generated a segfault or something...
Is this supposed to happen according to the compiler spec, or is it a bug in the compiler?
Here's the (simple) program I'm using:
int main() {
char *ptr = 0;
*(ptr) = 0;
}
I'm compiling with -O3. That shouldn't generate invalid hardware instructions though, right? With -O0, I get a segfault when I run the code. That seems a lot more sane.
Edit: It's generating a ud2 instruction...
The ud2 instruction is a "valid instruction" and it stands for Undefined Instruction and generates an invalid opcode exception clang and apparently gcc can generate this code when a program invokes undefined behavior.
From the clang link above the rationale is explained as follows:
Stores to null and calls through null pointers are turned into a
__builtin_trap() call (which turns into a trapping instruction like "ud2" on x86). These happen all of the time in optimized code (as the
result of other transformations like inlining and constant
propagation) and we used to just delete the blocks that contained them
because they were "obviously unreachable".
While (from a pedantic language lawyer standpoint) this is strictly
true, we quickly learned that people do occasionally dereference null
pointers, and having the code execution just fall into the top of the
next function makes it very difficult to understand the problem. From
the performance angle, the most important aspect of exposing these is
to squash downstream code. Because of this, clang turns these into a
runtime trap: if one of these is actually dynamically reached, the
program stops immediately and can be debugged. The drawback of doing
this is that we slightly bloat code by having these operations and
having the conditions that control their predicates.
at the end of the day once your are invoking undefined behavior the behavior of your program is unpredictable. The philosophy here is that is probably better to crash hard and give the developer an indication that something is seriously wrong and allow them to debug fro the right point than to produce a program that seems to work but actually is broken.
As Ruslan notes, it is "valid" in the sense that it guaranteed to raise an invalid opcode exception as opposed to other unused sequences which may in the future become valid.
Related
Consider this demo programme:
#include <string.h>
#include <unistd.h>
typedef struct {
int a;
int b;
int c;
} mystruct;
int main() {
int TOO_BIG = getpagesize();
int SIZE = sizeof(mystruct);
mystruct foo = {
123, 323, 232
};
mystruct bar;
memset(&bar, 0, SIZE);
memcpy(&bar, &foo, TOO_BIG);
}
I compile this two ways:
gcc -O2 -o buffer -Wall buffer.c
gcc -g -o buffer_debug -Wall buffer.c
i.e. the first time with optimizations enabled, the second time with debug flags and no optimization.
The first thing to notice is that there are no warnings when compiling, despite getpagesize returning a value that will cause buffer overflow with memcpy.
Secondly, running the first programme produces:
*** buffer overflow detected ***: terminated
Aborted (core dumped)
whereas the second produces
*** stack smashing detected ***: terminated
Aborted (core dumped)
or, and you'll have to believe me here since I can't reproduce this with the demo programme, sometimes no warning at all. The programme doesn't even interrupt, it runs as normal. This was a behaviour I encountered with some more complex code, which made it difficult to debug until I realised that there was a buffer overflow happening.
My question is: why are there two different behaviours with different build flags? And why does this sometimes execute with no errors when built as a debug build, but always errors when built with optimizations?
..I can't reproduce this with the demo program, sometimes no warning at all...
The undefined behavior directives are very broad, there is no requirement for the compiler to issue any warnings for a program that exhibits this behavior:
why are there two different behaviours with different build flags? And why does this sometimes execute with no errors when built as a debug build, but always errors when built with optimizations?
Compiler optimizations tend to optimize away unused variables, if I compile your code with optimizations enabled I don't get a segmentation fault, looking at the assembly (link above), you'll notice that the problematic variables are optimized away, and memcpy doesn't get called, so there is no reason for it to not compile successfuly, the program exits with success code 0, whereas if don't optimize it, the undefined behavior manifests itself, and the program exits with code 139, classic segmentation fault exit code.
As you can see these results are different from yours and that is one of the features of undefined behavior, different compilers, systems or even compiler versions can behave in a completely different way.
Accessing memory behind what's been allocated is undefined behavior, which means the compiler is allowed to do anything. When there are no optimizations, the compiler may try to guess and do something reasonable. When optimizations are turned on, the compiler may take advantage of the fact that any behavior is allowed to do something that runs faster.
The first thing to notice is that there are no warnings when compiling, despite getpagesize returning a value that will cause buffer overflow with memcpy.
That is the programmer's responsibility to fix, not the compiler. You'll be very lucky if a compiler manages to find potential buffer overflows for you. Its job is to check that your code is valid C then translate it to machine code.
If you want a tool that catches bugs, they are called static analysers and that's a different type of program. At some extent, static analysis might be integrated in a compiler as a feature. There is one for clang, but most static analysers are commercial tools and not open source.
Secondly, running the first programme produces: ... whereas the second produces
Undefined behavior simply means there is no defined behavior. What is undefined behavior and how does it work?. Meaning there's not likely anything to learn from examining the results, no interesting mystery to solve. In one case it apparently accessed forbidden memory, in the other case it mangled a poor little "stack canary". The difference will be related to different memory layouts. Who cares - bugs are bugs. Focus on why the bug happened (you already know!), instead of trying to make sense of the undefined results.
Now when I run your code with optimizations actually enabled for real (gcc -O2 on an x86 Linux), the compiler gives me
main:
subq $8, %rsp
call getpagesize
xorl %eax, %eax
addq $8, %rsp
ret
With optimizations actually enabled, it didn't even bother calling memcpy & friends because there are no side effects and the variables aren't used, so they can be safely removed from the executable.
I have an embedded project that requires at some point that I write to address 0. So naturally I try:
*(int*)0 = 0 ;
But at optimisation level 2 or higher, the gcc compiler rubs its hands and says, in effect, "That is undefined behaviour! I can do what I like! Bwahaha!" and emits an invalid instruction to the code stream!
Here is my source file:
void f (void)
{
*(int*)0 = 0 ;
}
and here is the output listing:
.file "bug.c"
.text
.p2align 4,,15
.globl _f
.def _f; .scl 2; .type 32; .endef
_f:
LFB0:
.cfi_startproc
movl $0, 0
ud2 <-- Invalid instruction!
.cfi_endproc
LFE0:
.ident "GCC: (i686-posix-dwarf-rev0, Built by MinGW-W64 project) 7.3.0"
My question is: Why would anybody do this? What possible benefit could accrue from sabotaging code like this? Surely the obvious course of action is to issue a warning and carry on compiling?
I know the compiler is allowed to do this, I just wonder about the motivation of the compiler writer. It cost me two days and four engineering samples to track this down, so I'm a little peeved.
Edited to add: I have worked around this by using assembly language. So I'm not looking for solutions. I'm just curious why anybody would think this compiler behaviour was a good idea.
(Disclaimer: I'm not an expert on GCC internals, and this is more of a "post hoc" attempt to explain its behavior. But maybe it will be helpful.)
the gcc compiler rubs its hands and says, in effect, "That is undefined behaviour! I can do what I like! Bwahaha!" and emits an invalid instruction to the code stream!
I won't deny that there are cases where GCC does more or less that, but here there's a little more going on, and there is some method to its madness.
As I understand it, GCC isn't treating the null dereference as totally undefined here; it is making some assumptions about what it does. Its handling of null dereferences is controlled by a flag called -fdelete-null-pointer-checks, which is probably enabled by default when you turn on optimizations. From the manual:
-fdelete-null-pointer-checks
Assume that programs cannot safely dereference null pointers, and that no code or data element resides at address zero. This option
enables simple constant folding optimizations at all optimization
levels. In addition, other optimization passes in GCC use this flag to
control global dataflow analyses that eliminate useless checks for
null pointers; these assume that a memory access to address zero
always results in a trap, so that if a pointer is checked after it has
already been dereferenced, it cannot be null.
Note however that in some environments this assumption is not true. Use -fno-delete-null-pointer-checks to disable this optimization
for programs that depend on that behavior.
This option is enabled by default on most targets. On Nios II ELF, it defaults to off. On AVR, CR16, and MSP430, this option is
completely disabled.
Passes that use the dataflow information are enabled independently at different optimization levels.
So, if you are intending to actually access address 0, or if for some other reason your code will go on executing after the dereference, then you want to disable this with -fno-delete-null-pointer-checks. That will achieve the "carry on compiling" part of what you want. It will not give you warnings, however, presumably under the assumption that such dereferences are intentional.
But under default options, why are you seeing the generated code that you do, with the undefined instruction, and why isn't there a warning? I would guess that GCC's logic is running as follows:
Because -fdelete-null-pointer-checks is in effect, the compiler assumes that execution will not continue past the null dereference, but instead will trap. How the trap will be handled, it doesn't know: maybe program termination, maybe a signal or exception handler, maybe a longjmp up the stack. The null dereference itself is emitted as requested, perhaps under the assumption that you are intentionally exercising your trap handler. But either way, whatever code comes after the null dereference is now unreachable.
So now it does what any reasonable optimizing compiler does with unreachable code: it doesn't emit it. In your case, that's nothing but a ret, but whatever it is, as far as GCC is concerned it would just be wasted bytes of memory, and should be omitted.
You might think you should get a warning here, but GCC has a longstanding design decision not to warn about unreachable code, on the grounds that such warnings tended to be inconsistent and the false positives would do more harm than good. See for instance https://gcc.gnu.org/legacy-ml/gcc-help/2011-05/msg00360.html.
However, as a safety feature, GCC emits an undefined instruction (ud2 on x86) in place of the omitted unreachable code. The idea, I believe, is that just in case execution somehow does continue past the null dereference, it is better for the program to die, than to go off into the weeds and try to execute whatever memory contents happen to come next. (And indeed this can happen even on systems that do unmap the zero page; for instance, if you do struct huge *p = NULL; p->x = 0;, GCC understands this as a null dereference, even though p->x may not be on the zero page at all, and could conceivably be located at an accessible address.)
There is a warning flag, -Wnull-dereference, that will trigger a warning on your blatant null dereference. However, it only works if -fdelete-null-pointer-checks is enabled.
When would GCC's behavior be useful? Here's an example, maybe contrived, but it might get the idea across. Imagine your program has some allocation function that might fail:
struct foo *p = get_foo();
// do other stuff for a while
if (!p) {
// 5000 lines of elaborate backup plan in case we can't get a foo
}
frob(p->bar);
Now imagine that you redesign get_foo() so that it can't fail. You forget to take out your "backup plan" code, but you go ahead and use the returned object right away:
struct foo *p = get_foo();
frob(p->bar);
// do other stuff for a while
if (!p) {
// 5000 lines of elaborate backup plan in case we can't get a foo
}
The compiler doesn't know, a priori, that get_foo() will always return a valid pointer. But it can see that you've dereferenced it, and thus can assume that execution will only continue past that point if the pointer was not null. Therefore, it can tell that the elaborate backup plan is unreachable and should be omitted, which will save you a lot of bloat in your binary.
Incidentally, the situation with clang. Although as Eric Postpischil points out you do get a warning, what you don't get is an actual load from address 0: clang omits it and just emits ud2. This is what "doing whatever it likes" would really look like, and if you were hoping to exercise your page zero trap handler, you are out of luck.
In describing Undefined Behavior, the Standard refers to it as resulting "upon use of a nonportable or erroneous program construct or of erroneous data,", and the authors of the Standard clarify their intentions more clearly in the published Rationale: "Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior." The question of when to extend the language in such fashion--treating various forms of UB as non-portable but correct, was left as a Quality of Implementation issue outside the Standard's jurisdiction.
The maintainers of clang and gcc take the view that the phrase "non-portable or erroneous" should be interpreted as synonymous with "erroneous", since the Standard would not forbid such an interpretation. If a compiler will never be used to process non-portable programs that will never be fed erroneous data, such an interpretation will sometimes allow them to process some strictly conforming programs which are fed exclusively valid data more quickly than would otherwise be possible, at the expense of making them less suitable for other purposes. I personally would view the range of programs that a compiler can usefully process reasonably efficiently as a much better metric of quality than the efficiency with which a compiler can process strictly-conforming programs, but people who are using compilers for different purposes may have different views about what would make a compiler more or less useful for those purposes.
I cant imagine what the compiler does when for instance there is no lvalue for instance like this :
number>>1;
My intuition tells me that the compiler will discard this line from compilation due to optimizations and if the optimization is removed what happens?
Does it use a register to do the manipulation? or does it behave like if it was a function call so the parameters are passed to the stack, and than the memory used is marked as freed? OR does it transform that to an NOP operation?
Can I see what is happening using the VS++ debugger?
Thank your for your help.
In the example you give, it discards the operation. It knows the operation has no side effects and therefore doesn't need to emit the code to execute the statement in order to produce a correct program. If you disable optimizations, the compiler may still emit code. If you enable optimizations, the compiler may still emit code, too -- it's not perfect.
You can see the code the compiler emits using the /FAsc command line option of the Microsoft compiler. That option creates a listing file which has the object code output of the compiler interspersed with the related source code.
You can also use "view disassembly" in the debugger to see the code generated by the compiler.
Using either "view disassembly" or /FAsc on optimized code, I'd expect to see no emitted code from the compiler.
Assuming that number is a regular variable of integer type (not volatile) then any competent optimizing compiler (Microsoft, Intel, GNU, IBM, etc) will generate exactly NOTHING. Not a nop, no registers are used, etc.
If optimization is disabled off (in a "debug build"), then the compiler may well "do what you asked for", because it doesn't realize it doesn't have side-effects from the code. In this case, the value will be loaded into a register, shifted right once. The result of this is not stored anywhere. The compiler will perform "useless code elimination" as one of the optimization steps - I'm not sure which one, but for this sort of relatively simple thing, I expect the compiler to figure out with fairly basic optimization settings. Some cases, where loops are concerned, etc, the compiler may not optimize away the code until some more advanced optimization settings are enabled.
As mentioned in the comments, if the variable is volatile, then the read of the memory reprsented by number will have to be made, as the compiler MUST read volatile memory.
In Visual studio, if you "view disassembly", it should show you the code that the compiler generated.
Finally, if this was C++, there is also the possibility that the variable is not a regular integer type, the function operator>> is being called when this code is seen by the compiler - this function may have side-effects besides returning a result, so may well have to be performed. But this can't be the case in C, since there is no operator overloading.
We have a code written in C that sometimes doesn’t handle zero pointers very well.
The code was originally written on Solaris and such pointers cause a segmentation fault. Not ideal but better than ploughing on.
Our experience is that if you read from a null pointer on AIX you get 0. If you use the xlc compiler you can add an option -qcheck=all to trap these pointers. But we use gcc (and want to continue using that compiler). Does gcc provide such an option?
Does gcc provide such an option?
I'm sheepishly volunteering the answer no, it doesn't. Although I can't cite the absence of information regarding gcc and runtime NULL checks.
The problem you're tackling is that you're trying to make undefined behavior a little more defined in a program that's poorly-written.
I recommend that you bite the bullet and either switch to xlc or manually add NULL checks to the code until the bad behavior has been found and removed.
Consider:
Making a macro to null-check a pointer
Adding that macro after pointer assignments
Adding that macro to the entry point of functions that accept pointers
As bugs are removed, you can begin to remove these checks.
Please do us all a favor and add proper NULL checks to your code. Not only will you have a slight gain in performance by checking for NULL only when needed, rather than having the compiler perform the check everywhere, but your code will be more portable to other platforms.
And let's not mention the fact that you will be more likely to print a proper error message rather than have the compiler drop some incomprehensible stack dump/source code location/error code that will not help your users at all.
AIX uses the concept of a NULL page. Essentially, NULL (i.e. virtual address 0x0) is mapped to a location that contains a whole bunch of zeros. This allows string manipulation code e.t.c. to continue despite encountering a NULL pointer.
This is contrary to most other Unix-like systems, but it is not in violation of the C standard, which considers dereferencing NULL an undefined operation. In my opinion, though, this is woefully broken: it takes an application that would crash violently and turns it into one that ignores programming errors silently, potentially producing totally incorrect results.
As far as I know, GCC has no options to work around fundamentally broken code. Even historically supported patterns, such as writable string literals, have been slowly phased out in newer GCC versions.
There might be some support when using memory debugging options such as -fmudflap, but I don't really know - in any case you should not use debugging code in production systems, especially for forcing broken code to work.
Bottom line: I don't think that you can avoid adding explicit NULL checks.
Unfortunately we now come to the basic question: Where should the NULL checks be added?. I suppose having the compiler add such checks indiscriminately would help, provided that you add an explicit check when you discover an issue.
Unfortunately, there is no Valgrind support for AIX. If you have the cash, you might want to have a look at IBM Rational Purify Plus for AIX - it might catch such errors.
It might also be possible to use xlc on a testing system and gcc for everything else, but unfortunately they are not fully compatible.
I am programming an algorithm that contains 4 nested for loops. The problem is at at each level a pointer is updated. The innermost loop only uses 1 of the pointers. The algorithm does a complicated count. When I include a debugging statement that logs the combination of the indexes and the results of the count I get the correct answer. When the debugging statement is omitted, the count is incorrect. The program is compiled with the -O3 option on gcc. Why would this happen?
Always put your code through something like valgrind, Purify, etc, before blaming the optimizer. Especially when blaming things related to pointers.
It's not to say the optimizer isn't broken, but more than likely, it's you. I've worked on various C++ compilers and seen my share of seg faults that only happen with optimized code. Quite often, people do things like forget to count the \0 when allocating space for a string, etc. And it's just luck at that point on which pages you're allocated when the program runs with different -O settings.
Also, important questions: are you dealing with restricted pointers at all?
Print out the assembly code generated by the compiler, with optimizations. Compare to an assembly language listing of the code without optimizations.
The compiler may have figured out the some of the variables can be eliminated. They were not used in the computation. You can try to match wits with the compiler and factor out variables that are not used.
The compiler may have substituted a for loop with an equation. In some cases (after removing unused variables), the loop can be replaced by a simple equation. For example, a loop that adds 1 to a variable can be replaced by a multiplication statement.
You can tell the compiler to let a variable be by declaring it as volatile. The volatile keyword tells the compiler that the variable's value may be altered by means outside of the program and the compiler should not cache nor eliminate the variable. This is a popular technique in embedded systems programming.
Most likely your program somehow exploits undefined behaviour which works in your favour without optimisation, but with -O3 optimisation it turns against you.
I had a similar experience with one my project - it works fine with -O2 but breaks with -O3. I used setjmp()/longjmp() heavily in my code and I had to make half of variables volatile to get it working so I decided that -O2 is good enough.
Sounds like something is accessing memory that it shouldn't. Debugging symbols are famous for postponing bad news.
Is it pure C or there's any crazy thing like inline assembly?
However, run it on valgrind to check whether this might be happening. Also, did you try compiling with different optimization levels? And without debugging & optimizations?
Without code this is difficult, but here's some things that I've seen before.
Debugging print statements often end up being the only user of a value that the compiler knows about. Without the print statement the compiler thinks that it can do away with any operations and memory requirements that would otherwise be required to compute or store that value.
A similar thing happens when you have side effects included within the argument list of your print statement.
printf("%i %i\n", x, y = x - z);
Another type of error can be:
for( i = 0; i < END; i++) {
int *a = &i;
foo(a);
}
if (bar) {
int * a;
baz(a);
}
This code would likely have the intended result because the compiler would probably choose to store both a variables in the same location, so the second a would have the last value that the other a had.
inline functions can have some strange behavior or you somehow rely on them not being inlined (or sometimes the other way round), which is often the case for unoptimized code.
You should definitely try compiling with warnings turned up to the maximum (-Wall for gcc).
That will often tell you about the risky code.
(edit)
Just thought of another.
If you have more than one way to reference a variable then you can have issues that work right without optimization, but break when optimization is turned up. There are two main ways this can happen.
The first is if a value can be changed by a signal handler or another thread. You need to tell the compiler about that so it will know that any access to assume that the value needs to be reloaded and/or stored. This is done by using the volatile keyword.
The second is aliasing. This is when you create two different ways to access the same memory. Compilers usually are quick to assume that you are aliasing with pointers, but not always. Also, they're are optimization flags for some that tell them to be less quick to make those assumptions, as well as ways that you could fool the compiler (crazy stuff like while (foo != bar) { foo++; } *foo = x; not being obviously a copy of bar to foo).