C optimization breaks algorithm - c

I am programming an algorithm that contains 4 nested for loops. The problem is at at each level a pointer is updated. The innermost loop only uses 1 of the pointers. The algorithm does a complicated count. When I include a debugging statement that logs the combination of the indexes and the results of the count I get the correct answer. When the debugging statement is omitted, the count is incorrect. The program is compiled with the -O3 option on gcc. Why would this happen?

Always put your code through something like valgrind, Purify, etc, before blaming the optimizer. Especially when blaming things related to pointers.
It's not to say the optimizer isn't broken, but more than likely, it's you. I've worked on various C++ compilers and seen my share of seg faults that only happen with optimized code. Quite often, people do things like forget to count the \0 when allocating space for a string, etc. And it's just luck at that point on which pages you're allocated when the program runs with different -O settings.
Also, important questions: are you dealing with restricted pointers at all?

Print out the assembly code generated by the compiler, with optimizations. Compare to an assembly language listing of the code without optimizations.
The compiler may have figured out the some of the variables can be eliminated. They were not used in the computation. You can try to match wits with the compiler and factor out variables that are not used.
The compiler may have substituted a for loop with an equation. In some cases (after removing unused variables), the loop can be replaced by a simple equation. For example, a loop that adds 1 to a variable can be replaced by a multiplication statement.
You can tell the compiler to let a variable be by declaring it as volatile. The volatile keyword tells the compiler that the variable's value may be altered by means outside of the program and the compiler should not cache nor eliminate the variable. This is a popular technique in embedded systems programming.

Most likely your program somehow exploits undefined behaviour which works in your favour without optimisation, but with -O3 optimisation it turns against you.
I had a similar experience with one my project - it works fine with -O2 but breaks with -O3. I used setjmp()/longjmp() heavily in my code and I had to make half of variables volatile to get it working so I decided that -O2 is good enough.

Sounds like something is accessing memory that it shouldn't. Debugging symbols are famous for postponing bad news.
Is it pure C or there's any crazy thing like inline assembly?
However, run it on valgrind to check whether this might be happening. Also, did you try compiling with different optimization levels? And without debugging & optimizations?

Without code this is difficult, but here's some things that I've seen before.
Debugging print statements often end up being the only user of a value that the compiler knows about. Without the print statement the compiler thinks that it can do away with any operations and memory requirements that would otherwise be required to compute or store that value.
A similar thing happens when you have side effects included within the argument list of your print statement.
printf("%i %i\n", x, y = x - z);
Another type of error can be:
for( i = 0; i < END; i++) {
int *a = &i;
foo(a);
}
if (bar) {
int * a;
baz(a);
}
This code would likely have the intended result because the compiler would probably choose to store both a variables in the same location, so the second a would have the last value that the other a had.
inline functions can have some strange behavior or you somehow rely on them not being inlined (or sometimes the other way round), which is often the case for unoptimized code.
You should definitely try compiling with warnings turned up to the maximum (-Wall for gcc).
That will often tell you about the risky code.
(edit)
Just thought of another.
If you have more than one way to reference a variable then you can have issues that work right without optimization, but break when optimization is turned up. There are two main ways this can happen.
The first is if a value can be changed by a signal handler or another thread. You need to tell the compiler about that so it will know that any access to assume that the value needs to be reloaded and/or stored. This is done by using the volatile keyword.
The second is aliasing. This is when you create two different ways to access the same memory. Compilers usually are quick to assume that you are aliasing with pointers, but not always. Also, they're are optimization flags for some that tell them to be less quick to make those assumptions, as well as ways that you could fool the compiler (crazy stuff like while (foo != bar) { foo++; } *foo = x; not being obviously a copy of bar to foo).

Related

How do I "tell" to C compiler that the code shouldn't be optimized out?

Sometimes I need some code to be executed by the CPU exactly as I put it in the source. But any C compiler has it's optimization algorithms so I can expect some tricks. For example:
unsigned char flag=0;
interrupt ADC_ISR(){
ADC_result = ADCH;
flag = 1;
}
void main(){
while(!flag);
echo ADC_result;
}
Some compilers will definitely make while(!flag); loop infinitive as it will suppose flag equals to false (!flag is therefore always true).
Sometimes I can use volatile keyword. And sometimes it can help. But actually in my case (AVR GCC) volatile keyword forces compiler to locate the variable into SRAM instead of registers (which is bad for some reasons). Moreover many articles in the Internet suggesting to use volatile keyword with a big care as the result can become unstable (depending on a compiler, its optimization settings, platform and so on).
So I would definitely prefer to somehow point out the source code instruction and tell to the compiler that this code should be compiled exactly as it is. Like this: volatile while(!flag);
Is there any standard C instruction to do this?
The only standard C way is volatile. If that doesn't happen to do exactly what you want, you'll need to use something specific for your platform.
You should indeed use volatile as answered by David Schwartz. See also this chapter of GCC documentation.
If you use a recent GCC compiler, you could disable optimizations in a single function by using appropriate function specific options pragmas (or some optimize function attribute), for instance
#pragma GCC optimize ("-O0");
before your main. I'm not sure it is a good idea.
Perhaps you want extended asm statements with the volatile keyword.
You have several options:
Compile without optimisations. Unlike some compilers, GCC doesn't optimise by default so unless you tell it to optimise, you should get generated code which looks very similar to your C source. Of course you can choose to optimise some C files and not others, using simple make rules.
Take the compiler out of the equation and write the relevant functions in assembly. Then you can get exactly the generated code you want.
Use volatile, which prevents the compiler from making any assumptions about a certain variable, so for any use of the variable in C the compiler is forced to generate a LOAD or a STORE even if ostensibly unnecessary.

for loop being ignored (optimized?) out

I am using for/while loops for implementing a delay in my code. The duration of the delay is unimportant here though it is sufficiently large to be noticeable. Here is the code snippet.
uint32_t i;
// Do something useful
for (i = 0; i < 50000000U; ++i)
{}
// Do something useful
The issue I am observing is that this for loop won't get executed. It probably gets ignored/optimized by the compiler. However, if I qualify the loop counter i by volatile, the for loop seems to execute and I do notice the desired delay in the execution.
This behavior seems a bit counter-intuitive to my understanding of the compiler optimizations with/without the volatile keyword.
Even if the loop counter is getting optimized and being stored in the processor register, shouldn't the counter still work, perhaps with a lesser delay? (Since the memory fetch overhead is done away with.)
The platform I am building for is Xtensa processor (by Tensilica), and the C compiler is the one provided by Tensilica, Xtensa C/C++ compiler running with highest level of optimizations.
I tried the same with gcc 4.4.7 with -o3 and ofast optimization levels. The delay seems to work in that case.
This is all about observable behavior. The only observable behavior of your loop is that i is 50000000U after the loop. The compiler is allowed to optimize it and replace it by i = 50000000U;. This i assignment will also be optimized out because the value of i have no observable consequences.
The volatile keyword tells the compiler that writing to and reading from i have an observable behavior, thus preventing it from optimizing.
The compiler will also not optimize calls to function where it doesn't have access to the code. Theoretically, if a compiler had access to the whole OS code, it could optimize everything but the volatile variables, which are often put on hardware IO operations.
These optimization rules all conform to what is written in the C standard (cf. comments for references).
Also, if you want a delay, use a specialized function (ex: OS API), they are reliable and don't consume CPU, unlike a spin-delay like yours.

C code with undefined results, compiler generates invalid code (with -O3)

I know that when you do certain things in a C program, the results are undefined. However, the compiler should not be generating invalid (machine) code, right? It would be reasonable if the code did the wrong thing, or if the code generated a segfault or something...
Is this supposed to happen according to the compiler spec, or is it a bug in the compiler?
Here's the (simple) program I'm using:
int main() {
char *ptr = 0;
*(ptr) = 0;
}
I'm compiling with -O3. That shouldn't generate invalid hardware instructions though, right? With -O0, I get a segfault when I run the code. That seems a lot more sane.
Edit: It's generating a ud2 instruction...
The ud2 instruction is a "valid instruction" and it stands for Undefined Instruction and generates an invalid opcode exception clang and apparently gcc can generate this code when a program invokes undefined behavior.
From the clang link above the rationale is explained as follows:
Stores to null and calls through null pointers are turned into a
__builtin_trap() call (which turns into a trapping instruction like "ud2" on x86). These happen all of the time in optimized code (as the
result of other transformations like inlining and constant
propagation) and we used to just delete the blocks that contained them
because they were "obviously unreachable".
While (from a pedantic language lawyer standpoint) this is strictly
true, we quickly learned that people do occasionally dereference null
pointers, and having the code execution just fall into the top of the
next function makes it very difficult to understand the problem. From
the performance angle, the most important aspect of exposing these is
to squash downstream code. Because of this, clang turns these into a
runtime trap: if one of these is actually dynamically reached, the
program stops immediately and can be debugged. The drawback of doing
this is that we slightly bloat code by having these operations and
having the conditions that control their predicates.
at the end of the day once your are invoking undefined behavior the behavior of your program is unpredictable. The philosophy here is that is probably better to crash hard and give the developer an indication that something is seriously wrong and allow them to debug fro the right point than to produce a program that seems to work but actually is broken.
As Ruslan notes, it is "valid" in the sense that it guaranteed to raise an invalid opcode exception as opposed to other unused sequences which may in the future become valid.

C dummy operations

I cant imagine what the compiler does when for instance there is no lvalue for instance like this :
number>>1;
My intuition tells me that the compiler will discard this line from compilation due to optimizations and if the optimization is removed what happens?
Does it use a register to do the manipulation? or does it behave like if it was a function call so the parameters are passed to the stack, and than the memory used is marked as freed? OR does it transform that to an NOP operation?
Can I see what is happening using the VS++ debugger?
Thank your for your help.
In the example you give, it discards the operation. It knows the operation has no side effects and therefore doesn't need to emit the code to execute the statement in order to produce a correct program. If you disable optimizations, the compiler may still emit code. If you enable optimizations, the compiler may still emit code, too -- it's not perfect.
You can see the code the compiler emits using the /FAsc command line option of the Microsoft compiler. That option creates a listing file which has the object code output of the compiler interspersed with the related source code.
You can also use "view disassembly" in the debugger to see the code generated by the compiler.
Using either "view disassembly" or /FAsc on optimized code, I'd expect to see no emitted code from the compiler.
Assuming that number is a regular variable of integer type (not volatile) then any competent optimizing compiler (Microsoft, Intel, GNU, IBM, etc) will generate exactly NOTHING. Not a nop, no registers are used, etc.
If optimization is disabled off (in a "debug build"), then the compiler may well "do what you asked for", because it doesn't realize it doesn't have side-effects from the code. In this case, the value will be loaded into a register, shifted right once. The result of this is not stored anywhere. The compiler will perform "useless code elimination" as one of the optimization steps - I'm not sure which one, but for this sort of relatively simple thing, I expect the compiler to figure out with fairly basic optimization settings. Some cases, where loops are concerned, etc, the compiler may not optimize away the code until some more advanced optimization settings are enabled.
As mentioned in the comments, if the variable is volatile, then the read of the memory reprsented by number will have to be made, as the compiler MUST read volatile memory.
In Visual studio, if you "view disassembly", it should show you the code that the compiler generated.
Finally, if this was C++, there is also the possibility that the variable is not a regular integer type, the function operator>> is being called when this code is seen by the compiler - this function may have side-effects besides returning a result, so may well have to be performed. But this can't be the case in C, since there is no operator overloading.

debugging c programs

Programming in a sense is easy. But bugs are something which always makes more trouble. Can anyone help me with good debugging tricks and softwares in c?
From "The Elements of Programming Style" Brian Kernighan, 2nd edition, chapter 2:
Everyone knows that debugging is twice
as hard as writing a program in the
first place. So if you're as clever as
you can be when you write it, how will
you ever debug it?
So from that; don't be "too clever"!
But apart from that and the answers already given; use a debugger! That is your starting point tool-wise. You'd be amazed how many programmers struggle along without the aid of a debugger, and they are fools to do so.
But before you even get to the debugger, get your compiler to help you as much as possible; set the warning level to high, and set warnings as errors. A static analysis tool such as lint, pclint, or QA-C would be even better.
Tools for debugging are all well and good and for some classes of error they will just point you straight to the problem. The best tip that I have for debugging is that you need to think about it in the right way. What works for me is the following:
The compiler probably isn't broken. I've been working with C for 25 years now and in all that time it's almost invariably something I'm doing wrong.
Read the error messages. Often I've looked back at the error message and in hindsight realized it was telling me exactly what was wrong.
Read the documentation. Make sure you aren't making assumptions about the language or library that aren't true.
Make a mental model of the problem. I ask myself what needs to be hapening in my code in order for the results I'm seeing to occur. Then add debug statements, assertions or just step through in the debugger (if you can) to see what is really happening.
Talk the problem through with someone else. Just describing it to a a third party often results in a revelation about what might be happening.
Other people will have other ways of approaching debugging, but I find if you have a structured approach to it rather than flailing around changing stuff at random you usually get there and when you do be prepared for the inevitable Why didn't I see that straight away!
Best debugger for C
gdb
Best tools for memory leak checking:
Valgrind
The following are popular debugging tools.
Valgrind
Purify
Duma
Some very simple Tricks/Suggestions
-> Always check that nowhere in your code you have dereferenced a wild/dangling pointer
Example 1)
int main()
{
int *p;
*p=10; //Undefined Behaviour (crash on most implementations)
}
Example 2)
int main()
{
int *p=malloc(sizeof(int));
//do something with p
free p;
printf("%d", *p); ////Undefined Behaviour (crash on most implementations)
}
-> Always initialize variables before using
int main()
{
int k;
for(int i= k;i<10;++i)
^^
Ouch
printf("%d",i");
}
In addition to all the other suggestions (gdb, valgrind, all that), some simple rules when writing the code help a lot when debugging afterwards.
Always use types with the proper
semantics. Unsigned types (best
size_t) for array indices and numbers that represent a cardinal,
ptrdiff_t for pointer differences,
off_t for file offsets etc. enum types for tags and case distinctions.
There is almost no need for the
builtin types int, long, char or
whatever. Avoid them whenever possible.
In particular don't use char for
arithmetic, the signedness problems with that are a plague. Use uint8_t or int8_t
if you feel the need for such a
thing.
Always initialize variables, all of them: integer, double, pointers, struct. It is
not true that this is less efficient
with a modern compiler. In most cases it will just
be optimized away when not necessary.
But especially pointer variables that
are not properly initialized can
produce spurious errors and make code
hard to debug. If you have them
initialized to NULL your program
will fail early, and your debugger will show you the place.
Compile with all warnings on, and
don't finish tidying your code until
the compiler doesn't give a single
warning. They are quite good at that nowadays, take advantage.
Compile with different optimization
options on, or even better with
different versions of your compiler,
or still better with completely
different compilers on different
platforms.
Use the assert macro. This forces you to think of your assumptions and also make your
code fail early if they are not fulfilled.
Unit testing. Makes getting your software correct a lot easier.
gdb is a debugger to analyse your program.
Other techinque is to use printf or logs
Valgrind provides dynamic analysis of the executable
Purify provides static and dynamic analysis. Sparrow and Prevent are some other tools in competition to Purify.
This can be separated into:
Prevention measures:
Use strict coding styles, don't make a mess
Use comments and code revisions
Use static code analysis tools
Use assertions where it's possible
Don't over complicate
Post-factum
Use debugger/tracer
Use memory checking tools
Use regression testing
Use your brain
Off the top of my head, Valgrind.
You might also want to hone your debugging skills by reading the book Debugging by David Agans. Every programmer should read this early on in their career.
valgrind for memory problems if you're on linux. use gdb/ddd on linux as well. On windows a lot of windows programmers don't seem to be knowledgeable of windbg. It is very useful but has a learning curve like gdb; more powerful than the built in debugger in visual studio. learn to use assert, you will catch lots of stuff and you can turn it off in release code if you so choose. Use a unit testing framework like Check, cunit, etc . Always initialize your pointer, to NULL if nothing else. When you free a pointer set it to NULL. Better you to catch a segfault than your user. Pick a coding standard and stick to it, consistency will help you make fewer mistakes. Keep your functions small if at all possible, this will keep you from having 10 level deep braces which are logic nightmares. If compiling using gcc use -Wall and -Wextra . Use the strn* functions instead of str* functions. Well worth the extra thinking they force you to do.

Resources