for loop being ignored (optimized?) out - c

I am using for/while loops for implementing a delay in my code. The duration of the delay is unimportant here though it is sufficiently large to be noticeable. Here is the code snippet.
uint32_t i;
// Do something useful
for (i = 0; i < 50000000U; ++i)
{}
// Do something useful
The issue I am observing is that this for loop won't get executed. It probably gets ignored/optimized by the compiler. However, if I qualify the loop counter i by volatile, the for loop seems to execute and I do notice the desired delay in the execution.
This behavior seems a bit counter-intuitive to my understanding of the compiler optimizations with/without the volatile keyword.
Even if the loop counter is getting optimized and being stored in the processor register, shouldn't the counter still work, perhaps with a lesser delay? (Since the memory fetch overhead is done away with.)
The platform I am building for is Xtensa processor (by Tensilica), and the C compiler is the one provided by Tensilica, Xtensa C/C++ compiler running with highest level of optimizations.
I tried the same with gcc 4.4.7 with -o3 and ofast optimization levels. The delay seems to work in that case.

This is all about observable behavior. The only observable behavior of your loop is that i is 50000000U after the loop. The compiler is allowed to optimize it and replace it by i = 50000000U;. This i assignment will also be optimized out because the value of i have no observable consequences.
The volatile keyword tells the compiler that writing to and reading from i have an observable behavior, thus preventing it from optimizing.
The compiler will also not optimize calls to function where it doesn't have access to the code. Theoretically, if a compiler had access to the whole OS code, it could optimize everything but the volatile variables, which are often put on hardware IO operations.
These optimization rules all conform to what is written in the C standard (cf. comments for references).
Also, if you want a delay, use a specialized function (ex: OS API), they are reliable and don't consume CPU, unlike a spin-delay like yours.

Related

Is it necessary to use the "volatile" qualifier even in case the GCC optimisations are turned off?

My question is targeted towards the embedded development, specifically STM32.
I am well aware of the fact that the use of volatile qualifier for a variable is crucial when dealing with a program with interrupt service routines (ISR) in order to prevent the compiler optimising out a variable that is used in both the ISR and the main thread.
In Atollic TrueSTUDIO one can turn off the GCC optimisations with the -O0 flag before the compilation. The question is, whether it is absolutely necessary to use the volatile qualifier for variables that are used inside and outside the ISR, even when the optimisations are turned off like this.
With optimizations disabled it seems unlikely that you'd need volatile. However, the compiler can do trivial optimizations even at O0. For example it might remove parts of the code that it can deduct won't be used. So not using volatile will be a gamble. I see no reason why you shouldn't be using volatile, particularly not if you run with no optimizations on anyway.
Also, regardless of optimization level, variables may be pre-fetch cached on high end MCUs with data cache. Whether volatile solves/should solve this is debatable, however.
“Programs must be written for people to read, and only incidentally for machines to execute.”
I think here We can use this quote. Imagine a situation (as user253751 mentioned) you remove keyword volatile from every variable because there is optimization enabled. Then few months later you have to turn optimization on. Do you imagine what a disaster happened?
In addition, I work with code where there is an abstraction layer above bare-metal firmware and there we use volatile keyword when variable share memory space between those layers to be sure that we use exact proper value. So there this another usage of volatile not only in ISRs, that means there is not easy to change this back and be sure that everything works ok.
Debugging code where variable should be volatile is not so hard but bugs like this looks like something magic happened and you don't know why because for example something happened one in 10k execution of that part of code.
Summary: There is no strict "ban" for removing volatile keyword when optimization is turned off but for me is VERY bad programming practice.
I am well aware of the fact that the use of volatile qualifier for a variable is crucial when dealing with a program with interrupt service routines (ISR) in order to prevent the compiler optimising out a variable that is used in both the ISR and the main thread.
You should actually keep in mind that volatile is not a synchronization construct.
It does not force any barriers, and does not prevent reordering with other non-volatile variables. It only tells the compiler not to reorder the specific access relative to other volatile variables -- and even then gives no guarantees that the variables won't be reordered by the CPU.
That's why GCC will happily compile this:
volatile int value = 0;
int old_value = 0;
void swap(void)
{
old_value = value;
value = 100;
}
to something like:
// value will be changed before old_value
mov eax, value
mov value, 100
mov old_value, eax
So if your function uses a volatile to signal something like "everything else has been written up to this point", keep in mind that it might not be enough.
Additionally, if you are writing for a multi-core microcontroller, reordering of instructions done by the CPU will render the volatile keyword completely useless.
In other words, the correct approach for dealing with synchronization is to use what the C standard is telling you to use, i.e. the <stdatomic.h> library. Constructs provided by this library are guaranteed to be portable and emit all necessary compiler and CPU barriers.

Is it possible to disable gcc/g++ optimisations for specific sections of code?

I am compiling some code that works without optimisation but breaks with optimisations enabled. I suspect certain critical sections of the code of being optimised out, resulting in the logic breaking.
I want to do something like:
code...
#disable opt
more code...
#enable opt
Even better if I can set the level of optimisation for that section (like O0, O1...)
For those suggesting it is the code:
The section of the code being deleted is (checked by disassembling the object file):
void wait(uint32_t time)
{
while (time > 0) {
time--;
}
}
I seriously doubt there is something wrong with that code
If optimization causes your program to crash, then you have a bug and should fix it. Hiding the problem by not optimizing this portion of code is poor practice that will leave your code fragile, and its like leaving a landmine for the next developer who supports your code. Worse, by ignoring it, you will not learn how to debug these problems.
Some Possible Root Causes:
Hardware Accesses being optimized out: Use Volatile
It is unlikely that critical code is being optimized out, although if you are touching hardware registers then you should add the volatile attribute to force the compiler to access those registers regardless of the optimization settings.
Race Condition: Use a Mutex or Semaphore to control access to shared data
It is more likely that you have a race condition that is timing specific, and the optimization causes this timing condition to show itself. That is a good thing, because it means you can fix it. Do you have multiple threads or processes that access the same hardware or shared data? You might need to add a mutex or semaphore to control access to avoid timing problems.
Heisenbug: This is when the behavior of code changes based on whether or not debug statements are added, or whether the code is optimized or not. There is a nice example here where the optimized code does floating point comparisons in registers in high precision, but when printf's are added, then the values are stored as doubles and compared with less precision. This resulted in the code failing one way, but not the other. Perhaps that will give you some ideas.
Timing Loop gets Optimized Out: Creating a wait function that works by creating a timing loop that increments a local variable in order to add a delay is not good programming style. Such loops can be completely optimized out based on the compiler and optimization settings. In addition, the amount of delay will change if you move to a different processor. Delay functions should work based on CPU ticks or real-time, which will not get optimized out. Have the delay function use the CPU clock, or a real time clock, or call a standard function such as nanosleep() or use a select with a timeout. Note that if you are using a CPU tick, be sure to comment the function well and highlight that the implementation needs to be target specific.
Bottom line: As others have suggested, place the suspect code in a separate file and compile that single source file without optimization. Test it to ensure it works, then migrate half the code back into the original, and retest with half the code optimized and half not, to determine where your bug is. Once you know which half has the Heisenbug, use divide and conquer to repeat the process until you identify the smallest portion of code that fails when optimized.
If you can find the bug at that point, great. Otherwise post that fragment here so we can help debug it. Provide the compiler optmization flags used to cause it to fail when optimized.
You can poke around the GCC documentation. For example:
Function specific option pragmas
#pragma GCC optimize ("string"...)
This pragma allows you to set global optimization options for functions defined later in the source file. One or more strings can be specified. Each function that is defined after this point is as if attribute((optimize("STRING"))) was specified for that function. The parenthesis around the options is optional. See Function Attributes, for more information about the optimize attribute and the attribute syntax.
See also the pragmas for push_options, pop_options and reset_options.
Common function attributes
optimize The optimize attribute is used to specify that a function is to be compiled with different optimization options than specified on the command line. Arguments can either be numbers or strings. Numbers are assumed to be an optimization level. Strings that begin with O are assumed to be an optimization option, while other options are assumed to be used with a -f prefix. You can also use the '#pragma GCC optimize' pragma to set the optimization options that affect more than one function. See Function Specific Option Pragmas, for details about the '#pragma GCC optimize' pragma.
This attribute should be used for debugging purposes only. It is not suitable in production code.
Optimize options
This page lists a lot of optimization options that can be used with the -f option on the command line, and hence with the optimize attribute and/or pragma.
The best thing you can do is to move the code you do not want optimized into a separate source file. Compile that without optimization, and link it against the rest of your code.
With GCC you can also declare a function with __attribute__((optimize("O0")) to inhibit optimization.

How do I "tell" to C compiler that the code shouldn't be optimized out?

Sometimes I need some code to be executed by the CPU exactly as I put it in the source. But any C compiler has it's optimization algorithms so I can expect some tricks. For example:
unsigned char flag=0;
interrupt ADC_ISR(){
ADC_result = ADCH;
flag = 1;
}
void main(){
while(!flag);
echo ADC_result;
}
Some compilers will definitely make while(!flag); loop infinitive as it will suppose flag equals to false (!flag is therefore always true).
Sometimes I can use volatile keyword. And sometimes it can help. But actually in my case (AVR GCC) volatile keyword forces compiler to locate the variable into SRAM instead of registers (which is bad for some reasons). Moreover many articles in the Internet suggesting to use volatile keyword with a big care as the result can become unstable (depending on a compiler, its optimization settings, platform and so on).
So I would definitely prefer to somehow point out the source code instruction and tell to the compiler that this code should be compiled exactly as it is. Like this: volatile while(!flag);
Is there any standard C instruction to do this?
The only standard C way is volatile. If that doesn't happen to do exactly what you want, you'll need to use something specific for your platform.
You should indeed use volatile as answered by David Schwartz. See also this chapter of GCC documentation.
If you use a recent GCC compiler, you could disable optimizations in a single function by using appropriate function specific options pragmas (or some optimize function attribute), for instance
#pragma GCC optimize ("-O0");
before your main. I'm not sure it is a good idea.
Perhaps you want extended asm statements with the volatile keyword.
You have several options:
Compile without optimisations. Unlike some compilers, GCC doesn't optimise by default so unless you tell it to optimise, you should get generated code which looks very similar to your C source. Of course you can choose to optimise some C files and not others, using simple make rules.
Take the compiler out of the equation and write the relevant functions in assembly. Then you can get exactly the generated code you want.
Use volatile, which prevents the compiler from making any assumptions about a certain variable, so for any use of the variable in C the compiler is forced to generate a LOAD or a STORE even if ostensibly unnecessary.

STM32 HAL Library simple C coding error

I am using the STM32 HAL Library for a micro controller project. In the ADC section I found the following code:
uint32_t WaitLoopIndex = 0;
/...
.../
/* Delay for ADC stabilization time. */
/* Delay fixed to worst case: maximum CPU frequency */
while(WaitLoopIndex < ADC_STAB_DELAY_CPU_CYCLES)
{
WaitLoopIndex++;
}
It is my understanding that this code will most likely get optimized away since WaitLoopIndex isn't used anywhere else in the function and is not declared volatile, right?
Technically yes, though from my experiences with compilers for embedded targets, that loop will not get optimised out. If you think about it, having a pointless loop is not really a construct you are going to see in a program unless the programmer does it on purpose, so I doubt many compilers bothers to optimise for it.
The fact that you have to make assumptions about how it might be optimised though means it most certainly is a bug, and one of the worst types at that. More than likely ST wanted to only use C in their library, so did this instead of the inline assembler delay that they should have used. But since the problem they are trying to solve is heavily platform dependent, an annoying platform/compiler dependent solution is unavoidable, and all they have done here is try to hide that dependency.
Declaring the variable volatile will help, but you still really have no idea how long that loop is taking to execute without making assumptions about how the compiler is building it. This is still very bad practice, though if they added an assert reminding you to check the delay manually that might be passable.
This depends on the compiler and the optimization level. To confirm the result, just enter debug mode and check the disassembly code of the piece of code.

C optimization breaks algorithm

I am programming an algorithm that contains 4 nested for loops. The problem is at at each level a pointer is updated. The innermost loop only uses 1 of the pointers. The algorithm does a complicated count. When I include a debugging statement that logs the combination of the indexes and the results of the count I get the correct answer. When the debugging statement is omitted, the count is incorrect. The program is compiled with the -O3 option on gcc. Why would this happen?
Always put your code through something like valgrind, Purify, etc, before blaming the optimizer. Especially when blaming things related to pointers.
It's not to say the optimizer isn't broken, but more than likely, it's you. I've worked on various C++ compilers and seen my share of seg faults that only happen with optimized code. Quite often, people do things like forget to count the \0 when allocating space for a string, etc. And it's just luck at that point on which pages you're allocated when the program runs with different -O settings.
Also, important questions: are you dealing with restricted pointers at all?
Print out the assembly code generated by the compiler, with optimizations. Compare to an assembly language listing of the code without optimizations.
The compiler may have figured out the some of the variables can be eliminated. They were not used in the computation. You can try to match wits with the compiler and factor out variables that are not used.
The compiler may have substituted a for loop with an equation. In some cases (after removing unused variables), the loop can be replaced by a simple equation. For example, a loop that adds 1 to a variable can be replaced by a multiplication statement.
You can tell the compiler to let a variable be by declaring it as volatile. The volatile keyword tells the compiler that the variable's value may be altered by means outside of the program and the compiler should not cache nor eliminate the variable. This is a popular technique in embedded systems programming.
Most likely your program somehow exploits undefined behaviour which works in your favour without optimisation, but with -O3 optimisation it turns against you.
I had a similar experience with one my project - it works fine with -O2 but breaks with -O3. I used setjmp()/longjmp() heavily in my code and I had to make half of variables volatile to get it working so I decided that -O2 is good enough.
Sounds like something is accessing memory that it shouldn't. Debugging symbols are famous for postponing bad news.
Is it pure C or there's any crazy thing like inline assembly?
However, run it on valgrind to check whether this might be happening. Also, did you try compiling with different optimization levels? And without debugging & optimizations?
Without code this is difficult, but here's some things that I've seen before.
Debugging print statements often end up being the only user of a value that the compiler knows about. Without the print statement the compiler thinks that it can do away with any operations and memory requirements that would otherwise be required to compute or store that value.
A similar thing happens when you have side effects included within the argument list of your print statement.
printf("%i %i\n", x, y = x - z);
Another type of error can be:
for( i = 0; i < END; i++) {
int *a = &i;
foo(a);
}
if (bar) {
int * a;
baz(a);
}
This code would likely have the intended result because the compiler would probably choose to store both a variables in the same location, so the second a would have the last value that the other a had.
inline functions can have some strange behavior or you somehow rely on them not being inlined (or sometimes the other way round), which is often the case for unoptimized code.
You should definitely try compiling with warnings turned up to the maximum (-Wall for gcc).
That will often tell you about the risky code.
(edit)
Just thought of another.
If you have more than one way to reference a variable then you can have issues that work right without optimization, but break when optimization is turned up. There are two main ways this can happen.
The first is if a value can be changed by a signal handler or another thread. You need to tell the compiler about that so it will know that any access to assume that the value needs to be reloaded and/or stored. This is done by using the volatile keyword.
The second is aliasing. This is when you create two different ways to access the same memory. Compilers usually are quick to assume that you are aliasing with pointers, but not always. Also, they're are optimization flags for some that tell them to be less quick to make those assumptions, as well as ways that you could fool the compiler (crazy stuff like while (foo != bar) { foo++; } *foo = x; not being obviously a copy of bar to foo).

Resources