if statement, function evaluation & compiler optimization

if statement, function evaluation & compiler optimization - c

Just a quick question, to save me from testing things (although I really should test things to be absolutely certain):
Given the following C code:
r1 = fun1();
r2 = fun2();
if (r1 && r2)
{
// do something
}
Variables r1 and r2 are not being used anywhere else in the code, except in the if (...) statement. Will both functions be evaluated? I'm worried that the compiler may optimize the code by eliminating r1 and r2, thus making it look like this:
if (fun1() && fun2())
{
// do something
}
In this case, fun1() will be evaluated first, and if it returns FALSE, then fun2() will not be evaluated at all. This is not what I want, and that's the reason why I'm coding it as in the first code segment.
How can I guarantee that a function will always be evaluated? I thought that this could be done by assigning it to a variable, but I'm concerned about compiler optimization if it sees that this variable is never actually being used later on in the code...
I know this can be achieved by declaring r1 and r2 as volatile, but I'd like to know if there's a more elegant solution.
Any comments on this issue are greatly appreciated, thanks!
Edit: Thanks to all who replied. I've just used my first code snippet on my project (it's an ARM Cortex-M7-based embedded system). It appears that the compiler does not optimize the code in the way I showed above, and both fun1() and fun2() are evaluated (as they should). Furthermore, compiling the code with r1 and r2 declared as volatile produces exactly the same binary output as when r1 and r2 are just normal variables (i.e., the volatile keyword doesn't change the compiler output at all). This reassures me that the first code snippet is in fact a guaranteed way of evaluating both functions prior to processing the if (...) statement that follows.

Assuming the code does not exhibit any undefined behavior, compilers can only perform optimizations that have the same externally viewable behavior as the unoptimized code.
In your example, the two pieces of code do two different things. Specifically, one always calls fun2 while the other calls it conditionally. So you don't need to worry about the first piece of code doing the wrong thing.

The calls will not be optimized out unless the result of the calls can be computed compile time.
void foo()
{
int r1 = fun1();
int r2 = fun2();
if (r1 && r2)
{
func3();
}
}
int fun3() {return 1;}
int fun4() {func();return 0;}
void bar()
{
int r1 = fun3();
int r2 = fun4();
if (r1 && r2)
{
func3();
}
}
foo:
push {r4, lr}
bl fun1
mov r4, r0
bl fun2
cmp r4, #0
cmpne r0, #0
popeq {r4, pc}
pop {r4, lr}
b func3
fun3:
mov r0, #1
bx lr
fun4:
push {r4, lr}
bl func
mov r0, #0
pop {r4, pc}
bar:
b func

Related

What are the rules that gcc optimize the usage of global variables? [duplicate]

This question already has answers here:
Once more volatile: necessary to prevent optimization?
(3 answers)
Closed 4 months ago.
I use gcc to compile a simple test code for ARM Cortex-M4, and it will optimize the usgae of the global variables which confused me. What are the rules that gcc optimizing the usage of global variables?
GCC compiler: gcc-arm-none-eabi-8-2019-q3-update/bin/arm-none-eabi-gcc
Optimization level: -Os
My test code:
The following code is in "foo.c", and the function foo1() and foo2() ard called in task A, the function global_cnt_add() is called in task B.
int g_global_cnt = 0;
void dummy_func(void);
void global_cnt_add(void)
{
g_global_cnt++;
}
int foo1(void)
{
while (g_global_cnt == 0) {
// do nothing
}
return 0;
}
int foo2(void)
{
while (g_global_cnt == 0) {
dummy_func();
}
return 0;
}
The function dummy_func() is implemented in bar.c as following:
void dummy_func(void)
{
// do nothing
}
The assembly code of function foo1() is shown below:
int foo1(void)
{
while (g_global_cnt == 0) {
201218: 4b02 ldr r3, [pc, #8] ; (201224 <foo1+0xc>)
20121a: 681b ldr r3, [r3, #0]
20121c: b903 cbnz r3, 201220 <foo1+0x8>
20121e: e7fe b.n 20121e <foo1+0x6>
// do nothing
}
return 0;
}
201220: 2000 movs r0, #0
201222: 4770 bx lr
201224: 00204290 .word 0x00204290
The assembly code of function foo2() is shown below:
int foo2(void)
{
201228: b510 push {r4, lr}
while (g_global_cnt == 0) {
20122a: 4c04 ldr r4, [pc, #16] ; (20123c <foo2+0x14>)
20122c: 6823 ldr r3, [r4, #0]
20122e: b10b cbz r3, 201234 <foo2+0xc>
dummy_func();
}
return 0;
}
201230: 2000 movs r0, #0
201232: bd10 pop {r4, pc}
dummy_func();
201234: f1ff fcb8 bl 400ba8 <dummy_func>
201238: e7f8 b.n 20122c <foo2+0x4>
20123a: bf00 nop
20123c: 00204290 .word 0x00204290
In the assembly code of function foo1(), the global variable "g_global_cnt" is loaded only once, and the while loop will never be broken. The compiler optimize the usage of "g_global_cnt", and I know I can add volatile to avoid this optimization.
In the assembly code of function foo2(), the global variable "g_global_cnt" is loaded and checked in each while loop, the while loop can be broken.
What are the gcc optimization rules make the difference?

In order to understand this behaviour, you have to think about side effects and sequence point ref.
For the compiler a side effect is a result of an operator, expression, statement, or function that persists even after the operator, expression, statement, or function has finished being evaluated.
While *A sequence point defines any point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed. *
The main rule of a sequence point is that no variable will be accessed more than once between points for any purpose other than to calculate a change in its value
Citing the C standard:
In the abstract machine, all expressions are evaluated as specified by
the semantics. An actual implementation need not evaluate part of an
expression if it can deduce that its value is not used and that no
needed side effects are produced (including any caused by calling a
function or accessing a volatile object).
In your code
int foo1(void)
{
while (g_global_cnt == 0) {
// do nothing
}
return 0;
}
After reading the g_global_cnt there are no more side effects that might influence the value of the variable. The compiler can't know that it is modified outside the scope of the function, hence it thinks that you can read it only once, and that's because there are no more sequence points in the functions scope.
The way to tell the compiler that each read has side effects is to mark the variable with the identifier volatile.
With int g_global_cnt = 0;:
adrp x0, g_global_cnt
add x0, x0, :lo12:g_global_cnt
ldr w0, [x0]
cmp w0, 0
beq .L3
mov w0, 0
ret
With volatile int g_global_cnt = 0;:
adrp x0, g_global_cnt
add x0, x0, :lo12:g_global_cnt
ldr w0, [x0]
cmp w0, 0
cset w0, eq
and w0, w0, 255
cmp w0, 0
bne .L3
mov w0, 0
ret

EMBEDDED C - Volatile qualifier does not matter in my interrupt routine

I am new to embedded C, and I recently watched some videos about volatile qualifier. They all mention about the same things. The scenarios for the use of a volatile qualifier :
when reading or writing a variable in ISR (interrupt service routine)
RTOS application or multi thread (which is not my case)
memory mapped IO (which is also not my case)
My question is that my code does not stuck in the whiletest();function below
when my UART receives data and then triggers the void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart) interrupt function
int test;
int main(void)
{
test = 0;
MX_GPIO_Init();
MX_USART1_UART_Init();
HAL_UART_Receive_IT(&huart1, (uint8_t *)&ch, 1);
while (1)
{
Delay(500);
printf("the main is runing\r\n");
whiletest();
}
}
void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
if(huart->Instance == USART1)
{
test = 1;
HAL_UART_Receive_IT(&huart1, (uint8_t *)&ch, 1);
}
}
void whiletest(void)
{
int count =0;
while(!test){
count++;
printf("%d\r\n",count);
Delay(2000);
}
}
I use keil IDE and stm32cubeIDE. I learned that the compiler would optimize some instructions away if you choose the o2 or o3 optimization level. Therefore, I chose the o2 level for build option, but it seems no effect on my code. The compiler does not optimize the load instruction away in the while loop and cache the test value 0 in the main function as the videos teach on youtube. It is confusing. In what situation I am supposed to use volatile qualifier while keep my code optimized (o2 or o3 level).
note: I am using stm32h743zi (M7)

volatile informs the compiler that object is side effects prone. It means that it can be changed by something which is not in the program execution path.
As you never call the interrupt routine directly compiler assumes that the test variable will never be 1. You need to tell him (volatile does it) that it may change anyway.
example:
volatile int test;
void interruptHandler(void)
{
test = 1;
}
void foo(void)
{
while(!test);
LED_On();
}
Compiler knows that the test can be changed somehow and always read it in the while loop
foo:
push {r4, lr}
ldr r2, .L10
.L6:
ldr r3, [r2] //compiler reads the value of the test from the memory as it knows that it can change.
cmp r3, #0
beq .L6
bl LED_On
pop {r4, lr}
bx lr
.L10:
.word .LANCHOR0
test:
Without the volatile compiler will assume that the test always will be zero.
foo:
ldr r3, .L10
ldr r3, [r3]
cmp r3, #0
bne .L6
.L7:
b .L7 //dead loop here
.L6:
push {r4, lr}
bl LED_On
pop {r4, lr}
bx lr
.L10:
.word .LANCHOR0
test:
In your code you have to use volatile if the object is changed by something which is not in the program path.

The compiler may only optimize (change) code if the optimized code behaves as if it the optimizer did nothing.
In your case you are calling two functions (Delay and printf) in your while loop. The compiler has no visibility of what these functions do since they appear in a separate compiler unit. The compiler therefore must assume they may change the value of the global variable test and therefore cannot optimize out the check for the value in test. Remove the function calls and the compiler may well optimize out the check for value of test.

Unable to implement C Logic in ARM assembly

I have a C code in my mind which I want to implement in ARM Programming Language.
The C code I have in my mind is something of this sort:
int a;
scanf("%d",&a);
if(a == 0 || a == 1){
a = 1;
}
else{
a = 2;
}
What I have tried:
//arm equivalent of taking input to reg r0
//check for first condition
cmp r0,#1
moveq r0,#1
//if false
movne r0,#2
//check for second condition
cmp r0,#0
moveq r0,#1
Is this the correct way of implementing it?

Your code is broken for a=0 - single step through it in your head, or in a debugger, to see what happens.
Given this specific condition, it's equivalent to (unsigned)a <= 1U (because negative integer convert to huge unsigned values). You can do a single cmp and movls / movhi. Compilers already spot this optimization; here's how to ask a compiler to make asm for you so you can learn the tricks clever humans programmed into them:
int foo(int a) {
if(a == 0 || a == 1){
a = 1;
}
else{
a = 2;
}
return a;
}
With ARM GCC10 -O3 -marm on the Godbolt compiler explorer:
foo:
cmp r0, #1
movls r0, #1
movhi r0, #2
bx lr
See How to remove "noise" from GCC/clang assembly output? for more about making functions that will have useful asm output. In this case, r0 is the first arg-passing register in the calling convention, and also the return-value register.
I also included another C version using if (a <= 1U) to show that it compiles to the same asm. (1U is an unsigned constant, so C integer promotion rules implicitly convert a to unsigned so the types match for the <= operator. You don't need to explicitly do (unsigned)a <= 1U.)
General case: not a single range
For a case like a==0 || a==3 that isn't a single range-check, you can predicate a 2nd cmp. (Godbolt)
foo:
cmp r0, #3 # sets Z if a was 3
cmpne r0, #0 # leaves Z unmodified if it was already set, else sets it according to a == 0
moveq r0, #1
movne r0, #2
bx lr
You can similarly chain && like a==3 && b==4, or for checks like a >= 3 && a <= 7 you can sub / cmp, using the same unsigned-compare trick as the 0 or 1 range check after sub maps a values into the 0..n range. See the Godbolt link for that.

No that does not work.
cmp r0,#1 is it a one
moveq r0,#1 yes, make it a one again?
movne r0,#2 otherwise make it a 2, what if it was a zero to start, now it is a 2
cmp r0,#0 at this point it is either a 1 or a 2 you forced it so it cannot be zero, what it started off is is now lost.
moveq r0,#1
You have the right concept but need to order things better.
following that line of thinking though
maybe use another register
x = 2;
if(a==0) x = 1;
if(a==1) x = 1;
a = x;
Ponder this
if(a==0) a = 1;
if(a!=1) a = 2;
Or as everyone else is going to say ask the compiler.
because of the or, test OR test, generically they need to be done separately the false condition of the first test does not mean the else condition you have to then do the other test before declaring false. But if true you need to hop over everything and not fall into the second test because that might (in this case will) be false...
As Peter points out you can use unsigned less than or equal and greater than conditions (even though in C it is a signed int, bits is bits).
LS Unsigned lower or same
HI Unsigned higher

Depending the ARM instruction sets is can be:
cmp r0, #1
movls r0, #1
movhi r0, #2
bx lr
or
cmp r0, #1
ite ls
movls r0, #1
movhi r0, #2
bx lr
Am I smarter than you? NO I simply use the compiler to compile the C code.
https://godbolt.org/z/dqxv64Eb9

Is there any real life example of optimization benefit in case of passing const parameter by value ?

Here is the case,
I've tried to investigate a little bit advantages/disadvantages of implementation functions as follows :
void foo(const int a, const int b)
{
...
}
with common function prototype which is used as API and it's included in header file as shown below :
void foo(int a, int b)
I've found quite a huge discussion about this topic in the following question:
Similar Q
I agree there with answer from the rlerallut who is saying about self-documenting code and about being a little bit paranoid on the angle of security of your code.
However and this is the question, someone wrote there that using const for the usual parameters passed to the function can bring some optimization benefits. My question is does anybody have real life example which proves this claim ?

"it might help the compiler optimize things a bit (though it's a long shot)."
I cant see how it would make a difference. It is most useful in this case to generate compiler warnings/errors when you try to modify the const variable...
If you were to try to invent an experiment to compare a function parameter declared as const or not for the purpose of optimization, pass by value. That means that this experiment would not modify the variable(s) because when const is used you would expect a warning/error. An optimizer that might be able to care, would already know that the variable is not modified in the code with or without the declaration and can then act accordingly. How would that declaration matter? If I found such a thing I would file the difference as a bug to the compiler bugboard.
For example here is a missed opportunity I found when playing with const vs not.
Note const or not doesnt matter...
void fun0x ( int a, int b);
void fun1x ( const int a, const int b);
int fun0 ( int a, int b )
{
fun0x(a,b);
return(a+b);
}
int fun1 ( const int a, const int b )
{
fun1x(a,b);
return(a+b);
}
gcc produced with a -O2 and -O3 (and -O1?)
00000000 <fun0>:
0: e92d4038 push {r3, r4, r5, lr}
4: e1a05000 mov r5, r0
8: e1a04001 mov r4, r1
c: ebfffffe bl 0 <fun0x>
10: e0850004 add r0, r5, r4
14: e8bd4038 pop {r3, r4, r5, lr}
18: e12fff1e bx lr
0000001c <fun1>:
1c: e92d4038 push {r3, r4, r5, lr}
20: e1a05000 mov r5, r0
24: e1a04001 mov r4, r1
28: ebfffffe bl 0 <fun1x>
2c: e0850004 add r0, r5, r4
30: e8bd4038 pop {r3, r4, r5, lr}
34: e12fff1e bx lr
Where this would have worked with less cycles...
push {r4,lr}
add r4,r0,r1
bl fun1x
mov r0,r4
pop {r4,lr}
bx lr
clang/llvm did the same thing the add after the function call burning extra stack locations.
Googling showed mostly discussions about const by reference rather than const by value and then the nuances of C and C++ as to what will or wont or can or cant change with the const declaration, etc.
If you use the const on a global variable then it can leave that item in .text and not have it in .data (and for your STM32 microcontroller not have to copy it from flash to ram). But that doesnt fit into your rules. the optimizer may not care and may not actually reach out to that variables home it might know to encode it directly into the instruction as an immediate based on the instruction set, etc...All things held equal though a non-const would have that same benefit if not declared volatile...
Following your rules the const saves on some human error, if you try to put that const variable on the left side of an equals sign the compiler will let you know.
I would consider it a violation of your rules, but if inside the function where it was pass by value you then did some pass by reference things and played the pass by reference const vs not optimization game....

matrix multiplication in ARM assembly

A course of ARM assembly recently started at my university, and or assignment is to create an NxM * MxP matrix multiplication programm, that is called from C code.
Now I have a fairly limited knowledge in assambler, but i'm more than willing to learn. What I would like to know is this:
How to read/pass 2D arrays from C to ASM?
How to output a 2D array back to C?
I'm thinking, that i can figure the rest of this out by myself, but these 2 points are what I find difficult.
I am using ARM assembly on qemu, on Ubuntu for this code, it's not going on any particular device.

C arrays are merely just pointers, so when you pass a C array as an argument to an assemply function, you will get a pointer to an area of memory that is the content of the array.
For retrieving the argument, it depends on what calling convention you use. The ARM EABI stipulates that:
The first four registers r0-r3 (a1-a4) are used to pass argument values into a subroutine and to return a result
value from a function. They may also be used to hold intermediate values within a routine (but, in general, only
between subroutine calls).
For simple functions, them, you should find the pointer to your array in r0 to r4 depending on your function signature. Otherwise, you will find it on the stack. A good technique to find out exactly what the ABI is would be to disassemble the object file of the C code that calls your assembly function and check what it does prior to calling your Assembly function.
For instance, on Linux, you can compile the following C code in a file called testasm.c:
extern int myasmfunc(int *);
static int array[] = { 0, 1, 2 };
int mycfunc()
{
return myasmfunc(array);
}
Then compile it with:
arm-linux-gnueabi-gcc -c testasm.c
And finally get a disassembly with:
arm-linux-gnueabi-objdump -S testasm.o
The result is:
testasm.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <mycfunc>:
0: e92d4800 push {fp, lr}
4: e28db004 add fp, sp, #4
8: e59f000c ldr r0, [pc, #12] ; 1c <mycfunc+0x1c>
c: ebfffffe bl 0 <myasmfunc>
10: e1a03000 mov r3, r0
14: e1a00003 mov r0, r3
18: e8bd8800 pop {fp, pc}
1c: 00000000 andeq r0, r0, r0
You can see that the single-parametered function myasmfunc is called by putting the parameter into register r0. The meaning of ldr r0, [pc, #12] is "load into r0 the content of the memory address that is at pc+12". That is where the pointer to the array is stored.

Even though Guillaumes answer helped me A LOT, I just thought to answer my own question with a bit of code.
What I ended up doing was creating an a 1D array, and passing it to asm along with the dimensions.
int *p;
scanf("%d", &h1);
scanf("%d", &w1);
int* A =(int *) malloc (sizeof(int) * ( w1 * h1 ));
p=A;
int i;
int j;
for(i=0;i<(w1*h1);i++)
{
scanf("%d", p++);
}
That being said, I allocated another array the same (malloc) way, and passed it along aswell. I then just stored the int value I needed in the appropriate address in the assembly code, and since the addresses of the array elements don't change, I just used the same array to output the result.