I've noticed that gcc12 does not optimize these two functions to the same code (with -O3):
int x = 0;
void f(bool a)
{
if (a) {
++x;
}
}
void f2(bool a)
{
x += a;
}
Basically no transformation is done. That can be seen here: https://godbolt.org/z/1G3n4fxEK
Optimizing f to the code in f2 seems to be trivial and no jump would be needed anymore. However, I'm curious if there's a reason why this is not done by gcc? Is it somehow still slower or something? I would assume it's never slower and sometimes faster, but I might be wrong.
Thanks
Such a substitution would be incorrect in a scenario where one thread calls f(1) while another thread calls f(0). If x is never actually accessed outside the first thread, there would be no race condition in the code as written, but the substitution would create one. If x is initially 1, nothing would prevent the code from being processed as:
thread 1: read x (yields 1)
thread 2: read x (yields 1)
thread 1: write 2
thread 2: write 1
This would cause x to be left holding the value 1 when thread 2 has just written the value 2. Worse than that, if the function was invoked within a context like:
x = 1;
f(1);
if (x != 1)
launch_nuclear_missiles_if_x_is_1_and_otherwise_make_coffee();
a compiler might recognize that x will always equal 2 following the return from f(1), and thus make the function call unconditional.
To be sure, such substitution would rarely cause problems in real-world situations, but the Standard explicitly forbids transformations that could create race conditions where none would exist in the source code as written.
I would have hoped the compiler would have changed f2 to f. Reading and writing memory may require slow transactions to acquire a copy of the memory location, and update other bus controllers about the state of that location (Invalid -> Shared -> Modified).
Jumping around an update based on a register value is quite cheap; especially with the efficacy of branch predictors.
A much simpler reason why that optimization isn't done is because bool is nothing but an alias for int. Specifically, nothing stops you from passing an arbitrary integer to your function:
int v = 5;
f2(*(bool *)&v);
// x = 5 here
Related
My question pertains to function calls in general, but I thought of it
while I was writing a priority queue using a heap. Just to give some context (not that it matters much) my heap stores items top to bottom left to right and I represent the heap as an array of structures. Upon inserting a new item, I just put it in the last place in the heap and then call the function "fix_up" at the bottom which will move the item to the proper place in the heap. I am wondering if instead of doing...
fix_up(pQueue->heap, pQueue->size);
pQueue->size++;
...I could just do...
fix_up(pQueue->heap, pQueue->size++);
I am unsure as to if this is ok for a few reasons.
1) Since pQueue->size is in the function call, I'm not even sure if it's actually pQueue->size or rather a copy of the integer stored in pQueue->size. If it was a copy then obviously I wouldn't be adding 1 to the actual pQueue->size so there'd be no point in doing this.
2) Since it's a function call, it is going to then go into the function fix_up and execute all the code there. I am wondering if this would have an unintended consequence of maybe when it went to fix_up it would get incremented by 1 and my index would be 1 higher than I intended while executing fix_up? Or would it do what it's supposed to do and wait until after fix_up had finished executing?
3) Even if it is ok, is it considered a good coding practice for C?
Status priority_queue_insert(PRIORITY_QUEUE hQueue, int priority_level, int data_item)
{
Priority_queue *pQueue = (Priority_queue*)hQueue;
Item *temp_heap;
int i;
/*Resize if necessary*/
if (pQueue->size >= pQueue->capacity) {
temp_heap = (Item*)malloc(sizeof(Item) * pQueue->capacity * 2);
if (temp_heap == NULL)
return FAILURE;
for (i = 0; i < pQueue->size; i++)
temp_heap[i] = pQueue->heap[i];
pQueue->capacity *= 2;
}
/*Either resizing was not necessary or it successfully resized*/
pQueue->heap[pQueue->size].key = priority_level;
pQueue->heap[pQueue->size].data = data_item;
/*Now it is placed as the last item in the heap. Fixup as necessary*/
fix_up(pQueue->heap, pQueue->size);
pQueue->size++;
//continue writing function code here
}
Yes you can.
However, you cannot do this:
foo(myStruct->size++, myStruct->size)
The reason is that the C standard does not say in which order the arguments should be evaluated. This would lead to undefined behavior.
1) Since pQueue->size is in the function call, I'm not even sure if it's actually pQueue->size or rather a copy of the integer stored in pQueue->size. If it was a copy then obviously I wouldn't be adding 1 to the actual pQueue->size so there'd be no point in doing this.
Whatever argument you're sending to a function, it will be evaluated before the function starts to execute. So
T var = expr;
foo(var);
is always equivalent to
foo(expr);
2) Since it's a function call, it is going to then go into the function fix_up and execute all the code there. I am wondering if this would have an unintended consequence of maybe when it went to fix_up it would get incremented by 1 and my index would be 1 higher than I intended while executing fix_up? Or would it do what it's supposed to do and wait until after fix_up had finished executing?
See above
3) Even if it is ok, is it considered a good coding practice for C?
Somewhat subjective, and a bit OT for this site, but I'll answer it anyway from my personal view. In general, I would try to avoid it.
Though, the other posts already answer this question, but none of them talk about role of Sequence Point, in this particular case, which can greatly help in clarifying OP's doubt.
From this [emphasis mine]:
There is a sequence point after the evaluation of all function arguments and of the function designator, and before the actual function call.
From this [emphasis mine]:
Increment operators initiate the side-effect of adding the value 1 of appropriate type to the operand. Decrement operators initiate the side-effect of subtracting the value 1 of appropriate type from the operand. As with any other side-effects, these operations complete at or before the next sequence point.
Also, the post increment operator increase the value of operand by 1 but the value of the expression is the operand's original value prior to the increment operation.
So, in this statement:
fix_up(pQueue->heap, pQueue->size++);
the value of pQueue->size will be increased by 1 before the fix_up() function call but the argument value will be the original value prior to the increment operation.
Yes you can use it directly in the expression you pass as argument.
A statement like
fix_up(pQueue->heap, pQueue->size++);
is somewhat equivalent to
{
int old_value = pQueue->size;
pQueue->size = pQueue->size + 1;
fix_up(pQueue->heap, old_value);
}
A note about the "equivalent" example above. Since the order of evaluation of arguments to function calls is not specified, the actual increment of pQueue->size could happen after the call to fix_up. And it also means that using pQueue->size more than once in the same call would lead to undefined behavior.
Yeah you can use it in function calls, but please note that your two examples are not equivalent. The pQueue->heap argument may be evaluated before or after pQueue->size++ and you can't know or rely on the order. Consider this example :
int func (void)
{
static int x = 0;
x++;
return x;
}
printf("%d %d", func(), func());
This will print 1 2 or 2 1 and we can't know which we'll get. The compiler need not evalute the function parameters consistently throughout the program. So if we add a second printf("%d %d", func(), func()); we could get something like 1 2 4 3 as output.
The importance here is to not write code which relies on order of evaluation. Which is the same reason as mixing ++ with other operations or side-effects in the same expression is bad practice. It can even lead to undefined behavior in some cases.
To answer your questions:
1) Since pQueue->size is in the function call, I'm not even sure if it's actually pQueue->size or rather a copy of the integer stored in pQueue->size. If it was a copy then obviously I wouldn't be adding 1 to the actual pQueue->size so there'd be no point in doing this.
The ++ is applied to the variable in the caller, so this isn't an issue. The local copy of the variable happens during function call, independently of the ++. However, the result of a ++ operation is not a so-called "lvalue" (addressable data), so this code is not valid:
void func (int* a);
...
func(&x++);
++ takes precedence and is evaluted first. The result is not an lvalue and cannot have its address taken.
2) Since it's a function call, it is going to then go into the function fix_up and execute all the code there. I am wondering if this would have an unintended consequence of maybe when it went to fix_up it would get incremented by 1 and my index would be 1 higher than I intended while executing fix_up? Or would it do what it's supposed to do and wait until after fix_up had finished executing?
This isn't an issue unless the function modifies the original variable through a global pointer or such. In that case you would have problems. For example
int* ptr;
void func (int a)
{
*ptr = 1;
}
int x=5;
ptr = &x;
func(x++);
This is very questionable code and x will be 1 after the line func(x++); and not 6 as one might have expected. This is because the function call expression is evaluated and finished before the function call.
3) Even if it is ok, is it considered a good coding practice for C?
It will work ok in your case but it is bad practice. Specifically, mixing the ++ or -- operators together with other operators in the same expression is bad (although common) practice, since it has a high potential for bugs and tends to make code less readable.
Your original code with pQueue->size++; on a line of it's own is superior in every way - stick with that. Contrary to popular belief, when writing C, you get no bonus points for "most operators on a single line". You may however get bugs and maintenance problems.
Example:
for (int i = 0; i < a[index]; i++) {
// do stuff
}
Would a[index] be read every time? If no, what if someone wanted to change the value at a[index] in the loop? I've never seen it myself, but does the compiler make such an assumption?
If the condition was instead i < val-2, would it be evaluated every time?
The compiler will perform optimizations normally when the system is not impacted by other parts of the program. So if you make changes inside the for loop on the condition parameter, the compiler will not optimize.
As mentioned, the compiler should read the array and check it before each iteration in your code snippet. You can optimize your code as follows, then it will read the array only once for loop condition checking.
int cond = a[index];
for (int i = 0; i < cond; i++) {
// do stuff
}
well, maybe.
A standards compliant compiler will produce code that behaves as-if it
is read every time.
If index and/or array are of storage class volatile the they will be re-evaluated every time.
If they are not and the loops content doesn't use them in a way that can be expected to modify their value the optimiser may decide to use a cached result instead.
Co does not store results of expressions in temporary variables. So, all expressions re evaluated in-place. Note that any for loop can be changed to a while loop:
for ( def_or_expr1 ; expr2 ; expr3 ) {
...
}
becomes:
def_or_expr1;
while ( expr2 ) {
...
cont:
expr3;
}
Update: continue in the for loop would be the same as goto cont; int the while loop above. I.e. expr3 is evaluated for every iteration.
The compiler can bascially apply any optimization it can proof not to change the program's essence. Describing full details would be too far for this, but in general, it can (and will) optimize:
a[index] is not changed in the loop: read once before loop and keep in a temp (e.g. register).
a[index] is changed in the loop: update the temp (register) with the new value, avoiding memory access (and the index calculations).
For this, the compiler must assume the array is not changed outside the visible control flow. This is typically the file being compiled (with all included files). For modern systems using link time optimization (LTO), this can be the whole final program - minus dynamic libraries.
Note this is a very brief description. Actually, the C standard defines pretty clear how a program has to be executed, so what/how the compiler may optimize.
If the array is changed, for example by an interrupt handler or another thread, things become complicated. Depending on your target, you need from volatile, atomic operations (stdatomic.h, since C11) up to thread locks/mutexes/semapores/etc. to control accesses to the share resource.
This is a function to get sum of the digits of a number:
int sumOfDigits(int n)
{
int sum=0; //line 1
if(n==0)
return sum;
else
{
sum=(n%10)+sumOfDigits(n/10); //line 2
// return sum; //line 3
}
}
While writing this code, I realized the scope of the local variables is local to each individual recursion of the function. So am I right in saying that if n=11111, 5 sum variables are created and pushed on the stack with each recursion? If this is correct then what is the benefit of using recursion when I can do it in normal function using loops, thus overwriting only one memory location? If I use pointers, recursion will probably take similar memory as a normal function.
Now my second question, even though this function gives me the correct result each time, I don't see how the recursions (other than the last one which returns 0) return values without uncommenting line 3. (using geany with gcc)
I'm new to programming, so please pardon any mistakes
So am I right in saying that if n=11111, 5 sum variables are created and pushed on the stack with each recursion?
Conceptually, but compilers may turn some forms of recursion into jumps/loops. E.g. a compiler that does tail call optimization may turn
void rec(int i)
{
if (i > 0) {
printf("Hello, level %d!\n", i);
rec(i - 1);
}
}
into the equivalent of
void loop(int i)
{
for (; i > 0; i--)
printf("Hello, level %d!\n", i);
}
because the recursive call is in tail position: when the call is made, the current invocation of rec has no more work to do except a return to its caller, so it might as well reuse its stack frame for the next recursive call.
If this is correct then what is the benefit of using recursion when I can do it in normal function using loops, thus overwriting only one memory location? If I use pointers, recursion will probably take similar memory as a normal function.
For this problem, recursion is a pretty bad fit, at least in C, because a loop is much more readable. There are problems, however, where recursion is easier to understand. Algorithms on tree structures are the prime example.
(Although every recursion can be emulated by a loop with an explicit stack, and then stack overflows can be more easily caught and handled.)
I don't understand the remark about pointers.
I don't see how the recursions (other than the last one which returns 0) return values without uncommenting line 3.
By chance. The program exhibits undefined behavior, so it may do anything, even return the correct answer.
So am I right in saying that if n=11111, 5 sum variables are created
and pushed on the stack with each recursion?
The recursion is 5 levels deep, so traditionally 5 stack frames will be eventually created (but read below!), each one of which will have space to hold a sum variable. So this is mostly correct in spirit.
If this is correct then what is the benefit of using recursion when I
can do it in normal function using loops, thus overwriting only one
memory location?
There are several reasons, which include:
it might be more natural to express an algorithm recursively; if the performance is acceptable, maintainability counts for a lot
simple recursive solutions typically do not keep state, which means they are trivially parallelizable, which is a major advantage in the multicore era
compiler optimizations frequently negate the drawbacks of recursion
I don't see how the recursions (other than the last one which returns
0) return values without uncommenting line 3.
It's undefined behavior to comment out line 3. Why would you do that?
Yes, the parameters and local variables are local to each invokation and this is usually achieved by creating a copy of each invokation variables set on the program stack. Yes, that consumes more memory compared to an implementation with a loop, but only if the problem can be solved with a loop and constant memory usage. Consider traversing a tree - you will have to store the tree elements somewhere - be it on the stack or in some other structure. Recursion advantage is it is easier to implement (but not always easier to debug).
If you comment return sum; in the second branch the behavior is undefined - anything can happen, expected behavior included. That's not what you should do.
Would it be possible to implement an if that checks for -1 and if not negative -1 than assign the value. But without having to call the function twice? or saving the return value to a local variable. I know this is possible in assembly, but is there a c implementation?
int i, x = -10;
if( func1(x) != -1) i = func1(x);
saving the return value to a local variable
In my experience, avoiding local variables is rarely worth the clarity forfeited. Most compilers (most of the time) can often avoid the corresponding load/stores and just use registers for those locals. So don't avoid it, embrace it! The maintainer's sanity that gets preserved just might be your own.
I know this is possible in assembly, but is there a c implementation?
If it turns out your case is one where assembly is actually appropriate, make a declaration in a header file and link against the assembly routine.
Suggestion:
const int x = -10;
const int y = func1(x);
const int i = y != -1
? y
: 0 /* You didn't really want an uninitialized value here, right? */ ;
It depends whether or not func1 generates any side-effects. Consider rand(), or getchar() as examples. Calling these functions twice in a row might result in different return values, because they generate side effects; rand() changes the seed, and getchar() consumes a character from stdin. That is, rand() == rand() will usually1 evaluate to false, and getchar() == getchar() can't be predicted reliably. Supposing func1 were to generate a side-effect, the return value might differ for consecutive calls with the same input, and hence func1(x) == func1(x) might evaluate to false.
If func1 doesn't generate any side-effect, and the output is consistent based solely on the input, then I fail to see why you wouldn't settle with int i = func1(x);, and base logic on whether or not i == -1. Writing the least repetitive code results in greater legibility and maintainability. If you're concerned about the efficiency of this, don't be. Your compiler is most likely smart enough to eliminate dead code, so it'll do a good job at transforming this into something fairly efficient.
1. ... at least in any sane standard library implementation.
int c;
if((c = func1(x)) != -1) i = c;
The best implementation I could think of would be:
int i = 0; // initialize to something
const int x = -10;
const int y = func1(x);
if (y != -1)
{
i = y;
}
The const would let the compiler to any optimizations that it thinks is best (perhaps inline func1). Notice that func is only called once, which is probably best. The const y would also allow y to be kept in a register (which it would need to be anyway in order to perform the if). If you wanted to give more of a suggestion, you could do:
register const int y = func1(x);
However, the compiler is not required to honor your register keyword suggestion, so its probably best to leave it out.
EDIT BASED ON INSPIRATION FROM BRIAN'S ANSWER:
int i = ((func1(x) + 1) ?:0) - 1;
BTW, I probably wouldn't suggest using this, but it does answer the question. This is based on the SO question here. To me, I'm still confused as to the why for the question, it seems like more of a puzzle or job interview question than something that would be encountered in a "real" program? I'd certainly like to hear why this would be needed.
Compare the two:
if (strstr(a, "earth")) // A1
return x;
if (strstr(a, "ear")) // A2
return y;
and
if (strstr(a, "earth")) // B1
return x;
else if (strstr(a, "ear")) // B2
return y;
Personally, I feel that else is redundant and prevent CPU from branch prediction.
In the first one, when executing A1, it's possible to pre-decode A2. And in the second one, it will not interpret B2 until B1 is evaluated to false.
I found a lot of (maybe most of?) sources using the latter form.
Though, the latter form looks better to understand, because it's not so obviously that it will call return y only if a =~ /ear(?!th)/ without the else clause.
Your compiler probably knows that both these examples mean exactly the same thing. CPU branch prediction doesn't come into it.
I usually would choose the first option for symmetry.
(The following answers the original version of the question.)
Do you realize that the two code snippets are NOT semantically equivalent???
Consider what happens if a is "earth".
The first snippet calls foo() and then bar().
The second snippet calls foo() and skips the bar() call.
And this explains why the generated machine code is different. It has to be to implement the different semantics of the respective code fragments!
Personally, I feel that else is redundant ...
Unfortunately, your feeling is incorrect.
Lesson - write your code simply and clearly and leave optimization to the compiler ... which is going to do a far more accurate job than you can achieve.
FOLLOWUP
The snippets in the updated version of the question are now semantically identical, and the else is redundant. However:
any half decent optimizing compiler will generate identical code for the two snippets, and
it is a matter of opinion (i.e. subjective) which of the snippets is easier to understand.
Use else if to state your intentions clearly. Code is meant to be read by humans.
Let the compiler optimize this, and don't worry about optimization until your code is 1) working 2) crystal clear 3) profiled (do this in that order). When doing step 3, you'll notice that the bottlenecks are not where you supposed they would be.
Any attempt to control branch prediction or whatever low level stuff is silly: compilers are very good at optimizing and they use sophisticated methods to yield a fast code on your particular machine.
Look at output from LLVM based compilers to see what I mean: sometimes you can't even remotely understand what it does.
usually it's better to use the second way if you want to test exactly the condition for a, for the exact solution, to reduce the options for the var or const "a". if you write two separate if's you can get 2 different solutions.
for example in your situation with the exact conditions you have there let's say a= -2
A: if (a < 0)
return x; // if -2 is less than 0 will return x and it stops.
else if (a < 100)
return y; //
B: if (a < 0)
return x; // -2 is less than 0 so it will return x and passes to the next if statement;
if (a < 100)
return y; // -2 is also less than 100 and it will return y too
Why not simply write
char* str;
strstr(a, "ear")
if (str != NULL)
{
foo();
if(strstr(str, "earth") != NULL)
{
bar();
}
}