What is faster in a for loop comparison? - c

My guess would be that in c89 Version 1 is faster because sizeof is a compile time operator, so we will be comparing with a constant. But in c99 we can take sizeof a VLA, so sizeof is a run time operator.
So which one is faster in c99?
And which one is faster in c89?
One define and array for both of them:
#define NUM_ROWS(x) (int) (sizeof(x) / sizeof((x)[0]))
int x[5] = { 0 };
Version 1:
int i;
for (i = 0; i < NUM_ROWS(x); i++) {
// code
}
Version 2:
const int length = NUM_ROWS(x);
int i;
for (i = 0; i < length; i++) {
// code
}

The only true answer to what is faster is: measure.
That said, in version 1 you evaluate and end condition at each iteration of your loop, and in version 2 you only evaluate it once.
Even if sizeof is a constant if you compiler can put the constant value directly in a register for comparison in version 1, it could probably do the same in version 2.
So version2 is in theory either faster or at worst the same speed than version1 (most likely for a constant expression).

sizeof is only evaluated at run-time if a VLA is part of the expression.
Since that's not the case, it'll be just a compile time constant, and you will get the same performance.

Related

behavior when for-loop variable overflow and compiler optimization

While reading http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html about undefined behavior in c, I get a question on this example.
for (i = 0; i <= N; ++i) { ... }
In this loop, the compiler can assume that the loop will iterate
exactly N+1 times if "i" is undefined on overflow, which allows a
broad range of loop optimizations to kick in. On the other hand, if
the variable is defined to wrap around on overflow, then the compiler
must assume that the loop is possibly infinite (which happens if N is
INT_MAX) - which then disables these important loop optimizations.
This particularly affects 64-bit platforms since so much code uses
"int" as induction variables.
This example is to show the C compiler could take advantage of the undefined behavior to make assumption that the execution times would be exact N+1. But I don't understand why this assumption is valid.
I can understand that if the variable is defined to wrap around on overflow and N is INT_MAX, then the for loop will be infinite because i will go from 0 to INT_MAX and overflow to INT_MIN, then loop to INT_MAX and restart from INT_MIN etc. So the compiler could not make this assumption about execution times and can not do optimization on this point.
But what about when i is undefined on overflow? In this case, i loops normally from 0 to INT_MAX, then i will be assigned INT_MAX+1, which would overflow to an undefined value such as between 0 and INT_MAX. If so, the condition i<= INT_MAX is still valid, should the for-loop not continue and also be infinite?
… then i will be assigned INT_MAX+1, which would overflow to an undefined value such as between 0 and INT_MAX.
No, that is not correct. That is written as if the rule were:
If ++i overflows, then i will be given some int value, although it is not specified which one.
However, the rule is:
If ++i overflows, the entire behavior of the program is undefined by the C standard.
That is, if ++i overflows, the C standard allows any of these things to happen:
i stays at INT_MAX.
i changes to INT_MIN.
i changes to zero.
i changes to 37.
The processor generates a trap, and the operating system terminates your process.
Some other variable changes value.
Program control jumps out of the loop, as if it had ended normally.
Anything.
Now consider this assumption used in optimization by the compiler:
… the compiler can assume that the loop will iterate exactly N+1 times…
If ++i can only set i to some int value, then the loop will not terminate, as you conclude. On the other hand, if the compiler generates code that assumes the loop will iterate exactly N+1 times, then something else will happen in the case when ++i overflows. Exactly what happens depends on the contents of the loop and what the compiler does with them. But it does not matter what: Generating this code is allowed by the C standard because whatever happens when ++i overflows is allowed by the C standard.
Lets consider an actual case:
#include <limits.h>
#include <stdio.h>
unsigned long long test_int(unsigned long long L, int N) {
for (int i = 0; i <= N; ++i) {
L++;
return L;
}
unsigned long long test_unsigned(unsigned long long L, unsigned N) {
for (unsigned i = 0; i <= N; ++i) {
L++;
return L;
}
int main() {
fprintf(stderr, "int: %llu\n", test_int(0, INT_MAX));
fprintf(stderr, "unsigned: %llu\n", test_unsigned(0, UINT_MAX));
return 0;
}
The point of the blog article is the of possible behavior of the compiler for the above code:
for test_int() the compiler can determine that for argument values from INT_MIN to -1, the function should return L unchanged, for values between 0 and INT_MAX-1, the return value should be L + N + 1 and for INT_MAX the behavior is undefined, so returning L + N + 1 is OK too, hence the code can be simplified as
unsigned long long test_int(unsigned long long L, int N) {
if (N >= 0)
L += N + 1;
return L;
}
for test_unsigned(), the same analysis yields: for argument values below UINT_MAX, the return value is L + N + 1 and for UINT_MAX there is an infinite loop:
unsigned long long test_unsigned(unsigned long long L, unsigned N) {
if (N != UINT_MAX)
return L + N + 1;
for (;;);
}
As can be seen on https://godbolt.org/z/abafdE8P4 both gcc and clang perform this optimisation for test_int, taking advantage of undefined behavior on overflow but generate iterative code for test_unsigned.
Signed integer overflow invokes the Undefined Behaviour. Programmer cannot assume that a portable program will behave the particular way.
On the other hand, a program compiled for the particular platform using particular version of the compiler and using the same versions of the libraries will behave deterministic way. But you do not know if any of those change (ie. compiler, compiler version etc etc) that the behaviour will remain the same.
So your assumptions can be valid for the particular build and execution environment, but are invalid in general.

Don't fully understand custom-written 'memcpy' function in C

So I was browsing the Quake engine source code earlier today and stumbled upon some written utility functions. One of them was 'Q_memcpy':
void Q_memcpy (void *dest, void *src, int count)
{
int i;
if (( ( (long)dest | (long)src | count) & 3) == 0 )
{
count>>=2;
for (i=0 ; i<count ; i++)
((int *)dest)[i] = ((int *)src)[i];
}
else
for (i=0 ; i<count ; i++)
((byte *)dest)[i] = ((byte *)src)[i];
}
I understand the whole premise of the function but I don't quite understand the reason for the bitwise OR between the source and destination address. So the sum of my questions are as follows:
Why does 'count' get used in the same bitwise arithmetic?
Why is that result's last two bits checked if they are differing?
What purpose does this whole check serve?
I'm sure it's something obvious but please excuse my ignorance because I haven't really delved into the more low level side of things when it comes to programming. I just find it interesting and want to learn more.
It is finding out whether the source and destination pointers are int aligned, and whether the count is an exact int size of bytes.
If those three things are all true, the l.s. 2 bits of them all will be 0 (assuming pointers and int are 4 bytes). So the algorithm ORs the three values, and isolates the l.s. 2 bits.
In this case, it copies int by int. Otherwise it copies char by char.
If the test fails, a more sophisticated algorithm would copy some of the leading and trailing bytes char by char and the intermediate bytes int by int.
The bitwise ORing and ANding with 3 is to check whether the source, destination and count are divisible by 4. If they are, the operation can work with 4-byte words, while this code is assuming int as 4 bytes. Otherwise the operation is performed bytewise.
It first tests if all 3 arguments are divisible by 4. If - and only if - they all are, it proceeds with copying 4 bytes at a time.
I.e. this undecoded would be
if ((long) src % 4 == 0 && (long) dst % 4 == 0 && count % 4 == 0 )
{
count = count / 4;
for (i = 0; i < count; i++)
((int *)dest)[i] = ((int *)src)[i];
}
I am not sure if they tested their compiler and it generated bad code for even a test, and therefore they decided to write it in such a convoluted way. In any case, the x | y | z will guarantee that a bit n is set in the result if it is set in any of x, y or z. Therefore if the (x | y | z) & 3 results in 0, none of the numbers had either of the 2 lowest bits set, and therefore are divisible by 4.
Of course it would be rather silly to use now - the standard library memcpy in recent library implementations is almost certainly better than this.
Therefore, on recent compilers you can optimize all calls to Q_memcpy by switching them to memcpy. GCC could generate things like 64-bit or SIMD moves with memcpy depending on the size of area to be copied.

Advantage of using compound assignment

What is the real advantage of using compound assignment in C/C++ (or may be applicable to many other programming languages as well)?
#include <stdio.h>
int main()
{
int exp1=20;
int b=10;
// exp1=exp1+b;
exp1+=b;
return 0;
};
I looked at few links like microsoft site, SO post1, SO Post2 .
But the advantage says exp1 is evaluated only once in case of compound statement. How exp1 is really evaluated twice in first case? I understand that current value of exp1 is read first and then new value is added. Updated value is written back to the same location. How this really happens at lower level in case of compound statement? I tried to compare assembly code of two cases, but I did not see any difference between them.
For simple expressions involving ordinary variables, the difference between
a = a + b;
and
a += b;
is syntactical only. The two expressions will behave exactly the same, and might well generate identical assembly code. (You're right; in this case it doesn't even make much sense to ask whether a is evaluated once or twice.)
Where it gets interesting is when the left-hand side of the assignment is an expression involving side effects. So if you have something like
*p++ = *p++ + 1;
versus
*p++ += 1;
it makes much more of a difference! The former tries to increment p twice (and is therefore undefined). But the latter evaluates p++ precisely once, and is well-defined.
As others have mentioned, there are also advantages of notational convenience and readability. If you have
variable1->field2[variable1->field3] = variable1->field2[variable2->field3] + 2;
it can be hard to spot the bug. But if you use
variable1->field2[variable1->field3] += 2;
it's impossible to even have that bug, and a later reader doesn't have to scrutinize the terms to rule out the possibility.
A minor advantage is that it can save you a pair of parentheses (or from a bug if you leave those parentheses out). Consider:
x *= i + 1; /* straightforward */
x = x * (i + 1); /* longwinded */
x = x * i + 1; /* buggy */
Finally (thanks to Jens Gustedt for reminding me of this), we have to go back and think a little more carefully about what we meant when we said "Where it gets interesting is when the left-hand side of the assignment is an expression involving side effects." Normally, we think of modifications as being side effects, and accesses as being "free". But for variables qualified as volatile (or, in C11, as _Atomic), an access counts as an interesting side effect, too. So if variable a has one of those qualifiers, a = a + b is not a "simple expression involving ordinary variables", and it may not be so identical to a += b, after all.
Evaluating the left side once can save you a lot if it's more than a simple variable name. For example:
int x[5] = { 1, 2, 3, 4, 5 };
x[some_long_running_function()] += 5;
In this case some_long_running_function() is only called once. This differs from:
x[some_long_running_function()] = x[some_long_running_function()] + 5;
Which calls the function twice.
This is what the standard 6.5.16.2 says:
A compound assignment of the form E1 op= E2 is equivalent to the simple assignment expression E1 = E1 op (E2), except that the lvalue E1 is evaluated only once
So the "evaluated once" is the difference. This mostly matters in embedded systems where you have volatile qualifiers and don't want to read a hardware register several times, as that could cause unwanted side-effects.
That's not really possible to reproduce here on SO, so instead here's an artificial example to demonstrate why multiple evaluations could lead to different program behavior:
#include <string.h>
#include <stdio.h>
typedef enum { SIMPLE, COMPOUND } assignment_t;
int index;
int get_index (void)
{
return index++;
}
void assignment (int arr[3], assignment_t type)
{
if(type == COMPOUND)
{
arr[get_index()] += 1;
}
else
{
arr[get_index()] = arr[get_index()] + 1;
}
}
int main (void)
{
int arr[3];
for(int i=0; i<3; i++) // init to 0 1 2
{
arr[i] = i;
}
index = 0;
assignment(arr, COMPOUND);
printf("%d %d %d\n", arr[0], arr[1], arr[2]); // 1 1 2
for(int i=0; i<3; i++) // init to 0 1 2
{
arr[i] = i;
}
index = 0;
assignment(arr, SIMPLE);
printf("%d %d %d\n", arr[0], arr[1], arr[2]); // 2 1 2 or 0 1 2
}
The simple assignment version did not only give a different result, it also introduced unspecified behavior in the code, so that two different results are possible depending on the compiler.
Not sure what you're after. Compound assignment is shorter, and therefore simpler (less complex) than using regular operations.
Consider this:
player->geometry.origin.position.x += dt * player->speed;
versus:
player->geometry.origin.position.x = player->geometry.origin.position.x + dt * player->speed;
Which one is easier to read and understand, and verify?
This, to me, is a very very real advantage, and is just as true regardless of semantic details like how many times something is evaluated.
Advantage of using compound assignment
There is a disadvantage too.
Consider the effect of types.
long long exp1 = 20;
int b=INT_MAX;
// All additions use `long long` math
exp1 = exp1 + 10 + b;
10 + b addition below will use int math and overflow (undefined behavior)
exp1 += 10 + b; // UB
// That is like the below,
exp1 = (10 + b) + exp1;
A language like C is always going to be an abstraction of the underlying machine opcodes. In the case of addition, the compiler would first move the left operand into the accumulator, and add the right operand to it. Something like this (pseudo-assembler code):
move 1,a
add 2,a
This is what 1+2 would compile to in assembler. Obviously, this is perhaps over-simplified, but you get the idea.
Also, compiler tend to optimise your code, so exp1=exp1+b would very likely compile to the same opcodes as exp1+=b.
And, as #unwind remarked, the compound statement is a lot more readable.

Iterating an array backward in C using an unsigned index

Is the following code, safe to iterate an array backward?
for (size_t s = array_size - 1; s != -1; s--)
array[s] = <do something>;
Note that I'm comparing s, which is unsigned, against -1;
Is there a better way?
This code is surprisingly tricky. If my reading of the C standard is correct, then your code is safe if size_t is at least as big as int. This is normally the case because size_t is usually implemented as something like unsigned long int.
In this case -1 is converted to size_t (the type of s). -1 can't be represented by an unsigned type, so we apply modulo arithmetic to bring it in range. This gives us SIZE_MAX (the largest possible value of type size_t). Similarly, decrementing s when it is 0 is done modulo SIZE_MAX+1, which also results in SIZE_MAX. Therefore your loop ends exactly where you want it to end, after processing the s = 0 case.
On the other hand, if size_t were something like unsigned short (and int bigger than short), then int could represent all possible size_t values and s would be converted to int. In other words, the comparison would be done as (int)SIZE_MAX != -1, which would always return false, thus breaking your code. But I've never seen a system where this could happen.
You can avoid any potential problems by using SIZE_MAX (which is provided by <stdint.h>) instead of -1:
for (size_t s = array_size - 1; s != SIZE_MAX; s--)
...
But my favorite solution is this:
for (size_t s = array_size; s--; )
...
Well, s will never be -1, so your ending condition will never happen. s will go from 0 to SIZE_MAX, at which point your program will probably segfault from a memory access error. The better solution would be to start at the max size, and subtract one from everywhere you use it:
for (size_t s = array_size; s > 0; s--)
array[s-1] = <do something>;
Or you can combine this functionality into the for loop's syntax:
for (size_t s = array_size; s--;)
array[s] = <do something>;
Which will subtract one before going into the loop, but checks for s == 0 before subtracting 1.
IMO in the iterations use large enough signed value. It ss easier to read by humans.

Why an expression instead of a constant, in a C for-loop's conditional?

In many programming competitions I have seen people write this type of for-loop
for(i = 0; i < (1 << 7); i++)
Unless I am missing something, that's the same as
for(i = 0; i < 128; i++)
Why use the (1 << 7) version?
Isn't calculating the condition every time an unnecessary overhead?
Yes, they are equivalent in behavior.
Then why do people use the (1 << 7) version?
I guess, they use it to document it is a power of 2.
Calculating the condition every time must be an overhead! I am unable to find the reason behind this!
Not really, any normal compiler will replace 1 << 7 by 128 and so both loops will have the same performances.
(C11, 6.6p2) "A constant expression can be evaluated during translation rather than runtime, and accordingly may be used in any place that a constant may be."
Let's translate each one of these options into plain English:
for(i = 0; i < (1 << 7); i++) // For every possible combination of 7 bits
for(i = 0; i < 128; i++) // For every number between 0 and 127
Runtime behavior should be identical in both cases.
In fact, assuming a decent compiler, even the assembly code should be identical.
So the first option is essentially used just in order to "make a statement".
You could just as well use the second option and add a comment above.
1 << 7 is a constant expression, the compiler treats it like 128, there's no overhead in run time.
Without the loop body, it's hard to say why the author uses it. Possibly it's a loop that iterates something associated with 7 bits, but that's just my guess.
Then why do people use the (1 << 7) version?
It is a form of documentation, it is not a magic number but 2^7(two to the seventh power) which is meaningful to whomever wrote the code. A modern optimizing compiler should generate the exact same code for both examples and so there is no cost to using this form and there is a benefit of adding context.
Using godbolt we can verify this is indeed the case, at least for several versions of gcc, clang and icc. Using a simple example with side effects to ensure that the code is not totally optimized away:
#include <stdio.h>
void forLoopShift()
{
for(int i = 0; i < (1 << 7); i++)
{
printf("%d ", i ) ;
}
}
void forLoopNoShift()
{
for(int i = 0; i < 128; i++)
{
printf("%d ", i ) ;
}
}
For the relevant part of the code we can see they both generate the following see it live:
cmpl $128, %ebx
What we have is an integer constant expression as defined in the draft C11 standard section 6.6 Constant expressions which says:
An integer constant expression117) shall have integer type and shall only have operands
that are integer constants, enumeration constants, character constants, sizeof
expressions whose results are integer constants,[...]
and:
Constant expressions shall not contain assignment, increment, decrement, function-call,
or comma operators, except when they are contained within a subexpression that is not
evaluated.115)
and we can see that a constant expression is allowed to be evaluated during translation:
A constant expression can be evaluated during translation rather than runtime, and
accordingly may be used in any place that a constant may be.
for(i = 0; i < (1 << 7); i++)
and
for(i = 0; i < 128; i++)
gives same performance but developer can take huge advantage in case for(i = 0; i < (1 << 7); i++) is used in a loop as
for(int k = 0; k < 8; k++)
{
for(int i = 0; i < (1 << k); i++)
{
//your code
}
}
Now it is in the inner loop upper limit i.e. (1 << k) changes with power of 2 runtime. But it is applicable if your algorithm requires this logic.
The compiler outputs the same code for both cases. you probably want to use different forms depending on the context.
You can use NUM_STEPS or NUM_ELEMENTS_IN_NETWORK_PACKET when it's a constant part or a design choice in your algorithm that you want to make clear.
Or you can write 128, to make clear it's 128, a constant.
Or write 1 << 7 if you're at a competition and the test said something like "run it 2^7 times".
Or, you can be showing off that you know bit operations!
In my humble opinion, programming is like writing a letter for two people, the compiler and the person that will have to read it. What you're meaning should be clear for both.
It's evaluated by the preprocessor since both operands are constant.
But if you're going to use the number instead of the bit shift shouldn't it be 0x0100?

Resources