Here is the piece of code from GNU C reference manual Pg 74:
If your code uses a signed loop index, make sure that the index cannot
overflow, along with all signed expressions derived from the index.
Here is a contrived example of problematic code with two instances of
overflow.
for( i = INT_MAX - 10 ; i <= INT_MAX; i++)
if( i+1 < 0 ) //first overflow
{
report_overflow();
break;
}
Because of the two overflows, a compiler might optimize away or
transform the two comparisons in a way that is incompatible with the
wraparound assumption.
What GNU C reference manual means is that you have two possible overflows. The first one is the i++ statement in
for( i = INT_MAX - 10 ; i <= INT_MAX; i++)
and the second one would be i+1 in
if( i+1 < 0 ) //first overflow
The example C code avoids an eternal loop with the
if( i+1 < 0 ) //first overflow
{
report_overflow();
break;
}
piece of code, and to do that you're relying in signed wraparound behaviour.
However the A.3 apendix tells you that you shouldn't rely on signed wraparound behaviour because the optimizer exploits its undefined behaviour and could generate code that would behave differently from what you expect. This is the case with if( i+1 < 0 ) piece of code, which relies in that wraparound will happen when i is INT_MAX.
As a conclusion, above code could fail after being optimized by the compiler.
Converting from comment:
i <= INT_MAX is always true, so loop can never quit. So this is a bug because i++ overflows.
Because it is always true, compiler may optimize this condition out, which is obviously not what expected.
due to the break, there should be none
without the break this would be an eternal loop, and overflow on ++i
since i <= INT_MAX is true for all values of i (assuming i is an integer)
Related
In an if statement I want to include a range, e.g.:
if(10 < a < 0)
but by doing so, I get a warning "Pointless comparison". However, this works fine without any warning:
if(a<10 && a>0)
Is the first case possible to implement in C?
Note that the original version if(10 < a < 0) is perfectly legal. It just doesn't do what you might (reasonably) think it does. You're fortunate that the compiler recognized it as a probable mistake and warned you about it.
The < operator associates left-to-right, just like the + operator. So just as a + b + c really means (a + b) + c, a < b < c really means (a < b) < c. The < operator yields an int value of 0 if the condition is false, 1 if it's true. So you're either testing whether 0 is less than c, or whether 1 is less than c.
In the unlikely case that that's really what you want to do, adding parentheses will probably silence the warning. It will also reassure anyone reading your code later that you know what you're doing, so they don't "fix" it. (Again, this applies only in the unlikely event that you really want (a < b) < c).)
The way to check whether a is less than b and b is less than c is:
a < b && b < c
(There are languages, including Python, where a < b < c means a<b && b<c, as it commonly does in mathematics. C just doesn't happen to be one of those languages.)
It's not possible, you have to split the check as you did in case 2.
No it is not possible.
You have to use the second way by splitting the two conditional checks.
The first does one comparison, then compares the result of the first to the second value. In this case, the operators group left to right, so it's equivalent to (10<a) < 0. The warning it's giving you is really because < will always yield 0 or 1. The warning is telling you that the result of the first comparison can never be less than 0, so the second comparison will always yield false.
Even though the compiler won't complain about it, the second isn't really much improvement. How can a number be simultaneously less than 0, but greater than 10? Ideally, the compiler would give you a warning that the condition is always false. Presumably you want 0<a<10 and a>0 && a<10.
You can get the effect of the second using only a single comparison: if ((unsigned)a < 10) will be true only if the number is in the range 0..10. A range comparison can normally be reduced to a single comparison with code like:
if ((unsigned)(x-range_start)<(range_end-range_start))
// in range
else
// out of range.
At one time this was a staple of decent assembly language programming. I doubt many people do it any more though (I certainly don't as a rule).
As stated above, you have to split the check. Think about it from the compiler's point of view, which looks at one operator at a time. 10 < a = True or False. And then it goes to do True/False < 0, which doesn't make sense.
no,this is not valid syntax of if statement,it should have a valid constant expression,or may have logical operators in them,and is executed only,when the expression in the bracket evaluates to true,or non zero value
In an if statement I want to include a range, e.g.:
if(10 < a < 0)
but by doing so, I get a warning "Pointless comparison". However, this works fine without any warning:
if(a<10 && a>0)
Is the first case possible to implement in C?
Note that the original version if(10 < a < 0) is perfectly legal. It just doesn't do what you might (reasonably) think it does. You're fortunate that the compiler recognized it as a probable mistake and warned you about it.
The < operator associates left-to-right, just like the + operator. So just as a + b + c really means (a + b) + c, a < b < c really means (a < b) < c. The < operator yields an int value of 0 if the condition is false, 1 if it's true. So you're either testing whether 0 is less than c, or whether 1 is less than c.
In the unlikely case that that's really what you want to do, adding parentheses will probably silence the warning. It will also reassure anyone reading your code later that you know what you're doing, so they don't "fix" it. (Again, this applies only in the unlikely event that you really want (a < b) < c).)
The way to check whether a is less than b and b is less than c is:
a < b && b < c
(There are languages, including Python, where a < b < c means a<b && b<c, as it commonly does in mathematics. C just doesn't happen to be one of those languages.)
It's not possible, you have to split the check as you did in case 2.
No it is not possible.
You have to use the second way by splitting the two conditional checks.
The first does one comparison, then compares the result of the first to the second value. In this case, the operators group left to right, so it's equivalent to (10<a) < 0. The warning it's giving you is really because < will always yield 0 or 1. The warning is telling you that the result of the first comparison can never be less than 0, so the second comparison will always yield false.
Even though the compiler won't complain about it, the second isn't really much improvement. How can a number be simultaneously less than 0, but greater than 10? Ideally, the compiler would give you a warning that the condition is always false. Presumably you want 0<a<10 and a>0 && a<10.
You can get the effect of the second using only a single comparison: if ((unsigned)a < 10) will be true only if the number is in the range 0..10. A range comparison can normally be reduced to a single comparison with code like:
if ((unsigned)(x-range_start)<(range_end-range_start))
// in range
else
// out of range.
At one time this was a staple of decent assembly language programming. I doubt many people do it any more though (I certainly don't as a rule).
As stated above, you have to split the check. Think about it from the compiler's point of view, which looks at one operator at a time. 10 < a = True or False. And then it goes to do True/False < 0, which doesn't make sense.
no,this is not valid syntax of if statement,it should have a valid constant expression,or may have logical operators in them,and is executed only,when the expression in the bracket evaluates to true,or non zero value
In an if statement I want to include a range, e.g.:
if(10 < a < 0)
but by doing so, I get a warning "Pointless comparison". However, this works fine without any warning:
if(a<10 && a>0)
Is the first case possible to implement in C?
Note that the original version if(10 < a < 0) is perfectly legal. It just doesn't do what you might (reasonably) think it does. You're fortunate that the compiler recognized it as a probable mistake and warned you about it.
The < operator associates left-to-right, just like the + operator. So just as a + b + c really means (a + b) + c, a < b < c really means (a < b) < c. The < operator yields an int value of 0 if the condition is false, 1 if it's true. So you're either testing whether 0 is less than c, or whether 1 is less than c.
In the unlikely case that that's really what you want to do, adding parentheses will probably silence the warning. It will also reassure anyone reading your code later that you know what you're doing, so they don't "fix" it. (Again, this applies only in the unlikely event that you really want (a < b) < c).)
The way to check whether a is less than b and b is less than c is:
a < b && b < c
(There are languages, including Python, where a < b < c means a<b && b<c, as it commonly does in mathematics. C just doesn't happen to be one of those languages.)
It's not possible, you have to split the check as you did in case 2.
No it is not possible.
You have to use the second way by splitting the two conditional checks.
The first does one comparison, then compares the result of the first to the second value. In this case, the operators group left to right, so it's equivalent to (10<a) < 0. The warning it's giving you is really because < will always yield 0 or 1. The warning is telling you that the result of the first comparison can never be less than 0, so the second comparison will always yield false.
Even though the compiler won't complain about it, the second isn't really much improvement. How can a number be simultaneously less than 0, but greater than 10? Ideally, the compiler would give you a warning that the condition is always false. Presumably you want 0<a<10 and a>0 && a<10.
You can get the effect of the second using only a single comparison: if ((unsigned)a < 10) will be true only if the number is in the range 0..10. A range comparison can normally be reduced to a single comparison with code like:
if ((unsigned)(x-range_start)<(range_end-range_start))
// in range
else
// out of range.
At one time this was a staple of decent assembly language programming. I doubt many people do it any more though (I certainly don't as a rule).
As stated above, you have to split the check. Think about it from the compiler's point of view, which looks at one operator at a time. 10 < a = True or False. And then it goes to do True/False < 0, which doesn't make sense.
no,this is not valid syntax of if statement,it should have a valid constant expression,or may have logical operators in them,and is executed only,when the expression in the bracket evaluates to true,or non zero value
While reading http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html about undefined behavior in c, I get a question on this example.
for (i = 0; i <= N; ++i) { ... }
In this loop, the compiler can assume that the loop will iterate
exactly N+1 times if "i" is undefined on overflow, which allows a
broad range of loop optimizations to kick in. On the other hand, if
the variable is defined to wrap around on overflow, then the compiler
must assume that the loop is possibly infinite (which happens if N is
INT_MAX) - which then disables these important loop optimizations.
This particularly affects 64-bit platforms since so much code uses
"int" as induction variables.
This example is to show the C compiler could take advantage of the undefined behavior to make assumption that the execution times would be exact N+1. But I don't understand why this assumption is valid.
I can understand that if the variable is defined to wrap around on overflow and N is INT_MAX, then the for loop will be infinite because i will go from 0 to INT_MAX and overflow to INT_MIN, then loop to INT_MAX and restart from INT_MIN etc. So the compiler could not make this assumption about execution times and can not do optimization on this point.
But what about when i is undefined on overflow? In this case, i loops normally from 0 to INT_MAX, then i will be assigned INT_MAX+1, which would overflow to an undefined value such as between 0 and INT_MAX. If so, the condition i<= INT_MAX is still valid, should the for-loop not continue and also be infinite?
… then i will be assigned INT_MAX+1, which would overflow to an undefined value such as between 0 and INT_MAX.
No, that is not correct. That is written as if the rule were:
If ++i overflows, then i will be given some int value, although it is not specified which one.
However, the rule is:
If ++i overflows, the entire behavior of the program is undefined by the C standard.
That is, if ++i overflows, the C standard allows any of these things to happen:
i stays at INT_MAX.
i changes to INT_MIN.
i changes to zero.
i changes to 37.
The processor generates a trap, and the operating system terminates your process.
Some other variable changes value.
Program control jumps out of the loop, as if it had ended normally.
Anything.
Now consider this assumption used in optimization by the compiler:
… the compiler can assume that the loop will iterate exactly N+1 times…
If ++i can only set i to some int value, then the loop will not terminate, as you conclude. On the other hand, if the compiler generates code that assumes the loop will iterate exactly N+1 times, then something else will happen in the case when ++i overflows. Exactly what happens depends on the contents of the loop and what the compiler does with them. But it does not matter what: Generating this code is allowed by the C standard because whatever happens when ++i overflows is allowed by the C standard.
Lets consider an actual case:
#include <limits.h>
#include <stdio.h>
unsigned long long test_int(unsigned long long L, int N) {
for (int i = 0; i <= N; ++i) {
L++;
return L;
}
unsigned long long test_unsigned(unsigned long long L, unsigned N) {
for (unsigned i = 0; i <= N; ++i) {
L++;
return L;
}
int main() {
fprintf(stderr, "int: %llu\n", test_int(0, INT_MAX));
fprintf(stderr, "unsigned: %llu\n", test_unsigned(0, UINT_MAX));
return 0;
}
The point of the blog article is the of possible behavior of the compiler for the above code:
for test_int() the compiler can determine that for argument values from INT_MIN to -1, the function should return L unchanged, for values between 0 and INT_MAX-1, the return value should be L + N + 1 and for INT_MAX the behavior is undefined, so returning L + N + 1 is OK too, hence the code can be simplified as
unsigned long long test_int(unsigned long long L, int N) {
if (N >= 0)
L += N + 1;
return L;
}
for test_unsigned(), the same analysis yields: for argument values below UINT_MAX, the return value is L + N + 1 and for UINT_MAX there is an infinite loop:
unsigned long long test_unsigned(unsigned long long L, unsigned N) {
if (N != UINT_MAX)
return L + N + 1;
for (;;);
}
As can be seen on https://godbolt.org/z/abafdE8P4 both gcc and clang perform this optimisation for test_int, taking advantage of undefined behavior on overflow but generate iterative code for test_unsigned.
Signed integer overflow invokes the Undefined Behaviour. Programmer cannot assume that a portable program will behave the particular way.
On the other hand, a program compiled for the particular platform using particular version of the compiler and using the same versions of the libraries will behave deterministic way. But you do not know if any of those change (ie. compiler, compiler version etc etc) that the behaviour will remain the same.
So your assumptions can be valid for the particular build and execution environment, but are invalid in general.
I have a loop that has to go from N to 0 (inclusively). My i variable is of type size_t which is usually unsigned. I am currently using the following code:
for (size_t i = N; i != (size_t) -1; --i) {
...
}
Is that correct? Is there a better way to handle the condition?
Thanks,
Vincent.
Yes, it's correct and it is a very common approach. I wouldn't consider changing it.
Arithmetic on unsigned integer types is guaranteed to use modulo 2^N arithmetic (where N is the number of value bits in the type) and behaviour on overflow is well defined. The result is converted into the range 0 to 2^N - 1 by adding or subtracting multiples of 2^N (i.e. modulo 2^N arithmetic).
-1 converted to an unsigned integer type (of which size_t is one) converts to 2^N - 1. -- also uses modulo 2^N arithmetic for unsigned types so an unsigned type with value 0 will be decremented to 2^N - 1. Your loop termination condition is correct.
Just because for has a convenient place to put a test at the beginning of each iteration doesn't mean you have to use it. To handle N to 0 inclusive, the test should be at the end, at least if you care about handling the maximum value. Don't let the convenience suck you in to putting the test in the wrong place.
for (size_t i = N;; --i) {
...
if (i == 0) break;
}
A do-while loop would also work but then you'd additionally give up i being scoped to the loop.
You can use this:
for (size_t i = n + 1; i-- > 0;)
{
}
Hope that helps.
Personally, I would just use a different loop construct, but to each their own:
size_t i = N;
do {
...
} while (i --> 0);
(you could just use (i--) as the loop condition, but one should never pass up a chance to use the --> "operator").
for ( size_t i = N ; i <= N ; i-- ) { .... }
This would do it because size_t is an unsigned int. Unsigned ints are 32bits. When the variable i has a value of 0, you want your loop to execute the condition. If you perform i--, the computer does
00000000000000000000000000000000
-00000000000000000000000000000001
Which results in a clear overflow, giving a value of 111111111...1. For a signed two's complement integer, this value is clearly negative. However, the type of i is an unsigned int so the computer will interpret 111111...1 to be a very large positive value.
So you have a few options:
1) Do as above and make the loop terminate when overflow occurs.
2) Make the loop run from i = 0 to i <= N but use (N-i) instead of i in everywhere in your loop. For example, myArray[i] would become myArray[N-i] (off by one depending on what the value of N actually represents).
3) Make the condition of your for loop exploit the precedence of the unary -- operator. As another user posted,
for ( size_t i = N + 1 ; i-- > 0 ; ) { ... }
This will set i to N+1, check to see if the condition N+1 > 0 still holds. It does, but i-- has a side effect, so the value of i is decremented to i = N. Keep going until you get to i = 1. The condition will be test, 1 > 0 is true, the side effect occurs, then i = 0 and it executse.
You can use a second variable as the loop counter to make the range of iteration clear to a future reviewer.
for (size_t j=0, i=N; j<=N; ++j, --i) {
// code here ignores j and uses i which runs from N to 0
...
}
for (i=N; i+1; i--)
Since unsigned integer will roll into its max value when decremented from zero, you can try the following, provided N is less then that maximum value (someone please correct me if this is UB):
for ( size_t i = N; i <= N; i-- ) { /* ... */ }