comparison operator optimization in C

comparison operator optimization in C - c

I just came across an interesting case but I can't find any information about it and I was wondering if anyone here might know the answer.
So I have the macro INT_MAX which is the largest possible number an int can store on my operating system.
the following if statement has some weird behavior:
#include <stdio.h>
#include <limits.h>
int main(int argc, const char* argv[]) {
int maxValue = INT_MAX;
printf("INT_MAX: %d\n", maxValue);
printf("INT_MAX + 1: %d\n", maxValue + 1);
if (INT_MAX < maxValue + 1) {
printf("no overflow\n");
} else {
printf("overflow\n");
}
return 0;
}
by running this program we get the value of INT_MAX and the overflow of INT_MAX followed by overflow.
if I switch INT_MAX with the variable maxValue the 'else' is executed instead and "no overflow" is printed. I assume this means that the if statement or < operator is checking if both the left and right values passed to it are the same and instead of doing the actual calculation it simply returns 1 as it sees that on the right hand side we're adding a positive value to the same variable.
So is this what is actually happening or is it something else entirely?
Thanks!
edit: INT_MAX not MAX_INT

I assume this means that the if statement or < operator is checking if both the left and right values passed to it are the same and instead of doing the actual calculation it simply returns 1 as it sees that on the right hand side we're adding a positive value to the same variable.
So is this what is actually happening or is it something else entirely?
That is an optimization that compilers commonly make. It is likely it is what is occurring in your example, although proving this particular optimization is responsible rather than some other behavior in the compiler would require diving into compiler internals. Godbolt does show that Clang and GCC compile the following code to a constant return value of 1:
int foo(int x)
{
return x < x+1;
}
The assembly generated by Clang is:
foo: # #foo
mov eax, 1
ret

When the calculation result goes beyond the range that the result type can represent, undefined behavior is invoked and anything is allowed to happen.
Quote from N1570 6.5 Expressions 5:
If an exceptional condition occurs during the evaluation of an expression (that is, if the
result is not mathematically defined or not in the range of representable values for its
type), the behavior is undefined.
The result of maxValue + 1 when maxValue = INT_MAX will go beyond the range of int, so undefined behavior is invoked here.
On the other hand, if you use unsigned integer, calculating UINT_MAX + 1 won't invoke undefined behavior because results of unsigned integer calcuhation is defined as modulo to the range of the types.
Quote from N1570 6.2.5 Types 9:
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
Therefore, this code will print overflow even after replacing maxValue < maxValue + 1 to UINT_MAX < maxValue + 1.
#include <stdio.h>
#include <limits.h>
int main(int argc, const char* argv[]) {
unsigned int maxValue = UINT_MAX;
printf("UINT_MAX: %u\n", maxValue);
printf("UINT_MAX + 1: %u\n", maxValue + 1);
if (maxValue < maxValue + 1) {
printf("no overflow\n");
} else {
printf("overflow\n");
}
return 0;
}

Related

behavior when for-loop variable overflow and compiler optimization

While reading http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html about undefined behavior in c, I get a question on this example.
for (i = 0; i <= N; ++i) { ... }
In this loop, the compiler can assume that the loop will iterate
exactly N+1 times if "i" is undefined on overflow, which allows a
broad range of loop optimizations to kick in. On the other hand, if
the variable is defined to wrap around on overflow, then the compiler
must assume that the loop is possibly infinite (which happens if N is
INT_MAX) - which then disables these important loop optimizations.
This particularly affects 64-bit platforms since so much code uses
"int" as induction variables.
This example is to show the C compiler could take advantage of the undefined behavior to make assumption that the execution times would be exact N+1. But I don't understand why this assumption is valid.
I can understand that if the variable is defined to wrap around on overflow and N is INT_MAX, then the for loop will be infinite because i will go from 0 to INT_MAX and overflow to INT_MIN, then loop to INT_MAX and restart from INT_MIN etc. So the compiler could not make this assumption about execution times and can not do optimization on this point.
But what about when i is undefined on overflow? In this case, i loops normally from 0 to INT_MAX, then i will be assigned INT_MAX+1, which would overflow to an undefined value such as between 0 and INT_MAX. If so, the condition i<= INT_MAX is still valid, should the for-loop not continue and also be infinite?

… then i will be assigned INT_MAX+1, which would overflow to an undefined value such as between 0 and INT_MAX.
No, that is not correct. That is written as if the rule were:
If ++i overflows, then i will be given some int value, although it is not specified which one.
However, the rule is:
If ++i overflows, the entire behavior of the program is undefined by the C standard.
That is, if ++i overflows, the C standard allows any of these things to happen:
i stays at INT_MAX.
i changes to INT_MIN.
i changes to zero.
i changes to 37.
The processor generates a trap, and the operating system terminates your process.
Some other variable changes value.
Program control jumps out of the loop, as if it had ended normally.
Anything.
Now consider this assumption used in optimization by the compiler:
… the compiler can assume that the loop will iterate exactly N+1 times…
If ++i can only set i to some int value, then the loop will not terminate, as you conclude. On the other hand, if the compiler generates code that assumes the loop will iterate exactly N+1 times, then something else will happen in the case when ++i overflows. Exactly what happens depends on the contents of the loop and what the compiler does with them. But it does not matter what: Generating this code is allowed by the C standard because whatever happens when ++i overflows is allowed by the C standard.

Lets consider an actual case:
#include <limits.h>
#include <stdio.h>
unsigned long long test_int(unsigned long long L, int N) {
for (int i = 0; i <= N; ++i) {
L++;
return L;
}
unsigned long long test_unsigned(unsigned long long L, unsigned N) {
for (unsigned i = 0; i <= N; ++i) {
L++;
return L;
}
int main() {
fprintf(stderr, "int: %llu\n", test_int(0, INT_MAX));
fprintf(stderr, "unsigned: %llu\n", test_unsigned(0, UINT_MAX));
return 0;
}
The point of the blog article is the of possible behavior of the compiler for the above code:
for test_int() the compiler can determine that for argument values from INT_MIN to -1, the function should return L unchanged, for values between 0 and INT_MAX-1, the return value should be L + N + 1 and for INT_MAX the behavior is undefined, so returning L + N + 1 is OK too, hence the code can be simplified as
unsigned long long test_int(unsigned long long L, int N) {
if (N >= 0)
L += N + 1;
return L;
}
for test_unsigned(), the same analysis yields: for argument values below UINT_MAX, the return value is L + N + 1 and for UINT_MAX there is an infinite loop:
unsigned long long test_unsigned(unsigned long long L, unsigned N) {
if (N != UINT_MAX)
return L + N + 1;
for (;;);
}
As can be seen on https://godbolt.org/z/abafdE8P4 both gcc and clang perform this optimisation for test_int, taking advantage of undefined behavior on overflow but generate iterative code for test_unsigned.

Signed integer overflow invokes the Undefined Behaviour. Programmer cannot assume that a portable program will behave the particular way.
On the other hand, a program compiled for the particular platform using particular version of the compiler and using the same versions of the libraries will behave deterministic way. But you do not know if any of those change (ie. compiler, compiler version etc etc) that the behaviour will remain the same.
So your assumptions can be valid for the particular build and execution environment, but are invalid in general.

How are these two methods to find what power of 2 a number is, different?

So, let's say I have a number N that's guaranteed to be a power of 2 and is always greater than 0. Now, I wrote two C methods to find what power of 2 N is, based on bitwise operators -
Method A -
int whichPowerOf2(long long num) {
int ret = -1;
while (num) {
num >>= 1;
ret += 1;
}
return ret;
}
Method B -
int whichPowerOf2(long long num) {
int idx = 0;
while (!(num & (1<<idx))) idx += 1;
return idx;
}
Intuitively, the two methods seem one and the same and also return the same values for different (smaller) values of N. However, Method B doesn't work for me when I try to submit my solution to a coding problem.
Can anyone tell me what's going on here? Why is Method A right and Method B wrong?

The problem is with this subexpression:
1<<idx
The constant 1 has type int. If idx becomes larger than the bit width of an int, you invoked undefined behavior. This is specified in section 6.5.7p3 of the C standard regarding bitwise shift operators:
The integer promotions are performed on each of the operands. The type
of the result is that of the promoted left operand. If the value of
the right operand is negative or is greater than or equal to the width
of the promoted left operand, the behavior is undefined.
Change the constant to 1LL to give it type long long, matching the type of num.
while (!(num & (1LL<<idx))) idx += 1;

In your Method B, the following line can cause undefined behaviour:
while (!(num & (1<<idx))) idx += 1;
Why? Well, the expression 1<<idx is evaluated as an int because the constant 1 is an int. Further, as num is a long long (which we'll assume has more bits than an int), then you could end up left-shifting by more than the number of bits in an int.
To fix the issue, use the LL suffix on the constant:
while (!(num & (1LL<<idx))) idx += 1;

Finding maximum value of a short int variable in C

I was working on Exercise 2-1 of K&R, the goal is to calculate the range of different variable types, bellow is my function to calculate the maximum value a short int can contain:
short int max_short(void) {
short int i = 1, j = 0, k = 0;
while (i > k) {
k = i;
if (((short int)2 * i) > (short int)0)
i *= 2;
else {
j = i;
while (i + j <= (short int)0)
j /= 2;
i += j;
}
}
return i;
}
My problem is that the returned value by this function is: -32768 which is obviously wrong since I'm expecting a positive value. I can't figure out where the problem is, I used the same function (with changes in the variables types) to calculate the maximum value an int can contain and it worked...
I though the problem could be caused by comparison inside the if and while statements, hence the typecasting but that didn't help...
Any ideas what is causing this ? Thanks in advance!
EDIT: Thanks to Antti Haapala for his explanations, the overflow to the sign bit results in undefined behavior NOT in negative values.

You can't use calculations like this to deduce the range of signed integers, because signed integer overflow has undefined behaviour, and narrowing conversion at best results in an implementation-defined value, or a signal being raised. The proper solution is to just use SHRT_MAX, INT_MAX ... of <limits.h>. Deducing the maximum value of signed integers via arithmetic is a trick question in standardized C language, and has been so ever since the first standard was published in 1989.
Note that the original edition of K&R predates the standardization of C by 11 years, and even the 2nd one - the "ANSI-C" version predates the finalized standard and differs from it somewhat - they were written for a language that wasn't almost, but not quite, entirely unlike the C language of this day.
You can do it easily for unsigned integers though:
unsigned int i = -1;
// i now holds the maximum value of `unsigned int`.

Per definition, you cannot calculate the maximum value of a type in C, by using variables of that very same type. It simply doesn't make any sense. The type will overflow when it goes "over the top". In case of signed integer overflow, the behavior is undefined, meaning you will get a major bug if you attempt it.
The correct way to do this is to simply check SHRT_MAX from limits.h.
An alternative, somewhat more questionable way would be to create the maximum of an unsigned short and then divide that by 2. We can create the maximum by taking the bitwise inversion of the value 0.
#include <stdio.h>
#include <limits.h>
int main()
{
printf("%hd\n", SHRT_MAX); // best way
unsigned short ushort_max = ~0u;
short short_max = ushort_max / 2;
printf("%hd\n", short_max);
return 0;
}
One note about your code:
Casts such as ((short int)2*i)>(short int)0 are completely superfluous. Most binary operators in C such as * and > implement something called "the usual arithmetic conversions", which is a way to implicitly convert and balance types of an expression. These implicit conversion rules will silently make both of the operands type int despite your casts.

You forgot to cast to short int during comparison
OK, here I assume that the computer would handle integer overflow behavior by changing into negative integers, as I believe that you have assumed in writing this program.
code that outputs 32767:
#include <stdlib.h>
#include <stdio.h>
#include <malloc.h>
short int max_short(void)
{
short int i = 1, j = 0, k = 0;
while (i>k)
{
k = i;
if (((short int)(2 * i))>(short int)0)
i *= 2;
else
{
j = i;
while ((short int)(i + j) <= (short int)0)
j /= 2;
i += j;
}
}
return i;
}
int main() {
printf("%d", max_short());
while (1);
}
added 2 casts

warning: comparison of unsigned expression >= 0 is always true

I have the following error when compiling a C file:
t_memmove.c: In function ‘ft_memmove’:
ft_memmove.c:19: warning: comparison of unsigned expression >= 0 is always true
Here's the full code, via cat ft_memmove.c:
#include "libft.h"
#include <string.h>
void *ft_memmove(void *s1, const void *s2, size_t n)
{
char *s1c;
char *s2c;
size_t i;
if (!s1 || !s2 || !n)
{
return s1;
}
i = 0;
s1c = (char *) s1;
s2c = (char *) s2;
if (s1c > s2c)
{
while (n - i >= 0) // this triggers the error
{
s1c[n - i] = s2c[n - i];
++i;
}
}
else
{
while (i < n)
{
s1c[i] = s2c[i];
++i;
}
}
return s1;
}
I do understand that size_t is unsigned and that both integers will be >= 0 because of that. But since I'm subtracting one from the other, I don't get it. Why does this error come up?

If you subtract two unsigned integers in C, the result will be interpreted as unsigned. It doesn't automatically treat it as signed just because you subtracted. One way to fix that is use n >= i instead of n - i >= 0.

consider this loop:
for(unsigned int i=5;i>=0;i--)
{
}
This loop will be infinite because whenever i becomes -1 it'll be interprated as a very large possitive value as sign bit is absent in unsigned int.
This is the reason a warning is generated here

According to section 6.3.1.8 of the draft C99 standard Usual arithmetic conversions, since they are both of the same type, the result will also be size_t. The section states:
[...]Unless explicitly stated otherwise, the common real type is also the corresponding real type of the result[...]
and later on says:
If both operands have the same type, then no further conversion is needed.
mathematically you can just move the i over to the other side of the expression like so:
n >= i

Arithmetic on unsigned results in an unsigned and that's why you are getting this warning. Better to change n - i >= 0 to n >= i.

Operations with unsigned operands are performed in the domain of unsigned type. Unsigned arithmetic follows the rules of modular arithmetic. This means that the result will never be negative, even if you are subtracting something from something. For example 1u - 5u does not produce -4. If produces UINT_MAX - 3, which is a huge positive value congruent to -4 modulo UINT_MAX + 1.

c code output unexpected/expected behaviour

what is wrong with this code ? can anyone explain ?
#include <stdio.h>
#include <malloc.h>
#define TOTAL_ELEMENTS (sizeof(array) / sizeof(array[0]))
int array[] = {23,34,12,17,204,99,16};
int main()
{
int num;
int d;
int size = TOTAL_ELEMENTS -2;
printf("%d\n",(TOTAL_ELEMENTS-2));
for(d=-1;d <= (TOTAL_ELEMENTS-2);d++)
printf("%d\n",array[d+1]);
return 0;
}
when i print it gives 5, but inside for loop what is happening ?

The sizeof operator returns a value of type size_t, which is an unsigned value. In your for loop condition test:
d <= (TOTAL_ELEMENTS-2)
you are comparing a signed value (d) with an unsigned value (TOTAL_ELEMENTS-2). This is usually a warning condition, and you should turn up the warning level on your compiler so you'll properly get a warning message.
The compiler can only generate code for either a signed or an unsigned comparison, and in this case the comparison is unsigned. The integer value in d is converted to an unsigned value, which on a 2's complement architecture ends up being 0xFFFFFFFF or similar. This is not less than your TOTAL_ELEMENTS-2 value, so the comparison is false.

You're starting the loop by setting d = -1, it should be d = 0. So for the first element you're reading random bits of memory.
If you fix that, then you can change your printf to be
printf("%d\n",array[d]);
As you've also marked this as homework, I'd advise also take a look at your loop terminating condition.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight