I need help understanding something about overflow with signed integers.
I have read in this post Wrap around explanation for signed and unsigned variables in C? that C language (or at least some C compilers) have something called "Undefined behaviour" as a result of overflow with signed integers.
In this post, people said "The GCC compiler assumes that overflow for signed integers never occur so that the compiler can optimize"; other people said "You can't rely on wraparound at the time when working with signed integers".
I have used Dev cpp, but I'm not sure if this IDE works with GCC so I installed Code Blocks and now I'm sure it works with GCC (at least in my config), and I overflowed a signed integer variable to experiment with the things the people said, but I found that when it overflows, the IDE doesn't show an error or a warning and the signed integer shows wraparound behaviour. So, can you help me to clarify this situation?
Also I want to ask you for help about the concept "Strict overflow" and the "option" -Wstrict-overflow.
… the signed integer shows a wrap around behaviour…
Here is an example where GCC and Clang do not show wraparound behavior:
#include <limits.h>
#include <stdio.h>
void foo(int x)
{
if (x - INT_MAX <= 0)
printf("True.\n");
else
printf("False.\n");
}
If x - INT_MAX wrapped around, and this routine were called with −2 for x, then x - INT_MAX would wrap around to INT_MAX. (For example, if INT_MAX is 231−1, then −2 − (231−1) = −231−1, and wrapping it modulo 232 gives −231−1 + 232 = 231−1. Then x - INT_MAX would be positive, so x - INT_MAX <= 0 would be false.) So the routine could print “False.” some of the times it is called.
However, when we compile it with GCC and -O3, we see the compiler has optimized it to code that only prints “True.” This shows the compiler is not assuming that arithmetic wraps.
The compiler, or its writers, can reason:
If x - INT_MAX does not overflow, then it must give a result less than or equal to zero, because there is no int value for x that is greater than INT_MAX. In this case, we must execute printf("True.\n");.
If x - INT_MAX does overflow, then the behavior is not defined by the C standard. In this case, we can execute any code we desire, and it is easier for optimization to execute the same code as the other case, printf("True.\n");.
This is equivalent to reasoning:
x - INT_MAX does not overflow. Therefore, it is less than or equal to zero, so x - INT_MAX <= 0 is always true, and printf("True.\n"); is always executed. So we can discard the else case.
GCC and Clang have a switch -fwrapv, that extends the C standard by defining addition, subtraction, and multiplication of signed integers to wrap. When we compile with this switch, we can see the above reasoning no longer applies. It is possible for x - INT_MAX <= 0 to be false, and so the compiler generates both code paths.
Related
I am solving one of the lab exercises from the CS:APP course as a self-study.
In the CS:APP course the maximum positive number, that can be represented with 4 bytes in two's complement, is marked as Tmax (which is equal to the 0x7fffffff).
Likewise, the most negative number is marked as Tmin (which is equal to the 0x80000000).
The goal of the exercise was to implement a isTmax() function which should return 1, when given an Tmax, otherwise it should return 0. This should be done only with a restricted set of operators which are: ! ~ & ^ | +, the maximum number of operators is 10.
Below you can see my implementation of isTmax() function, with comments explaining how it should work.
#include <stdio.h>
int isTmax(int x)
{
/* Ok, lets assume that x really is tMax.
* This means that if we add 1 to it we get tMin, lets call it
* possible_tmin. We can produce an actual tMin with left shift.
* We can now xor both tmins, lets call the result check.
* If inputs to xor are identical then the check will be equal to
* 0x00000000, if they are not identical then the result will be some
* value different from 0x00000000.
* As a final step we logicaly negate check to get the requested behaviour.
* */
int possible_tmin = x + 1;
int tmin = 1 << 31;
int check = possible_tmin ^ tmin;
int negated_check = !check;
printf("input =\t\t 0x%08x\n", x);
printf("possible_tmin =\t 0x%08x\n", possible_tmin);
printf("tmin =\t\t 0x%08x\n", tmin);
printf("check =\t\t 0x%08x\n", check);
printf("negated_check =\t 0x%08x\n", negated_check);
return negated_check;
}
int main()
{
printf("output: %i", isTmax(0x7fffffff));
return 0;
}
The problem that I am facing is that I get different output whether I set an optimization flag when compiling the program. I am using gcc 11.1.0.
With no optimizations I get this output, which is correct for the given input:
$ gcc main.c -lm -m32 -Wall && ./a.out
input = 0x7fffffff
possible_tmin = 0x80000000
tmin = 0x80000000
check = 0x00000000
negated_check = 0x00000001
output: 1
With optimization enabled I get this output, which is incorrect.
gcc main.c -lm -m32 -Wall -O1 && ./a.out
input = 0x7fffffff
possible_tmin = 0x80000000
tmin = 0x80000000
check = 0x00000000
negated_check = 0x00000000
output: 0
For some reason the logical negation is not applied to the check variable when optimization is enabled.
The problem persists with any other level of optimization (-O2, -O3, -Os).
Even if I write the expressions as a one-liner return !((x + 1) ^ (1 << 31)); nothing changes.
I can "force" a correct behavior If I declare check as a volatile.
I am using the same level of optimization as is used by the automated checker that came with the exercise, If I turn it off my code passes all checks.
Can anyone shed on some light why is this happening? Why doesn't the logical negation happen?
EDIT: I have added a section with the extra guidelines and restrictions connected to the exercise that I forgot to include with the original post. Specifically, I am not allowed to use any other data type instead of int. I am not sure if that also includes literal suffix U.
Replace the "return" statement in each function with one
or more lines of C code that implements the function. Your code
must conform to the following style:
int Funct(arg1, arg2, ...) {
/* brief description of how your implementation works */
int var1 = Expr1;
...
int varM = ExprM;
varJ = ExprJ;
...
varN = ExprN;
return ExprR;
}
Each "Expr" is an expression using ONLY the following:
1. Integer constants 0 through 255 (0xFF), inclusive. You are
not allowed to use big constants such as 0xffffffff.
2. Function arguments and local variables (no global variables).
3. Unary integer operations ! ~
4. Binary integer operations & ^ | + << >>
Some of the problems restrict the set of allowed operators even further.
Each "Expr" may consist of multiple operators. You are not restricted to
one operator per line.
You are expressly forbidden to:
1. Use any control constructs such as if, do, while, for, switch, etc.
2. Define or use any macros.
3. Define any additional functions in this file.
4. Call any functions.
5. Use any other operations, such as &&, ||, -, or ?:
6. Use any form of casting.
7. Use any data type other than int. This implies that you
cannot use arrays, structs, or unions.
You may assume that your machine:
1. Uses 2s complement, 32-bit representations of integers.
2. Performs right shifts arithmetically.
3. Has unpredictable behavior when shifting an integer by more
than the word size.
The specific cause is most likely in 1 << 31. Nominally, this would produce 231, but 231 is not representable in a 32-bit int. In C 2018 6.5.7 4, where the C standard specifies the behavior of <<, it says the behavior in this case is not defined.
When optimization is disabled, the compiler may generate a processor instruction that gives 1 left 31 bits. This produces the bit pattern 0x80000000, and subsequent instructions interpret that as −231.
In contrast, with optimization enabled, the optimization software recognizes that 1 << 31 is not defined and does not generate a shift instruction for it. It may replace it with a compile-time value. Since the behavior is not defined by the C standard, the compiler is allowed to use any value for that. It might use zero, for example. (Since the entire behavior is not defined, not just the result, the compiler is actually allowed toreplace this part of your program with anything. It could use entirely different instructions or just abort.)
You can start to fix that by using 1u << 31. That is defined because 231 fits in the unsigned int type. However, there is a problem when assigning that to tmin, because tmin is an int, and the value still does not fit in an int. However, for this conversion, the behavior is implementation-defined, not undefined. Common C implementations define the conversion to wrap modulo 232, which means that the assignment will store −231 in tmin. However, an alternative is to change tmin from int to unsigned int (which may also be written just as unsigned) and then work with unsigned integers. That will give fully defined behavior, rather than undefined or implementation-defined, except for assuming the int width is 32 bits.
Another problem is x + 1. When x is INT_MAX, that overflows. That is likely not the cause of the behavior you observe, as common compilers simply wrap the result. Nonetheless, it can be corrected similarly, by using x + 1u and changing the type of possible_tmin to unsigned.
That said, the desired result can be computed with return ! (x ^ ~0u >> 1);. This takes zero as an unsigned int, complements it to produce all 1 bits, and shifts it right one bit, which gives a single 0 bit followed by all 1 bits. That is the INT_MAX value, and it works regardless of the width of int. Then this is XORed with x. The result of that has all zero bits if and only if x is also INT_MAX. Then ! either changes that zero into 1 or changes a non-zero value into 0.
Change the type of the variables from int to unsigned int (or just unsigned) because bitwise operations with signed values cause undefined behavior.
#Voo made a correct observation, x+1 created an undefined behavior, which was not apparent at first as the printf calls did not show anything weird happening.
In reference to C11 draft, section 3.4.3 and C11 draft, section H.2.2, I'm looking for "C" implementations that implement behaviour other than modulo arithmetic for signed integers.
Specifically, I am looking for instances where this is the default behaviour, possibly due to the underlying machine architecture.
Here's a code sample and terminal session that illustrates modulo arithmetic behaviour for signed integers:
overflow.c:
#include <stdio.h>
#include <limits.h>
int main(int argc, char *argv[])
{
int a, b;
printf ( "INT_MAX = %d\n", INT_MAX );
if ( argc == 2 && sscanf(argv[1], "%d,%d", &a, &b) == 2 ) {
int c = a + b;
printf ( "%d + %d = %d\n", a, b, c );
}
return 0;
}
Terminal session:
$ ./overflow 2000000000,2000000000
INT_MAX = 2147483647
2000000000 + 2000000000 = -294967296
Even with a "familiar" compiler like gcc, on a "familiar" platform like x86, signed integer overflow can do something other than the "obvious" twos-complement wraparound behavior.
One amusing (or possibly horrifying) example is the following (see on godbolt):
#include <stdio.h>
int main(void) {
for (int i = 0; i >= 0; i += 1000000000) {
printf("%d\n", i);
}
printf("done\n");
return 0;
}
Naively, you would expect this to output
0
1000000000
2000000000
done
And with gcc -O0 you would be right. But with gcc -O2 you get
0
1000000000
2000000000
-1294967296
-294967296
705032704
...
continuing indefinitely. The arithmetic is twos-complement wraparound, all right, but something seems to have gone wrong with the comparison in the loop condition.
In fact, if you look at the assembly output, you'll see that gcc has omitted the comparison entirely, and made the loop unconditionally infinite. It is able to deduce that if there were no overflow, the loop could never terminate, and since signed integer overflow is undefined behavior, it is free to have the loop not terminate in that case either. The simplest and "most efficient" legal code is therefore to never terminate at all, since that avoids an "unnecessary" comparison and conditional jump.
You might consider this either cool or perverse, depending on your point of view.
(For extra credit: look at what icc -O2 does and try to explain it.)
On many platforms, requiring that a compiler perform precise integer-size truncation would cause many constructs to run less efficiently than would be possible if they were allowed to use looser truncation semantics. For example, given int muldiv(int x, ind y) { return x*y/60; }, a compiler that was allowed to use loose integer semantics could replace muldiv(x,240); with x<<2, but one which was required to use precise semantics would need to actually perform the multiplication and division. Such optimizations are useful, and generally won't pose problems if casting operators are used in cases where programs need mod-reduced arithmetic, and compilers process a cast to a particular size as implying truncation to that size.
Even when using unsigned values, the presence of a cast in (uint32_t)(uint32a-uint32b) > uint32c will make the programmer's intention clearer, and would be necessary to ensure that code will operate the same on systems with 64-bit int as on those with 32-bit int, so if one wants to test for integer wraparound, even on a compiler that would define the behavior, I would regard (int)(x+someUnsignedChar) < x as superior to `x+someUnsignedChar < x because the cast would let a human reader know the code was deliberately treating values as something other than normal mathematical integers.
The big problem is that some compilers are prone to generate code which behaves nonsensically in case of integer overflow. Even a construct like unsigned mul_mod_65536(unsigned short x, unsigned short y) { return (x*y) & 0xFFFFu; } which the authors of the Standard expected commonplace implementations to process as in a way indistinguishable from unsigned math, will sometimes cause gcc to generate nonsensical code in cases where x would exceed INT_MAX/y.
According to this link, performing INT_MIN / -1 division operation will result in the program to terminate in i386 CPUs. My processor is of 32-bit architecture and I use GCC compiler. I have done the following experiments to check it.
int a = INT_MIN;
int b = -1;
int c = a / b;
printf("%d\n",c);
As per the information specified in the link mentioned above this program gets terminated throwing a Floating point exception. But it wasn't the same when I tried it in a different manner.
int c = INT_MIN / -1;
printf("%d\n",c);
The compiler threw the following warning after compiling this program.
iso.c: In function ‘main’:
iso.c:6:18: warning: integer overflow in expression [-Woverflow]
int c = INT_MIN / -1;
_____________^
But I got the output -2147483648. Once again I did more two experiments.
int a = INT_MIN;
int b = -1;
printf("%d\n",a / b);
This was a floating point exception.
printf("%d\n",INT_MIN / -1);
This threw the following compiler warning.
iso.c: In function ‘main’:
iso.c:6:24: warning: integer overflow in expression [-Woverflow]
printf("%d\n",INT_MIN / -1);
__________________^
And the output of this program was again -2147483648.
After doing all these experiments, I have noticed that result of division operation done directly on constants differs from the result of the division operation done on variables. So what exactly is making this difference?
Both results are acceptable according to standard. Draft n1256 for C99 says (emphasize mine):
6.5 Expressions...
5 If an exceptional condition occurs during the evaluation of an expression (that is, if the
result is not mathematically defined or not in the range of representable values for its
type), the behavior is undefined.
In 2's complement integer representation, INT_MIN/-1 is INT_MAX + 1 so the operation invokes Undefined Behaviour, so any result (including crash) is acceptable
As explained by #tilz0R in his comment, when the values are passed in variables, the operation is executed at run time and raises a SIGFPE signal. But when the operation only involves compile time constants, the operation is executed by the compiler at compiler time. In gcc implementation, the compiler protect itself against the error and simply uses its best represention for INT_MAX + 1. In a 32 bit 2's complement implementation, INT_MAX is 0x7fffffff, so INT_MAX + 1 is (after signed overflow) 0x80000000 or INT_MIN again (-2147483648)
Apple Secure Coding Guide says the following (page 27):
Also, any bits that overflow past the length of an integer variable (whether signed or unsigned) are dropped.
However, regards to signed integer overflow C standard (89) says:
An example of undefined behavior is the behavior on integer overflow.
and
If an exception occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not representable), the behavior is undefined.
Is the Coding Guide wrong? Is there something here that I don't get? I am not convinced myself that Apple Secure Coding Guide could get this wrong.
Here is a second opinion, from a static analyzer described as detecting undefined behavior:
int x;
int main(){
x = 0x7fffffff + 1;
}
The analyzer is run so:
$ frama-c -val -machdep x86_32 t.c
And it produces:
[kernel] preprocessing with "gcc -C -E -I. t.c"
[value] Analyzing a complete application starting at main
...
t.c:4:[kernel] warning: signed overflow. assert 0x7fffffff+1 ≤ 2147483647;
...
[value] Values at end of function main:
NON TERMINATING FUNCTION
This means that the program t.c contains undefined behavior, and that no execution of it ever terminates without causing undefined behavior.
Let's take this example:
1 << 32
If we assume 32-bit int, C clearly says it is undefined behavior. Period.
But any implementation can define this undefined behavior.
gcc for example says (while not very explicit in defining the behavior):
GCC does not use the latitude given in C99 only to treat certain aspects of signed '<<' as undefined, but this is subject to change.
http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
I don't know for clang but I suspect that as for gcc, the evaluation of an expression like 1 << 32 would give no surprise (that is, evaluate to 0).
But even if it is defined on implementations running in Apple operating systems, a portable program should not make use of expressions that invoke undefined behavior in the C language.
EDIT: I thought the Apple sentence was dealing only with bitwise << operator. It looks like it's more general and in that case for C language, they are utterly wrong.
The two statements are not mutually incompatible.
The standard does not define what behaviour each implementation is required to provide (so different implementations can do different things and still be standard conformant).
Apple is allowed to define the behaviour of its implementation.
You as a programmer would be well advised to treat the behaviour as undefined since your code may need to be moved to other platforms where the behaviour is different, and perhaps because Apple could, in theory, change its mind in the future and still conform to the standard.
Consider the code
void test(int mode)
{
int32_t a = 0x12345678;
int32_t b = mode ? a*0x10000 : a*0x10000LL;
return b;
}
If this method is invoked with a mode value of zero, the code will compute the long long value 0x0000123456780000 and store it into a. The behavior of this is fully defined by the C standard: if bit 31 of the result is clear, it will lop off all but the bottom 32 bits and store the resulting (positive) integer into a. If bit 31 were set and the result were being stored to a 32-bit int rather than a variable of type int32_t, the implementation would have some latitude, but implementations are only allowed to define int32_t if they would perform such narrowing conversions according to the rules of two's-complement math.
If this method were invoked with a non-zero mode value, then the numerical computation would yield a result outside the range of the temporary expression value, and as such would cause Undefined Behavior. While the rules dictate what should happen if a calculation performed on a longer type is stored into a shorter one, they do not indicate what should happen if calculations don't fit in the type with which they are performed. A rather nasty gap in the standard (which should IMHO be plugged) occurs with:
uint16_t multiply(uint16_t x, uint16_t y)
{
return x*y;
}
For all combinations of x and y values where the Standard says anything about what this function should do, the Standard requires that it compute and return the product mod 65536. If the Standard were to mandate that for all combinations of x and y values 0-65535 this method must return the arithmetical value of (x*y) mod 65536, it would be mandating behavior with which 99.99% of standards-compliant compilers would already be in conformance. Unfortunately, on machines where int is 32 bits, the Standard presently imposes no requirements with regard to this function's behavior in cases where the arithmetical product would be larger than 2147483647. Even though any portion of the intermediate result beyond the bottom 16 bits will ignored, the code will try to evaluate the result using a 32-bit signed integer type; the Standard imposes no requirements on what should happen if a compiler recognizes that the product will overflow that type.
This question already has answers here:
for every int x: x+1 > x .... is this always true?
(4 answers)
Closed 9 years ago.
for (i = 0; i <= N; ++i) { ... }
This particular statement will cause an infinite loop if N is INT_MAX.
Having known that Unsigned Overflows are wrapping overflows, assuming i and N to unsigned, compiler can assume that the loop will iterate exactly N+1 times if i is undefined on overflow.
The thing to note here is: if I make the loops as,
for (i = 0; i < N; ++i) { ... }
Will this still be undefined behav?
Why INT_MAX + 1 is not surely equal to INT_MIN in case of signed integers?
INT_MAX + 1
this operation invokes undefined behavior. Signed integer overflow is undefined behavior in C.
It can result to INT_MIN or the implementation can consider this expression to be positive or the program can crash. Do not let a portable program compute this expression.
Why INT_MAX + 1 is not surely equal to INT_MIN in case of signed integers?
First, the behaviour on integer overflow is undefined by the C standard.
Most implementations seem to let the number just overflow silently, so let's assume that is the case.
Second, the C standard does not assume twos's complement integers. Most platforms use it, especially newer ones. There are (were) older platforms that use other integer representations, for instance one's complement. Overflow in one's complement results in negative zero.
Relying on undefined behaviour to work in any particular way is really bad programming practice as it makes the program so much less portable. Even OS or compiler upgrades may change undefined behaviour, so it might not even be portable between different versions of the same OS.