This question already has answers here:
for every int x: x+1 > x .... is this always true?
(4 answers)
Closed 9 years ago.
for (i = 0; i <= N; ++i) { ... }
This particular statement will cause an infinite loop if N is INT_MAX.
Having known that Unsigned Overflows are wrapping overflows, assuming i and N to unsigned, compiler can assume that the loop will iterate exactly N+1 times if i is undefined on overflow.
The thing to note here is: if I make the loops as,
for (i = 0; i < N; ++i) { ... }
Will this still be undefined behav?
Why INT_MAX + 1 is not surely equal to INT_MIN in case of signed integers?
INT_MAX + 1
this operation invokes undefined behavior. Signed integer overflow is undefined behavior in C.
It can result to INT_MIN or the implementation can consider this expression to be positive or the program can crash. Do not let a portable program compute this expression.
Why INT_MAX + 1 is not surely equal to INT_MIN in case of signed integers?
First, the behaviour on integer overflow is undefined by the C standard.
Most implementations seem to let the number just overflow silently, so let's assume that is the case.
Second, the C standard does not assume twos's complement integers. Most platforms use it, especially newer ones. There are (were) older platforms that use other integer representations, for instance one's complement. Overflow in one's complement results in negative zero.
Relying on undefined behaviour to work in any particular way is really bad programming practice as it makes the program so much less portable. Even OS or compiler upgrades may change undefined behaviour, so it might not even be portable between different versions of the same OS.
Related
I need help understanding something about overflow with signed integers.
I have read in this post Wrap around explanation for signed and unsigned variables in C? that C language (or at least some C compilers) have something called "Undefined behaviour" as a result of overflow with signed integers.
In this post, people said "The GCC compiler assumes that overflow for signed integers never occur so that the compiler can optimize"; other people said "You can't rely on wraparound at the time when working with signed integers".
I have used Dev cpp, but I'm not sure if this IDE works with GCC so I installed Code Blocks and now I'm sure it works with GCC (at least in my config), and I overflowed a signed integer variable to experiment with the things the people said, but I found that when it overflows, the IDE doesn't show an error or a warning and the signed integer shows wraparound behaviour. So, can you help me to clarify this situation?
Also I want to ask you for help about the concept "Strict overflow" and the "option" -Wstrict-overflow.
… the signed integer shows a wrap around behaviour…
Here is an example where GCC and Clang do not show wraparound behavior:
#include <limits.h>
#include <stdio.h>
void foo(int x)
{
if (x - INT_MAX <= 0)
printf("True.\n");
else
printf("False.\n");
}
If x - INT_MAX wrapped around, and this routine were called with −2 for x, then x - INT_MAX would wrap around to INT_MAX. (For example, if INT_MAX is 231−1, then −2 − (231−1) = −231−1, and wrapping it modulo 232 gives −231−1 + 232 = 231−1. Then x - INT_MAX would be positive, so x - INT_MAX <= 0 would be false.) So the routine could print “False.” some of the times it is called.
However, when we compile it with GCC and -O3, we see the compiler has optimized it to code that only prints “True.” This shows the compiler is not assuming that arithmetic wraps.
The compiler, or its writers, can reason:
If x - INT_MAX does not overflow, then it must give a result less than or equal to zero, because there is no int value for x that is greater than INT_MAX. In this case, we must execute printf("True.\n");.
If x - INT_MAX does overflow, then the behavior is not defined by the C standard. In this case, we can execute any code we desire, and it is easier for optimization to execute the same code as the other case, printf("True.\n");.
This is equivalent to reasoning:
x - INT_MAX does not overflow. Therefore, it is less than or equal to zero, so x - INT_MAX <= 0 is always true, and printf("True.\n"); is always executed. So we can discard the else case.
GCC and Clang have a switch -fwrapv, that extends the C standard by defining addition, subtraction, and multiplication of signed integers to wrap. When we compile with this switch, we can see the above reasoning no longer applies. It is possible for x - INT_MAX <= 0 to be false, and so the compiler generates both code paths.
Here is the program using a while loop,
#include <stdio.h>
int main()
{
int i;
i = 1;
while (i <= 32767)
{
printf("%d", i);
i = i + 1;
}
}
do you think the loop would execute indefinitely?
Well it's signed integer. Considering that if int is of 16 bits, it will overflow at one point precisely when the value is INT_MAX or 32767. At that point it is undefined behavior.
It is undefined behavior - when int is of 16 bits. As the behavior is undefined we can't say it will always run infinitely etc in that case.
In your system if int is of 32 bits or higher then the behavior of this program is not
undefined.
From standard
.... Their implementation-defined values shall be equal or
greater in magnitude (absolute value) to those shown, with the same
sign.
In your case if sizeof(int) = 4 or higher, then the loop will stop. The only way to know whether the behavior is undefined or not is to know what the size of int is.
To summarise
If int is of 32 bits or higher then this will stop.
If int is of 16 bits then this will be undefined behavior. It may loop indefinitely or it may not. It's not defined by the standard.
If you disregard the specific numbers and instead write
while (i <= MAX_INT)
the compiler sees this as "loop while i is less than or equal to the largest value it can ever have".
As i - by definition - can never be larger than the largest value, this condition will always be true and the loop would be infinite.
However, as the code tries to compute i + 1 even when i cannot possibly become any larger, there is an error in the program. The language standard explicltly states that if the program tries this - overflow on a signed variable - the result is undefined.
Undefined behavior can have any result according to the language standard. This includes getting some other value for i (perhaps a negative one despite trying to add 1), having the OS trap and terminate the program, or possibly even terminate a loop that would otherwise be infinite. We just don't know.
I don't know your book. And in fact the book is correct if you assume that the type int is a 16 bit signed integer. The range of a 16 bit integer goes from -32768 to +32767. So in this case the condition i<=32767 will always be true.
But in your programm I think the type of int is a 32 bit integer which range goes from -2147483648 to +2147483647.
If you replace int i with short i the loop should be an infinity loop.
Consider following program (C99):
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
int main(void)
{
printf("Enter int in range %jd .. %jd:\n > ", INTMAX_MIN, INTMAX_MAX);
intmax_t i;
if (scanf("%jd", &i) == 1)
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
}
Now as I understand it, this contains easily triggerable undefined behaviour, like this:
Enter int in range -9223372036854775808 .. 9223372036854775807:
> -9223372036854775808
Result: |-9223372036854775808| = -9223372036854775808
Questions:
Is this really undefined behaviour, as in "code is allowed to trigger any code path, which any code that stroke compiler's fancy", when user enters the bad number? Or is it some other flavor of not-completely-defined?
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
(There are a few related questions, but I didn't find one which answers question 2 above, so if you suggest duplicate, please make sure it answers that.)
If the result of imaxabs cannot be represented, can happen if using two's complement, then the behavior is undefined.
7.8.2.1 The imaxabs function
The imaxabs function computes the absolute value of an integer j. If the result cannot
be represented, the behavior is undefined. 221)
221) The absolute value of the most negative number cannot be represented in two’s complement.
The check that makes no assumptions and is always defined is:
intmax_t i = ... ;
if( i < -INTMAX_MAX )
{
//handle error
}
(This if statement cannot be taken if using one's complement or sign-magnitude representation, so the compiler might give a unreachable code warning. The code itself is still defined and valid. )
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
One method is to use unsigned integers. The overflow behaviour of unsigned integers is well-defined as is the behaviour when converting from a signed to an unsigned integer.
So I think the following should be safe (turns out it's horriblly broken on some really obscure systems, see later in the post for an improved version)
uintmax_t j = i;
if (j > (uintmax_t)INTMAX_MAX) {
j = -j;
}
printf("Result: |%jd| = %ju\n", i, j);
So how does this work?
uintmax_t j = i;
This converts the signed integer into an unsigned one. IF it's positive the value stays the same, if it's negative the value increases by 2n (where n is the number of bits). This converts it to a large number (larger than INTMAX_MAX)
if (j > (uintmax_t)INTMAX_MAX) {
If the original number was positive (and hence less than or equal to INTMAX_MAX) this does nothing. If the original number was negative the inside of the if block is run.
j = -j;
The number is negated. The result of a negation is clearly negative and so cannot be represented as an unsigned integer. So it is increased by 2n.
So algebraically the result for negative i looks like
j = - (i + 2n) + 2n = -i
Clever, but this solution makes assumptions. This fails if INTMAX_MAX == UINTMAX_MAX, which is allowed by C Standard.
Hmm, lets look at this (i'm reading https://busybox.net/~landley/c99-draft.html which is apprarently the last C99 draft prior to standardisation, if anything changed in the final standard please do tell me.
When typedef names differing only in the absence or presence of the initial u are defined, they shall denote corresponding signed and unsigned types as described in 6.2.5; an implementation shall not provide a type without also providing its corresponding type.
In 6.2.5 I see
For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.
In 6.2.6.2 I see
#1
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that >objects of that type shall be capable of representing values from 0 to 2N-1 >using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.39)
#2
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M<=N). If the sign bit is zero, it shall not affect the resulting value.
So yes it seems you are right, while the signed and unsigned types have to be the same size it does seem to be valid for the unsigned type to have one more padding bit than the signed type.
Ok, based on the analysis above revealing a flaw in my first attempt i've written a more paranoid variant. This has two changes from my first version.
I use i < 0 rather than j > (uintmax_t)INTMAX_MAX to check for negative numbers. This means that the algorithm proceduces correct results for numbers grater than or equal to -INTMAX_MAX even when INTMAX_MAX == UINTMAX_MAX.
I add handling for the error case where INTMAX_MAX == UINTMAX_MAX, INTMAX_MIN == -INTMAX_MAX -1 and i == INTMAX_MIN. This will result in j=0 inside the if condition which we can easilly test for.
It can be seen from the requirements in the C standard that INTMAX_MIN cannot be smaller than -INTMAX_MAX -1 since there is only one sign bit and the number of value bits must be the same or lower than in the corresponding unsigned type. There are simply no bit patterns left to represent smaller numbers.
uintmax_t j = i;
if (i < 0) {
j = -j;
if (j == 0) {
printf("your platform sucks\n");
exit(1);
}
}
printf("Result: |%jd| = %ju\n", i, j);
#plugwash I think 2501 is correct. For example, -UINTMAX_MAX value becomes 1: (-UINTMAX_MAX + (UINTMAX_MAX + 1)), and is not caught by your if. – hyde 58 mins ago
Umm,
assuming INTMAX_MAX == UINTMAX_MAX and i = -INTMAX_MAX
uintmax_t j = i;
after this command j = -INTMAX_MAX + (UINTMAX_MAX + 1) = 1
if (i < 0) {
i is less than zero so we run the commands inside the if
j = -j;
after this command j = -1 + (UINTMAX_MAX + 1) = UINTMAX_MAX
which is the correct answer, so no need to trap it in an error case.
On two-complement systems getting the absolute number of the most negative value is indeed undefined behavior, as the absolute value would be out of range. And it's nothing the compiler can help you with, as the UB happens at run-time.
The only way to protect against that is to compare the input against the most negative value for the type (INTMAX_MIN in the code you show).
So calculating the absolute value of an integer invokes undefined behaviour in one single case. Actually, while the undefined behaviour can be avoided, it is impossible to give the correct result in one case.
Now consider multiplication of an integer by 3: Here we have a much more serious problem. This operation invokes undefined behaviour in 2/3rds of all cases! And for two thirds of all int values x, finding an int with the value 3x is just impossible. That's a much more serious problem than the absolute value problem.
You may want to use some bit hacks:
int v; // we want to find the absolute value of v
unsigned int r; // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT - 1;
r = (v + mask) ^ mask;
This works well when INT_MIN < v <= INT_MAX. In the case where v == INT_MIN, it remains INT_MIN , without causing undefined behavior.
You can also use bitwise operation to handle this on ones' complement and sign-magnitude systems.
Reference: https://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs
according to this http://linux.die.net/man/3/imaxabs
Notes
Trying to take the absolute value of the most negative integer is not defined.
To handle the full range you could add something like this to your code
if (i != INTMAX_MIN) {
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
} else { /* Code around undefined abs( INTMAX_MIN) /*
printf("Result: |%jd| = %jd%jd\n", i, -(i/10), -(i%10));
}
edit: As abs(INTMAX_MIN) cannot be represented on a 2's complement machine, 2 values within respresentable range are concatenated on output as a string.
Tested with gcc, though printf required %lld as %jd was not a supported format.
Is this really undefined behaviour, as in "code is allowed to trigger any code path, which any code that stroke compiler's fancy", when user enters the bad number? Or is it some other flavor of not-completely-defined?
The behaviour of the program is only undefined, when the bad number is successfully input-ed and passed to imaxabs(), which on a typical 2's complement system returns a -ve result as you observed.
That is the undefined behaviour in this case, the implementation would also be allowed to terminate the program with an over-flow error if the ALU set status flags.
The reason for "undefined behaviour" in C is so compiler writers don't have to guard against overflow, so programs can run more efficiently. Whilst it is within C standard for every C program using abs() to try to kill your first born, just because you call it with a too -ve value, writing such code into the object file would simply be perverse.
The real problem with these undefined behaviours, is that an optimising compiler, can reason away naive checks so code like :
r = (i < 0) ? -i : i;
if (r < 0) { // This code may be pointless
// Do overflow recovery
doRecoveryProcessing();
} else {
printf("%jd", r);
}
As a compiler optomiser can reason that negative values are negated, it could in principal determine that (r <0) is always false, so the attempt to trap the problem fails.
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
By far the best way, is simply to ensure that the program works on a valid range, so in this case validating the input suffices (disallow INTMAX_MIN).
Programs printing tables of abs() ought to avoid INT*_MIN and so on.
if (i != INTMAX_MIN) {
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
} else { /* Code around undefined abs( INTMAX_MIN) /*
printf("Result: |%jd| = %jd%jd\n", i, -(i/10), -(i%10));
}
Appears to write out the abs( INTMAX_MIN) by fakery, allowing the program to live up to it's promise to the user.
This question already has answers here:
Why is unsigned integer overflow defined behavior but signed integer overflow isn't?
(6 answers)
Closed 7 years ago.
A c programming book that I'm reading(c programming, a modern approach 2nd edition) says that when an "overflow occurs during an operation on unsigned integers, though, the result is defined."
Here is a small code example
#include <stdio.h>
int main()
{
unsigned short int x = 65535; // The unsigned short int is at the maximum possible range
x += 1; // If I add one to it will overflow.
printf("%u", x); // the output will be zero or one if decide to add plus one again to x
return 0;
}
He then goes to say that "for signed integers, the behaviors for these integers are not defined". Meaning the program can either print out the incorrect result or it can crash the program.
Why is this so?
It comes down to hardware representation, and there being more than one way to represent signed integral types in binary (sign magnitude, ones complement, twos complement) and operations on them. Those have quite different implications when an overflow occurs (e.g. triggering a hardware trap, working with modulo, etc).
All of the obvious means of representing unsigned integral values in binary and implementing numerical operations on such values have the same consequence - essentially that numeric operations in hardware work with a modulo arithmetic.
For basic types (and other things) the standard generally allows freedom to compiler vendors when there is more than one feasible way of implementing something, and those options have different consequences. There are multiple ways with signed integral types, and real-world hardware that uses each approach. They are different enough to warrant the behaviour being undefined (as that term is defined in the standard).
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Best way to detect integer overflow in C/C++
This is probably a rookie question, but how can I check some overflow affected the value of my numbers in C. For example, when multiplying integers, and waiting for an integer result, if actual result was bigger than max-integer value, actual result is altered(right?). So how can I tell if something like this occured?
Signed integer overflow is like division by zero - it leads to undefined behaviour, so you have to check if it would occur before executing the potentially-overflowing operation. Once you've overflowed, all bets are off - your code could do anything.
The *_MAX and _MIN macros defined in <limits.h> come in handy for this, but you need to be careful not to invoke undefined behaviour in the tests themselves. For example, to check if a * b will overflow given int a, b;, you can use:
if ((b > 0 && a <= INT_MAX / b && a >= INT_MIN / b) ||
(b == 0) ||
(b == -1 && a >= -INT_MAX) ||
(b < -1 && a >= INT_MAX / b && a <= INT_MIN / b))
{
result = a * b;
}
else
{
/* calculation would overflow */
}
(Note that one subtle pitfall this avoids is that you can't calculate INT_MIN / -1 - such a number isn't guaranteed to be representable and indeed causes a fatal trap on common platforms).
The C99 standard has this section explaining what undefined behavior is:
3.4.3
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable
results, to behaving during translation or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
EXAMPLE
An example of undefined behavior is the behavior on integer overflow.
So you're pretty much out of luck, there is no portable way of detecting that in the general case, after the fact.
Your compiler/implementation might have extensions/support for it though, and there are techniques to avoid these situations.
See this question for excellent advice: Best way to detect integer overflow in C/C++.
If you mean while you're programming, you can debug the code.
If you mean in runtime, you can add some conditionals that if it exceeds the limit, do something.
C doesn't know what to do when a calculation's yield would be out of range. You must evade this by testing operands.
Check this http://www.fefe.de/intof.html. It shows you how to check if actual result was bigger than max-integer value.
If the resulting number is smaller than one of the inputs.
a + b = c, if c < a => overflow.
edit: to fast, this is only for addition on unsigned integers.
You cannot know, in the general case, if overflow occurred just by staring at the result. What you can do, however, is to check whether the operation would overflow separately. E.g. if you want to check whether a*b overflows, where a and b are int's, you need to solve the inequality
a * b <= INT_MAX
That is, if a <= INT_MAX / b, then the multiplication would overflow.
As long as you do your arithmetic in unsigned integers, or else can rely on implementation-specific guarantees about how signed integer overflow behaves, there are various tricks you can use.
In the case of unsigned multiplication, the simplest is:
unsigned int lhs = something, rhs = something_else;
unsigned int product = lhs * rhs;
if (lhs != 0 && product/lhs != rhs) { overflow occurred }
It's unlikely to be fast, but it's portable. The unsigned overflow check for addition is also quite simple -- pick either one of the operands, then overflow occurred if and only if the sum is less than that.