This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Best way to detect integer overflow in C/C++
This is probably a rookie question, but how can I check some overflow affected the value of my numbers in C. For example, when multiplying integers, and waiting for an integer result, if actual result was bigger than max-integer value, actual result is altered(right?). So how can I tell if something like this occured?
Signed integer overflow is like division by zero - it leads to undefined behaviour, so you have to check if it would occur before executing the potentially-overflowing operation. Once you've overflowed, all bets are off - your code could do anything.
The *_MAX and _MIN macros defined in <limits.h> come in handy for this, but you need to be careful not to invoke undefined behaviour in the tests themselves. For example, to check if a * b will overflow given int a, b;, you can use:
if ((b > 0 && a <= INT_MAX / b && a >= INT_MIN / b) ||
(b == 0) ||
(b == -1 && a >= -INT_MAX) ||
(b < -1 && a >= INT_MAX / b && a <= INT_MIN / b))
{
result = a * b;
}
else
{
/* calculation would overflow */
}
(Note that one subtle pitfall this avoids is that you can't calculate INT_MIN / -1 - such a number isn't guaranteed to be representable and indeed causes a fatal trap on common platforms).
The C99 standard has this section explaining what undefined behavior is:
3.4.3
undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable
results, to behaving during translation or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
EXAMPLE
An example of undefined behavior is the behavior on integer overflow.
So you're pretty much out of luck, there is no portable way of detecting that in the general case, after the fact.
Your compiler/implementation might have extensions/support for it though, and there are techniques to avoid these situations.
See this question for excellent advice: Best way to detect integer overflow in C/C++.
If you mean while you're programming, you can debug the code.
If you mean in runtime, you can add some conditionals that if it exceeds the limit, do something.
C doesn't know what to do when a calculation's yield would be out of range. You must evade this by testing operands.
Check this http://www.fefe.de/intof.html. It shows you how to check if actual result was bigger than max-integer value.
If the resulting number is smaller than one of the inputs.
a + b = c, if c < a => overflow.
edit: to fast, this is only for addition on unsigned integers.
You cannot know, in the general case, if overflow occurred just by staring at the result. What you can do, however, is to check whether the operation would overflow separately. E.g. if you want to check whether a*b overflows, where a and b are int's, you need to solve the inequality
a * b <= INT_MAX
That is, if a <= INT_MAX / b, then the multiplication would overflow.
As long as you do your arithmetic in unsigned integers, or else can rely on implementation-specific guarantees about how signed integer overflow behaves, there are various tricks you can use.
In the case of unsigned multiplication, the simplest is:
unsigned int lhs = something, rhs = something_else;
unsigned int product = lhs * rhs;
if (lhs != 0 && product/lhs != rhs) { overflow occurred }
It's unlikely to be fast, but it's portable. The unsigned overflow check for addition is also quite simple -- pick either one of the operands, then overflow occurred if and only if the sum is less than that.
Related
I need help understanding something about overflow with signed integers.
I have read in this post Wrap around explanation for signed and unsigned variables in C? that C language (or at least some C compilers) have something called "Undefined behaviour" as a result of overflow with signed integers.
In this post, people said "The GCC compiler assumes that overflow for signed integers never occur so that the compiler can optimize"; other people said "You can't rely on wraparound at the time when working with signed integers".
I have used Dev cpp, but I'm not sure if this IDE works with GCC so I installed Code Blocks and now I'm sure it works with GCC (at least in my config), and I overflowed a signed integer variable to experiment with the things the people said, but I found that when it overflows, the IDE doesn't show an error or a warning and the signed integer shows wraparound behaviour. So, can you help me to clarify this situation?
Also I want to ask you for help about the concept "Strict overflow" and the "option" -Wstrict-overflow.
… the signed integer shows a wrap around behaviour…
Here is an example where GCC and Clang do not show wraparound behavior:
#include <limits.h>
#include <stdio.h>
void foo(int x)
{
if (x - INT_MAX <= 0)
printf("True.\n");
else
printf("False.\n");
}
If x - INT_MAX wrapped around, and this routine were called with −2 for x, then x - INT_MAX would wrap around to INT_MAX. (For example, if INT_MAX is 231−1, then −2 − (231−1) = −231−1, and wrapping it modulo 232 gives −231−1 + 232 = 231−1. Then x - INT_MAX would be positive, so x - INT_MAX <= 0 would be false.) So the routine could print “False.” some of the times it is called.
However, when we compile it with GCC and -O3, we see the compiler has optimized it to code that only prints “True.” This shows the compiler is not assuming that arithmetic wraps.
The compiler, or its writers, can reason:
If x - INT_MAX does not overflow, then it must give a result less than or equal to zero, because there is no int value for x that is greater than INT_MAX. In this case, we must execute printf("True.\n");.
If x - INT_MAX does overflow, then the behavior is not defined by the C standard. In this case, we can execute any code we desire, and it is easier for optimization to execute the same code as the other case, printf("True.\n");.
This is equivalent to reasoning:
x - INT_MAX does not overflow. Therefore, it is less than or equal to zero, so x - INT_MAX <= 0 is always true, and printf("True.\n"); is always executed. So we can discard the else case.
GCC and Clang have a switch -fwrapv, that extends the C standard by defining addition, subtraction, and multiplication of signed integers to wrap. When we compile with this switch, we can see the above reasoning no longer applies. It is possible for x - INT_MAX <= 0 to be false, and so the compiler generates both code paths.
There was this range checking function that required two signed integer parameters:
range_limit(long int lower, long int upper)
It was called with range_limit(0, controller_limit). I needed to expand the range check to also include negative numbers up to the 'controller_limit' magnitude.
I naively changed the call to
range_limit(-controller_limit, controller_limit)
Although it compiled without warnings, this did not work as I expected.
I missed that controller_limit was unsigned integer.
In C, simple integer calculations can lead to surprising results. For example these calculations
0u - 1;
or more relevant
unsigned int ui = 1;
-ui;
result in 4294967295 of type unsigned int (aka UINT_MAX). As I understand it, this is due to integer conversion rules and modulo arithmetics of unsigned operands see here.
By definition, unsigned arithmetic does not overflow but rather "wraps-around". This behavior is well defined, so the compiler will not issue a warning (at least not gcc) if you use these expressions calling a function:
#include <stdio.h>
void f_l(long int li) {
printf("%li\n", li); // outputs: 4294967295
}
int main(void)
{
unsigned int ui = 1;
f_l(-ui);
return 0;
}
Try this code for yourself!
So instead of passing a negative value I passed a ridiculously high positive value to the function.
My fix was to cast from unsigned integer into int:
range_limit(-(int)controller_limit, controller_limit);
Obviously, integer modulo behavior in combination with integer conversion rules allows for subtle mistakes that are hard to spot especially, as the compiler does not help in finding these mistakes.
As the compiler does not emit any warnings and you can come across these kind of calculations any day, I'd like to know:
If you have to deal with unsigned operands, how do you best avoid the unsigned integers modulo arithmetic pitfall?
Note:
While gcc does not provide any help in detecting integer modulo arithmetic (at the time of writing), clang does. The compiler flag "-fsanitize=unsigned-integer-overflow" will enable detection of modulo arithmetic (using "-Wconversion" is not sufficient), however, not at compile time but at runtime. Try for yourself!
Further reading:
Seacord: Secure Coding in C and C++, Chapter 5, Integer Security
Using signed integers does not change the situation at all.
A C implementation is under no obligation to raise a run-time warning or error as a response to Undefined Behaviour. Undefined Behaviour is undefined, as it says; the C standard provides absolutely no requirements or guidance about the outcome. A particular implementation can choose any mechanism it sees fit in response to Undefined Behaviour, including explicitly defining the result. (If you rely on that explicit definition, your program is no longer portable to other compilers with different or undocumented behaviour. Perhaps you don't care.)
For example, GCC defines the result of out-of-bounds integer conversions and some bitwise operations in Implementation-defined behaviour section of its manual.
If you're worried about integer overflow (and there are lots of times you should be worried about it), it's up to you to protect yourself.
For example, instead of allowing:
unsigned_counter += 5;
to overflow, you could write:
if (unsigned_count > UINT_MAX - 5) {
/* Handle the error */
}
else {
unsigned_counter += 5;
}
And you should do that in cases where integer overflow will get you into trouble. A common example, which can (and has!) lead to buffer-overflow exploits, comes from checking whether a buffer has enough room for an addition:
if (buffer_length + added_length >= buffer_capacity) {
/* Reallocate buffer or fail*/
}
memcpy(buffer + buffer_length, add_characters, added_length);
buffer_length += added_length;
buffer[buffer_length] = 0;
If buffer_length + added_length overflows -- in either signed or unsigned arithmetic -- the necessary reallocation (or failure) won't trigger and the memcpy will overwrite memory or segfault or do something else you weren't expecting.
It's easy to fix, so it's worth getting into the habit:
if (added_length >= buffer_capacity
|| buffer_length >= buffer_capacity - added_length) {
/* Reallocate buffer or fail*/
}
memcpy(buffer + buffer_length, add_characters, added_length);
buffer_length += added_length;
buffer[buffer_length] = 0;
Another similar case where you can get into serious trouble is when you are using a loop and your increment is more than one.
This is safe:
for (i = 0; i < limit; ++i) ...
This could lead to an infinite loop:
for (i = 0; i < limit; i += 2) ...
The first one is safe -- assuming i and limit are the same type -- because i + 1 cannot overflow if i < limit. The most it can be is limit itself. But no such guarantee can be made about i + 2, since limit could be INT_MAX (or whatever is the maximum value for the integer type being used). Again, the fix is simple: compare the difference rather than the sum.
If you're using GCC and you don't care about full portability, you can use the GCC overflow-detection builtins to help you. They're also documented in the GCC manual.
This is tricking my mind a little bit, because i can't seem to find any problem with this code.
Anyways, here's the riddle:
Suppose someone injects a random value into a, b:
int foo(int a, int b)
{
return b ? (a / b): 0;
}
b != 0 always!
Is it possible for an integer zero division exception to occur?
I'm starting to think this is a prank, yet....
NOTE:
This question was published in a conference, there is a possibility that the author was specific to a compiler/architecture where this is problematic.
No, divide by zero is not possible here.
Quoting C11, chapter §6.5.15, Conditional operator , (emphasis mine)
The first operand is evaluated; there is a sequence point between its evaluation and the
evaluation of the second or third operand (whichever is evaluated). The second operand
is evaluated only if the first compares unequal to 0; the third operand is evaluated only if
the first compares equal to 0; [...]
So, in case, b is 0, the expression (a/b) will not be evaluated.
That said, just as note:
The division is integer division.
If you have a wrapper, that ensures b != 0, then you could cut down the whole function call and simply write someVar = a/b;
Also, I don't know of any architecture which (possibly) changes the aforesaid behavior.
There is no possibility for a division by 0 in your example, yet there is another special case you should check for: dividing INT_MIN by -1 may cause a division overflow, and usually causes an fatal exception on Intel hardware, a surprising behavior, yet consistent with the C Standard that specifies that integer overflow can cause an implementation specific behavior.
If you need to protect against such unwanted behavior, you must special case these operands and handle them specifically:
#include <limits.h>
int foo(int a, int b) {
if (b == 0) return 0;
if (a == INT_MIN && b == -1) return INT_MAX;
return a / b;
}
Since the values of a and b can be crafted by an external source, you definitely need to worry about division overflow. It is a fun game to play to try a crash unsafe calculators by feeding them these values.
Consider following program (C99):
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
int main(void)
{
printf("Enter int in range %jd .. %jd:\n > ", INTMAX_MIN, INTMAX_MAX);
intmax_t i;
if (scanf("%jd", &i) == 1)
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
}
Now as I understand it, this contains easily triggerable undefined behaviour, like this:
Enter int in range -9223372036854775808 .. 9223372036854775807:
> -9223372036854775808
Result: |-9223372036854775808| = -9223372036854775808
Questions:
Is this really undefined behaviour, as in "code is allowed to trigger any code path, which any code that stroke compiler's fancy", when user enters the bad number? Or is it some other flavor of not-completely-defined?
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
(There are a few related questions, but I didn't find one which answers question 2 above, so if you suggest duplicate, please make sure it answers that.)
If the result of imaxabs cannot be represented, can happen if using two's complement, then the behavior is undefined.
7.8.2.1 The imaxabs function
The imaxabs function computes the absolute value of an integer j. If the result cannot
be represented, the behavior is undefined. 221)
221) The absolute value of the most negative number cannot be represented in two’s complement.
The check that makes no assumptions and is always defined is:
intmax_t i = ... ;
if( i < -INTMAX_MAX )
{
//handle error
}
(This if statement cannot be taken if using one's complement or sign-magnitude representation, so the compiler might give a unreachable code warning. The code itself is still defined and valid. )
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
One method is to use unsigned integers. The overflow behaviour of unsigned integers is well-defined as is the behaviour when converting from a signed to an unsigned integer.
So I think the following should be safe (turns out it's horriblly broken on some really obscure systems, see later in the post for an improved version)
uintmax_t j = i;
if (j > (uintmax_t)INTMAX_MAX) {
j = -j;
}
printf("Result: |%jd| = %ju\n", i, j);
So how does this work?
uintmax_t j = i;
This converts the signed integer into an unsigned one. IF it's positive the value stays the same, if it's negative the value increases by 2n (where n is the number of bits). This converts it to a large number (larger than INTMAX_MAX)
if (j > (uintmax_t)INTMAX_MAX) {
If the original number was positive (and hence less than or equal to INTMAX_MAX) this does nothing. If the original number was negative the inside of the if block is run.
j = -j;
The number is negated. The result of a negation is clearly negative and so cannot be represented as an unsigned integer. So it is increased by 2n.
So algebraically the result for negative i looks like
j = - (i + 2n) + 2n = -i
Clever, but this solution makes assumptions. This fails if INTMAX_MAX == UINTMAX_MAX, which is allowed by C Standard.
Hmm, lets look at this (i'm reading https://busybox.net/~landley/c99-draft.html which is apprarently the last C99 draft prior to standardisation, if anything changed in the final standard please do tell me.
When typedef names differing only in the absence or presence of the initial u are defined, they shall denote corresponding signed and unsigned types as described in 6.2.5; an implementation shall not provide a type without also providing its corresponding type.
In 6.2.5 I see
For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.
In 6.2.6.2 I see
#1
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that >objects of that type shall be capable of representing values from 0 to 2N-1 >using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.39)
#2
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M<=N). If the sign bit is zero, it shall not affect the resulting value.
So yes it seems you are right, while the signed and unsigned types have to be the same size it does seem to be valid for the unsigned type to have one more padding bit than the signed type.
Ok, based on the analysis above revealing a flaw in my first attempt i've written a more paranoid variant. This has two changes from my first version.
I use i < 0 rather than j > (uintmax_t)INTMAX_MAX to check for negative numbers. This means that the algorithm proceduces correct results for numbers grater than or equal to -INTMAX_MAX even when INTMAX_MAX == UINTMAX_MAX.
I add handling for the error case where INTMAX_MAX == UINTMAX_MAX, INTMAX_MIN == -INTMAX_MAX -1 and i == INTMAX_MIN. This will result in j=0 inside the if condition which we can easilly test for.
It can be seen from the requirements in the C standard that INTMAX_MIN cannot be smaller than -INTMAX_MAX -1 since there is only one sign bit and the number of value bits must be the same or lower than in the corresponding unsigned type. There are simply no bit patterns left to represent smaller numbers.
uintmax_t j = i;
if (i < 0) {
j = -j;
if (j == 0) {
printf("your platform sucks\n");
exit(1);
}
}
printf("Result: |%jd| = %ju\n", i, j);
#plugwash I think 2501 is correct. For example, -UINTMAX_MAX value becomes 1: (-UINTMAX_MAX + (UINTMAX_MAX + 1)), and is not caught by your if. – hyde 58 mins ago
Umm,
assuming INTMAX_MAX == UINTMAX_MAX and i = -INTMAX_MAX
uintmax_t j = i;
after this command j = -INTMAX_MAX + (UINTMAX_MAX + 1) = 1
if (i < 0) {
i is less than zero so we run the commands inside the if
j = -j;
after this command j = -1 + (UINTMAX_MAX + 1) = UINTMAX_MAX
which is the correct answer, so no need to trap it in an error case.
On two-complement systems getting the absolute number of the most negative value is indeed undefined behavior, as the absolute value would be out of range. And it's nothing the compiler can help you with, as the UB happens at run-time.
The only way to protect against that is to compare the input against the most negative value for the type (INTMAX_MIN in the code you show).
So calculating the absolute value of an integer invokes undefined behaviour in one single case. Actually, while the undefined behaviour can be avoided, it is impossible to give the correct result in one case.
Now consider multiplication of an integer by 3: Here we have a much more serious problem. This operation invokes undefined behaviour in 2/3rds of all cases! And for two thirds of all int values x, finding an int with the value 3x is just impossible. That's a much more serious problem than the absolute value problem.
You may want to use some bit hacks:
int v; // we want to find the absolute value of v
unsigned int r; // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT - 1;
r = (v + mask) ^ mask;
This works well when INT_MIN < v <= INT_MAX. In the case where v == INT_MIN, it remains INT_MIN , without causing undefined behavior.
You can also use bitwise operation to handle this on ones' complement and sign-magnitude systems.
Reference: https://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs
according to this http://linux.die.net/man/3/imaxabs
Notes
Trying to take the absolute value of the most negative integer is not defined.
To handle the full range you could add something like this to your code
if (i != INTMAX_MIN) {
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
} else { /* Code around undefined abs( INTMAX_MIN) /*
printf("Result: |%jd| = %jd%jd\n", i, -(i/10), -(i%10));
}
edit: As abs(INTMAX_MIN) cannot be represented on a 2's complement machine, 2 values within respresentable range are concatenated on output as a string.
Tested with gcc, though printf required %lld as %jd was not a supported format.
Is this really undefined behaviour, as in "code is allowed to trigger any code path, which any code that stroke compiler's fancy", when user enters the bad number? Or is it some other flavor of not-completely-defined?
The behaviour of the program is only undefined, when the bad number is successfully input-ed and passed to imaxabs(), which on a typical 2's complement system returns a -ve result as you observed.
That is the undefined behaviour in this case, the implementation would also be allowed to terminate the program with an over-flow error if the ALU set status flags.
The reason for "undefined behaviour" in C is so compiler writers don't have to guard against overflow, so programs can run more efficiently. Whilst it is within C standard for every C program using abs() to try to kill your first born, just because you call it with a too -ve value, writing such code into the object file would simply be perverse.
The real problem with these undefined behaviours, is that an optimising compiler, can reason away naive checks so code like :
r = (i < 0) ? -i : i;
if (r < 0) { // This code may be pointless
// Do overflow recovery
doRecoveryProcessing();
} else {
printf("%jd", r);
}
As a compiler optomiser can reason that negative values are negated, it could in principal determine that (r <0) is always false, so the attempt to trap the problem fails.
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
By far the best way, is simply to ensure that the program works on a valid range, so in this case validating the input suffices (disallow INTMAX_MIN).
Programs printing tables of abs() ought to avoid INT*_MIN and so on.
if (i != INTMAX_MIN) {
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
} else { /* Code around undefined abs( INTMAX_MIN) /*
printf("Result: |%jd| = %jd%jd\n", i, -(i/10), -(i%10));
}
Appears to write out the abs( INTMAX_MIN) by fakery, allowing the program to live up to it's promise to the user.
This question already has answers here:
for every int x: x+1 > x .... is this always true?
(4 answers)
Closed 9 years ago.
for (i = 0; i <= N; ++i) { ... }
This particular statement will cause an infinite loop if N is INT_MAX.
Having known that Unsigned Overflows are wrapping overflows, assuming i and N to unsigned, compiler can assume that the loop will iterate exactly N+1 times if i is undefined on overflow.
The thing to note here is: if I make the loops as,
for (i = 0; i < N; ++i) { ... }
Will this still be undefined behav?
Why INT_MAX + 1 is not surely equal to INT_MIN in case of signed integers?
INT_MAX + 1
this operation invokes undefined behavior. Signed integer overflow is undefined behavior in C.
It can result to INT_MIN or the implementation can consider this expression to be positive or the program can crash. Do not let a portable program compute this expression.
Why INT_MAX + 1 is not surely equal to INT_MIN in case of signed integers?
First, the behaviour on integer overflow is undefined by the C standard.
Most implementations seem to let the number just overflow silently, so let's assume that is the case.
Second, the C standard does not assume twos's complement integers. Most platforms use it, especially newer ones. There are (were) older platforms that use other integer representations, for instance one's complement. Overflow in one's complement results in negative zero.
Relying on undefined behaviour to work in any particular way is really bad programming practice as it makes the program so much less portable. Even OS or compiler upgrades may change undefined behaviour, so it might not even be portable between different versions of the same OS.