Related
Unsigned integer overflow is well defined by both the C and C++ standards. For example, the C99 standard (§6.2.5/9) states
A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting unsigned integer type is
reduced modulo the number that is one greater than the largest value that can be
represented by the resulting type.
However, both standards state that signed integer overflow is undefined behavior. Again, from the C99 standard (§3.4.3/1)
An example of undefined behavior is the behavior on integer overflow
Is there an historical or (even better!) a technical reason for this discrepancy?
The historical reason is that most C implementations (compilers) just used whatever overflow behaviour was easiest to implement with the integer representation it used. C implementations usually used the same representation used by the CPU - so the overflow behavior followed from the integer representation used by the CPU.
In practice, it is only the representations for signed values that may differ according to the implementation: one's complement, two's complement, sign-magnitude. For an unsigned type there is no reason for the standard to allow variation because there is only one obvious binary representation (the standard only allows binary representation).
Relevant quotes:
C99 6.2.6.1:3:
Values stored in unsigned bit-fields and objects of type unsigned char shall be represented using a pure binary notation.
C99 6.2.6.2:2:
If the sign bit is one, the value shall be modified in one of the following ways:
— the corresponding value with sign bit 0 is negated (sign and magnitude);
— the sign bit has the value −(2N) (two’s complement);
— the sign bit has the value −(2N − 1) (one’s complement).
Nowadays, all processors use two's complement representation, but signed arithmetic overflow remains undefined and compiler makers want it to remain undefined because they use this undefinedness to help with optimization. See for instance this blog post by Ian Lance Taylor or this complaint by Agner Fog, and the answers to his bug report.
Aside from Pascal's good answer (which I'm sure is the main motivation), it is also possible that some processors cause an exception on signed integer overflow, which of course would cause problems if the compiler had to "arrange for another behaviour" (e.g. use extra instructions to check for potential overflow and calculate differently in that case).
It is also worth noting that "undefined behaviour" doesn't mean "doesn't work". It means that the implementation is allowed to do whatever it likes in that situation. This includes doing "the right thing" as well as "calling the police" or "crashing". Most compilers, when possible, will choose "do the right thing", assuming that is relatively easy to define (in this case, it is). However, if you are having overflows in the calculations, it is important to understand what that actually results in, and that the compiler MAY do something other than what you expect (and that this may very depending on compiler version, optimisation settings, etc).
First of all, please note that C11 3.4.3, like all examples and foot notes, is not normative text and therefore not relevant to cite!
The relevant text that states that overflow of integers and floats is undefined behavior is this:
C11 6.5/5
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or
not in the range of representable values for its type), the behavior
is undefined.
A clarification regarding the behavior of unsigned integer types specifically can be found here:
C11 6.2.5/9
The range of nonnegative values of a signed integer type is a subrange
of the corresponding unsigned integer type, and the representation of
the same value in each type is the same. A computation involving
unsigned operands can never overflow, because a result that cannot be
represented by the resulting unsigned integer type is reduced modulo
the number that is one greater than the largest value that can be
represented by the resulting type.
This makes unsigned integer types a special case.
Also note that there is an exception if any type is converted to a signed type and the old value can no longer be represented. The behavior is then merely implementation-defined, although a signal may be raised.
C11 6.3.1.3
6.3.1.3 Signed and unsigned integers
When a value with integer
type is converted to another integer type other than _Bool, if the
value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
Otherwise, the new type is signed and the value
cannot be represented in it; either the result is
implementation-defined or an implementation-defined signal is raised.
In addition to the other issues mentioned, having unsigned math wrap makes the unsigned integer types behave as abstract algebraic groups (meaning that, among other things, for any pair of values X and Y, there will exist some other value Z such that X+Z will, if properly cast, equal Y and Y-Z will, if properly cast, equal X). If unsigned values were merely storage-location types and not intermediate-expression types (e.g. if there were no unsigned equivalent of the largest integer type, and arithmetic operations on unsigned types behaved as though they were first converted them to larger signed types, then there wouldn't be as much need for defined wrapping behavior, but it's difficult to do calculations in a type which doesn't have e.g. an additive inverse.
This helps in situations where wrap-around behavior is actually useful - for example with TCP sequence numbers or certain algorithms, such as hash calculation. It may also help in situations where it's necessary to detect overflow, since performing calculations and checking whether they overflowed is often easier than checking in advance whether they would overflow, especially if the calculations involve the largest available integer type.
Perhaps another reason for why unsigned arithmetic is defined is because unsigned numbers form integers modulo 2^n, where n is the width of the unsigned number. Unsigned numbers are simply integers represented using binary digits instead of decimal digits. Performing the standard operations in a modulus system is well understood.
The OP's quote refers to this fact, but also highlights the fact that there is only one, unambiguous, logical way to represent unsigned integers in binary. By contrast, Signed numbers are most often represented using two's complement but other choices are possible as described in the standard (section 6.2.6.2).
Two's complement representation allows certain operations to make more sense in binary format. E.g., incrementing negative numbers is the same that for positive numbers (expect under overflow conditions). Some operations at the machine level can be the same for signed and unsigned numbers. However, when interpreting the result of those operations, some cases don't make sense - positive and negative overflow. Furthermore, the overflow results differ depending on the underlying signed representation.
The most technical reason of all, is simply that trying to capture overflow in an unsigned integer requires more moving parts from you (exception handling) and the processor (exception throwing).
C and C++ won't make you pay for that unless you ask for it by using a signed integer. This isn't a hard-fast rule, as you'll see near the end, but just how they proceed for unsigned integers. In my opinion, this makes signed integers the odd-one out, not unsigned, but it's fine they offer this fundamental difference as the programmer can still perform well-defined signed operations with overflow. But to do so, you must cast for it.
Because:
unsigned integers have well defined overflow and underflow
casts from signed -> unsigned int are well defined, [uint's name]_MAX - 1 is conceptually added to negative values, to map them to the extended positive number range
casts from unsigned -> signed int are well defined, [uint's name]_MAX - 1 is conceptually deducted from positive values beyond the signed type's max, to map them to negative numbers)
You can always perform arithmetic operations with well-defined overflow and underflow behavior, where signed integers are your starting point, albeit in a round-about way, by casting to unsigned integer first then back once finished.
int32_t x = 10;
int32_t y = -50;
// writes -60 into z, this is well defined
int32_t z = int32_t(uint32_t(y) - uint32_t(x));
Casts between signed and unsigned integer types of the same width are free, if the CPU is using 2's compliment (nearly all do). If for some reason the platform you're targeting doesn't use 2's Compliment for signed integers, you will pay a small conversion price when casting between uint32 and int32.
But be wary when using bit widths smaller than int
usually if you are relying on unsigned overflow, you are using a smaller word width, 8bit or 16bit. These will promote to signed int at the drop of a hat (C has absolutely insane implicit integer conversion rules, this is one of C's biggest hidden gotcha's), consider:
unsigned char a = 0;
unsigned char b = 1;
printf("%i", a - b); // outputs -1, not 255 as you'd expect
To avoid this, you should always cast to the type you want when you are relying on that type's width, even in the middle of an operation where you think it's unnecessary. This will cast the temporary and get you the signedness AND truncate the value so you get what you expected. It's almost always free to cast, and in fact, your compiler might thank you for doing so as it can then optimize on your intentions more aggressively.
unsigned char a = 0;
unsigned char b = 1;
printf("%i", (unsigned char)(a - b)); // cast turns -1 to 255, outputs 255
I was working with integers in C, trying to explore more on when and how overflow happens.
I noticed that when I added two positive numbers, the sum of which overflows, I always got a negative number.
On the other hand, if I added two negative numbers, the sum of which overflows, I always got a positive number (including 0).
I made few experiments, but I would like to know if this is true for every case.
Integer overflows are undefined behavior in C.
C says an expression involving integers overflows, if its result after the usual arithmetic conversions is of a signed typed and cannot be represented in the type of the result. Assignment and cast expressions are an exception as they are ruled by the integer conversions.
Expressions of unsigned type cannot overflow, they wrap, e. g., 0U - 1 is UINT_MAX.
Examples:
INT_MAX + 1 // integer overflow
UINT_MAX + 1 // no overflow, the resulting type is unsigned
(unsigned char) INT_MAX // no overflow, integer conversion occurs
Never let any integer expression overflows, modern compilers (like gcc) take advantage of integer overflows being undefined behavior to perform various types of optimizations.
For example:
a - 10 < 20
when a is of type int after promotion, the expression is reduced in gcc (when optimization are enabled) to:
a < 30
It takes advantage of the expression being undefined behavior when a is in the range INT_MIN + 10 - 1 to INT_MIN.
This optimization could not be done when a is unsigned int because if a is 0, then a - 10 has to be evaluated as UINT_MAX - 9 (no undefined behavior). Optimizing a - 10 < 20 to a < 30 would then lead to a different result than the required one when a is 0 to 9.
Overflow of signed integers is undefined behaviour in C, so there are no guarantees.
That said, wrap around, or arithmetic modulo 2N, where N is the number of bits in the type, is a common behaviour. For that behaviour, indeed if a sum overflows, the result has the opposite sign of the operands.
Formally, the behaviour of signed arithmetic on overflow is undefined; anything can happen and it is 'correct'. This contrasts with unsigned arithmetic, where overflow is completely defined.
In practice, many older compilers used signed arithmetic which overflowed as you describe. However, modern GCC is making changes to the way it works, and you'd be very ill-advised to rely on the behaviour. It may change at any time when anything in the environment where your code is compiled changes — the compiler, the platform, ...
Overflow in C is a godawful mess.
Overflow during unsigned arithmetic or conversion to an unsigned type results in wraping modulo 2n
Overflow during conversion to a signed type is implementation defined, most implementations will wrap modulo 2n but some may not.
Overflow during signed arithmetic is undefined behaviour, according to the standard anything might happen. In practice sometimes it will do what you wan't, sometimes it will cause strange issues later in yoir code as the compiler optimises out important tests.
What makes things even worse is how this interacts with integer promotion. Thanks to promotion you can be doing signed arithmetic when it looks like you are doing unsigned arithmetic. For example consider the following code
uint16_t a = 65535;
uint16_t b = a * a;
On a system with 16-bit int this code is well-defined. However on a system with 32-bit int the multiplication will take place as signed int and the resulting overflow will be undefined behavior!
I was working with integers in C, trying to explore more on when and how overflow happens.
I noticed that when I added two positive numbers, the sum of which overflows, I always got a negative number.
On the other hand, if I added two negative numbers, the sum of which overflows, I always got a positive number (including 0).
I made few experiments, but I would like to know if this is true for every case.
Integer overflows are undefined behavior in C.
C says an expression involving integers overflows, if its result after the usual arithmetic conversions is of a signed typed and cannot be represented in the type of the result. Assignment and cast expressions are an exception as they are ruled by the integer conversions.
Expressions of unsigned type cannot overflow, they wrap, e. g., 0U - 1 is UINT_MAX.
Examples:
INT_MAX + 1 // integer overflow
UINT_MAX + 1 // no overflow, the resulting type is unsigned
(unsigned char) INT_MAX // no overflow, integer conversion occurs
Never let any integer expression overflows, modern compilers (like gcc) take advantage of integer overflows being undefined behavior to perform various types of optimizations.
For example:
a - 10 < 20
when a is of type int after promotion, the expression is reduced in gcc (when optimization are enabled) to:
a < 30
It takes advantage of the expression being undefined behavior when a is in the range INT_MIN + 10 - 1 to INT_MIN.
This optimization could not be done when a is unsigned int because if a is 0, then a - 10 has to be evaluated as UINT_MAX - 9 (no undefined behavior). Optimizing a - 10 < 20 to a < 30 would then lead to a different result than the required one when a is 0 to 9.
Overflow of signed integers is undefined behaviour in C, so there are no guarantees.
That said, wrap around, or arithmetic modulo 2N, where N is the number of bits in the type, is a common behaviour. For that behaviour, indeed if a sum overflows, the result has the opposite sign of the operands.
Formally, the behaviour of signed arithmetic on overflow is undefined; anything can happen and it is 'correct'. This contrasts with unsigned arithmetic, where overflow is completely defined.
In practice, many older compilers used signed arithmetic which overflowed as you describe. However, modern GCC is making changes to the way it works, and you'd be very ill-advised to rely on the behaviour. It may change at any time when anything in the environment where your code is compiled changes — the compiler, the platform, ...
Overflow in C is a godawful mess.
Overflow during unsigned arithmetic or conversion to an unsigned type results in wraping modulo 2n
Overflow during conversion to a signed type is implementation defined, most implementations will wrap modulo 2n but some may not.
Overflow during signed arithmetic is undefined behaviour, according to the standard anything might happen. In practice sometimes it will do what you wan't, sometimes it will cause strange issues later in yoir code as the compiler optimises out important tests.
What makes things even worse is how this interacts with integer promotion. Thanks to promotion you can be doing signed arithmetic when it looks like you are doing unsigned arithmetic. For example consider the following code
uint16_t a = 65535;
uint16_t b = a * a;
On a system with 16-bit int this code is well-defined. However on a system with 32-bit int the multiplication will take place as signed int and the resulting overflow will be undefined behavior!
The output comes to be the 32-bit 2's complement of 128 that is 4294967168. How?
#include <stdio.h>
int main()
{
char a;
a=128;
if(a==-128)
{
printf("%u\n",a);
}
return 0;
}
Compiling your code with warnings turned on gives:
warning: overflow in conversion from 'int' to 'char' changes value from '128' to '-128' [-Woverflow]
which tell you that the assignment a=128; isn't well defined on your plat form.
The standard say:
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
So we can't know what is going on as it depends on your system.
However, if we do some guessing (and note this is just a guess):
128 as 8 bit would be 0b1000.0000
so when you call printf where you get a conversion to int there will be a sign extension like:
0b1000.0000 ==> 0b1111.1111.1111.1111.1111.1111.1000.0000
which - printed as unsigned represents the number 4294967168
The sequence of steps that got you there is something like this:
You assign 128 to a char.
On your implementation, char is signed char and has a maximum value of 127, so 128 overflows.
Your implementation interprets 128 as 0x80. It uses two’s-complement math, so (int8_t)0x80 represents (int8_t)-128.
For historical reasons (relating to the instruction sets of the DEC PDP minicomputers on which C was originally developed), C promotes signed types shorter than int to int in many contexts, including variadic arguments to functions such as printf(), which aren’t bound to a prototype and still use the old argument-promotion rules of K&R C instead.
On your implementation, int is 32 bits wide and also two’s-complement, so (int)-128 sign-extends to 0xFFFFFF80.
When you make a call like printf("%u", x), the runtime interprets the int argument as an unsigned int.
As an unsigned 32-bit integer, 0xFFFFFF80 represents 4,294,967,168.
The "%u\n" format specifier prints this out without commas (or other separators) followed by a newline.
This is all legal, but so are many other possible results. The code is buggy and not portable.
Make sure you don’t overflow the range of your type! (Or if that’s unavoidable, overflow for unsigned scalars is defined as modular arithmetic, so it’s better-behaved.) The workaround here is to use unsigned char, which has a range from 0 to (at least) 255, instead of char.
First of all, as I hope you understand, the code you've posted is full of errors, and you would not want to depend on its output. If you were trying to perform any of these manipulations in a real program, you would want to do so in some other, more well-defined, more portable way.
So I assume you're asking only out of curiosity, and I answer in the same spirit.
Type char on your machine is probably a signed 8-bit quantity. So its range is from -128 to +127. So +128 won't fit.
When you try to jam the value +128 into a signed 8-bit quantity, you probably end up with the value -128 instead. And that seems to be what's happening for you, based on the fact that your if statement is evidently succeeding.
So next we try to take the value -128 and print it as if it was an unsigned int, which on your machine is evidently an 32-bit type. It can hold numbers in the range 0 to 4294967295, which obviously does not include -128. But unsigned integers typically behave pretty nicely modulo their range, so if we add 4294967296 to -128 we get 4294967168, which is precisely the number you saw.
Now that we've worked through this, let's resolve in future not to jam numbers that won't fit into char variables, or to print signed quantities with the %u format specifier.
Say I want to print some values. I am assuming that I should get Integer Overflow if my signed variable exceeds from TMin and TMax (in this case, using 4 byte int, 0x7FFFFFFF as Tmax and 0x80000000 as Tmin) but in these example I am not getting what I expect (explained in comments):
// Tmax = 0x7FFFFFFF == 2147483647
// Tmin = 0x80000000 == -2147483648
printf("1- overflow test: %d\n", 0x7FFFFFFF+1 ); // overflow - how compiler finds out it's a signed value, not an Unsigned one
printf("2- overflow test: %d\n", 0x80000000-1 ); // in contrast, why I am not getting an integer overflow here
printf("3- overflow test: %d\n", (int) 0x80000000-1 ); // overflow (Expected)
printf("4- overflow test: %d\n",(int) (0x7FFFFFFF+1)); // overflow (Expected)
First of all, let me tell you, (signed) integer overflow invokes undefined behavior.
In that scenario, anything can happen. You can neither trust nor reasonify the output of a code having UB.
Just to clarify, even
printf("2- overflow test: %d\n", 0x80000000-1 );
is UB. Though 0x80000000-1 is unsigned and not an overflow in itself, using %d will lead to a type mismatch which will technically lead to UB.
Regarding the Undefined behavior, from C11, annex §J.2,
Conversion to or from an integer type produces a value outside the range that can be
represented.
OP is not always experiencing signed integer overflow - which is undefined behavior.
The following is unsigned math as 0x80000000 is likely an unsigned integer. Hexadecimal constants are of the type that first fits them int, unsigned, long, unsigned long, ...
printf("2- overflow test: %d\n", 0x80000000-1 );
0x80000000-1 is an unsigned type as 0x80000000 first fits in an unsigned type, likely unsigned with the value of 2147483648u. 2147483648u - 1 --> 2147483647u.
0x7FFFFFFF+1 is a signed type as 0x7FFFFFFF first fits in a signed type, likely int with the value of INT_MAX.
int + int --> int and INT_MAX + 1 --> overflow.
OP said "0x80000000 as Tmin" is certainly a mis-understanding. In C, with 32-bit int/unsigned, 0x80000000 is a hexadecimal constant with the value of 2147483648. For OP, Tmin is more likely -INT_MAX - 1.
Trying to invoke integer overflow is undefined behavior. As per standard, in that case anything, including outwardly appearing non-overflow can happen. What this outputs is uninteresting and irrelevant.
It may be that your compiler optimizes it out. It may be that your system refuses to be used in such a way, it may be that a sack of rice falling in china caused a butterfly effect that lead to this. Its anything but defined what happens there.
It may also be that your system decided that its integers are bigger. Only the lower limit of integer sizes (2 byte) is mandated in c (at least 4 is typical for standard computers). Your system could have 8 byte integers, or even bigger ones.
C is kind of inconsistent about numeric exceptions.
There are a few things you can do that almost always cause problems. If you divide by 0, your program will usually crash pretty hard, just as if you'd accessed an invalid pointer.
There are a few things you can do that are guaranteed not to cause a problem. If you say
unsigned int i = UINT_MAX;
and then add 1 to it, it's guaranteed to wrap around to 0.
And then there are a number of things where the behavior is undefined or unspecified. Signed integer overflow is one of these. Strictly speaking it's undefined (anything can happen, you can't depend on it). In practice, most computers quietly wrap around, just like for unsigned arithmetic.
Now, everything I've said has been about the run-time behavior of a program, but in this question's posted code fragment, all the arithmetic happens at compile time. Compile-time arithmetic operates mostly according to the same rules as run-time, but not always. Modern compilers tend to warn you about problematic compile-time arithmetic (my copy of gcc emits three warnings for the posted fragment), but not always.