Unsigned integer underflow in C - c

I've seen multiple questions on the site addressing unsigned integer overflow/underflow.
Most of the questions about underflow ask about assigning a negative number to an unsigned integer; what's unclear to me is what happens when an unsigned int is subtracted from another unsigned int e.g. a - b where the result is negative. The relevant part of the standard is:
A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
In this context how do you interpret "reduced"? Does it mean that UINT_MAX+1 is added to the negative result until it is >= 0?
I see that the main point is addressed by this question (which basically says that the standard chooses to talk about overflow but the main point about modulo holds for underflow too) but it's still unclear to me:
Say the result of a-b is -1; According to the standard, the operation -1%(UINT_MAX+1) will return -1 (as is explained here); so we're back to where we started.
This may be overly pedantic, but does this modulo mean a mathematical modulo as opposed to C's computational modulo?

Firstly, a result that is below the minimum value of the given integer type is not called "underflow" in C. The term "underflow" is reserved for floating-point types and means something completely different. Going out of range of an integer type is always overflow, regardless of which end of the range you cross. So the fact that you don't see the language specification talking about "underflow" doers not really mean anything in this case.
Secondly, you are absolutely right about the meaning of the word "reduced". The final value is defined by adding (or subtracting) UINT_MAX+1 from the "mathematical" result until it returns into the range of unsigned int. This is also the same thing as Euclidean "modulo" operation.

The part of the standard you posted talks about overflow, not underflow.
"Does it mean that UINT_MAX+1 is added to the negative result until it is >= 0?"
You can think that's what happens. Abstractly the result will be the same. A similar question has already been asked about it. Check this link: Question about C behaviour for unsigned integer underflow for more details.
Another way to think is that, for example, -1 is in principle from type int (that is 4 bytes, in which all bits are 1). Then, when you tell the program to interpret all these bits 1 as unsigned int, its value will be interpreted as UINT_MAX.

Under the hood, addition, or subtraction is bit wise and sign independent. The code generated could use the same instructions independent of whether it is signed or not. It is other operators that interpret the result, for example a > 0. Do the bit wise add or sub and this tells you the answer. b0 - b1 = b111111111 the answer is the same independent of the sign. It is only other operators that see the answer as -1 for signed types and 0xFF for unsigned types. The standard describes this behaviour, but I always find it easiest to remember how it works and deduce the consequences to the code I am writing.
signed int adds(signed int a, signed int b)
{
return a + b;
}
unsigned int addu(unsigned a, unsigned b)
{
return a + b;
}
int main() {
return 0;
}
->
adds(int, int):
lea eax, [rdi+rsi]
ret
addu(unsigned int, unsigned int):
lea eax, [rdi+rsi]
ret
main:
xor eax, eax
ret

Related

Using Type Modifier (signed) Comparisons

This prints "signed comparison" https://onlinegdb.com/eA87wKQkU
#include <stdio.h>
#include <stdint.h>
int main()
{
uint64_t A = -1, B = 1;
if ((signed)A < (signed)B)
{
printf("signed comparison");
}
return 0;
}
To ensure an overall signed comparison, looks like the (signed) type modifier must be applied to A and B.
Is this correct?
Also, I haven't seen any C code using ((signed)A < (signed)B) and was wondering if it's valid C89/99?
Perhaps ((int64_t)A < (int64_t)B) is a better approach?
Thanks.
The answer to both questions is yes:
if you only convert A or B as (signed), which means (signed int), the comparison will still be performed as uint64_t because the converted value will be converted to the larger type uint64_t. Converting both A and B is hence necessary.
converting to int64_t is probably a better idea as this signed type is larger, but it should not matter in this particular example: converting A, whose value is UINT64_MAX to int or int64_t is implementation defined and may or may not produce -1. The C Standard allows for an implementation defined signal to be raised by this out of range conversion.
on most current architectures, no signal will be raised and the conversion of A will indeed produce -1 and the code will print signed comparison. Yet you should end the output with a newline for proper operation.
This is a slightly unusual solution, irrelevant for all practical purposes, but it does have the one advantage of avoiding the following conversion rule from the C standard:
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
The idea of this solution is to invert the top bit of the unsigned type in the values being compared before the comparison.
Converting -1 to uint64_t produces the value 0xffffffffffffffff in A.
Converting 1 to uint64_t produces the value 0x0000000000000001 in B.
So the comparison A < B is false.
Denoting A and B with the top bit inverted as Ax and Bx respectively, then Ax has the value 0x7fffffffffffffff and Bx has the value 0x8000000000000001. The comparison Ax < Bx is true.
One way to invert the top bit of a uint64_t value is to add or subtract INT64_MIN and convert the result back to uint64_t. Converting INT64_MIN to uint64_t produces the value 0x8000000000000000 and adding or subtracting that to another uint64_t value will invert the top bit. This also works for any other exact-width unsigned type with the 64 changed to exact width in question.
So the following will do a "signed comparison" of the uint64_t values A and B:
if ((uint64_t)(A - INT64_MIN) < (uint64_t)(B - INT64_MIN))
printf("A signed less than B\n");
Type-casting the adjusted values back to the unsigned integer type as shown above is only necessary for unsigned types whose values can all be represented by int. For example, it is necessary when A and B are of type uint8_t. A would have the value 255, B would have the value 1, Ax would have value 383 (for subtraction) or 127 (for addition), Bx would have the value 129 (for subtraction) or -127 (for addition), and Ax < Bx would be false. The type-cast would convert Ax to 127 and Bx to 129 so the comparison Ax < Bx would be true.

signed and unsigned integer in C

I have wrote this program as an exercise to understand how the signed and unsigned integer
work in C.
This code should print simply -9 the addition of -4+-5 stored in variable c
#include <stdio.h>
int main (void) {
unsigned int a=-4;
unsigned int b=-5;
unsigned int c=a+b;
printf("result is %u\n",c);
return 0;
}
When this code run it give me an unexpected result 4294967287.
I also have cast c from unsigned to signed integer printf ("result is %u\n",(int)c);
but also doesn't work.
please someone give explanation why the program doesn't give the exact result?
if this is an exercise in c and signed vs unsigned you should start by thinking - what does this mean?
unsigned int a=-4;
should it even compile? It seems like a contradiction.
Use a debugger to inspect the memory stored at he location of a. Do you think it will be the same in this case?
int a=-4;
Does the compiler do different things when its asked to add unsigned x to unsigned y as opposed to signed x and signed y. Ask the compiler to show you the machine code it generated in each case, read up what the instructions do
Explore investigate verify, you have the opportunity to get really interesting insights into how computers really work
You expect this:
printf("result is %u\n",c);
to print -9. That's impossible. c is of type unsigned int, and %u prints a value of type unsigned int (so good work using the right format string for the argument). An unsigned int object cannot store a negative value.
Going back a few line in your program:
unsigned int a=-4;
4 is of type (signed) int, and has the obvious value. Applying unary - to that value yields an int value of -4.
So far, so good.
Now what happens when you store this negative int value in an unsigned int object?
It's converted.
The language specifies what happens when you convert a signed int value to unsigned int: the value is adjusted to it's within the range of unsigned int. If unsigned int is 32 bits, this is done by adding or subtracting 232 as many times as necessary. In this case, the result is -4 + 232, or 4294967292. (That number makes a bit more sense if you show it in hexadecimal: 0xfffffffc.)
(The generated code isn't really going to repeatedly add or subtract 232; it's going to do whatever it needs to do to get the same result. The cool thing about using two's-complement to represent signed integers is that it doesn't have to do anything. The int value -4 and the unsigned int value 4294967292 have exactly the same bit representation. The rules are defined in terms of values, but they're designed so that they can be easily implemented using bitwise operations.)
Similarly, c will have the value -5 + 232, or 4294967291.
Now you add them together. The mathematical result is 8589934583, but that won't fit in an unsigned int. Using rules similar to those for conversion, the result is reduced to a value that's within the range of unsigned int, yielding 4294967287 (or, in hex, 0xfffffff7).
You also tried a cast:
printf ("result is %u\n",(int)c);
Here you're passing an int argument to printf, but you've told it (by using %u) to expect an unsigned int. You've also tried to convert a value that's too big to fit in an int -- and the unsigned-to-signed conversion rules do not define the result of such a conversion when the value is out of range. So don't do that.
That answer is precisely correct for 32-bit ints.
unsigned int a = -4;
sets a to the bit pattern 0xFFFFFFFC, which, interpreted as unsigned, is 4294967292 (232 - 4). Likewise, b is set to 232 - 5. When you add the two, you get 0x1FFFFFFF7 (8589934583), which is wider than 32 bits, so the extra bits are dropped, leaving 4294967287, which, as it happens, is 232 - 9. So if you had done this calculation on signed ints, you would have gotten exactly the same bit patterns, but printf would have rendered the answer as -9.
Using google, one finds the answer in two seconds..
http://en.wikipedia.org/wiki/Signedness
For example, 0xFFFFFFFF gives −1, but 0xFFFFFFFFU gives 4,294,967,295
for 32-bit code
Therefore, your 4294967287 is expected in this case.
However, what exactly do you mean by "cast from unsigned to signed does not work?"

is it safe to subtract between unsigned integers?

Following C code displays the result correctly, -1.
#include <stdio.h>
main()
{
unsigned x = 1;
unsigned y=x-2;
printf("%d", y );
}
But in general, is it always safe to do subtraction involving
unsigned integers?
The reason I ask the question is that I want to do some conditioning
as follows:
unsigned x = 1; // x was defined by someone else as unsigned,
// which I had better not to change.
for (int i=-5; i<5; i++){
if (x+i<0) continue
f(x+i); // f is a function
}
Is it safe to do so?
How are unsigned integers and signed integers different in
representing integers? Thanks!
1: Yes, it is safe to subtract unsigned integers. The definition of arithmetic on unsigned integers includes that if an out-of-range value would be generated, then that value should be adjusted modulo the maximum value for the type, plus one. (This definition is equivalent to truncating high bits).
Your posted code has a bug though: printf("%d", y); causes undefined behaviour because %d expects an int, but you supplied unsigned int. Use %u to correct this.
2: When you write x+i, the i is converted to unsigned. The result of the whole expression is a well-defined unsigned value. Since an unsigned can never be negative, your test will always fail.
You also need to be careful using relational operators because the same implicit conversion will occur. Before I give you a fix for the code in section 2, what do you want to pass to f when x is UINT_MAX or close to it? What is the prototype of f ?
3: Unsigned integers use a "pure binary" representation.
Signed integers have three options. Two can be considered obsolete; the most common one is two's complement. All options require that a positive signed integer value has the same representation as the equivalent unsigned integer value. In two's complement, a negative signed integer is represented the same as the unsigned integer generated by adding UINT_MAX+1, etc.
If you want to inspect the representation, then do unsigned char *p = (unsigned char *)&x; printf("%02X%02X%02X%02X", p[0], p[1], p[2], p[3]);, depending on how many bytes are needed on your system.
Its always safe to subtract unsigned as in
unsigned x = 1;
unsigned y=x-2;
y will take on the value of -1 mod (UINT_MAX + 1) or UINT_MAX.
Is it always safe to do subtraction, addition, multiplication, involving unsigned integers - no UB. The answer will always be the expected mathematical result modded by UINT_MAX+1.
But do not do printf("%d", y ); - that is UB. Instead printf("%u", y);
C11 §6.2.5 9 "A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type."
When unsigned and int are used in +, the int is converted to an unsigned. So x+i has an unsigned result and never is that sum < 0. Safe, but now if (x+i<0) continue is pointless. f(x+i); is safe, but need to see f() prototype to best explain what may happen.
Unsigned integers are always 0 to power(2,N)-1 and have well defined "overflow" results. Signed integers are 2's complement, 1's complement, or sign-magnitude and have UB on overflow. Some compilers take advantage of that and assume it never occurs when making optimized code.
Rather than really answering your questions directly, which has already been done, I'll make some broader observations that really go to the heart of your questions.
The first is that using unsigned in loop bounds where there's any chance that a signed value might crop up will eventually bite you. I've done it a bunch of times over 20 years and it has ultimately bit me every time. I'm now generally opposed to using unsigned for values that will be used for arithmetic (as opposed to being used as bitmasks and such) without an excellent justification. I have seen it cause too many problems when used, usually with the simple and appealing rationale that “in theory, this value is non-negative and I should use the most restrictive type possible”.
I understand that x, in your example, was decided to be unsigned by someone else, and you can't change it, but you want to do something involving x over an interval potentially involving negative numbers.
The “right” way to do this, in my opinion, is first to assess the range of values that x may take. Suppose that the length of an int is 32 bits. Then the length of an unsigned int is the same. If it is guaranteed to be the case that x can never be larger than 2^31-1 (as it often is), then it is safe in principle to cast x to a signed equivalent and use that, i.e. do this:
int y = (int)x;
// Do your stuff with *y*
x = (unsigned)y;
If you have a long that is longer than unsigned, then even if x uses the full unsigned range, you can do this:
long y = (long)x;
// Do your stuff with *y*
x = (unsigned)y;
Now, the problem with either of these approaches is that before assigning back to x (e.g. x=(unsigned)y; in the immediately preceding example), you really must check that y is non-negative. However, these are exactly the cases where working with the unsigned x would have bitten you anyway, so there's no harm at all in something like:
long y = (long)x;
// Do your stuff with *y*
assert( y >= 0L );
x = (unsigned)y;
At least this way, you'll catch the problems and find a solution, rather than having a strange bug that takes hours to find because a loop bound is four billion unexpectedly.
No, it's not safe.
Integers usually are 4 bytes long, which equals to 32 bits. Their difference in representation is:
As far as signed integers is concerned, the most significant bit is used for sign, so they can represent values between -2^31 and 2^31 - 1
Unsigned integers don't use any bit for sign, so they represent values from 0 to 2^32 - 1.
Part 2 isn't safe either for the same reason as Part 1. As int and unsigned types represent integers in a different way, in this case where negative values are used in the calculations, you can't know what the result of x + i will be.
No, it's not safe. Trying to represent negative numbers with unsigned ints smells like bug. Also, you should use %u to print unsigned ints.
If we slightly modify your code to put %u in printf:
#include <stdio.h>
main()
{
unsigned x = 1;
unsigned y=x-2;
printf("%u", y );
}
The number printed is 4294967295
The reason the result is correct is because C doesn't do any overflow checks and you are printing it as a signed int (%d). This, however, does not mean it is safe practice. If you print it as it really is (%u) you won't get the correct answer.
An Unsigned integer type should be thought of not as representing a number, but as a member of something called an "abstract algebraic ring", specifically the equivalence class of integers congruent modulo (MAX_VALUE+1). For purposes of examples, I'll assume "unsigned int" is 16 bits for numerical brevity; the principles would be the same with 32 bits, but all the numbers would be bigger.
Without getting too deep into the abstract-algebraic nitty-gritty, when assigning a number to an unsigned type [abstract algebraic ring], zero maps to the ring's additive identity (so adding zero to a value yields that value), one means the ring's multiplicative identity (so multiplying a value by one yields that value). Adding a positive integer N to a value is equivalent to adding the multiplicative identity, N times; adding a negative integer -N, or subtracting a positive integer N, will yield the value which, when added to +N, would yield the original value.
Thus, assigning -1 to a 16-bit unsigned integer yields 65535, precisely because adding 1 to 65535 will yield 0. Likewise -2 yields 65534, etc.
Note that in an abstract algebraic sense, every integer can be uniquely assigned into to algebraic rings of the indicated form, and a ring member can be uniquely assigned into a smaller ring whose modulus is a factor of its own [e.g. a 16-bit unsigned integer maps uniquely to one 8-bit unsigned integer], but ring members are not uniquely convertible to larger rings or to integers. Unfortunately, C sometimes pretends that ring members are integers, and implicitly converts them; that can lead to some surprising behavior.
Subtracting a value, signed or unsigned, from an unsigned value which is no smaller than int, and no smaller than the value being subtracted, will yield a result according to the rules of algebraic rings, rather than the rules of integer arithmetic. Testing whether the result of such computation is less than zero will be meaningless, because ring values are never less than zero. If you want to operate on unsigned values as though they are numbers, you must first convert them to a type which can represent numbers (i.e. a signed integer type). If the unsigned type can be outside the range that is representable with the same-sized signed type, it will need to be upcast to a larger type.

Subtracting 0x8000 from an int

I am reverse engineering some old C, running under Win95 (yes, in production) appears to have been compiled with a Borland compiler (I don't have the tool chain).
There is a function which does (among other things) something like this:
static void unknown(int *value)
{
int v = *value;
v-=0x8000;
*value = v;
}
I can't quite work out what this does. I assume 'int' in this context is signed 32 bit. I think 0x8000 would be unsigned 32bit int, and outside the range of a signed 32 bit int. (edit - this is wrong, it is outside of a signed 16 bit int)
I am not sure if one of these would be cast first, and how the casting would handle overflows, and/or how the subtraction would handle the over flow.
I could try on a modern system, but I am also unsure if the results would be the same.
Edit for clarity:
1: 'v-=0x8000;' is straight from the original code, this is what makes little sense to me. v is defined as an int.
2: I have the code, this is not from asm.
3: The original code is very, very bad.
Edit: I have the answer! The answer below wasn't quite right, but it got me there (fix up and I'll mark it as the answer).
The data in v is coming from an ambiguous source, which actually seems to be sending unsigned 16 bit data, but it is being stored as a signed int. Latter on in the program all values are converted to floats and normalised to an average 0 point, so actual value doesn't matter, only order. Because we are looking at an unsigned int as a signed one, values over 32767 are incorrectly placed below 0, so this hack leaves the value as signed, but swaps the negative and positive numbers around (not changing order). End results is all numbers have the same order (but different values) as if they were unsigned in the first place.
(...and this is not the worst code example in this program)
In Borland C 3.x, int and short were the same: 16 bits. long was 32-bits.
A hex literal has the first type in which the value can be represented: int, unsigned int, long int or unsigned long int.
In the case of Borland C, 0x8000 is a decimal value of 32768 and won't fit in an int, but will in an unsigned int. So unsigned int it is.
The statement v -= 0x8000 ; is identical to v = v - 0x8000 ;
On the right-hand side, the int value v is implicitly cast to unsigned int, per the rules, the arithmetic operation is performed, yielding an rval that is an unsigned int. That unsigned int is then, again per the rules, implicitly cast back to the type of the lval.
So, by my estimation, the net effect is to toggle the sign bit — something that could be more easily and clearly done via simple bit-twiddling: *value ^= 0x8000 ;.
There is possibly a clue on this page http://www.ousob.com/ng/borcpp/nga0e24.php - Guide to Borland C++ 2.x ( with Turbo C )
There is no such thing as a negative numeric constant. If
a minus sign precedes a numeric constant it is treated as
the unary minus operator, which, along with the constant,
constitutes a numeric expression. This is important with
-32768, which, while it can be represented as an int,
actually has type long int, since 32768 has type long. To
get the desired result, you could use (int) -32768,
0x8000, or 0177777.
This implies the use of two's complement for negative numbers. Interestingly, the two's complement of 0x8000 is 0x8000 itself (as the value +32768 does not fit in the range for signed 2 byte ints).
So what does this mean for your function? Bit wise, this has the effect of toggling the sign bit, here are some examples:
f(0) = f(0x0000) = 0x8000 = -32768
f(1) = f(0x0001) = 0x8001 = -32767
f(0x8000) = 0
f(0x7fff) = 0xffff
It seems like this could be represented as val ^= 0x8000, but perhaps the XOR operator was not implemented in Borland back then?

How to cast or convert an unsigned int to int in C?

My apologies if the question seems weird. I'm debugging my code and this seems to be the problem, but I'm not sure.
Thanks!
It depends on what you want the behaviour to be. An int cannot hold many of the values that an unsigned int can.
You can cast as usual:
int signedInt = (int) myUnsigned;
but this will cause problems if the unsigned value is past the max int can hold. This means half of the possible unsigned values will result in erroneous behaviour unless you specifically watch out for it.
You should probably reexamine how you store values in the first place if you're having to convert for no good reason.
EDIT: As mentioned by ProdigySim in the comments, the maximum value is platform dependent. But you can access it with INT_MAX and UINT_MAX.
For the usual 4-byte types:
4 bytes = (4*8) bits = 32 bits
If all 32 bits are used, as in unsigned, the maximum value will be 2^32 - 1, or 4,294,967,295.
A signed int effectively sacrifices one bit for the sign, so the maximum value will be 2^31 - 1, or 2,147,483,647. Note that this is half of the other value.
Unsigned int can be converted to signed (or vice-versa) by simple expression as shown below :
unsigned int z;
int y=5;
z= (unsigned int)y;
Though not targeted to the question, you would like to read following links :
signed to unsigned conversion in C - is it always safe?
performance of unsigned vs signed integers
Unsigned and signed values in C
What type-conversions are happening?
IMHO this question is an evergreen. As stated in various answers, the assignment of an unsigned value that is not in the range [0,INT_MAX] is implementation defined and might even raise a signal. If the unsigned value is considered to be a two's complement representation of a signed number, the probably most portable way is IMHO the way shown in the following code snippet:
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else if (u >= (unsigned int)INT_MIN)
i = -(int)~u - 1; /*(2)*/
else
i = INT_MIN; /*(3)*/
Branch (1) is obvious and cannot invoke overflow or traps, since it
is value-preserving.
Branch (2) goes through some pains to avoid signed integer overflow
by taking the one's complement of the value by bit-wise NOT, casts it
to 'int' (which cannot overflow now), negates the value and subtracts
one, which can also not overflow here.
Branch (3) provides the poison we have to take on one's complement or
sign/magnitude targets, because the signed integer representation
range is smaller than the two's complement representation range.
This is likely to boil down to a simple move on a two's complement target; at least I've observed such with GCC and CLANG. Also branch (3) is unreachable on such a target -- if one wants to limit the execution to two's complement targets, the code could be condensed to
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else
i = -(int)~u - 1; /*(2)*/
The recipe works with any signed/unsigned type pair, and the code is best put into a macro or inline function so the compiler/optimizer can sort it out. (In which case rewriting the recipe with a ternary operator is helpful. But it's less readable and therefore not a good way to explain the strategy.)
And yes, some of the casts to 'unsigned int' are redundant, but
they might help the casual reader
some compilers issue warnings on signed/unsigned compares, because the implicit cast causes some non-intuitive behavior by language design
If you have a variable unsigned int x;, you can convert it to an int using (int)x.
It's as simple as this:
unsigned int foo;
int bar = 10;
foo = (unsigned int)bar;
Or vice versa...
If an unsigned int and a (signed) int are used in the same expression, the signed int gets implicitly converted to unsigned. This is a rather dangerous feature of the C language, and one you therefore need to be aware of. It may or may not be the cause of your bug. If you want a more detailed answer, you'll have to post some code.
Some explain from C++Primer 5th Page 35
If we assign an out-of-range value to an object of unsigned type, the result is the remainder of the value modulo the number of values the target type can hold.
For example, an 8-bit unsigned char can hold values from 0 through 255, inclusive. If we assign a value outside the range, the compiler assigns the remainder of that value modulo 256.
unsigned char c = -1; // assuming 8-bit chars, c has value 255
If we assign an out-of-range value to an object of signed type, the result is undefined. The program might appear to work, it might crash, or it might produce garbage values.
Page 160:
If any operand is an unsigned type, the type to which the operands are converted depends on the relative sizes of the integral types on the machine.
...
When the signedness differs and the type of the unsigned operand is the same as or larger than that of the signed operand, the signed operand is converted to unsigned.
The remaining case is when the signed operand has a larger type than the unsigned operand. In this case, the result is machine dependent. If all values in the unsigned type fit in the large type, then the unsigned operand is converted to the signed type. If the values don't fit, then the signed operand is converted to the unsigned type.
For example, if the operands are long and unsigned int, and int and long have the same size, the length will be converted to unsigned int. If the long type has more bits, then the unsigned int will be converted to long.
I found reading this book is very helpful.

Resources