I was reading this Why is the sum of two large integers a negative number in the C language?. Then I tried the following code:
int main() {
unsigned int x,y,s;
x = 4294967295;
y = 4294967295;
s = x+y;
printf("%u",s);
return 0;
}
Output : 4294967294
1 ) How does it calculate sum (s)? I am a bit confused by the explanation given in the link.
When I increase the values of x and y to the extent which is beyond the range of unsigned int the result always seems to be in range of unsigned int. In fact it seems like the result keeps decreasing. It does give the following error.
sample.c:7:9: warning: large integer implicitly truncated to unsigned type [-Woverflow]
2 ) Can I brute force this program so that whenever value of x and y exceeds unsigned int range program throws an error.
The C standard has well defined behavior for unsigned integer overflow. When this happens, the mathematical result is truncated modulo the maximum allowable value + 1. In layman's terms, this means that the values wrap around.
In the case of adding 4294967295 and 4294967295, this wraparound behavior results in 4294967294.
Throwing an error would be in violation of the standard.
1 ) How does it calculate sum (s)?
See #dbush good answer.
2 ) Can I brute force this program so that whenever value of x and y exceeds unsigned int range program throws an error.
Code could detect the mathematical overflow easily with unsigned math. Math overflow occurs if sum is less than an operand of the addition. Testing against only one of x or y is sufficient.
unsigned int x,y,s;
x = 4294967295;
y = 4294967295;
s = x+y;
printf("%u\n",s);
if (s < x) puts("Math overflow");
// or
if (s < y) puts("Math overflow");
return 0;
For signed int tests, see Test if arithmetic operation will cause undefined behavior
In you case unsigned int is 32 bits, 4294967295 + 4294967295 == 8589934590 which has a 33 bit binary value:
1 1111 1111 1111 1111 1111 1111 1111 1110
^
Carry bit
The carry bit is lost because the representation has only 32 bits, and the resulting value:
1111 1111 1111 1111 1111 1111 1111 1110 = 4294967294 decimal
You must either detect an overflow before it happens to store the result in a larger type and test its value.
if( UINT_MAX - x < y )
{
puts("Math overflow") ;
}
else
{
s = x + y ;
printf( "%u\n", s ) ;
}
Or:
unsigned long long s = x + y ;
if( s > UINT_MAX )
{
puts("Math overflow") ;
}
else
{
printf( "%u\n", (unsigned int)s ) ;
}
According to kias.dyndns.org:
Most modern computers store memory in units of 8 bits, called a "byte"
(also called an "octet"). Arithmetic in such computers can be done in
bytes, but is more often done in larger units called "(short)
integers" (16 bits), "long integers" (32 bits) or "double integers"
(64 bits). Short integers can be used to store numbers between 0 and
216 - 1, or 65,535. Long integers can be used to store numbers between
0 and 232 - 1, or 4,294,967,295. and double integers can be used to
store numbers between 0 and 264 - 1, or 18,446,744,073,709,551,615.
(Check these!)
[...]
When a computer performs an unsigned integer arithmetic operation,
there are three possible problems which can occur:
if the result is too large to fit into the number of bits assigned to
it, an "overflow" is said to have occurred. For example if the result
of an operation using 16 bit integers is larger than 65,535, an
overflow results.
in the division of two integers, if the result is not itself an
integer, a "truncation" is said to have occurred: 10 divided by 3 is
truncated to 3, and the extra 1/3 is lost. This is not a problem, of
course, if the programmer's intention was to ignore the remainder!
any division by zero is an error, since division by zero is not
possible in the context of arithmetic.
[Original emphasis removed; emphasis added.]
Related
This question already has answers here:
Detecting signed overflow in C/C++
(13 answers)
Closed 1 year ago.
I want to know if x - y overflows.
Below is my code.
#include <stdio.h>
/* Determine whether arguments can be subtracted without overflow */
int tsub_ok(int x, int y)
{
return (y <= 0 && x - y >= x) || (y >= 0 && x - y <= x);
}
int main()
{
printf("0x80000000 - 1 : %d\n", tsub_ok(0x80000000, 1));
}
Why can't I get the result I expect?
You can't check for overflow of signed integers by performing the offending operation and seeing if the result wraps around.
First, the value 0x80000000 passed to the function is outside the range of a 32 bit int. So it undergoes an implementation defined conversion. On most systems that use 2's compliment, this will result in the value with that representation which is -2147483648 which also happens to be the value of INT_MIN.
Then you attempt to execute x - y which results in signed integer overflow which triggers undefined behavior, giving you an unexpected result.
The proper way to handle this is to perform some algebra to ensure the overflow does not happen.
If x and y have the same sign then subtracting won't overflow.
If the signs differ and x is positive, one might naively try this:
INT_MAX >= x - y
But this could overflow. Instead change it to the mathematically equivalent:
INT_MAX + y >= x
Because y is negative, INT_MAX + y doesn't overflow.
A similar check can be done when x is negative with INT_MIN. The full check:
if (x>=0 && y>=0) {
return 1;
} else if (x<=0 && y<=0) {
return 1;
} else if (x>=0 && INT_MAX + y >= x) {
return 1;
} else if (x<0 && INT_MIN + y <= x) {
return 1;
} else {
return 0;
}
Yes, x - y overflows.
We assume int and unsigned int are 32 bits in the C implementation you are using, as indicated in the title, and that two’s complement is used for int. Then the range of values for int is −231 to +231−1.
In tsub_ok(0x80000000, 1), the constant 0x80000000 has the value 231, and its type is unsigned int since it will not fit in an int. Then this value is passed to tsub_ok. Since the first parameter of tsub_ok has type int, the value is converted to int.
By C 2018 6.3.1.3 3, the conversion is implementation-defined. Many C implementations “wrap” the value modulo 232. Assuming your C implementation does this, the result of converting 231 to int is −231.
Then, inside the function, x - y is −231 − 1, and the result of that overflows the int type. The C standard does not define the behavior of the program when signed integer overflow occurs, and so any test that relies on comparing x - y when it may overflow is not supported by the C standard.
Here an int is 32 bits. This means it has a total range of 2^32 possible values. Converting this to hex, that's a max of 0xFFFFFFFF(when unsigned), but not signed. A signed int will have a max hex value of 0x7FFFFFFF. Thus, you cannot store 0x80000000 in an int here and have everything work.
In computer programming, signed and unsigned numbers are represented only as sequences of bits. Bit 31 is the sign bit for a 32-bit signed int, hence the highest 32-bit int you can store is 0x7FFFFFFF, hence the overflow with 0x80000000 as signed int.
Remember, a signed int is an integer that can be both positive and negative. This is as opposed to an unsigned int, which can only be used to hold a positive integer.
What you are trying to do is, you are trying a signed int variable hold an unsigned value - which causes the overflow.
For more info check Signed number representations or refer any beginner level digital number systems and programming book.
Suppose, I'm trying to subtract 2 unsigned integers:
247 = 1111 0111
135 = 1000 0111
If we subtract these 2 binary numbers we get = 0111 0000
Is this a underflow, since we only need 7 bits now?? Or how does that work??
Underflow in unsigned subtraction c = a - b occurs whenever b is larger than a.
However, that's somewhat of a circular definition, because how many kinds of machines perform the a < b comparison is by subtracting the operands using wraparound arithmetic, and then detecting the overflow based on the two operands and the result.
Note also that in C we don't speak about "overflow", because there is no error condition: C unsigned integers provide that wraparound arithmetic that is commonly found in hardware.
So, given that we have the wraparound arithmetic, we can detect whether wraparound (or overflow, depending on point of view) has taken place in a subtraction.
What we need is the most significant bits from a, b and c. Let's call them A, B and C. From these, the overflow V is calculated like this:
A B C | V
------+--
0 0 0 | 0
0 0 1 | 1
0 1 0 | 1
0 1 1 | 1
1 0 0 | 0
1 0 1 | 0
1 1 0 | 0
1 1 1 | 1
This simplifies to
A'B + A'C + BC
In other words, overflow in the unsigned subtraction c = a - b happens whenever:
the msb of a is 0 and that of b is 1;
or the msb of a is 0 and that of c is 1;
or the msb of b is 1 and that of c is also 1.
Subtracting 247 - 135 = 112 is clearly not overflow, since 247 is larger than 135. Applying the rules above, A = 1, B = 0 and C = 0. The 1 1 0 row of the table has a 0 in the V column: no overflow.
Generally, “underflow” means the ideal mathematical result of a calculation is below what the type can represent. If 7 is subtracted from 5 in unsigned arithmetic, the ideal mathematical result would be −2, but an unsigned type cannot represent −2, so the operation underflows. Or, in an eight-bit signed type that can represent numbers from −128 to +127, subtracting 100 from −100 would ideally produce −200, but this cannot be represented in the type, so the operation underflows.
In C, unsigned arithmetic is said not to underflow or overflow because the C standard defines the operations to be performed using modulo arithmetic instead of real-number arithmetic. For example, with 32-bit unsigned arithmetic, subtracting 7 from 5 would produce 4,294,967,294 (in hexadecimal, FFFFFFFE16), because it has wrapped modulo 232 = 4,294,967,296. People may nonetheless use the terms “underflow” or “overflow” when discussing these operations, intended to refer to the mathematical issues rather than the defined C behavior.
In other words, for whatever type you are using for arithmetic there is some lower limit L and some upper limit U that the type can represent. If the ideal mathematical result of an operation is less than L, the operation underflows. If the ideal mathematical result of an operation is greater than U, the operation overflows. “Underflow” and “overflow” mean the operation has gone out of the bounds of the type. “Overflow” may also be used to refer to any exceeding of the bounds of the type, including in the low direction.
It does not mean that fewer bits are needed to represent the result. When 100001112 is subtracted from 111101112, the result, 011100002 = 11100002, is within bounds, so there is no overflow or underflow. The fact that it needs fewer bits to represent is irrelevant.
(Note: For integer arithmetic, “underflow” or “overflow” is defined relative to the absolute bounds L and U. For floating-point arithmetic, these terms have somewhat different meanings. They may be defined relative to the magnitude of the result, neglecting the sign, and they are defined relative to the finite non-zero range of the format. A floating-point format may be able to represent 0, then various finite non-zero numbers, then infinity. Certain results between 0 and the smallest non-zero number the format can represent are said to underflow even though they are technically inside the range of representable numbers, which is from 0 to infinity in magnitude. Similarly, certain results above the greatest representable finite number are said to overflow even though they are inside the representable range, since they are less than infinity.)
Long story short, this is what happens when you have:
unsigned char n = 255; /* highest possible value for an unsigned char */
n = n + 1; /* now n is "overflowing" (although the terminology is not correct) to 0 */
printf("overflow: 255 + 1 = %u\n", n);
n = n - 1; /* n will now "underflow" from 0 to 255; */
printf("underflow: 0 - 1 = %u\n", n);
n *= 2; /* n will now be (255 * 2) % 256 = 254;
/* when the result is too high, modulo with 2 to the power of 8 is used */
/* for an 8 bit variable such as unsigned char; */
printf("large overflow: 255 * 2 = %u\n", n);
n = n * (-2) + 100; /* n should now be -408 which is 104 in terms of unsigned char. */
/* (Logic is this: 408 % 256 = 152; 256 - 152 = 104) */
printf("large underflow: 255 * 2 = %u\n", n);
The result of that is (compiled with gcc 11.1, flags -Wall -Wextra -std=c99):
overflow: 255 + 1 = 0
underflow: 0 - 1 = 255
large overflow: 255 * 2 = 254
large underflow: 255 * 2 = 104
Now the scientific version: The comments above represent just a mathematical model of what is going on. To better understand what is actually happening, the following rules apply:
Integer promotion:
Integer types smaller than int are promoted when an operation is
performed on them. If all values of the original type can be
represented as an int, the value of the smaller type is converted to
an int; otherwise, it is converted to an unsigned int.
So what actually happens in memory when the computer does n = 255; n = n + 1; is this:
First, the right side is evaluated as an int (signed), because the result fits in a signed int according to the rule of integer promotion. So the right side of the expression becomes in binary: 0b00000000000000000000000011111111 + 0b00000000000000000000000000000001 = 0b00000000000000000000000100000000 (a 32 bit int).
Truncation
The 32-bit int loses the most significant 24 bits when being assigned
back to an 8-bit number.
So, when 0b00000000000000000000000100000000 is assigned to variable n, which is an unsigned char, the 32-bit value is truncated to an 8-bit value (only the right-most 8 bits are copied) => n becomes 0b00000000.
The same thing happens for each operation. The expression on the right side evaluates to a signed int, than it is truncated to 8 bits.
I recently came across this question and the answer given by #chux - Reinstate Monica.
Quoting lines from their answer, "This is implementation-defined behavior. The assigned value could have been 0 or 1 or 2... Typically, the value is wrapped around ("modded") by adding/subtracting 256 until in range. 100 + 100 -256 --> -56."
Code:
#include <stdio.h>
int main(void)
{
char a = 127;
a++;
printf("%d", a);
return 0;
}
Output: -128
In most of the C compilers, char type takes 1 Byte size and strictly speaking, I'm assuming its 16-bit system and char takes 1 Byte.
When a = 127, its binary representation inside the computer is 0111 1111, increasing it with 1 should yield the value
0111 1111 + 0000 0001 = 1000 0000
which is equal to -0(considering, signed-number representation, where left-most bit represents 0 = + and 1 = -) then why the output is equal to -128?
Is it because of the "INTEGER PROMOTION RULE"? I mean, for this expression a + 1, a gets converted to int (2 Bytes) before the + operation and then its binary representation in the memory becomes 1111 1111 1000 0000 which is equal to -128 and makes sense to the output -128. But then this assumption of mine conflicts with the quoted lines of Chux-Reinstate-Monica about wrapping the values.
1000 0000 which is equal to -0...
Ones' complement has a -0, but most computers use two's complement which does not.
In two's complement notation the left-most bit represents -(coefficient_bit * 2^N-1) i.e. in your case, 1000 0000 the left-most bit represents -(1 * 2^8-1) which is equal to -128 and that's why the output is the same.
Your char is an 8 bit signed integer in which case 1000 0000 is -128. We can test what 1000 0000 is conveniently using the GNU extension which allows binary constants.
char a = 0b10000000;
printf("%d\n", a); // -128
char, in this implementation, is a signed 8-bit integer. Adding 1 to 127 causes integer overflow to -128.
What about integer promotion? Integer promotion happens during the calculation, but the result is still a char. 128 can't fit in our signed 8-bit char, so it overflows to -128.
Integer promotion is demonstrated by this example.
char a = 30, b = 40;
char c = (a * b);
printf("%d\n", c); // -80
char d = (a * b) / 10;
printf("%d\n", d); // 120
char c = (a * b); is -80, but char d = (a * b) / 10; is 120. Why? Shouldn't it be -8? The answer here is integer promotion. The math is done as native integers, but the result must still be stuffed into an 8-bit char. (30 * 40) is 1200 which is 0100 1011 0000. Then it must be stuffed back into an 8 bit signed integer; that's 1011 0000 or -80.
For the other calculation, (30 * 40) / 10 == 1200 / 10 == 120 which fits just fine.
y is promoted to unsigned int and compared with x here.Does binary number comparison happen everytime? Then if(12 == -4) is done, why can't it promote LHS to unsigned and print "same"?(considering 12 = 1100, -4 = 1100)Please correct if I am wrong.
#include<stdio.h>
int main()
{
unsigned int x = -1;
int y = ~0;
if(x == y)//1.what happens if( y == x) is given?.O/P is "same" then also.
printf("same");//output is "same"
else
printf("not same");
printf("%d",x);//2.output is -1.Won't x lose it's sign when unsigned is given?My hunch is x should become +1 here.
getchar();
return 0;
}
Please also provide the binary number working for the above code and answers to 1. and 2. in the code comments.Thank you.
First check in your system for size of unsigned int.
in my machine: printf("%zu\n",sizeof(unsigned int));//4 byte
as we have 4 bytes to store an Uint data type, we can say
unsigned int x ;//
x:Range[0, Max_number_with_4byte]
Max_number_with_4byte: (2^32) - 1 = 0xFFFFFFFF
obviously x can hold only positive numbers because of unsigned.
but you give to x = -1;, suppose a circular behaviour, when we put back one step from 0, x reach to last point: Max_number_with_4byte.
and printing x to screen shows: 0xFFFFFFFF
see hex equivalent of x with printf("%x\n",(unsigned int )x);
and printf("%y\n",(unsigned int )y); to see equality of x,y.
consider y = ~0; we have 32 bits for y if ~ operator use in y all bits are changes to 1, in hex form we see FFFFFFFF. (we cant print binary numbers with printf and use equal hex representation)
you can see this online calculator how to convert -1 to 0xFFFFFFFF
Answer to your Question
y is not promoted to unsigned int. it is just changes its bits form 0 -> 1
Does binary number comparison happen every time?
Yes in every conditions for example in if(10 > 20) first both 10 and 20 converted to its correspondent binary numbers then compare.
if (12 == -4) see my above explanation.
-4 not equals to 1100 inside computer (your variable).
-4 = 0xFFFFFFFC see
An unsigned int =-1 should actually interpreted as the max unsigned int(4294967295); surely is not transformed into 1.
When you do something like 0x01AE1 - 0x01AEA = fffffff7. I only want the last 3 digits. So I used the modulus trick to remove the extra digits. The displacement gets filled with hex values.
int extra_crap = 0;
int extra_crap1 = 0;
int displacement = 0;
int val1 = 0;
int val2 = 0;
displacement val1 - val2;
extra_crap = displacement % 0x100;
extra_crap1 = displacement % 256;
printf(" extra_crap is %x \n", extra_crap);
printf(" extra_crap1 is %x \n", extra_crap1);
Unfortunately this is having no effect at all. Is there another way to remove all but the last 3 digits?
'Unfortunately this is having no effect at all.'
That's probably because you do your calculations on signed int. Try casting the value to unsigned, or simply forget the remainder operator % and use bitwise masking:
displacement & 0xFF;
displacement & 255;
for two hex digits or
displacement & 0xFFF;
displacement & 4095;
for three digits.
EDIT – some explanation
A detailed answer would be quite long... You need to learn about data types used in C (esp. int and unsigned int, which are two of most used Integral types), the range of values that can be represented in those types and their internal representation in Two's complement code. Also about Integer overflow and Hexadecimal system.
Then you will easily get what happened to your data: subtracting 0x01AE1 - 0x01AEA, that is 6881 - 6890, gave the result of -9, which in 32-bit signed integer encoded with 2's complement and printed in hexadecimal is FFFFFFF7. That MINUS NINE divided by 256 gave a quotient ZERO and Remainder MINUS NINE, so the remainder operator % gave you a precise and correct result. What you call 'no effect at all' is just a result of your lack of understanding what you were actually doing.
My answer above (variant 1) is not any kind of magic, but just a way to enforce calculation on positive numbers. Casting values to unsigned type makes the program to interpret 0xFFFFFFF7 as 4294967287, which divided by 265 (0x100 in hex) results in quotient 16777215 (0xFFFFFF) and remainder 247 (0xF7). Variant 2 does no division at all and just 'masks' those necessary bits: numbers 255 and 4095 contain 8 and 12 low-order bits equal 1 (in hexadecimal 0xFF and 0xFFF, respectively), so bitwise AND does exactly what you want: removes the higher part of the value, leaving just the required two or three low-order hex dgits.