I saw a buggy code in C which was used to check whether addition results in overflow or not. It works fine with char, but gives incorrect answer when arguments are int and I couldn't figure why .
Here's the code with short arguments.
short add_ok( short x, short y ){
short sum = x+y;
return (sum-x==y) && (sum-y==x);
}
This version works fine, problem arise when you change arguments to int ( you can check it with INT_MAX )
Can you see what's wrong in here ?
Because in 2s complement, the integers can be arranged into a circle (in the sense of modulo arithmetic). Adding y and then subtracting y always gets you back where you started (undefined behaviour notwithstanding).
In your code, the addition does not overflow unless int is the same size as short. Due to default promotions, x+y is performed on the values of x and y promoted to int, and then the result is truncated to short in an implementation-defined manner.
Why not do simply: return x+y<=SHRT_MAX && x+y>=SHRT_MIN;
In C programming language, signed integers when converted to smaller signed integers, say char (for the sake of simplicity), are of implementation-defined manner. Even though many systems and programmers assume wrap-around overflow, it is not a standard. So what is wrap-around overflow?
Wrap-around overflow in Two's complement systems happens such that when a value can no longer be presented in the current type, it warps around the highest or lowest number that can be presented. So what does this mean? Take a look.
In signed char, the highest value that can be presented is 127 and the lowest is -128. Then what happens when we do: "char i = 128", is that the value stored in i becomes -128. Because the value was larger than the signed integral type, it wrapped around the lowest value, and if it was "char i = 129", then i will contain -127. Can you see it? Whenever an end reaches its maximum, it wraps around the other end (sign). Vice versa, if "char i = -129", then i will contain 127, and if it is "char i = -130", it will contain 126, because it reached its maximum and wrapped around the highest value.
(highest) 127, 126, 125, ... , -126, -127, -128 (lowest)
If the value is very large, it keeps wrapping around until it reaches a value that can be represented in its range.
UPDATE: the reason why int doesn't work in oppose to char and short is because that when both numbers are added there is a possibility of overflow (regardless of being int, short, or char, while not forgetting integral promotion), but because "short" and char are with smaller sizes than int and because they are promoted to int in expressions, they are represented again without truncation in this line:
return (sum-x==y) && (sum-y==x);
So any overflow is detected as explained later in detail, but when with int, it is not promoted to anything, so overflow will happen. For instance, if I do INT_MAX+1, then the result is INT_MIN, and if I tested for overflow by INT_MIN-1 == INT_MAX, the the result is TRUE! This is because "short" and char get promoted to int, evaluated, and then get truncated (overflowed). However, int get overflowed first and then evaluated, because they are not promoted to a larger size.
Think of char type without promotion, and try to make overflows and check them using the illustration above. You will find it that adding or subtracting values that cause the overflow returns you to where you were. However, this is not what happens in C, because char and "short" are promoted to int, thus overflow is detected, which is not true in int, because it is note promoted to a larger size.
END OF UPDATE
For your question, I checked your code in MinGW and Ubuntu 12.04, seems to work fine. I found later that the code works actually in systems where short is smaller than int, and when values don't exceed int range. This line:
return (sum-x==y) && (sum-y==x);
is true, because "sum-x" and "y" are evaluated as (int) so no wrap-around happens to, where it happened in the previous line (when assigned):
short sum = x+y;
Here is a test. If I entered 32767 for the first and 2 for the second, then when:
short sum = x+y;
sum will contain -32767, because of the wrap-around. However, when:
return (sum-x==y) && (sum-y==x);
"sum-x" (-32767 - 32767) will only be equal to y (2) (then buggy) if wrap-round occurs, but because of integral promotion, it never happen that way and "sum-x" value becomes -65534 which is not equal to y, which then leads to a correct detection.
Here is the code I used:
#include <stdio.h>
short add_ok( short x, short y ){
short sum = x+y;
return (sum-x==y) && (sum-y==x);
}
int main(void) {
short i, ii;
scanf("%hd %hd", &i, &ii);
getchar();
printf("%hd", add_ok(i, ii));
return 0;
}
Check here and here.
You need to provide the architecture you are working on, and what are the experimental values you tested, because not everyone faces what you say, and because of the implementation-defined nature of your question.
Reference: C99 6.3.1.3 here, and GNU C Manual here.
The compiler probably just replaces all calls to this expression with 1 because it's true in every case. The optimizing routine will perform copy propagation on sum and get
return (y==y) && (x==x);
and then:
return 1
It's true in every case because signed integer overflow is undefined behavior- hence, the compiler is free to guarantee that x+y-y == x and y+x-x == y.
If this was an unsigned operation it would fail similarly- since overflow is just performed as a modulo operation it is fairly easy to prove that
x+y mod SHRT_MAX - y mod SHRT_MAX == x
and similarly for the reverse case.
Related
I always wonder why C manages the memory the way it does.
Take a look at the following codes:
int main(){
int x = 10000000000;
printf("%d", x);
}
Of course, overflow occurs and it returns the following number:
1410065408
Or:
int main(){
int x = -10;
printf("%u", x);
}
Here x is signed and I am using the unsigned keyword "%u"
Returns:
4294967286
Or take a look at this one:
int main(){
char string_ = 'abc';
printf("%d", string_);
}
This returns:
99
That being said, I mainly have two questions:
Why the program returns these specific numbers for specific inputs? I don't think it is a simple malfunctioning because it produces the same result for the same input. So there is a deterministic way for it to calculate these numbers. What is going under the hood when I pass these obviously invalid numbers?
Most of these problems occur because C is not a memory-safe language. Wikipedia says:
In general, memory safety can be safely assured using tracing garbage
collection and the insertion of runtime checks on every memory access
Then besides historical reasons, why are some languages not memory-safe? Is there any advantage of not being memory-safe rather than being memory-safe?
Of course, overflow occurs and it returns the following number:
There is no overflow in int x = 10000000000;. Overflow in the C standard is when the result of an operation is not representable in the type. However, in int x = 10000000000;, 10,000,000,000 is converted to type int, and this conversion is defined to produce an implementation-defined result (that is implicitly representable in int) or an implementation-defined result (C 2018 6.3.1.3 3). So there is no result that is not representable in int.
You did not say which C implementation you are using (particularly the compiler), so we cannot be sure what the implementation defines for this conversion. For a 32-bit int, it is common that an implementation wraps the number modulo 232. The remainder of 10,000,000,000 when divided by 232 is 1,410,065,408, and that matches the result you observed.
4294967286
In this case, you passed an int where printf expected an unsigned int. The C standard does not define the behavior, but a common result is that the bits of an int are reinterpreted as an unsigned int. When two’s complement is used for a 32-bit int value of −10, the bits are FFFFFFF616. When the bits of an unsigned int have that value, they represent 4,294,967,286, and that matches the result you observed.
char string_ = 'abc';
'abc' is a character constant with more than one character. Its value is implementation defined (C 2018 6.4.4.4 10). Again, since you did not tell us which implementation you are using, we cannot be sure what the definition is.
One behavior of such constants is that 'abc' will have the value ('a'*256 + 'b')*256 + 'c'. When ASCII is used, this is (97*256 + 98)*256 + 99 = 6,382,179. Then char string_ = 'abc'; converts this value to char. If char is unsigned and is eight bits, the C standard defines this to wrap modulo 2256 (C 2018 6.3.1.3 2). If it is signed, it is implementation-defined, and a common behavior is to wrap modulo 2256. With either of those two methods, the result is 99, as the remainder of 6,382,179 when divided by 256 is 99, and this matches the result you observed.
Most of these problems occur because C is not a memory-safe language.
None of the above has anything to do with memory safety. None of the constants or the conversions access memory, so they are not affected by memory safety.
I am trying to add '1' to a character which is holding maximum positive value it can hold. It is giving 0 as output instead of giving -256.
#include <stdio.h>
int main() {
signed char c = 255;
printf("%d\n", c + 1 );
}
O / P : 0
c + 2 = 1;
c + 3 = 2;
As per my understanding, it should give negative numbers once it reaches the maximum limit (). Is this correct? I am testing on Ubuntu.
A signed char is very often 8-bit encoding values [-128...127].
signed char c = 255; is attempting to initialize c to a value outside the signed char range.
It is implementation behavior what happens next. Very commonly 255 is converted "mod" 256 to the value of -1.
signed char c = 255;
printf("%d\n", c ); // -1 expected
printf("%d\n", c + 1 ); // 0 expected
As per my understanding, it should give negative numbers once it reaches the maximum limit (). Is this correct?
No. Adding 1 to the maximum int value is undefined behavior. There is no should. It might result in a negative number, might not, might exit code - it is not defined.
Had code been
signed char c = 127;
printf("%d\n", c + 1 );
c + 1 would be 128 and "128\n" would be printed as c + 1 is an int operation with an in range int sum.
There's several implicit conversions to keep track of here:
signed char c = 255; Is a conversion of the constant 255 which has type int, into a smaller signed char. This is "lvalue conversion through assignment" (initialization follows the rules of assignment) where the right operand gets converted to the type of the left.
The actual conversion from a large signed type to a small signed type follows this rule:
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
In practice, the very likely conversion to happen on a two's complement computer is that you end up with the signed char having the decimal value equivalent to 0xFF, which is -1.
c + 1 is an operation with two operands of types signed char and int respectively. For the + operator, it means that the usual arithmetic conversions are performed, see Implicit type promotion rules.
Meaning c gets converted to int and the operation is carried out on int type, which is also the type of the result.
printf("%d\n", stuff ); The functions like printf accepting a variable number of arguments undergo an oddball conversion rule called the default argument promotions. In case of integers, it means that the integer promotions (see link above) are carried out. If you pass c + 1 as parameter, then the type is int and no promotion takes place. But if you had just passed c, then it gets implicitly promoted to int as per these rules. Which is why using %d together with character type actually works, even though it's the wrong conversion specifier for printing characters.
As per my understanding, it should give negative numbers once it reaches the maximum limit (). Is this correct?
If you simply do signed char c = 127; c++; then that's a signed overflow, undefined behavior with no predictable outcome.
If you do signed char c = 127; ... c + 1 then there's no overflow because of the implicit promotion to int.
If you do unsigned char c = 255; c++; then there is a well-defined wrap around since this is an unsigned type. c will become zero. Signed types do not have such a well-defined wrap around - they overflow instead.
In practice, signed number overflow is artificial nonsense invented by the C standard. All well-known computers just set an overflow and/or carry bit when you do overflow on assembler level, properly documented and well-defined by the core manual. The reason it turns "undefined behavior" in C is mainly because C allows for nonsensical signedness formats like one's complement or signed magnitude, that may have padding bits, trap representations or other such exotic, mostly fictional stuff.
Though nowadays, optimizing compilers take advantage of overflow not being allowed to happen, in order to generate more efficient code. Which is unfortunate, since we could have had both fast and 100% deterministic code if 2's complement was the only allowed format.
If I declare two max integers in C:
int a = INT_MAX;
int b = INT_MAX;
and sum them into the another int:
int c = a+b;
I know there is a buffer overflow but I am not sure how to handle it.
This causes undefined behavior since you are using signed integers (which cause undefined behavior if they overflow).
You will need to find a way to avoid the overflow, or if possible, switch to unsigned integers (which use wrapping overflow).
One possible solution is to switch to long integers such that no overflow occurs.
Another possibility is checking for the overflow first:
if( (INT_MAX - a) > b) {
// Will overflow, do something else
}
Note: I'm assume here you don't actually know the exact value of a and b.
For the calculation to be meaningful, you would have to use a type large enough to hold the result. Apart from that, overflow is only a problem for signed int. If you use unsigned types, then you don't get undefined overflow, but well-defined wrap-around.
In this specific case the solution is trivial:
unsigned int c = (unsigned int)a + (unsigned int)b; // 4.29 bil
Otherwise, if you truly wish to know the signed equivalent of the raw binary value, you can do:
int c = (unsigned int)a + (unsigned int)b;
As long as the calculation is carried out on unsigned types there's no danger (and the value will fit in this case - it won't wrap-around). The result of the addition is implicitly converted through assignment to the signed type of the left operand of =. This conversion is implementation-defined, as in the result depends on signed format used. On 2's complement mainstream computers you will very likely get the value -2.
I am learning the characteristics of the different data type. For example, this program increasingly prints the power of 2 with four different formats: integer, unsigned integer, hexadecimal, octal
#include<stdio.h>
int main(int argc, char *argv[]){
int i, val = 1;
for (i = 1; i < 35; ++i) {
printf("%15d%15u%15x%15o\n", val, val, val, val);
val *= 2;
}
return 0;
}
It works. unsigned goes up to 2147483648. integer goes up to -2147483648. But why does it become negative?
I have a theory: is it because the maximum signed integer we can represent on a 32 bit machine is 2147483647? If so, why does it return the negative number?
First of all, you should understand that this program is undefined. It causes signed integer overflow, and this is declared undefined in the C Standard.
The technical reason is that no behavior can be predicted as different representations are allowed for negative numbers and there could even be padding bits in the representation.
The most probable reason you see a negative number in your case is that your machine uses 2's complement (look it up) to represent negative numbers while arithmetics operate on bits without overflow checks. Therefore, the highest bit is the sign bit, and if your value overflows into this bit, it turns negative.
What you describe is UB caused by integer overflow. Since the behavior is undefined, anything could happen (“When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose”), BUT, what actually happens on some machines (I suspect yours included) is this:
You start with int val = 1;. That is represented 0b00...1 in binary form. Each time you val *= 2; the value is multiplied by 2, therefore the representation changes to 0b00...10 and then to 0b00...100 and so on (the 1 bit moves one step each time). The last time you val *= 2; you get 0b100.... Now, using 2's complement (which is what I guess your machine uses, as it very common) the value is actually -1 * 0b1000... which is -2147483648
Note, that even though this might be what's really going on in your machine, it's not to be trusted or thought of as the "right" thing to happen, since, as mentioned before, this is UB
In this program, the val value will overflow, if it is a 32- bit machine, because the size of integer is 4 bytes. Now, we have 2 type of values in math, positive and negative, so to do calculation involving negative results, we use sign representations i.e int or char in C language.
Lets take the example of char, range -128 to 127, unsigned char range 0-255 .
It tells, range is divided into two parts for signed representation. So for any signed variable, if it crosses its range of +ve value, it goes into negative value. Like here in case of char, as the value goes above the 127, it becomes -ve. And suppose if you add 300 to any char or unsigned char variable what happens, it rolls over and starts again from zero.
char a=2;
a+=300;
what is the value?? now you know max value of char is 255(total 256 values, including zero), so 300-256 = 44 + 2 =46.
Hope this helps
Following C code displays the result correctly, -1.
#include <stdio.h>
main()
{
unsigned x = 1;
unsigned y=x-2;
printf("%d", y );
}
But in general, is it always safe to do subtraction involving
unsigned integers?
The reason I ask the question is that I want to do some conditioning
as follows:
unsigned x = 1; // x was defined by someone else as unsigned,
// which I had better not to change.
for (int i=-5; i<5; i++){
if (x+i<0) continue
f(x+i); // f is a function
}
Is it safe to do so?
How are unsigned integers and signed integers different in
representing integers? Thanks!
1: Yes, it is safe to subtract unsigned integers. The definition of arithmetic on unsigned integers includes that if an out-of-range value would be generated, then that value should be adjusted modulo the maximum value for the type, plus one. (This definition is equivalent to truncating high bits).
Your posted code has a bug though: printf("%d", y); causes undefined behaviour because %d expects an int, but you supplied unsigned int. Use %u to correct this.
2: When you write x+i, the i is converted to unsigned. The result of the whole expression is a well-defined unsigned value. Since an unsigned can never be negative, your test will always fail.
You also need to be careful using relational operators because the same implicit conversion will occur. Before I give you a fix for the code in section 2, what do you want to pass to f when x is UINT_MAX or close to it? What is the prototype of f ?
3: Unsigned integers use a "pure binary" representation.
Signed integers have three options. Two can be considered obsolete; the most common one is two's complement. All options require that a positive signed integer value has the same representation as the equivalent unsigned integer value. In two's complement, a negative signed integer is represented the same as the unsigned integer generated by adding UINT_MAX+1, etc.
If you want to inspect the representation, then do unsigned char *p = (unsigned char *)&x; printf("%02X%02X%02X%02X", p[0], p[1], p[2], p[3]);, depending on how many bytes are needed on your system.
Its always safe to subtract unsigned as in
unsigned x = 1;
unsigned y=x-2;
y will take on the value of -1 mod (UINT_MAX + 1) or UINT_MAX.
Is it always safe to do subtraction, addition, multiplication, involving unsigned integers - no UB. The answer will always be the expected mathematical result modded by UINT_MAX+1.
But do not do printf("%d", y ); - that is UB. Instead printf("%u", y);
C11 §6.2.5 9 "A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type."
When unsigned and int are used in +, the int is converted to an unsigned. So x+i has an unsigned result and never is that sum < 0. Safe, but now if (x+i<0) continue is pointless. f(x+i); is safe, but need to see f() prototype to best explain what may happen.
Unsigned integers are always 0 to power(2,N)-1 and have well defined "overflow" results. Signed integers are 2's complement, 1's complement, or sign-magnitude and have UB on overflow. Some compilers take advantage of that and assume it never occurs when making optimized code.
Rather than really answering your questions directly, which has already been done, I'll make some broader observations that really go to the heart of your questions.
The first is that using unsigned in loop bounds where there's any chance that a signed value might crop up will eventually bite you. I've done it a bunch of times over 20 years and it has ultimately bit me every time. I'm now generally opposed to using unsigned for values that will be used for arithmetic (as opposed to being used as bitmasks and such) without an excellent justification. I have seen it cause too many problems when used, usually with the simple and appealing rationale that “in theory, this value is non-negative and I should use the most restrictive type possible”.
I understand that x, in your example, was decided to be unsigned by someone else, and you can't change it, but you want to do something involving x over an interval potentially involving negative numbers.
The “right” way to do this, in my opinion, is first to assess the range of values that x may take. Suppose that the length of an int is 32 bits. Then the length of an unsigned int is the same. If it is guaranteed to be the case that x can never be larger than 2^31-1 (as it often is), then it is safe in principle to cast x to a signed equivalent and use that, i.e. do this:
int y = (int)x;
// Do your stuff with *y*
x = (unsigned)y;
If you have a long that is longer than unsigned, then even if x uses the full unsigned range, you can do this:
long y = (long)x;
// Do your stuff with *y*
x = (unsigned)y;
Now, the problem with either of these approaches is that before assigning back to x (e.g. x=(unsigned)y; in the immediately preceding example), you really must check that y is non-negative. However, these are exactly the cases where working with the unsigned x would have bitten you anyway, so there's no harm at all in something like:
long y = (long)x;
// Do your stuff with *y*
assert( y >= 0L );
x = (unsigned)y;
At least this way, you'll catch the problems and find a solution, rather than having a strange bug that takes hours to find because a loop bound is four billion unexpectedly.
No, it's not safe.
Integers usually are 4 bytes long, which equals to 32 bits. Their difference in representation is:
As far as signed integers is concerned, the most significant bit is used for sign, so they can represent values between -2^31 and 2^31 - 1
Unsigned integers don't use any bit for sign, so they represent values from 0 to 2^32 - 1.
Part 2 isn't safe either for the same reason as Part 1. As int and unsigned types represent integers in a different way, in this case where negative values are used in the calculations, you can't know what the result of x + i will be.
No, it's not safe. Trying to represent negative numbers with unsigned ints smells like bug. Also, you should use %u to print unsigned ints.
If we slightly modify your code to put %u in printf:
#include <stdio.h>
main()
{
unsigned x = 1;
unsigned y=x-2;
printf("%u", y );
}
The number printed is 4294967295
The reason the result is correct is because C doesn't do any overflow checks and you are printing it as a signed int (%d). This, however, does not mean it is safe practice. If you print it as it really is (%u) you won't get the correct answer.
An Unsigned integer type should be thought of not as representing a number, but as a member of something called an "abstract algebraic ring", specifically the equivalence class of integers congruent modulo (MAX_VALUE+1). For purposes of examples, I'll assume "unsigned int" is 16 bits for numerical brevity; the principles would be the same with 32 bits, but all the numbers would be bigger.
Without getting too deep into the abstract-algebraic nitty-gritty, when assigning a number to an unsigned type [abstract algebraic ring], zero maps to the ring's additive identity (so adding zero to a value yields that value), one means the ring's multiplicative identity (so multiplying a value by one yields that value). Adding a positive integer N to a value is equivalent to adding the multiplicative identity, N times; adding a negative integer -N, or subtracting a positive integer N, will yield the value which, when added to +N, would yield the original value.
Thus, assigning -1 to a 16-bit unsigned integer yields 65535, precisely because adding 1 to 65535 will yield 0. Likewise -2 yields 65534, etc.
Note that in an abstract algebraic sense, every integer can be uniquely assigned into to algebraic rings of the indicated form, and a ring member can be uniquely assigned into a smaller ring whose modulus is a factor of its own [e.g. a 16-bit unsigned integer maps uniquely to one 8-bit unsigned integer], but ring members are not uniquely convertible to larger rings or to integers. Unfortunately, C sometimes pretends that ring members are integers, and implicitly converts them; that can lead to some surprising behavior.
Subtracting a value, signed or unsigned, from an unsigned value which is no smaller than int, and no smaller than the value being subtracted, will yield a result according to the rules of algebraic rings, rather than the rules of integer arithmetic. Testing whether the result of such computation is less than zero will be meaningless, because ring values are never less than zero. If you want to operate on unsigned values as though they are numbers, you must first convert them to a type which can represent numbers (i.e. a signed integer type). If the unsigned type can be outside the range that is representable with the same-sized signed type, it will need to be upcast to a larger type.