int:
The 32-bit int data type can hold integer values in the range of
−2,147,483,648 to 2,147,483,647. You may also refer to this data type
as signed int or signed.
unsigned int :
The 32-bit unsigned int data
type can hold integer values in the range of 0 to 4,294,967,295. You
may also refer to this data type simply as unsigned.
Ok, but, in practice:
int x = 0xFFFFFFFF;
unsigned int y = 0xFFFFFFFF;
printf("%d, %d, %u, %u", x, y, x, y);
// -1, -1, 4294967295, 4294967295
no difference, O.o. I'm a bit confused.
Hehe. You have an implicit cast here, because you're telling printf what type to expect.
Try this on for size instead:
unsigned int x = 0xFFFFFFFF;
int y = 0xFFFFFFFF;
if (x < 0)
printf("one\n");
else
printf("two\n");
if (y < 0)
printf("three\n");
else
printf("four\n");
Yes, because in your case they use the same representation.
The bit pattern 0xFFFFFFFF happens to look like -1 when interpreted as a 32b signed integer and as 4294967295 when interpreted as a 32b unsigned integer.
It's the same as char c = 65. If you interpret it as a signed integer, it's 65. If you interpret it as a character it's a.
As R and pmg point out, technically it's undefined behavior to pass arguments that don't match the format specifiers. So the program could do anything (from printing random values to crashing, to printing the "right" thing, etc).
The standard points it out in 7.19.6.1-9
If a conversion specification is invalid, the behavior is undefined. If
any argument is not the correct type for the corresponding conversion
specification, the behavior is undefined.
There is no difference between the two in how they are stored in memory and registers, there is no signed and unsigned version of int registers there is no signed info stored with the int, the difference only becomes relevant when you perform maths operations, there are signed and unsigned version of the maths ops built into the CPU and the signedness tell the compiler which version to use.
The problem is that you invoked Undefined Behaviour.
When you invoke UB anything can happen.
The assignments are ok; there is an implicit conversion in the first line
int x = 0xFFFFFFFF;
unsigned int y = 0xFFFFFFFF;
However, the call to printf, is not ok
printf("%d, %d, %u, %u", x, y, x, y);
It is UB to mismatch the % specifier and the type of the argument.
In your case you specify 2 ints and 2 unsigned ints in this order by provide 1 int, 1 unsigned int, 1 int, and 1 unsigned int.
Don't do UB!
The binary representation is the key. An Example:
Unsigned int in HEX
0XFFFFFFF = translates to = 1111 1111 1111 1111 1111 1111 1111 1111
Which represents 4,294,967,295 in a base-ten positive number.
But we also need a way to represent negative numbers.
So the brains decided on twos complement.
In short, they took the leftmost bit and decided that when it is a 1 (followed by at least one other bit set to one) the number will be negative.
And the leftmost bit is set to 0 the number is positive.
Now let's look at what happens
0000 0000 0000 0000 0000 0000 0000 0011 = 3
Adding to the number we finally reach.
0111 1111 1111 1111 1111 1111 1111 1111 = 2,147,483,645
the highest positive number with a signed integer.
Let's add 1 more bit (binary addition carries the overflow to the left, in this case, all bits are set to one, so we land on the leftmost bit)
1111 1111 1111 1111 1111 1111 1111 1111 = -1
So I guess in short we could say the difference is the one allows for negative numbers the other does not.
Because of the sign bit or leftmost bit or most significant bit.
The internal representation of int and unsigned int is the same.
Therefore, when you pass the same format string to printf it will be printed as the same.
However, there are differences when you compare them.
Consider:
int x = 0x7FFFFFFF;
int y = 0xFFFFFFFF;
x < y // false
x > y // true
(unsigned int) x < (unsigned int y) // true
(unsigned int) x > (unsigned int y) // false
This can be also a caveat, because when comparing signed and unsigned integer one of them will be implicitly casted to match the types.
He is asking about the real difference.
When you are talking about undefined behavior you are on the level of guarantee provided by language specification - it's far from reality.
To understand the real difference please check this snippet (of course this is UB but it's perfectly defined on your favorite compiler):
#include <stdio.h>
int main()
{
int i1 = ~0;
int i2 = i1 >> 1;
unsigned u1 = ~0;
unsigned u2 = u1 >> 1;
printf("int : %X -> %X\n", i1, i2);
printf("unsigned int: %X -> %X\n", u1, u2);
}
The type just tells you what the bit pattern is supposed to represent. The bits are only what you make of them. The same sequences can be interpreted in different ways.
The printf function interprets the value that you pass it according to the format specifier in a matching position. If you tell printf that you pass an int, but pass unsigned instead, printf would re-interpret one as the other, and print the results that you see.
Related
I came across this question.
What is the output of this C code?
#include <stdio.h>
int main()
{
unsigned int a = 10;
a = ~a;
printf("%d\n", a);
}
I know what tilde operator do, now 10 can be represented as 1010 in binary, and if i bitwise not it, i get 0101, so i do not understand the output -11. Can anyone explain?
The bitwise negation will not result in 0101. Note that an int contains at least 16 bits. So, for 16 bits, it will generate:
a = 0000 0000 0000 1010
~a = 1111 1111 1111 0101
So we expect to see a large number (with 16 bits that would be 65'525), but you use %d as format specifier. This means you interpret the integer as a signed integer. Now signed integers use the two-complement representation [wiki]. This means that every integers where the highest bit is set, is negative, and furthermore that in that case the value is equal to -1-(~x), so -11. In case the specifier was %u, then the format would be an unsigned integer.
EDIT: like #R. says, %d is only well defined for unsigned integers, if these are in the range of the signed integers as well, outside it depends on the implementation.
It's undefined behaviour, since "%d" is for signed integers; for unsigned ones, use "%u".
Otherwise, note that negative values are often represented as a two's complement; So -a == (~a)+1, or the other way round: (~a) == -a -1. Hence, (~10) is the same as -10-1, which is -11.
The format specifier for an unsigned decimal integer is %u. %d is for a signed decimal integer.
printf("%d\n", a) is interpreting a as a signed int. You want printf("%u\n", a).
#include <stdio.h>
int main()
{
unsigned int x=1;
char y=-1;
if (x>y)
{
printf("x>y");
}
else if(x==y)
printf("x=y");
else
printf("x<y");
return 0;
}
When I run code above, it does the last else's printf, which is really embarrassing, because x is 1 and y is -1.
I think there's something with the comparison, 'x>y', with hierarchical promotion, cause when I change x's type into 'int', not 'unsigned int', it does just right.
This problem is really interesting.. Any answer/thinking/suggestion is welcome.
It is actually correct, according to the standard.
Firstly, it is implementation defined whether char is signed or unsigned.
If char is unsigned, the initialisation will use modulo arithmetic, so initialising to -1 will initialise to the maximum value of an unsigned char - which is guaranteed to be greater than 1. The comparison will convert that char to unsigned (which doesn't change the value) before doing the comparison.
If char is signed, the comparison will convert the char with value -1 to be of type unsigned (since x is of type unsigned). That conversion, again, uses modulo arithmetic, except with respect to the unsigned type (so the -1 will convert to the maximum value an unsigned can represent). That results in a value that exceeds 1.
In practice, turning up warning levels on your compiler will trigger warnings on this sort of thing. That is a good idea in practice since the code arguably behaves in a manner that is less than intuitive.
For the comparison, y is promoted from type char to type unsigned int. However, an unsigned type cannot represent a negative value; instead, that -1 gets interpreted as UINT_MAX, which is most definitely not less than 1.
This is called Type Promotions
The rules, then (which you can also find on page 44 of K&R2, or in section 6.2.1 of the newer ANSI/ISO C Standard) are approximately as follows:
1, First, in most circumstances, values of type char and short int are converted to int right off the bat.
2, If an operation involves two operands, and one of them is of type long double, the other one is converted to long double.
3, If an operation involves two operands, and one of them is of type double, the other one is converted to double.
4, If an operation involves two operands, and one of them is of type float, the other one is converted to float.
5, If an operation involves two operands, and one of them is of type long int, the other one is converted to long int.
6, If an operation involves both signed and unsigned integers, the situation is a bit more complicated. If the unsigned operand is smaller (perhaps we're operating on unsigned int and long int), such that the larger, signed type could represent all values of the smaller, unsigned type, then the unsigned value is converted to the larger, signed type, and the result has the larger, signed type. Otherwise (that is, if the signed type can not represent all values of the unsigned type), both values are converted to a common unsigned type, and the result has that unsigned type.
7, Finally, when a value is assigned to a variable using the assignment operator, it is automatically converted to the type of the variable if (a) both the value and the variable have arithmetic type (that is, integer or floating point), or (b) both the value and the variable are pointers, and one or the other of them is of type void *.
According to the C Standard (6.5.8 Relational operators)
3 If both of the operands have arithmetic type, the usual arithmetic
conversions are performed.
And further (6.3.1.1 Boolean, characters, and integers, #2)
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions.
And at last (6.3.1.8 Usual arithmetic conversions)
Otherwise, the integer promotions are performed on both operands. Then
the following rules are applied to the promoted operands
...
Otherwise, both operands are converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
Thus in the expression
x > y
character y is promoted to type int. As unsigned int (that corresponds to x) and int have the same rank then according to the last quote y is interpretated as unsigned int. All its bits are set and it corresponds to the maximum value that can be stored in type unsigned int.
Thus you have
UINT_MAX > 1
^^^^^^^^ ^^^
y x
Just run this:
int main()
{
unsigned int x=1;
char y=-1;
printf("x : %#010x\n", x);
printf("y : %#010x\n", y);
return 0;
}
which will output the hexa values of your variables:
x : 0x00000001
y : 0xffffffff
Do I need to go any further...?
The problem is comparing a signed type with an unsigned type.
signed variable such as char y are generally stored using one bit for the sign and the 2-bit complement of the value when negative.
Thus, char y = -1; gives you a y with a general representation of :
vvvvvvv value : 1111111
11111111
^ sign : negative
2-bit complement: invert all bits and add one = (0000000 + 1) = 1
Meaning your comparison does if (binary 1 > binary 11111111)
The C++ language compiler tries to promote the types if there is no exact fit (as in our example: obviously a char is not an unsigned int).
Unfortunately, the direction of such a promotion goes from less precise type to a more precise one, not vice versa. It means, that any char can be promoted to int but no int can be promoted to char.
Expect that compiler will inform you that it is unable to find the best candidate and the compilations will fail.
Negative numbers are represented with the rules governing the so-called two's complement numbers.
To represent the char = -1 invert all bits and add one :
0000 0001
1111 1110
+ 1
1111 1111
Now when the promotion occurs the char is implicitly promoted to 4 bytes. Its the same procedure like trying to express -1 in a 4 bytes int:
0000 0000 0000 0001
1111 1111 1111 1110
+ 1
1111 1111 1111 1111
This value is now treated as unsigned, as in the code below:
int main(){
int y = -1;
cout << y << endl; //if x is an int: x > y
cout << unsigned(y) << endl; //if x is an unsigned int y is now treated as UINT_MAX: x < y
return 0;
}
which prints:
-1
4294967295
Thus, the evaluation of x < y will be true.
Here are some further details:
typecasting to unsigned in C
why unsigned int 0xFFFFFFFF is equal to int -1?
I read about twos complement on wikipedia and on stack overflow, this is what I understood but I'm not sure if it's correct
signed int
the left most bit is interpreted as -231 and this how we can have negative numbers
unsigned int
the left most bit is interpreted as +231 and this is how we achieve large positive numbers
update
What will the compiler see when we store 3 vs -3?
I thought 3 is always 00000000000000000000000000000011
and -3 is always 11111111111111111111111111111101
example for 3 vs -3 in C:
unsigned int x = -3;
int y = 3;
printf("%d %d\n", x, y); // -3 3
printf("%u %u\n", x, y); // 4294967293 3
printf("%x %x\n", x, y); // fffffffd 3
Two's complement is a way to represent negative integers in binary.
First of all, here's a standard 32-bit integer ranges:
Signed = -(2 ^ 31) to ((2 ^ 31) - 1)
Unsigned = 0 to ((2 ^ 32) - 1)
In two's complement, a negative is represented by inverting the bits of its positive equivalent and adding 1:
10 which is 00001010 becomes -10 which is 11110110 (if the numbers were 8-bit integers).
Also, the binary representation is only important if you plan on using bitwise operators.
If your doing basic arithmetic, then this is unimportant.
The only time this may give unexpected results outside of the aforementioned times is getting the absolute value of the signed version of -(2 << 31) which will always give a negative.
Your problem does not have to do with the representation, but the type.
A negative number in an unsigned integer is represented the same, the difference is that it becomes a super high number since it must be positive and the sign bit works as normal.
You should also realize that ((2^32) - 5) is the exact same thing as -5 if the value is unsigned, etc.
Therefore, the following holds true:
unsigned int x = (2 << 31) - 5;
unsigned int y = -5;
if (x == y) {
printf("Negative values wrap around in unsigned integers on underflow.");
}
else {
printf( "Unsigned integer underflow is undefined!" );
}
The numbers don't change, just the interpretation of the numbers. For most two's complement processors, add and subtract do the same math, but set a carry / borrow status assuming the numbers are unsigned, and an overflow status assuming the number are signed. For multiply and divide, the result may be different between signed and unsigned numbers (if one or both numbers are negative), so there are separate signed and unsigned versions of multiply and divide.
For 32-bit integers, for both signed and unsigned numbers, n-th bit is always interpreted as +2n.
For signed numbers with the 31th bit set, the result is adjusted by -232.
Example:
1111 1111 1111 1111 1111 1111 1111 11112 as unsigned int is interpreted as 231+230+...+21+20. The interpretation of this as a signed int would be the same MINUS 232, i.e. 231+230+...+21+20-232 = -1.
(Well, it can be said that for signed numbers with the 31th bit set, this bit is interpreted as -231 instead of +231, like you said in the question. I find this way a little less clear.)
Your representation of 3 and -3 is correct: 3 = 0x00000003, -3 + 232 = 0xFFFFFFFD.
Yes, you are correct, allow me to explain a bit further for clarification purposes.
The difference between int and unsigned int is how the bits are interpreted. The machine processes unsigned and signed bits the same way, but there are extra bits added for signing. Two's complement notation is very readable when dealing with related subjects.
Example:
The number 5's, 0101, inverse is 1011.
In C++, it's depends when you should use each data type. You should use unsigned values when functions or operators return those values. ALUs handle signed and unsigned variables very similarly.
The exact rules for writing in Two's complement is as follows:
If the number is positive, count up to 2^(32-1) -1
If it is 0, use all zeroes
For negatives, flip and switch all the 1's and 0's.
Example 2(The beauty of Two's complement):
-2 + 2 = 0 is displayed as 0010 + 1110; and that is 10000. With overflow at the end, we have our result as 0000;
int main()
{
unsigned char a = -1;
printf("%d",a);
printf("%u",a);
}
when i have executed the above program i got 255 255 as the answer.
we know negative numbers will be stored in 2's complement.
since it is 2's complement the representation would be
1111 1111 -->2's complement.
but in the above we are printing %d(int) but integer is four bytes.
my assumption is even though it is character we are forcing compiler to treat it as integer.
so it internally uses sign extension concept.
1111 1111 1111 1111 1111 1111 1111 1111.
according to the above representation it has to be -1 in the first case since it is %d(signed).
in the second case it has to print (2^31- 1) but it is printing 255 and 255.
why it is printing 255 in both cases.
tell me if my assumption is wrong and give me the real interpretation.
Your assumption is wrong; the character will "roll over" to 255, then be padded to the size of an integer. Assuming a 32-bit integer:
11111111
would be padded to:
00000000 00000000 00000000 11111111
Up to the representation of a, you are correct. However, the %d and %u conversions of the printf() function both take an int as an argument. That is, your code is the same as if you had written
int main() {
unsigned char a = -1;
printf("%d", (int)a);
printf("%u", (int)a);
}
In the moment you have assigned -1 to a you have lost the information that it once was a signed value, the logical value of a is 255. Now, when you convert an unsigned char to an int, the compiler preserves the logical value of a and the code prints 255.
The compiler doesn't know what type the extra parameters in printf should be, since the only thing that specifies it should be treated as a 4-byte int is the format string, which is irrelevant at compile time.
What actually happens behind the scenes is the callee (printf) receives a pointer to each parameter, then casts to the appropriate type.
Roughly the same result as this:
char a = -1;
int * p = (int*)&a; // BAD CAST
int numberToPrint = *p; // Accesses 3 extra bytes from somewhere on the stack
Since you're likely running on a little endian CPU, the 4-byte int 0x12345678 is arranged in memory as | 0x78 | 0x56 | 0x34 | 0x12 |
If the 3 bytes on the stack following a are all 0x00 (they probably are due to stack alignment, but it's NOT GUARANTEED), the memory looks like this:
&a: | 0xFF |
(int*)&a: | 0xFF | 0x00 | 0x00 | 0x00 |
which evaluates to *(int*)&a == 0x000000FF.
unsigned char runs from 0-255 So the negative number -1 will print 255 -2 will print 254 and so on...
signed char runs from -128 to +127 so you get -1 for the same printf() which is not the case with unsigned char
Once you make a assignment to a char then the rest of the integer values will be padded so your assumption of 2^31 is wrong.
The negative number is represented using 2's complement(Implementation dependent)
So
1 = 0000 0001
So in order to get -1 we do
----------------------------------------
2's complement = 1111 1111 = (255) |
-----------------------------------------
It is printing 255, simply because this is the purpose from ISO/IEC9899
H.2.2 Integer types
1 The signed C integer types int, long int, long long int, and the corresponding
unsigned types are compatible with LIA−1. If an implementation adds support for the
LIA−1 exceptional values ‘‘integer_overflow’’ and ‘‘undefined’’, then those types are
LIA−1 conformant types. C’s unsigned integer types are ‘‘modulo’’ in the LIA−1 sense
in that overflows or out-of-bounds results silently wrap. An implementation that defines
signed integer types as also being modulo need not detect integer overflow, in which case,
only integer divide-by-zero need be detected.
If this is given, printing 255 is absolutly that, what the LIA-1 would expect.
Otherwise, if your implementation doesn't support C99's LIA-1 Annex part, then its simply undefined behaving.
From C traps and pitfalls
If a and b are two integer variables, known to be non-negative then to
test whether a+b might overflow use:
if ((int) ((unsigned) a + (unsigned) b) < 0 )
complain();
I didn't get that how comparing the sum of both integers with zero will let you know that there is an overflow?
The code you saw for testing for overflow is just bogus.
For signed integers, you must test like this:
if (a^b < 0) overflow=0; /* opposite signs can't overflow */
else if (a>0) overflow=(b>INT_MAX-a);
else overflow=(b<INT_MIN-a);
Note that the cases can be simplified a lot if one of the two numbers is a constant.
For unsigned integers, you can test like this:
overflow = (a+b<a);
This is possible because unsigned arithmetic is defined to wrap, unlike signed arithmetic which invokes undefined behavior on overflow.
When an overflow occurs, the sum exceeds some range (let's say this one):
-4,294,967,295 < sum < 4,294,967,295
So when the sum overflows, it wraps around and goes back to the beginning:
4,294,967,295 + 1 = -4,294,967,295
If the sum is negative and you know the the two numbers are positive, then the sum overflowed.
If a and b are known to be non negative integers, the sequence (int) ((unsigned) a + (unsigned) b) will return indeed a negative number on overflow.
Lets assume a 4 bit (max positive integer is 7 and max unsigned integer is 15) system with the following values:
a = 6
b = 4
a + b = 10 (overflow if performed with integers)
While if we do the addition using the unsigned conversion, we will have:
int((unsigned)a + (unsigned)b) = (int) ((unsigned)(10)) = -6
To understand why, we can quickly check the binary addition:
a = 0110 ; b = 0100 - first bit is the sign bit for signed int.
0110 +
0100
------
1010
For unsigned int, 1010 = 10. While the same representation in signed int means -6.
So the result of the operation is indeed < 0.
If the integers are unsigned and you're assuming IA32, you can do some inline assembly to check the value of the CF flag. The asm can be trimmed a bit, I know.
int of(unsigned int a, unsigned int b)
{
unsigned int c;
__asm__("addl %1,%2\n"
"pushfl \n"
"popl %%edx\n"
"movl %%edx,%0\n"
:"=r"(c)
:"r"(a), "r"(b));
return(c&1);
}
There are some good explanations on this page.
Here's the simple way from that page that I like:
Do the addition normally, then check the result (e.g. if (a+23<23) overflow).
As we know that Addition of 2 Numbers might be overflow.
So for that we can use following way to add the two numbers.
Adder Concept
Suppose we have 2 numbers "a" AND "b"
(a^b)+(a&b);
this equation will give the correct result..
And this is patented by the Samsung.
assuming twos compliment representation and 8 bit integers, the most significant bit has sign (1 for negative and 0 for positive), since we know the integers are non negative, it means most significant bit is 0 for both integers. Now if adding the unsigned representation of these numbers result in a 1 in most significant bit then that mean the addition has overflowed, and to check whether an unsigned integer has a 1 in most significant bit is to check if it is more than the range of signed integer, or you can convert it to signed integer which will be negative (because the most significant bit is 1)
example 8 bit signed integers (range -128 to 127):
twos compliment of 127 = 0111 1111
twos complement of 1 = 0000 0001
unsigned 127 = 0111 1111
unsigned 1 = 0000 0001
unsigned sum = 1000 0000
sum is 128, which is not a overflow for unsigned integer but is a overflow for signed integer, the most significant bit gives it away.