I came across this question.
What is the output of this C code?
#include <stdio.h>
int main()
{
unsigned int a = 10;
a = ~a;
printf("%d\n", a);
}
I know what tilde operator do, now 10 can be represented as 1010 in binary, and if i bitwise not it, i get 0101, so i do not understand the output -11. Can anyone explain?
The bitwise negation will not result in 0101. Note that an int contains at least 16 bits. So, for 16 bits, it will generate:
a = 0000 0000 0000 1010
~a = 1111 1111 1111 0101
So we expect to see a large number (with 16 bits that would be 65'525), but you use %d as format specifier. This means you interpret the integer as a signed integer. Now signed integers use the two-complement representation [wiki]. This means that every integers where the highest bit is set, is negative, and furthermore that in that case the value is equal to -1-(~x), so -11. In case the specifier was %u, then the format would be an unsigned integer.
EDIT: like #R. says, %d is only well defined for unsigned integers, if these are in the range of the signed integers as well, outside it depends on the implementation.
It's undefined behaviour, since "%d" is for signed integers; for unsigned ones, use "%u".
Otherwise, note that negative values are often represented as a two's complement; So -a == (~a)+1, or the other way round: (~a) == -a -1. Hence, (~10) is the same as -10-1, which is -11.
The format specifier for an unsigned decimal integer is %u. %d is for a signed decimal integer.
printf("%d\n", a) is interpreting a as a signed int. You want printf("%u\n", a).
Related
I have written code in C as
#include<stdio.h>
int main()
{
char a = 128;
char b = -128;
printf("%c",a);
printf("%c",b);
}
The output of above code is ÇÇ
Using 128 or -128 output is coming out to be same. Why? Please explain using binary if possible.
A signed char type typically has a range of -128 to 127. Since 128 is outside of this range, your compiler is converting it to a value with the same 8-bit bit pattern, and this is -128.
The literal -128 has type int and on a 32bit 2's complement representation has the bit pattern:
1111 1111 1111 1111 1111 1111 1000 0000
In this case, when you assign it to a char there is an implicit conversion (cast) such that only the LSB is used 1000 000 or in decimal, 128. Hence the result is the same.
Strictly the behaviour is implementation defined if char is signed, and the standard defines the behaviour in somewhat arcane "as-if" terms for unsigned char. Whether char itself is signed or unsigned is itself implementation defined as is the actual width and therefore range of a char. In practice though the above explanation is what is happening in this case and is the most likely behaviour for any implementation with 8-bit char, it makes no difference whether char is signed or unsigned.
I tried the following piece of code, expecting the output to be positive 64:
char val = 0x80;
printf("%d",val>>1);
My understanding of what happens is(please correct me if i'm wrong as i probably am):
Referring to the ASCII table, there is no mapping of 0x80 to any character so i assume this is stored as an unsigned integer.
This is represented as 1000 0000 in bitwise format, so a right shift of 1 would result in 0100 0000
When printed as an integer value, this will then show as positive 64.
However it shows -64.
In contrast:
char val = 0x40;
printf("%d",val>>1);
gives positive 32.
Is the value implicitly converted to a signed integer in the first case and not in the second?
Your C implementation uses an eight-bit signed char. (The C standard permits char to be signed or unsigned.) In char val = 0x80;, a char cannot represent the value you initialize it with, 128. In this case, the value 128 is converted to char which, per C 2018 6.3.1.3 3, yields either an implementation-defined value or a trap. Your implementation likely produces −128. (This is a common result because 128 in binary is 10000000, and converting an out-of-range result to an eight-bit two’s complement integer often simply reinterprets the low eight bits of the value as eight-bit two’s complement. In two’s complement, 10000000 represents −128.)
So val>>1 asks to shift −128 right one bit. Per C 2018 6.5.7 5, shifting a negative value right yields an implementation defined value. Producing −64 is a common result.
(In detail, in val>>1, val is automatically promoted from char to int. It has the same value, −128. However, with a 32-bit int, it would then be represented as 11111111111111111111111110000000 instead of 10000000. Then shifting right “arithmetically,” which propagates the sign bit, yields 11111111111111111111111111000000, which is −64, the result you go. Some C implementations might shift right “logically,” which sets the sign bit to zero, yielding 01111111111111111111111111000000. In this case, the printf would show “2147483584”, which is 231−64).
Whether ASCII has any character with code 0x80 is irrelevant. The C rules apply to the values involved, regardless of what character encoding scheme is used.
Right shift of the signed integer is implementation-defined. In most modern systems signed integers are two's complement and the shift will be translated by the compiler to the arithmetic shift.
after the shift the binary value of val is 0xc0 which is -64 in the two's complement encoding.
The val is converted first to the signed integer then passed to the function. If you put some effort into your question and add a few more lines to your code you would discover it yourself.
int main(void)
{
char c = 0x80;
printf("%d\n", c >> 1);
printf("%x\n", c >> 1);
printf("%hhd\n", c >> 1);
printf("%hhx\n", c >> 1);
c >>= 1;
printf("%d\n", c);
printf("%x\n", c);
printf("%hhd\n",c);
printf("%hhx\n",c);
}
https://godbolt.org/z/YsaGos
You can also see if the MSB bit is 0 arithmetic shift behaves exactly as the binary shift, thus 0x40 >> 1 == 0x20
I read about twos complement on wikipedia and on stack overflow, this is what I understood but I'm not sure if it's correct
signed int
the left most bit is interpreted as -231 and this how we can have negative numbers
unsigned int
the left most bit is interpreted as +231 and this is how we achieve large positive numbers
update
What will the compiler see when we store 3 vs -3?
I thought 3 is always 00000000000000000000000000000011
and -3 is always 11111111111111111111111111111101
example for 3 vs -3 in C:
unsigned int x = -3;
int y = 3;
printf("%d %d\n", x, y); // -3 3
printf("%u %u\n", x, y); // 4294967293 3
printf("%x %x\n", x, y); // fffffffd 3
Two's complement is a way to represent negative integers in binary.
First of all, here's a standard 32-bit integer ranges:
Signed = -(2 ^ 31) to ((2 ^ 31) - 1)
Unsigned = 0 to ((2 ^ 32) - 1)
In two's complement, a negative is represented by inverting the bits of its positive equivalent and adding 1:
10 which is 00001010 becomes -10 which is 11110110 (if the numbers were 8-bit integers).
Also, the binary representation is only important if you plan on using bitwise operators.
If your doing basic arithmetic, then this is unimportant.
The only time this may give unexpected results outside of the aforementioned times is getting the absolute value of the signed version of -(2 << 31) which will always give a negative.
Your problem does not have to do with the representation, but the type.
A negative number in an unsigned integer is represented the same, the difference is that it becomes a super high number since it must be positive and the sign bit works as normal.
You should also realize that ((2^32) - 5) is the exact same thing as -5 if the value is unsigned, etc.
Therefore, the following holds true:
unsigned int x = (2 << 31) - 5;
unsigned int y = -5;
if (x == y) {
printf("Negative values wrap around in unsigned integers on underflow.");
}
else {
printf( "Unsigned integer underflow is undefined!" );
}
The numbers don't change, just the interpretation of the numbers. For most two's complement processors, add and subtract do the same math, but set a carry / borrow status assuming the numbers are unsigned, and an overflow status assuming the number are signed. For multiply and divide, the result may be different between signed and unsigned numbers (if one or both numbers are negative), so there are separate signed and unsigned versions of multiply and divide.
For 32-bit integers, for both signed and unsigned numbers, n-th bit is always interpreted as +2n.
For signed numbers with the 31th bit set, the result is adjusted by -232.
Example:
1111 1111 1111 1111 1111 1111 1111 11112 as unsigned int is interpreted as 231+230+...+21+20. The interpretation of this as a signed int would be the same MINUS 232, i.e. 231+230+...+21+20-232 = -1.
(Well, it can be said that for signed numbers with the 31th bit set, this bit is interpreted as -231 instead of +231, like you said in the question. I find this way a little less clear.)
Your representation of 3 and -3 is correct: 3 = 0x00000003, -3 + 232 = 0xFFFFFFFD.
Yes, you are correct, allow me to explain a bit further for clarification purposes.
The difference between int and unsigned int is how the bits are interpreted. The machine processes unsigned and signed bits the same way, but there are extra bits added for signing. Two's complement notation is very readable when dealing with related subjects.
Example:
The number 5's, 0101, inverse is 1011.
In C++, it's depends when you should use each data type. You should use unsigned values when functions or operators return those values. ALUs handle signed and unsigned variables very similarly.
The exact rules for writing in Two's complement is as follows:
If the number is positive, count up to 2^(32-1) -1
If it is 0, use all zeroes
For negatives, flip and switch all the 1's and 0's.
Example 2(The beauty of Two's complement):
-2 + 2 = 0 is displayed as 0010 + 1110; and that is 10000. With overflow at the end, we have our result as 0000;
int:
The 32-bit int data type can hold integer values in the range of
−2,147,483,648 to 2,147,483,647. You may also refer to this data type
as signed int or signed.
unsigned int :
The 32-bit unsigned int data
type can hold integer values in the range of 0 to 4,294,967,295. You
may also refer to this data type simply as unsigned.
Ok, but, in practice:
int x = 0xFFFFFFFF;
unsigned int y = 0xFFFFFFFF;
printf("%d, %d, %u, %u", x, y, x, y);
// -1, -1, 4294967295, 4294967295
no difference, O.o. I'm a bit confused.
Hehe. You have an implicit cast here, because you're telling printf what type to expect.
Try this on for size instead:
unsigned int x = 0xFFFFFFFF;
int y = 0xFFFFFFFF;
if (x < 0)
printf("one\n");
else
printf("two\n");
if (y < 0)
printf("three\n");
else
printf("four\n");
Yes, because in your case they use the same representation.
The bit pattern 0xFFFFFFFF happens to look like -1 when interpreted as a 32b signed integer and as 4294967295 when interpreted as a 32b unsigned integer.
It's the same as char c = 65. If you interpret it as a signed integer, it's 65. If you interpret it as a character it's a.
As R and pmg point out, technically it's undefined behavior to pass arguments that don't match the format specifiers. So the program could do anything (from printing random values to crashing, to printing the "right" thing, etc).
The standard points it out in 7.19.6.1-9
If a conversion specification is invalid, the behavior is undefined. If
any argument is not the correct type for the corresponding conversion
specification, the behavior is undefined.
There is no difference between the two in how they are stored in memory and registers, there is no signed and unsigned version of int registers there is no signed info stored with the int, the difference only becomes relevant when you perform maths operations, there are signed and unsigned version of the maths ops built into the CPU and the signedness tell the compiler which version to use.
The problem is that you invoked Undefined Behaviour.
When you invoke UB anything can happen.
The assignments are ok; there is an implicit conversion in the first line
int x = 0xFFFFFFFF;
unsigned int y = 0xFFFFFFFF;
However, the call to printf, is not ok
printf("%d, %d, %u, %u", x, y, x, y);
It is UB to mismatch the % specifier and the type of the argument.
In your case you specify 2 ints and 2 unsigned ints in this order by provide 1 int, 1 unsigned int, 1 int, and 1 unsigned int.
Don't do UB!
The binary representation is the key. An Example:
Unsigned int in HEX
0XFFFFFFF = translates to = 1111 1111 1111 1111 1111 1111 1111 1111
Which represents 4,294,967,295 in a base-ten positive number.
But we also need a way to represent negative numbers.
So the brains decided on twos complement.
In short, they took the leftmost bit and decided that when it is a 1 (followed by at least one other bit set to one) the number will be negative.
And the leftmost bit is set to 0 the number is positive.
Now let's look at what happens
0000 0000 0000 0000 0000 0000 0000 0011 = 3
Adding to the number we finally reach.
0111 1111 1111 1111 1111 1111 1111 1111 = 2,147,483,645
the highest positive number with a signed integer.
Let's add 1 more bit (binary addition carries the overflow to the left, in this case, all bits are set to one, so we land on the leftmost bit)
1111 1111 1111 1111 1111 1111 1111 1111 = -1
So I guess in short we could say the difference is the one allows for negative numbers the other does not.
Because of the sign bit or leftmost bit or most significant bit.
The internal representation of int and unsigned int is the same.
Therefore, when you pass the same format string to printf it will be printed as the same.
However, there are differences when you compare them.
Consider:
int x = 0x7FFFFFFF;
int y = 0xFFFFFFFF;
x < y // false
x > y // true
(unsigned int) x < (unsigned int y) // true
(unsigned int) x > (unsigned int y) // false
This can be also a caveat, because when comparing signed and unsigned integer one of them will be implicitly casted to match the types.
He is asking about the real difference.
When you are talking about undefined behavior you are on the level of guarantee provided by language specification - it's far from reality.
To understand the real difference please check this snippet (of course this is UB but it's perfectly defined on your favorite compiler):
#include <stdio.h>
int main()
{
int i1 = ~0;
int i2 = i1 >> 1;
unsigned u1 = ~0;
unsigned u2 = u1 >> 1;
printf("int : %X -> %X\n", i1, i2);
printf("unsigned int: %X -> %X\n", u1, u2);
}
The type just tells you what the bit pattern is supposed to represent. The bits are only what you make of them. The same sequences can be interpreted in different ways.
The printf function interprets the value that you pass it according to the format specifier in a matching position. If you tell printf that you pass an int, but pass unsigned instead, printf would re-interpret one as the other, and print the results that you see.
In the C programming language, unsigned int is used to store positive values only. However, when I run the following code:
unsigned int x = -12;
printf("%d", x);
The output is still -12. I thought it should have printed out: 12, or am I misunderstanding something?
The -12 to the right of your equals sign is set up as a signed integer (probably 32 bits in size) and will have the hexadecimal value 0xFFFFFFF4. The compiler generates code to move this signed integer into your unsigned integer x which is also a 32 bit entity. The compiler assumes you only have a positive value to the right of the equals sign so it simply moves all 32 bits into x. x now has the value 0xFFFFFFF4 which is 4294967284 if interpreted as a positive number. But the printf format of %d says the 32 bits are to be interpreted as a signed integer so you get -12. If you had used %u it would have printed as 4294967284.
In either case you don't get what you expected since C language "trusts" the writer of code to only ask for "sensible" things. This is common in C. If you wanted to assign a value to x and were not sure whether the value on the right side of the equals was positive you could have written unsigned int x = abs(-12); and forced the compiler to generate code to take the absolute value of a signed integer before moving it to the unsigned integer.
The int is unsinged, but you've told printf to look at it as a signed int.
Try
unsigned int x = -12; printf("%u", x);
It won't print "12", but will print the max value of an unsigned int minus 11.
Exercise to the reader is to find out why :)
Passing %d to printf tells printf to treat the argument as a signed integer, regardless of what you actually pass. Use %u to print as unsigned.
It all has to do with interpretation of the value.
If you assume 16 bit signed and unsigned integers, then here some examples that aren't exactly correct, but demonstrate the concept.
0000 0000 0000 1100 unsigned int, and signed int value 12
1000 0000 0000 1100 signed int value -12, and a large unsigned integer.
For signed integers, the bit on the left is the sign bit.
0 = positive
1 = negative
For unsigned integers, there is no sign bit.
the left hand bit, lets you store a larger number instead.
So the reason you are not seeing what you are expecting is that.
unsigned int x = -12, takes -12 as an integer, and stores it into x. x is unsigned, so
what was a sign bit, is now a piece of the value.
printf lets you tell the compiler how you want a value to be displayed.
%d means display it as if it were a signed int.
%u means display it as if it were an unsigned int.
c lets you do this kind of stuff. You the programmer are in control.
Kind of like a firearm.
It's a tool.
You can use it correctly to deal with certain situations,
or incorrectly to remove one of your toes.
one possibly useful case is the following
unsigned int allBitsOn = -1;
That particular value sets all of the bits to 1
1111 1111 1111 1111
that can be useful sometimes.
printf('%d', x);
Means print a signed integer. You'll have to write this instead:
printf('%u', x);
Also, it'll still not print "12", it's going to be "4294967284".
They do store positive values. But you're outputting the (very high) positive value as a signed integer, so it gets re-interpreted again (in an implementation-defined fashion, I might add).
Use the format flag "%u instead.
Your program has undefined behavior because you passed the wrong type to printf (you told it you were going to pass an int but you passed an unsigned int). Consider yourself lucky that the "easiest" thing for the implementation to do was just silently print the wrong value and not jump to some code that does something harmful...
What you are missing is that the printf("%d",x) expects x to be signed, so although you assign -12 to x it is interpreted as 2's complement which would be a very large number.
However when you pass this really large number to printf it interprets it as signed thus correctly translating it back to -12.
The correct syntax to print a unsigned in print f is "%u" - try this and see what it does!
The assignment of a negative value to an unsigned int does not compute the absolute value of the negative: it interprets as an unsigned int the binary representation of the negative value, i.e., 4294967284 (2^32 - 12).
printf("%d") performs the opposite interpretation. This is why your program displays -12.
int and unsigned int are used to allocate a number of bytes to store a value nothing more.
The compiler should give warnings about signed mismatching but it really does not affect the bits in the memory that represent the value -12.
%x, %d, %u etc tells the compiler how to interrupt a number of bits when you print them.
When you are trying to display the int value you are passing it to a (int) argument and not a (unsigned int) argument and that causes it to print -12 and not 4294967284. Integers are stored in hexadecimal format and -12 for int is the same as 4294967284 for unsigned int in hexadecimal format..
That is why "%u" prints the right value you want and not "%d".. It depends on your argument type..GOOD LUCK!
The -12 is in 16-bit 2's compliment format. So do this:
if (x & 0x8000) { x = ~x+1; }
This will convert the 2's compliment -ve number to the equivalent +ve number. Good luck.
When the compiler implicitly converts -12 to an unsigned integer, the underlying binary representation remains unaltered. This conversion is purely semantic. The sign bit of the two's complement integer becomes the most significant bit of the unsigned integer. Thus when printf treats the unsigned integer as a signed integer with %d, it will see -12.
In general context when only positive numbers can be stored, negative numbers are not stored explicitly but their 2's complement is stored. In the same way here, the 2's complement of -12 will be stored in 'x' and you use %u to get it.