How does C store negative numbers in signed vs unsigned integers?

How does C store negative numbers in signed vs unsigned integers? - c

Here is the example:
#include <stdio.h>
int main()
{
int x=35;
int y=-35;
unsigned int z=35;
unsigned int p=-35;
signed int q=-35;
printf("Int(35d)=%d\n\
Int(-35d)=%d\n\
UInt(35u)=%u\n\
UInt(-35u)=%u\n\
UInt(-35d)=%d\n\
SInt(-35u)=%u\n",x,y,z,p,p,q);
return 0;
}
Output:
Int(35d)=35
Int(-35d)=-35
UInt(35u)=35
UInt(-35u)=4294967261
UInt(-35d)=-35
SInt(-35u)=4294967261
Does it really matter if I declare the value as signed or unsigned int? Because, C actually only cares about how I read the value from memory. Please help me understand this and I hope you prove me wrong.

Representation of signed integers is up to the underlying platform, not the C language itself. The language definition is mostly agnostic with regard to signed integer representations. Two's complement is probably the most common, but there are other representations such as one's complement and signed magnitude.
In a two's complement system, you negate a value by inverting the bits and adding 1. To get from 5 to -5, you'd do:
5 == 0101 => 1010 + 1 == 1011 == -5
To go from -5 back to 5, you follow the same procedure:
-5 == 1011 => 0100 + 1 == 0101 == 5
Does it really matter if I declare the value as signed or unsigned int?
Yes, for the following reasons:
It affects the values you can represent: unsigned integers can represent values from 0 to 2N-1, whereas signed integers can represent values between -2N-1 and 2N-1-1 (two's complement).
Overflow is well-defined for unsigned integers; UINT_MAX + 1 will "wrap" back to 0. Overflow is not well-defined for signed integers, and INT_MAX + 1 may "wrap" to INT_MIN, or it may not.
Because of 1 and 2, it affects arithmetic results, especially if you mix signed and unsigned variables in the same expression (in which case the result may not be well defined if there's an overflow).

An unsigned int and a signed int take up the same number of bytes in memory. They can store the same byte values. However the data will be treated differently depending on if it's signed or unsigned.
See http://en.wikipedia.org/wiki/Two%27s_complement for an explanation of the most common way to represent integer values.
Since you can typecast in C you can effectively force the compiler to treat an unsigned int as signed int and vice versa, but beware that it doesn't mean it will do what you think or that the representation will be correct. (Overflowing a signed integer invokes undefined behaviour in C).
(As pointed out in comments, there are other ways to represent integers than two's complement, however two's complement is the most common way on desktop machines.)

Does it really matter if I declare the value as signed or unsigned int?
Yes.
For example, have a look at
#include <stdio.h>
int main()
{
int a = -4;
int b = -3;
unsigned int c = -4;
unsigned int d = -3;
printf("%f\n%f\n%f\n%f\n", 1.0 * a/b, 1.0 * c/d, 1.0*a/d, 1.*c/b);
}
and its output
1.333333
1.000000
-0.000000
-1431655764.000000
which clearly shows that it makes a huge difference if I have the same byte representation interpreted as signed or unsigned.

#include <stdio.h>
int main(){
int x = 35, y = -35;
unsigned int z = 35, p = -35;
signed int q = -35;
printf("x=%d\tx=%u\ty=%d\ty=%u\tz=%d\tz=%u\tp=%d\tp=%u\tq=%d\tq=%u\t",x,x,y,y,z,z,p,p,q,q);
}
the result is:
x=35 x=35 y=-35 y=4294967261 z=35 z=35 p=-35 p=4294967261 q=-35 q=4294967261
the int number store is not different, it stored with Complement style in memory，
I can use 0X... the 35 in 0X00000023, and the -35 in 0Xffffffdd, it is not different you use sigend or unsigend. it only output with different sytle. The %d and %u is not different about positive, but the negative the first position is sign, if you output with %u is 0Xffffffdd equal 4294967261, but the %d the 0Xffffffdd can be - 0X00000023 equal -35.

The most fundamental thing that variable's type defines is the way it is stored (that is - read from and written to) in memory and how are the bits interpreted, so your statement can be considered "valid".
You can also look at the problem using conversions. When you store signed and negative value in unsigned variable it gets converted to unsigned. It so happens that this conversion is reversible, so signed -35 converts to unsigned 4294967261, which - when you request it - can be converted to signed -35. That's how 2's complement encoding (see link in other answer) works.

Related

How does hexadecimal to %x work?

I am learning in C and I got a question regarding this conversion.
short int x = -0x52ea;
printf ( "%x", x );
output:
ffffad16
I would like to know how this conversion works because it's supposed to be on a test and we won't be able to use any compilers. Thank you

I would like to know how this conversion works
It is undefined behavior (UB)
short int x = -0x52ea;
0x52ea is a hexadecimal constant. It has the value of 52EA16, or 21,22610. It has type int as it fits in an int, even if int was 16 bit. OP's int is evidently 32-bit.
- negates the value to -21,226.
The value is assigned to a short int which can encode -21,226, so no special issues with assigning this int to a short int.
printf("%x", x );
short int x is passed to a ... function, so goes through the default argument
promotions and becomes an int. So an int with the value -21,226 is passed.
"%x" used with printf(), expects an unsigned argument. Since the type passed is not an unsigned (and not an int with a non-negative value - See exception C11dr §6.5.2.2 6), the result is undefined behavior (UB). Apparently the UB on your machine was to print the hex pattern of a 32-bit 2's complement of -21,226 or FFFFAD16.
If the exam result is anything but UB, just smile and nod and realize the curriculum needs updating.

The point here is that when a number is negative, it's structured in a completely different way.
1 in 16-bit hexadecimal is 0001, -1 is ffff. The most relevant bit (8000) indicates that it's a negative number (admitting it's a signed integer), and that's why it can only go as positive as 32767 (7fff), and as negative as -32768 (8000).
Basically to transform from positive to negative, you invert all bits and sum 1. 0001 inverted is fffe, +1 = ffff.
This is a convention called Two's complement and it's used because it's quite trivial to do arithmetic using bitwise operations when you use it.

Variable conversion mixup in C

I have the following C code and I have to understand why the result is a fairly big positive number:
int k;
unsigned int l;
float f;
f=4; l = 1; k = -2;
printf("\n %f", (unsigned int)l+k+f);
The result is a very large number (around 4 billion, max for 32 bit integers), so I suspect it has something to do with the representation of signed negative integers (two's complement) that looks fairly big if we look at it as unsigned.
However I don't quite understand why it behaves this way and what the float has to do with the behavior (if I remove it it stops doing it). Could someone explain how does it do that ? What is the process that happens when adding the numbers that leads to this result ?

The problem is that when you add a signed int to an unsigned, C converts the result to an unsigned int, even negative ones. Since k is negative, it gets re-interpreted as a large positive number before the addition. After that the f is added, but it is small in comparison to negative 2 re-interpreted as a positive number.
Here is a short illustration of the problem:
int k = -2;
unsigned int l = 1;
printf("\n %u", l+k);
This prints 4294967295 on a 32-bit system, because -2 in two's complement representation is 0xFFFFFFFE (demo).

The answer is that when the compiler does the l+k it typecasts the k to unsigned int which turns it into that big number. If you change the order of the vars to this: l+f+k this behavior will not occur.

What happens is that a negative integer is converted to an unsigned integer by adding the maximum unsigned value, plus one, repeatedly to the signed value until it's representable. This is why you see such a large value.
With two's complement, it's the same as truncating or sign extending the value and interpreting it as unsigned which is simple for a computer to do.

Printing declared char value in C

I understand that character variable holds from (signed)-128 to 127 and (unsigned)0 to 255
char x;
x = 128;
printf("%d\n", x);
But how does it work? Why do I get -128 for x?

printf is a variadic function, only providing an exact type for the first argument.
That means the default promotions are applied to the following arguments, so all integers of rank less than int are promoted to int or unsigned int, and all floating values of rank smaller double are promoted to double.
If your implementation has CHAR_BIT of 8, and simple char is signed and you have an obliging 2s-complement implementation, you thus get
128 (literal) to -128 (char/signed char) to -128 (int) printed as int => -128
If all the listed condition but obliging 2s complement implementation are fulfilled, you get a signal or some implementation-defined value.
Otherwise you get output of 128, because 128 fits in char / unsigned char.
Standard quote for case 2 (Thanks to Matt for unearthing the right reference):
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.60)
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.

This all has nothing to do with variadic functions, default argument promotions etc.
Assuming your system has signed chars, then x = 128; is performing an out-of-range assignment. The behaviour of this is implementation-defined ; meaning that the compiler may choose an action but it must document what it does (and therefore, do it reliably). This action is allowed to include raising a signal.
The usual behaviour that modern compilers do for out-of-range assignment is to truncate the representation of the value to fit in the destination type.
In binary representation, 128 is 000....00010000000.
Truncating this into a signed char gives the signed char of binary representation 10000000. In two's complement representation, which is used by all modern C systems for negative numbers, this is the representation of the value -128. (For historical curiousity: in one's complement this is -127, and in sign-magnitude, this is -0 which may be a trap representation and thus raise a signal).
Finally, printf accurately prints out this char's value of -128. The %d modifier works for char because of the default argument promotions and the facts that INT_MIN <= CHAR_MIN and INT_MAX >= CHAR_MAX.; this behaviour is guaranteed except on systems which have plain char as unsigned, and sizeof(int)==1 (which do exist but you'd know about it if you were on one).

Lets look at the binary representation of 128 when stored into 8 bits:
1000 0000
And now let's look at the binary representation of -128 when stored into 8 bits:
1000 0000
The standard for char with your current setup looks to be a signed char (note this isn't in the c standard, look here if you don't believe me) and thus when you're assigning the value of 128 to x you're assigning it the value 1000 0000 and thus when you compile and print it out it's printing out the signed value of that binary representation (meaning -128).
It turns out my environment is the same in assuming char is actually signed char. As expected if I cast x to be an unsigned char then I get the expected output of 128:
#include <stdio.h>
#include <stdlib.h>
int main() {
char x;
x = 128;
printf("%d %d\n", x, (unsigned char)x);
return 0;
}
gives me the output of -128 128
Hope this helps!

why the result is "2" of unsigned int (1) - unsigned int (0xFFFFFFFF)

Please look at the following codes:
#include <stdlib.h>
#include <stdio.h>
int main()
{
unsigned int a = 1;
unsigned int b = -1;
printf("0x%X\n", (a-b));
return 0;
}
The result is 0x2.
I　think the integer promotion should not happen because the type of both of "a" and "b" are unsigned int. But the result beats me.... I don't know the reason.
By the way, I know the arithmetic result should be 2 because 1-(-1)=2. But the type of b is unsigned int. When assign the (-1) to b, the value of b is 0xFFFFFFFF actually. It is the maximum value of unsigned int. When one small unsigned value subtract one big value, the result is not that I expect.
From the answer below, I think maybe the overflow is a good explanation。
Now I writes other test codes. It proves the overflow answer is right.
#include <stdlib.h>
#include <stdio.h>
int main()
{
unsigned int c = 1;
unsigned int d = -1;
printf("0x%llx\n", (unsigned long long)c-(unsigned long long)d);
return 0;
}
The result is "0xffffffff00000002". It is I expect.

unsigned int a = 1;
This initializes a to 1. Actually, since 1 is of type int, there's an implicit int-to-unsigned conversion, but it's a trivial conversion that doesn't change the value or representation).
unsigned int b = -1;
This is more interesting. -1 is of type int, and the initialization implicitly converts it from int to unsigned int. Since the mathematical value -1 cannot be represented as an unsigned int, the conversion is non-trivial. The rule in this case is (quoting section 6.3.1.3 of the C standard):
the value is converted by repeatedly adding or subtracting one more
than the maximum value that can be represented in the new type until
the value is in the range of the new type.
Of course it doesn't actually have to do it that way, as long as the result is the same. Adding UINT_MAX+1 ("one more than the maximum value that can be represented in the new type") to -1 yields UINT_MAX. (That addition is defined mathematically; it's not itself subject to any type conversions.)
In fact, assigning -1 to an object of any unsigned type is a good way to get the maximum value of that type without having to refer to the *_MAX macros defined in <limits.h>.
So, assuming a typical 32-bit system, we have a == 1 and b == 0xFFFFFFFF.
printf("0x%X\n", (a-b));
The mathematical result of the subtraction is -0xFFFFFFFE, but that's obviously outside the range of unsigned int. The rules for unsigned arithmetic are similar to the rules for unsigned conversion; the result is 2.

Who says you're suffering integer promotion? Let's pretend that your integers are two's complement (likely, though not mandated by the standard) and they're only four bits wide (not possible according to the standard, I'm just using this to simplify things).
int unsigned-int bit-pattern
--- ------------ -----------
1 1 0001
-1 15 1111
------
(subtract with borrow) 1 0010
(only have four bits) 0010 -> 2
You can see here that you can get the result you see without any promotion to signed or wider types.

There should be a compiler warning that you probably ignored or turned off, but it's still possible to store -1 in an unsigned integer. Internally, -1 is stored on a 32-bit machine as 0xffffffff. So if you subtract 0xffffffff from 1, you end up with -0xfffffffe, which is 2. (There are no negative numbers, a negative number is the maximum integer value plus one minus the number).
Bottom line - signed or unsigned doesn't matter at all, it only comes to play when you compare values.
And mathematically speaking, 1 - (-1) = 1+1.

If you subtract a negative number, it is the equivalent of adding a positive number.
a = 1
b = -1
(a-b) = ?
((1)-(-1)) = ?
(1-(-1)) = ?
(1+1) = ?
2 = ?
At first you might think that this isn't allowed, since you specified an unsigned int; however, you are also converting signed int (the -1 constant) to an unsigned int. So, you are effectively storing the exact same bits into the unsigned int (0xFFFF).
Then, in the expression, you take the negative of the 0xFFFF value, which of course forces the number to be signed. In effect, you are circumventing the unsigned directive at ever step.

Why do unsigned int x = -1 and int y = ~0 have the same binary representation?

In the following code segment what will be:
the result of function
value of x
value of y
{
unsigned int x=-1;
int y;
y = ~0;
if(x == y)
printf("same");
else
printf("not same");
}
a. same, MAXINT, -1
b. not same, MAXINT, -MAXINT
c. same , MAXUINT, -1
d. same, MAXUINT, MAXUINT
e. not same, MAXINT, MAXUINT
Can someone explain me how its works or can just explain the snippet??
I know it's about two's complement n etc..
What is the significance of MAXINT and -1 ?
It is because of unsigned int and int thing - am I right ?

unsigned int x=-1;
1 is an integer literal and has type int (because it fits in an int). Unary - applied to an int causes no further promotion so -1 is an int with value -1.
When converted to an unsigned int modulo 2^N arithmetic is used where N is the number of value bits in an unsigned int. x has the value 2^N - 1 which is UINT_MAX (What's MAX_UNIT?).
int y;
y = ~0;
Again 0 is type int, in C all the allowed representations of int must have all the value bits of an int representing 0 as 0. Again no promotion happens for unary ~ so ~0 is an int with all value bits being 1. What it's value is is implementation dependent but it is negative (the sign bit will be set) so definitely neither of UINT_MAX or INT_MAX. This value is stored in y unchanged.
if(x == y)
printf("same");
else
printf("not same");
In this comparison y will be converted to unsigned int in order to be compared with x which is already an unsigned int. As y has an implementation value, the value after conversion to unsigned int is still implementation defined (although the conversion itself is modulo 2^N and fully specified). The result of the comparison is still implementation defined.
So in conclusion:
implementation defined, UINT_MAX, implementation defined
In practice on ones' complement:
not same, UINT_MAX, -0 (aka 0)
sign plus magnitude:
not same, UINT_MAX, INT_MIN
two's complement:
same, UINT_MAX, -1

Its pretty easy. The twos complement representation of -1 is 0xFFFFFFFF. Hence that is what x contains.
The complement operator (~) flips all the bits. So the complement of 0 is a 32-bit number with all the bits set to 1 or 0xFFFFFFFF.
Edit: As pointed out int he comments. The answer is not A. If it was then it would be saying 0x7FFFFFFF and 0xFFFFFFFF are the same. They're not. The real answer is C (Assuming MAXUNIT is a typo ;)).

If you run this program, you will see that the answer a is wrong and c is the correct answer:
#include <stdio.h>
#include <limits.h>
int main() {
unsigned int x=-1;
int y;
y = ~0;
if(x == y)
printf("same\n");
if(x==INT_MAX) printf("INT_MAX\n");
if(x==UINT_MAX) printf("UINT_MAX\n");
else
printf("not same");
return 0;
}

It's because of twos-complement
The flipping of the bits results in a bit pattern that matches -1

Since in the first unsigned int, you put -1, but from the unsigned int point of view it is 0xFFFFFFFF (this is how negative integers are stored into a computer); in the second case the bitwise not does not look at the "kind" at all and "transform" 0 into 1 and viceversa, so all zeros of 0 becomes 1 (looking at bits), so you obtain 0xFFFFFFFF.
The next question is, why comparing an unsigned integer with a signed integer does not distinguish them? (numerically 4294967295 is not equal to -1 of course, even though their representation in a computer is the same). Indeed it could, but clearly C does not mandate such a distinction, and it is "natural", since processor aren't able to do it of their own (I am not sure about this last sentence being always true, ... but it is for most processor): to take into account this distinction in asm, you have to add extra code.
From the C point of view, you have to decide if to cast int into unsigned int, or signed int into unsigned int. But a negative number can't be cast into a unsigned one, of course, and on the other hand, an unsigned number could cause overflow (e.g. for 4294967295 you need a 64bit register (or a 33bit regster!) to be able to have it and still be able to calculate its negative value)...
So likely the most natural thing is to avoid strange casting, and permit a "cpu like" comparison, which in this case lead to 0xFFFFFFFF (-1 on 32 bit) compared to 0xFFFFFFFF (~0 on 32bit), which are the same, and more in genereal one can be considered as MAXUINT (the maximum unsigned integer that can be hold) and the other as -1. (Take a look at your machine limits.h include to check it)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight