Why unsigned is treated as signed? - c

I know there was very similar question already answered but I believe it doesn't address my problem.
unsigned char aaa = -10;
unsigned int bbb = (unsigned int)-5;
unsigned int ccc = (unsigned int)20 + (unsigned int)bbb;
printf("%d\n", aaa);
printf("%d\n", ccc);
Above code prints aaa = 246 (which is what I would expect) but ccc = 15 which means unsigned int was treated all the way as signed. I can't find explanation for this even trying probably obsolete typecasting.

unsigned char aaa = -10;
The int value -10 will be converted to unsigned char by repeatedly adding UCHAR_MAX + 1 until it the result will in range of [0, UCHAR_MAX]. Most probablly the char has 8-bits on your system, that means UCHAR_MAX = 2**8 - 1 = 255. So -10 + UCHAR_MAX + 1 is 246, it's in range. So aaa = 246.
unsigned int bbb = (unsigned int)-5;
The -5 is added UINT_MAX + 1, assuming int has 32bits, it results in bbb = 4294967291.
unsigned int ccc = (unsigned int)20 + (unsigned int)bbb;
Unsigned integer overflow "wrap around". So 20 + 4294967291 = 4294967311 is greater then UINT_MAX = 2**32 - 1 = 4294967295. So we subtract UINT_MAX+1 until we will be in range [0, UINT_MAX]. So 4294967311 - (UINT_MAX+1) = 15.
printf("%d\n", aaa);
The code is most probably fine. Most probably on your platform unsigned char is promoted to int before passing into variadic function argument. For reference about promotions you could read cppreference implicit conversions. Because %d expects an int and unsigned char is promoted to int, the code is fine. [1]
printf("%d\n", ccc);
This line results in undefined behavior. ccc has the type unsigned int, while %d printf format specifier expects an signed int. Because your platform uses two-s complement to represent numbers, this just results in printf interpreting the bits as a signed value, which is 15 anyway.
[1]: There is a theoretical possibility of unsigned char having as many bits as int, so unsigned char will get promoted to unsigned int instead of int, which will result in undefined behavior there too.

C variable types
C is a quite low-level programming language, and it does not preserve variable types after compilation. For example, if one had both a unsigned and an unsigned variables of equal size (say uint64_t and int64_t), after the compilation ends, each of them will be represented as just an 8-byte piece of memory. All the addition/subtraction will be performed modulo 2 in the corresponding power (64 for 64-bit variables and 32 for 32-bit ones).
The only difference, which remains after compilation is in the comparison. For example, (unsigned) -5 > 1, but -5 < 1. I will explain why now
Your -5 will be stored modulo 2^32, just like every 32-bit value. -5 will be presented as 0xfffffffb in actual ram. The algorithm is simple: if a variable is signed, then it's first bit is called a sign bit and indicates, whether it's positive or negative. The first bit of 0xfffffffb is 1, so, when it comes to signed comparison, it is negative, and is less, than 1. But when compared as an unsigned integer, this value is actually a huge one, 2^32 - 5. So, in general, unsigned representation of a negative signed number is greater by 2^[num of bits in that number]. You can read more about this binary arithmetics here.
So, all, that happened, you got an unsigned number, equal to 0xfffffffb + 0x14 modulo 2^32, which is 0x10000000f (mod 2^32), and that is 0xf = 15.
In conclusion, "%d" assumes it's argument is signed. But that's not the main reason, why the answer happened. However, I advice using %u for unsigned numbers.

Related

Converting non-Ascii characters to int in C, the extra bits are supplemented by 1 rather than 0

When coding in C, I have accidently found that as for non-Ascii characters, after they are converted from char (1 byte) to int (4 bytes), the extra bits (3 bytes) are supplemented by 1 rather than 0. (As for Ascii characters, the extra bits are supplemented by 0.) For example:
char c[] = "ā";
int i = c[0];
printf("%x\n", i);
And the result is ffffffc4, rather than c4 itself. (The UTF-8 code for ā is \xc4\x81.)
Another related issue is that when performing right shift operations >> on a non-Ascii character, the extra bits on the left end are also supplemented by 1 rather than 0, even though the char variable is explicitly converted to unsigned int (for as for signed int, the extra bits are supplemented by 1 in my OS). For example:
char c[] = "ā";
unsigned int u_c;
int i = c[0];
unsigned int u_i = c[0];
c[0] = (unsigned int)c[0] >> 1;
u_c = (unsigned int)c[0] >> 1;
i = i >> 1;
u_i = u_i >> 1;
printf("c=%x\n", (unsigned int)c[0]); // result: ffffffe2. The same with the signed int i.
printf("u_c=%x\n", u_c); // result: 7fffffe2.
printf("i=%x\n", i); // result: ffffffe2.
printf("u_i=%x\n", u_i); // result: 7fffffe2.
Now I am confused with these results... Are they concerned with the data structures of char, int and unsigned int, or related to my operating system (ubuntu 14.04), or related to the ANSI C requirements? I have tried to compile this program with both gcc(4.8.4) and clang(3.4), but there is no difference.
Thank you so much!
It is implementation-defined whether char is signed or unsigned. On x86 computers, char is customarily a signed integer type; and on ARM it is customarily an unsigned integer type.
A signed integer will be sign-extended when converted to a larger signed type;
a signed integer converted to unsigned integer will use the modulo arithmetic to wrap the signed value into the range of the unsigned type as if by repeatedly adding or subtracting the maximum value of the unsigned type + 1.
The solution is to use/cast to unsigned char if you want the value to be portably zero-extended, or for storing small integers in range 0..255.
Likewise, if you want to store signed integers in range -127..127/128, use signed char.
Use char if the signedness doesn't matter - the implementation will probably have chosen the type that is the most efficient for the platform.
Likewise, for the assignment
unsigned int u_c; u_c = (uint8_t)c[0];,
Since -0x3c or -60 is not in the range of uint16_t, then the actual value is the value (mod UINT16_MAX + 1) that falls in the range of uint16_t; iow, we add or subtract UINT16_MAX + 1 (notice that the integer promotions could trick here so you might need casts if in C code) until the value is in the range. UINT16_MAX is naturally always 0xFFFFF; add 1 to it to get 0x10000. 0x10000 - 0x3C is 0xFFC4 that you saw. And then the uint16_t value is zero-extended to the uint32_t value.
Had you run this on a platform where char is unsigned, the result would have been 0xC4!
BTW in i = i >> 1;, i is a signed integer with a negative value; C11 says that the value is implementation-defined, so the actual behaviour can change from compiler to compiler. The GCC manuals state that
Signed >> acts on negative numbers by sign extension.
However a strictly-conforming program should not rely on this.

Output of a short int and a unsigned short?

So I'm trying to interpret the following output:
short int v = -12345;
unsigned short uv = (unsigned short) v;
printf("v = %d, uv = %u\n", v, uv);
Output:
v = -12345
uv = 53191
So the question is: why is this exact output generated when this program is run on a two's complement machine?
What operations lead to this result when casting the value to unsigned short?
My answer assumes 16-bit two's complement arithmetic.
To find the value of -12345, take 12345, complement it, and add 1.
12345 is 0x3039 is 0011000000111001.
Complementing means changing all the 1's to 0's and all the 0's to 1's:
1100111111000110 is 0xcfc6 is 53190.
Add one: 53191.
So internally, -12345 is represented by 0xcfc7 = 53191.
But if you interpret it as an unsigned number, it's obviously just 53191. (And when you assign a signed value to an unsigned integer of the same size, what typically ends up happening is that you assign the exact bit pattern, without converting anything. Later, however, you will typically interpret that value differently, such as when you print it with %u.)
Another, perhaps easier way to think about this is that 16-bit arithmetic "wraps around" at 216 = 65536. So you can think of 65536 as being another name for 0 (just like 0:00 and 24:00 are both names for midnight). So -12345 is 65536 - 12345 = 53191.
The conversion rules, when converting signed integer to an unsigned integer, defined by C standard requires by repeatedly adding the TYPE_MAX + 1 to the value.
From 6.3.1.3 Signed and unsigned integers:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
If USHRT_MAX is 65535 and then adding 65535 + 1 + -12345 is 53191.
The output seen does not depend on 2's complement nor 16 or 32- bit int. The output seen is entirely defined and would be the same on a rare 1's complement or sign-magnitude machine. The result does depend on 16-bit unsigned short
-12345 is within the minimum range of a short, so no issues with that assignment. When a short is passed as a ... argument, is goes thought the usual promotion to an int with no change in value as all short are in the range of int. "%d" expects an int, so the output is "-12345"
short int v = -12345;
printf("%d\n", v); // output "-12345\n"
Assigning a negative number to a unsigned type is well defined. With a 16-bit unsigned short, the value of uv is -12345 plus the minimum multiples of USHRT_MAX+1 (65536 in this case) to a final value of 53191.
Passing an unsigned short as an ... argument, the value is converted to int or unsigned, whichever type contains the entire range of unsigned short. IAC, the values does not change. "%u" matches an unsigned. It also matches an int whose values are expressible as either an int or unsigned.
short int v = -12345;
unsigned short uv = (unsigned short) v;
printf("%u\n", v); // output "53191\n"
What operations lead to this result when casting the value to unsigned short?
The casting did not affect the final outcome. The same result would have occurred without the cast. The cast may be useful to quiet warnings.

Whats wrong with this C code?

My sourcecode:
#include <stdio.h>
int main()
{
char myArray[150];
int n = sizeof(myArray);
for(int i = 0; i < n; i++)
{
myArray[i] = i + 1;
printf("%d\n", myArray[i]);
}
return 0;
}
I'm using Ubuntu 14 and gcc to compile it, what it prints out is:
1
2
3
...
125
126
127
-128
-127
-126
-125
...
Why doesn't it just count up to 150?
int value of a char can range from 0 to 255 or -127 to 127, depending on implementation.
Therefore once the value reaches 127 in your case, it overflows and you get negative value as output.
The signedness of a plain char is implementation defined.
In your case, a char is a signed char, which can hold the value of a range to -128 to +127.
As you're incrementing the value of i beyond the limit signed char can hold and trying to assign the same to myArray[i] you're facing an implementation-defined behaviour.
To quote C11, chapter §6.3.1.4,
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Because a char is a SIGNED BYTE. That means it's value range is -128 -> 127.
EDIT Due to all the below comment suggesting this is wrong / not the issue / signdness / what not...
Running this code:
char a, b;
unsigned char c, d;
int si, ui, t;
t = 200;
a = b = t;
c = d = t;
si = a + b;
ui = c + d;
printf("Signed:%d | Unsigned:%d", si, ui);
Prints: Signed:-112 | Unsigned:400
Try yourself
The reason is the same. a & b are signed chars (signed variables of size byte - 8bits). c & d are unsigned. Assigning 200 to the signed variables overflows and they get the value -56. In memory, a, b,c&d` all hold the same value, but when used their type "signdness" dictates how the value is used, and in this case it makes a big difference.
Note about standard
It has been noted (in the comments to this answer, as well as other answers) that the standard doesn't mandate that char is signed. That is true. However, in the case presented by OP, as well the code above, char IS signed.
It seems that your compiler by default considers type char like type signed char. In this case CHAR_MIN is equal to SCHAR_MIN and in turn equal to -128 while CHAR_MAX is equal to SCHAR_MAX and in turn equal to 127 (See header <limits.h>)
According to the C Standard (6.2.5 Types)
15 The three types char, signed char, and unsigned char are
collectively called the character types. The implementation shall
define char to have the same range, representation, and behavior as
either signed char or unsigned char
For signed types one bit is used as the sign bit. So for the type signed char the maximum value corresponds to the following representation in the hexadecimal notation
0x7F
and equal to 127. The most significant bit is the signed bit and is equal to 0.
For negative values the signed bit is set to 1 and for example -128 is represented like
0x80
When in your program the value stored in char reaches its positive maximum 0x7Fand was increased it becomes equal to 0x80 that in the decimal notation is equal to -128.
You should explicitly use type unsigned char instead of the char if you want that the result of the program execution did not depend on the compiler settings.
Or in the printf statement you could explicitly cast type char to type unsigned char. For example
printf("%d\n", ( unsigned char )myArray[i]);
Or to compare results you could write in the loop
printf("%d %d\n", myArray[i], ( unsigned char )myArray[i]);

How does C store negative numbers in signed vs unsigned integers?

Here is the example:
#include <stdio.h>
int main()
{
int x=35;
int y=-35;
unsigned int z=35;
unsigned int p=-35;
signed int q=-35;
printf("Int(35d)=%d\n\
Int(-35d)=%d\n\
UInt(35u)=%u\n\
UInt(-35u)=%u\n\
UInt(-35d)=%d\n\
SInt(-35u)=%u\n",x,y,z,p,p,q);
return 0;
}
Output:
Int(35d)=35
Int(-35d)=-35
UInt(35u)=35
UInt(-35u)=4294967261
UInt(-35d)=-35
SInt(-35u)=4294967261
Does it really matter if I declare the value as signed or unsigned int? Because, C actually only cares about how I read the value from memory. Please help me understand this and I hope you prove me wrong.
Representation of signed integers is up to the underlying platform, not the C language itself. The language definition is mostly agnostic with regard to signed integer representations. Two's complement is probably the most common, but there are other representations such as one's complement and signed magnitude.
In a two's complement system, you negate a value by inverting the bits and adding 1. To get from 5 to -5, you'd do:
5 == 0101 => 1010 + 1 == 1011 == -5
To go from -5 back to 5, you follow the same procedure:
-5 == 1011 => 0100 + 1 == 0101 == 5
Does it really matter if I declare the value as signed or unsigned int?
Yes, for the following reasons:
It affects the values you can represent: unsigned integers can represent values from 0 to 2N-1, whereas signed integers can represent values between -2N-1 and 2N-1-1 (two's complement).
Overflow is well-defined for unsigned integers; UINT_MAX + 1 will "wrap" back to 0. Overflow is not well-defined for signed integers, and INT_MAX + 1 may "wrap" to INT_MIN, or it may not.
Because of 1 and 2, it affects arithmetic results, especially if you mix signed and unsigned variables in the same expression (in which case the result may not be well defined if there's an overflow).
An unsigned int and a signed int take up the same number of bytes in memory. They can store the same byte values. However the data will be treated differently depending on if it's signed or unsigned.
See http://en.wikipedia.org/wiki/Two%27s_complement for an explanation of the most common way to represent integer values.
Since you can typecast in C you can effectively force the compiler to treat an unsigned int as signed int and vice versa, but beware that it doesn't mean it will do what you think or that the representation will be correct. (Overflowing a signed integer invokes undefined behaviour in C).
(As pointed out in comments, there are other ways to represent integers than two's complement, however two's complement is the most common way on desktop machines.)
Does it really matter if I declare the value as signed or unsigned int?
Yes.
For example, have a look at
#include <stdio.h>
int main()
{
int a = -4;
int b = -3;
unsigned int c = -4;
unsigned int d = -3;
printf("%f\n%f\n%f\n%f\n", 1.0 * a/b, 1.0 * c/d, 1.0*a/d, 1.*c/b);
}
and its output
1.333333
1.000000
-0.000000
-1431655764.000000
which clearly shows that it makes a huge difference if I have the same byte representation interpreted as signed or unsigned.
#include <stdio.h>
int main(){
int x = 35, y = -35;
unsigned int z = 35, p = -35;
signed int q = -35;
printf("x=%d\tx=%u\ty=%d\ty=%u\tz=%d\tz=%u\tp=%d\tp=%u\tq=%d\tq=%u\t",x,x,y,y,z,z,p,p,q,q);
}
the result is:
x=35 x=35 y=-35 y=4294967261 z=35 z=35 p=-35 p=4294967261 q=-35 q=4294967261
the int number store is not different, it stored with Complement style in memory,
I can use 0X... the 35 in 0X00000023, and the -35 in 0Xffffffdd, it is not different you use sigend or unsigend. it only output with different sytle. The %d and %u is not different about positive, but the negative the first position is sign, if you output with %u is 0Xffffffdd equal 4294967261, but the %d the 0Xffffffdd can be - 0X00000023 equal -35.
The most fundamental thing that variable's type defines is the way it is stored (that is - read from and written to) in memory and how are the bits interpreted, so your statement can be considered "valid".
You can also look at the problem using conversions. When you store signed and negative value in unsigned variable it gets converted to unsigned. It so happens that this conversion is reversible, so signed -35 converts to unsigned 4294967261, which - when you request it - can be converted to signed -35. That's how 2's complement encoding (see link in other answer) works.

unsigned char rotate

I'm a little bit confused as to what an unsigned char is. A signed char is the representation of the char in bit form right? A sample problem has us rotating to the right by n bit positions, the bits of an unsigned char with this solution:
unsigned char rotate(unsigned char x, int n) {
unsigned char temp = x << 8 - n;
x = x >> n;
return (x | temp);
}
If anyone could explain with char examples and their respective bits, it would be greatly appreciated. Thanks so much.
signed char, char and unsigned char are all integer types. For the sake of simplicity I'll assume that CHAR_BIT is 8 and that signed types are 2's complement. So:
signed char is a number from -128 to +127
unsigned char is a number from 0 to 255
char is either the same range as signed char, or the same range as unsigned char, depending on your C implementation.
As far as C is concerned, a character is just a number within the range of the char type (although various character functions like tolower require the value to be cast to an unsigned type on the way in, even if char is signed).
So, signed char and unsigned char are both representation of the character in bit form. For numbers in the range 0 to +127 they both use the same representation (there's only one way to represent positive numbers in binary). For numbers outside that range, the signed representation of a negative number n is the same bits as the unsigned representation of n + 256 (definition of 2's complement).
The reason this code uses unsigned char is that right-shift with a negative signed value has implementation-defined result. Left shift with a negative signed value has undefined behavior. Usually left-shift behaves the same as for unsigned values, which is OK, but right shift inserts bits at the left-hand-side with value 1, a so-called "arithmetic shift", which isn't what's wanted here. Unsigned values always shift in zeros, and it's the shifting in of zero that lets this code build the two parts of the rotated result and or them together.
So, assuming an input value of x = 254 (11111110), and n = 1, we get:
x << 7 is 0111111100000000
x >> 1 is 01111111
| is 0111111101111111
convert to unsigned char to return is 01111111
If we used a signed type instead of unsigned char, we'd quite possibly get:
x is -2 11111110
x << 7 is 11111111111111111111111100000000 (assuming 32-bit int, since
smaller types are always promoted to int for arithmetic ops)
x >> 1 is implementation-defined, possibly
11111111111111111111111111111111
| is 11111111111111111111111111111111
convert to signed char to return is -1
So the bit-manipulation with the unsigned char results in the correct answer, rotated by 1 bit to move the 0 from the end to the start. Bit-manipulation with the signed char, probably gives the wrong result, might give the right result if negative signed values do a logical right shift, but on really unusual implementations could do anything.
Pretty much always for bit-manipulation tasks like rotate, you want to use unsigned types. It removes the implementation-dependence (other than on the width of the type), and avoids you having to reason about negative and non-negative values separately.
Declaring a variable as unsigned char tells the compiler to treat the underlying bit pattern as a number from 0 (00000000) to 255 (11111111). Declaring it a char tells the compiler to apply two's complement to the underlying bit pattern and treat it as a number from -128 (10000000) to 127 (01111111).
Consider a 3-bit number. If it is unsigned, you have:
000 = 0
001 = 1
010 = 2
011 = 3
100 = 4
101 = 5
110 = 6
111 = 7
If it is signed you have:
100 = -4
101 = -3
110 = -2
111 = -1
000 = 0
001 = 1
010 = 2
011 = 3
What is neat with respect to arithmetic (as that link mentions) is that you don't have to treat signed binary numbers differently than unsigned ones. You just do the actual binary math without regard to signed or unsigned. But you do have to apply the signed/unsigned interpretation to the inputs and to the output.
In the signed realm you might have:
2 + (-3) = 010 + 101 = 111 = -1
But in the unsigned realm this is:
2 + 5 = 010 + 101 = 111 = 7
So it's all a matter of interpretation since the actual bit patters being added and the bit pattern of the sum are the same in both cases.
an unsigned char is just an 8-bit integer type that can take values between 0 and 255 and a signed char can take values between -127 and 128. In the actual machine code there is no real difference, except one: when you do a right shift on a signed type using >> (be it char, short or int) it is carried out as an arithmetical shift, meaning for negative values (which have a 1 as MSB) a 1 is shifted in, instead of a 0 and the above code will not work as expected.
EDIT: Your above code example of rotating an unsigned char by 3 bits for signed and unsigned:
00110101 rotated unsigned and signed is 10100110.
but for a number whit a 1 in front you get an arithmetic shift and thus
11010001 rotated unsigned is 00111010.
11010001 rotated signed is 11111010.

Resources