Converting unsigned char to signed int - c

Consider the following program
void main(){
char t = 179;
printf("%d ",t);
}
Output is -77.
But binary representation of 179 is
10110011
So, shouldn't output be -51, considering 1st bit is singed bit.
Binary representation of -77 is
11001101
It seems bit order is reversed. What's going on? Please somebody advice.

You claimed that the binary representation of -77 is 11001101, but the reversal is yours. The binary representation of -77 is 10110011.
Binary 10110011 unsigned is decimal 179.
Binary 10110011 signed is decimal -77.
You assigned the out-of-range value 179 to a signed char. It might theoretically be Undefined Behaviour, but apart from throwing an error, it would be a very poor compiler that placed anything but that 8-bit value in the signed char.
But when printed, it's interpreted as a negative number because b7 is set.

Looks like char is a signed type on your system. The valid range for a char would be [-128, 127]
By using
char t = 179;
the compiler uses the 2's complement of 179 (which is most likely -77) and assigns that value to t.

To convert between a positive and a negative number in 2's complement you invert all the bits and then you add 1.
10110011
01001100 (invert all bits)
01001101 (add one)
That is 77 decimal

Char could be signed or unsigned: that's up to your compiler.
If signed, it could be 2s or 1 complement.
Furthermore, it could be larger than 8 bits, although sizeof char is defined to be 1.
So it's inadvisable to rely on a specific representation.

Please note, that on many systems char is signed. Hence, when you assign 179 (which is of type int) to a char this value is outside of char range, hence it is unspecified behaviour.
6.3.1.3 Signed and unsigned integers:
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged. Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more unsigned integer than the maximum value that can be represented in the new type until the value is in the range of the new conversion to type. Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
If you change the type to unsigned char your program will perform correctly.
Also note that char, signed char and unsigned char are 3 distinct types, unlike int and signed int. The signedness of char is implementation defined.

Related

What does it mean by "if the value can be represented by the new type"?

In C11 standard
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the new type, it
is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type. 60)
3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or an
implementation-defined signal is raised.
In the first point,
I was wondering what it means by "if the value can be represented by the new type" in the first point? Two integer types might have different ranges of integers, but can have the same range of bit representations. (For example, unsigned int and int.)
Are both the bit representation and the integer value not changed before and after the conversion?
Thanks.
The C standard does not dictate one specific representation for integers. While most implementations you are likely to come across use two's complement for signed integers, that is not guaranteed. It also allows for one's complement where negating a number means inverting all bits (and not adding 1 as two's complement does), or sign-and-magnitude where negating a number means inverting only the high order bit.
As such, the language of the standard talks about what happens to the value of an integer when it is converted, not the representation.
As an example, suppose an int has range -231 to 231-1 and unsigned int has range 0 to 232-1. If you have an int with the value 45 that is converted to unsigned int, that value can be represented in both types so now you have an unsigned int with value 45.
Now suppose your int has value -1000. That value can not be represented in an unsigned int so it has to be converted as per the rule in clause 2, i.e. 232 is added to -1000 to result in 232 - 1000 == 4294966296. Now in two's complement representation, an unsigned int with value 4294966296 and an int with value -1000 happen to have the same representation which is FFFFFC18 in hex. This equivalence does not hold true for one's complement or sign-and-magnitude.
So converting a number to a different type may or may not change the representation, depending on the implementation.

Range of character is positive then if i store a integer in it still it should print a value between(0-255).why Output is -24

'''
void main()
{
int i=1000;
char c='A';
c=i;
printf("%d",c);
}
'''
Output is -24
why this output when range of character is (0-255)
When explaining that behaviour, the following things need to be considered:
First, in an assignment like c=i in
int i=1000;
char c='A';
c=i;
we need to consider that i is converted to the type of c before assignment. Integral conversion is defined here in an online C99 standard draft:
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if the value
can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.60)
3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or an
implementation-defined signal is raised.
So it is necessary to know if char is a singed integral type or not. This can differ from compiler to compiler, but it seems that your compiler considers type char as signed char by the default.
In case of signed char, the "result is implementation-defined", and we would need to have a look at the compiler's specification.
A common implementation is that the integral value 1000, which in binary is 00000011 11101000 is truncated to 8 bits and stored in the char value.
What 11101000 then means for a signed char is defined in the representation of types:
6-2-6-2 Integer types
2 For signed integer types, the bits of the object representation
shall be divided into three groups: value bits, padding bits, and the
sign bit. There need not be any padding bits; signed char shall not
have any padding bits. There shall be exactly one sign bit. Each bit
that is a value bit shall have the same value as the same bit in the
object representation of the corresponding unsigned type (if there are
M value bits in the signed type and N in the unsigned type, then M <=
N ). If the sign bit is zero, it shall not affect the resulting value.
If the sign bit is one, the value shall be modified in one of the
following ways:
the corresponding value with sign bit 0 is negated (sign and
magnitude);
the sign bit has the value -(2M) (two's complement);
the
sign bit has the value -(2M- 1) (ones' complement).
Which of these
applies is implementation-defined, as ...
Again, the result is defined in by the implementation of the compiler; but a common interpretation is two's complement, interpreting the most significant bit as sign bit:
In your case, the 8 bits of 11101000 is one sign bit (set) and 7 bits of the remainder; the remainder is 1101000 which is 104; the actual value in two's complement is then -(127-104+1), which is -24.
Note that it is not clear that the result is -24; other compilers might yield different results.
There is even one more step to consider, as you print the signed character value using format specifier "%d":
printf("%d",c)
This means, that negative signed char value gets promoted to int type; but this will yield the "same" negative value then. I omit the explanation of "promotion" and why arguments in printf are promoted at all.
As #JohnBollinger said, the typical range of type char, which is usually signed, is -128 - 127. When you assign 1000(1111101000 in binary) to the char, only the 8 least significant bits(assuming your chars are 8 bits) are kept leading to 11101000 in binary. This translates to -24 when it's printed as a signed integer.
Your compiler by default considers the type char as a signed integer type similarly to the type signed char.
The range of values for the type signed char is
— minimum value for an object of type signed char
SCHAR_MIN -127 // −(27 − 1)
— maximum value for an object of type signed char
SCHAR_MAX +127 // 27 − 1
From the C Standard (5.2.4.2.1 Sizes of integer types <limits.h>)
2 If the value of an object of type char is treated as a signed
integer when used in an expression, the value of CHAR_MIN shall be the
same as that of SCHAR_MIN and the value of CHAR_MAX shall be the same
as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall be 0 and
the value of CHAR_MAX shall be the same as that of UCHAR_MAX. 20) The
value UCHAR_MAX shall equal 2CHAR_BIT − 1.
Here is a demonstrative program where there is used the type unsigned char instead of the type char.
#include <stdio.h>
int main(void)
{
int i = 1000;
unsigned char c;
printf( "i = %#x\n", i );
c = i;
printf( "c = %d\n", c );
return 0;
}
The program output is
i = 0x3e8
c = 232
As you can see an object of the type signed char can not hold such a big value as 232. If you will subtract 232 from 256 you will get 24. So this value and the value 232 used as an internal representation of signed char will get 0. So this internal representation of 232 interpreted as a signed value should be equal to -24 that is 24 + -24 = 0.

Int to char conversion rule in C when int is outside the range of char

Please see the following C code.
#include <stdio.h>
int main(void)
{
char c1 = 3000;
char c2 = 250;
printf("%d\n",c1);
printf("%d\n",c2);
}
The output of the above code is
-72
-6
Please explain the integer to char conversion rule applied here as both 3000 and 250 are outside of the range of char (-128 to 127).
Please explain the integer to char conversion rule applied here as both 3000 and 250 are outside the range of char(-128 to 127).
Note first that C does not specify whether char is signed or unsigned. That is left to implementations to decide, and they are not consistent on that. On implementations where char is unsigned, 250 is within its range.
Supposing, however, that your chars are signed, which indeed seems consistent with your results, the C rule for the conversions implicit in the assignment statements will not satisfy you:
the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
(C2011 6.3.1.3/3)
Evidently no signal was raised, so the result is implementation-defined. Among the possibilities is that the least-significant CHAR_BIT bits of each assigned value are stored in the target variable.
There is then an additional conversion when you call printf(). The arguments are promoted from char to int, and since int can represent all values of type char, that one is value-preserving. That allows us to conclude that it is indeed plausible that your implementation converts int to char by keeping only the least-significant bits, and interpreting them as 8-bit two's complement.
Integer uses 4 byte and char uses 1 byte. Numbers in C are represented as signed and that means first bit from the left is for sign (positive, negative) and the rest is number in full complement. So number 3000 is represent like this 00000000000000000000000010111000 in binary and for int it is stored like this. Because char is only 1 byte last 8 bits represent saved number in char variable and that is 10111000. When you convert this into decimal you will have -72.

What will happen if I assign negative value to an unsigned char?

In C++ primer it says that "if we assign an out of range value to an object of unsigned type the result is the remainder of the value modulo the number of values the target type can hold."
It gives the example:
int main(){
unsigned char i = -1;
// As per the book the value of i is 255 .
}
Can anybody please explain it to me how this works.
the result is the remainder of the value modulo the number of values the target type can hold
Start with "the number of values the target type can hold". For unsigned char, what is this? The range is from 0 to 255, inclusive, so there are a total of 256 values that can be represented (or "held").
In general, the number of values that can be represented in a particular unsigned integer representation is given by 2n, where n is the number of bits used to store that type.
An unsigned char is an 8-bit type, so 28 == 256, just as we already knew.
Now, we need to perform a modulo operation. In your case of assigning -1 to unsigned char, you would have -1 MOD 256 == 255.
In general, the formula is: x MOD 2n, where x is the value you're attempting to assign and n is the bit width of the type to which you are trying to assign.
More formally, this is laid out in the C++11 language standard (§ 3.9.1/4). It says:
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.*
* This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.
Perhaps an easier way to think about modulo arithmetic (and the description that you'll most commonly see used) is that overflow and underflow wrap around. You started with -1, which underflowed the range of an unsigned char (which is 0–255), so it wrapped around to the maximum representable value (which is 255).
It's equivalent in C to C++, though worded differently:
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented by the new type until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
The literal 1 is of type int. For this explanation, let's assume that sizeof(int) == 4 as it most probably is. So then 1 in binary would look like this:
00000000 00000000 00000000 00000001
Now let's apply the unary minus operator to get the -1. We're assuming two's complement is used as it most probably is (look up two's complement for more explanation). We get:
11111111 11111111 11111111 11111111
Note that in the above numbers the first bit is the sign bit.
As you try to assign this number to unsigned char, for which holds sizeof(unsigned char) == 1, the value would be truncated to:
11111111
Now if you convert this to decimal, you'll get 255. Here the first bit is not seen as a sign bit, as the type is unsigned.
In Stroustrup's words:
If the destination type is unsigned, the resulting value is simply as many bits from the source as will fit in the destination (high-order bits are thrown away if necessary). More precisely, the result is the least unsigned integer congruent to the source integer modulo 2 to the nth, where n is the number of bits used to represent the unsigned type.
Excerpt from C++ standard N3936:
For each of the standard signed integer types, there exists a corresponding (but different) standard unsigned
integer type: “unsigned char”, “unsigned short int”, “unsigned int”, “unsigned long int”,
and “unsigned long long int”, each of which occupies the same amount of storage and has the same
alignment requirements (3.11) as the corresponding signed integer type47; that is, each signed integer type
has the same object representation as its corresponding unsigned integer type.
I was going through the excerpt from C++ primer myself and I think that I have kind of figured out a way to mathematically figure out how those values come out(feel free to correct me if I'm wrong :) ). Taking example of the particular code below.
unsigned char c = -4489;
std::cout << +c << std::endl; // will yield 119 as its output
So how does this answer of 119 come out?
well take the 4489 and divide it by the total number of characters ie 2^8 = 256 which will give you 137 as remainder.
4489 % 256 = 137.
Now just subtract that 137 from 256.
256 - 137 = 119.
That's how we simply derive the mod value. Do try it for yourself on other values as well. Has worked perfectly accurate for me!

Printing declared char value in C

I understand that character variable holds from (signed)-128 to 127 and (unsigned)0 to 255
char x;
x = 128;
printf("%d\n", x);
But how does it work? Why do I get -128 for x?
printf is a variadic function, only providing an exact type for the first argument.
That means the default promotions are applied to the following arguments, so all integers of rank less than int are promoted to int or unsigned int, and all floating values of rank smaller double are promoted to double.
If your implementation has CHAR_BIT of 8, and simple char is signed and you have an obliging 2s-complement implementation, you thus get
128 (literal) to -128 (char/signed char) to -128 (int) printed as int => -128
If all the listed condition but obliging 2s complement implementation are fulfilled, you get a signal or some implementation-defined value.
Otherwise you get output of 128, because 128 fits in char / unsigned char.
Standard quote for case 2 (Thanks to Matt for unearthing the right reference):
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.60)
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
This all has nothing to do with variadic functions, default argument promotions etc.
Assuming your system has signed chars, then x = 128; is performing an out-of-range assignment. The behaviour of this is implementation-defined ; meaning that the compiler may choose an action but it must document what it does (and therefore, do it reliably). This action is allowed to include raising a signal.
The usual behaviour that modern compilers do for out-of-range assignment is to truncate the representation of the value to fit in the destination type.
In binary representation, 128 is 000....00010000000.
Truncating this into a signed char gives the signed char of binary representation 10000000. In two's complement representation, which is used by all modern C systems for negative numbers, this is the representation of the value -128. (For historical curiousity: in one's complement this is -127, and in sign-magnitude, this is -0 which may be a trap representation and thus raise a signal).
Finally, printf accurately prints out this char's value of -128. The %d modifier works for char because of the default argument promotions and the facts that INT_MIN <= CHAR_MIN and INT_MAX >= CHAR_MAX.; this behaviour is guaranteed except on systems which have plain char as unsigned, and sizeof(int)==1 (which do exist but you'd know about it if you were on one).
Lets look at the binary representation of 128 when stored into 8 bits:
1000 0000
And now let's look at the binary representation of -128 when stored into 8 bits:
1000 0000
The standard for char with your current setup looks to be a signed char (note this isn't in the c standard, look here if you don't believe me) and thus when you're assigning the value of 128 to x you're assigning it the value 1000 0000 and thus when you compile and print it out it's printing out the signed value of that binary representation (meaning -128).
It turns out my environment is the same in assuming char is actually signed char. As expected if I cast x to be an unsigned char then I get the expected output of 128:
#include <stdio.h>
#include <stdlib.h>
int main() {
char x;
x = 128;
printf("%d %d\n", x, (unsigned char)x);
return 0;
}
gives me the output of -128 128
Hope this helps!

Resources