As per my knowledge range of unsigned char in C is 0-255. but when I executed the below code its printing the 256 as output. How this is possible? I have got this code from "test your C skill" book which say char size is one byte.
main()
{
unsigned char i = 0x80;
printf("\n %d",i << 1);
}
Because the operands to <<* undergo integer promotion. It's effectively equivalent to (int)i << 1.
* This is true for most operators in C.
Several things are happening.
First, the expression i << 1 has type int, not char; the literal 1 has type int, so the type of i is "promoted" to int, and 0x100 is well within the range of a signed integer.
Secondly, the %d conversion specifier expects its corresponding argument to have type int. So the argument is being interpreted as an integer.
If you want to print the numeric value of a signed char, use the conversion specifier %hhd. If you want to print the numeric value of an unsigned char, use %hhu.
For arithmetical operations, char is promoted to int before the operation is performed. See the standard for details. Simplified: the "smaller" type is first brought to the "larger" type before the operation is performed. For the shift-operators, the resulting type is that of the left side operand, while for e.g. + and other "combining" operators it is the larger of both, but at least int. The latter means that char and short (and their unsigned counterparts are always promoted to int with the result being int, too. (simplified, for details please read the standard)
Note also that %d takes an int argument, not a char.
Additional notes:
unsigned char has not necessarily the range 0..255. Check limits.h, you will find UCHAR_MAX there.
char and "byte" are synonymously used in the standard, but neither are necessarily 8 bits wide (just very likely for modern general purpose CPUs).
As others have already explained, the statement "printf("\n %d",i << 1);" does integer promotion. So the one right shifting of integer value 128 results in 256. You could try the following code to print the maximum value of "unsigned char". The maximum value of "unsigned char" has all bits set. So a bitwise NOT operation using "~" should give you the maximum ASCII value of 255.
int main()
{
unsigned char ch = ~0;
printf("ch = %d\n", ch);
return 0;
}
Output:-
M-40UT:Desktop$ ./a.out
ch = 255
Related
int main()
{
char ch1 = 128;
unsigned char ch2 = 128;
printf("%d\n", (int)ch1);
printf("%d\n", (int)ch2);
}
The first printf statement outputs -128 and second 128. According to me both ch1 and ch2 will have same binary representation of the number stored: 10000000. So when I typecast both the values to integers how they end up being different value?
First of all, a char can be signed or unsigned and that depends on the compiler implementation. But, as you got different results. Then, your compiler treats char as signed.
A signed char can only hold values from -128 to 127. So, a value of 128 for signed char overflows to -128.
But an unsigned char can hold values from 0 to 255. So, a value of 128 remains the same.
An unsigned char can have a value of 0 to 255. A signed char can have a value of -128 to 127. Setting a signed char to 128 in your compiler probably wrapped around to the lowest possible value, which is -128.
Your fundamental error here is a misunderstanding of what a cast (or any conversion) does in C. It does not reinterpret bits. It's purely an operation on values.
Assuming plain char is signed, ch1 has value -128 and ch2 has value 128. Both -128 and 128 are representable in int, and therefore the cast does not change their value. (Moreover, writing it is redundant since the default promotions automatically convert variadic arguments of types lower-rank than int up to int.) Conversions can only change the value of an expression when the original value is not representable in the destination type.
For starters these castings
printf("%d\n", (int)ch1);
printf("%d\n", (int)ch2);
are redundant. You could just write
printf("%d\n", ch1);
printf("%d\n", ch2);
because due to the default argument promotions integer types with the rank that is less than the rank of the type int are promoted to the type int if an object of this type can represent the value stored in an object of an integer type with less rank.
The type char can behave either as the type signed char or unsigned char depending on compiler options.
From the C Standard (5.2.4.2.1 Sizes of integer types <limits.h>)
2 If the value of an object of type char is treated as a signed
integer when used in an expression, the value of CHAR_MIN shall be the
same as that of SCHAR_MIN and the value of CHAR_MAX shall be the same
as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall be 0 and
the value of CHAR_MAX shall be the same as that of UCHAR_MAX. 20) The
value UCHAR_MAX shall equal 2CHAR_BIT − 1.
So it seems by default the used compiler treats the type char as signed char.
As a result in the first declaration
char ch1 = 128;
unsigned char ch2 = 128;
the internal representation 0x80 of the value 128 was interpreted as a signed value because the sign bit is set. And this value is equal to -128.
So you got that the first call of printf outputted the value -128
printf("%d\n", (int)ch1);
while the second call of printf where there is used an object of the type unsigned char
printf("%d\n", (int)ch2);
outputted the value 128.
starting with a pseudo-code snippet:
char a = 0x80;
unsigned short b;
b = (unsigned short)a;
printf ("0x%04x\r\n", b); // => 0xff80
to my current understanding "char" is by definition neither a signed char nor an unsigned char but sort of a third type of signedness.
how does it come that it happens that 'a' is first sign extended from (maybe platform dependent) an 8 bits storage to (a maybe again platform specific) 16 bits of a signed short and then converted to an unsigned short?
is there a c standard that determines the order of expansion?
does this standard guide in any way on how to deal with those third type of signedness that a "pure" char (i called it once an X-char, x for undetermined signedness) so that results are at least deterministic?
PS: if inserting an "(unsigned char)" statement in front of the 'a' in the assignment line, then the result in the printing line is indeed changed to 0x0080. thus only two type casts in a row will provide what might be the intended result for certain intentions.
The type char is not a "third" signedness. It is either signed char or unsigned char, and which one it is is implementation defined.
This is dictated by section 6.2.5p15 of the C standard:
The three types char , signed char , and unsigned char are
collectively called the character types. The implementation
shall define char to have the same range, representation, and
behavior as either signed char or unsigned char.
It appears that on your implementation, char is the same as signed char, so because the value is negative and because the destination type is unsigned it must be converted.
Section 6.3.1.3 dictates how conversion between integer types occur:
1 When a value with integer type is converted to another integer type
other than
_Bool ,if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is
converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type
until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or
an implementation-defined signal is raised.
Since the value 0x80 == -128 cannot be represented in an unsigned short the conversion in paragraph 2 occurs.
char has implementation-defined signedness. It is either signed or unsigned, depending on compiler. It is true, in a way, that char is a third character type, see this. char has an indeterministic (non-portable) signedness and therefore should never be used for storing raw numbers.
But that doesn't matter in this case.
On your compiler, char is signed.
char a = 0x80; forces a conversion from the type of 0x80, which is int, to char, in a compiler-specific manner. Normally on 2's complement systems, that will mean that the char gets the value -128, as seems to be the case here.
b = (unsigned short)a; forces a conversion from char to unsigned short 1). C17 6.3.1.3 Signed and unsigned integers then says:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
One more than the maximum value would be 65536. So you can think of this as -128 + 65536 = 65408.
The unsigned hex representation of 65408 is 0xFF80. No sign extension takes place anywhere!
1) The cast is not needed. When both operands of = are arithmetic types, as in this case, the right operand is implicitly converted to the type of the right operand (C17 6.5.16.1 §2).
I just executed the following code
main()
{
char a = 0xfb;
unsigned char b = 0xfb;
printf("a=%c,b=%c",a,b);
if(a==b) {
printf("\nSame");
}
else {
printf("\nNot Same");
}
}
For this code I got the answer as
a=? b=?
Different
Why don't I get Same, and what is the value for a and b?
The line if (a == b)... promotes the characters to integers before comparison, so the signedness of the character affects how that happens. The unsigned character 0xFB becomes the integer 251; the signed character 0xFB becomes the integer -5. Thus, they are unequal.
There are 2 cases to consider:
if the char type is unsigned by default, both a and b are assigned the value 251 and the program will print Same.
if the char type is signed by default, which is alas the most common case, the definition char a = 0xfb; has implementation defined behavior as 0xfb (251 in decimal) is probably out of range for the char type (typically -128 to 127). Most likely the value -5 will be stored into a and a == b evaluates to 0 as both arguments are promoted to int before the comparison, hence -5 == 251 will be false.
The behavior of printf("a=%c,b=%c", a, b); is also system dependent as the non ASCII characters -5 and 251 may print in unexpected ways if at all. Note however that both will print the same as the %c format specifies that the argument is converted to unsigned char before printing. It would be safer and more explicit to try printf("a=%d, b=%d\n", a, b);
With gcc or clang, you can try recompiling your program with -funsigned-char to see how the behavior will differ.
According to the C Standard (6.5.9 Equality operators)
4 If both of the operands have arithmetic type, the usual arithmetic
conversions are performed....
The usual arithmetic conversions include the integer promotions.
From the C Standard (6.3.1.1 Boolean, characters, and integers)
2 The following may be used in an expression wherever an int or
unsigned int may be used:
...
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions.58) All other types are unchanged by the integer
promotions.
So in this equality expression
a == b
the both operands are converted to the type int. The signed operand ( provided that the type char behaves as the type signed char) is converted to the type int by means of propagating the sign bit.
As result the operands have different values due to the difference in the binary representation.
If the type char behaves as the type unsigned char (for example by setting a corresponding option of the compiler) then evidently the operands will be equal.
char stores from -128 to 127 and unsigned char stores from 0 to 255.
and 0xfb represents 251 in decimal which is beyond the limit of char a.
In the below snippet, shouldn't the output be 1? Why am I getting output as -1 and 4294967295?
What I understand is, the variable, c, here is of signed type, so shouldn't its value be 1?
char c=0xff;
printf("%d %u",c,c);
c is of signed type. a char is 8 bits. So you have an 8 bit signed quantity, with all bits 1. On a twos complement machine, that evaluates to -1.
Some compilers will warn you when you do that sort of thing. If you're using gcc/clang, switch on all the warnings.
Pedant note: On some machines it could have the value 255, should the compiler treat 'char' as unsigned.
You're getting the correct answer.
The %u format specifier indicates that the value will be an unsigned int. The compiler automatically promotes your 8-bit char to a 32-bit int. However you have to remember that char is a signed type. So a value of 0xff is in fact -1.
When the casting from char to int occurs, the value is still -1, but the it's the 32-bit representation which in binary is 11111111 11111111 11111111 11111111 or in hex 0xffffffff
When that is interpreted as an unsigned integer, all of the bits are obviously preserved because the length is the same, but now it's handled as an unsigned quantity.
0xffffffff = 4294967295 (unsigned)
0xffffffff = -1 (signed)
There are three character types in C, char, signed char, and unsigned char. Plain char has the same representation as either signed char or unsigned char; the choice is implementation-defined. It appears that plain char is signed in your implementation.
All three types have the same size, which is probably 8 bits (CHAR_BIT, defined in <limits.h>, specifies the number of bits in a byte). I'll assume 8 bits.
char c=0xff;
Assuming plain char is signed, the value 0xff (255) is outside the range of type char. Since you can't store the value 255 in a char object, the value is implicitly converted. The result of this conversion is implementation-defined, but is very likely to be -1.
Keep this carefully in mind: 0xff is simply another way to write 255, and 0xff and -1 are two distinct values. You cannot store the value 255 in a char object; its value is -1. Integer constants, whether they're decimal, hexadecimal, or octal, specify values, not representations.
If you really want a one-byte object with the value 0xff, define it as an unsigned char, not as a char.
printf("%d %u",c,c);
When a value of an integer type narrower than int is passed to printf (or to any variadic function), it's promoted to int if that type can hold the type's entire range of values, or to unsigned int if it can't. For type char, it's almost certainly promoted to int. So this call is equivalent to:
printf("%d %u", -1, -1);
The output for the "%d" format is obvious. The output for "%u" is less obvious. "%u" tells printf that the corresponding argument is of type unsigned int, but you've passed it a value of type int. What probably happens is that the representation of the int value is treated as if it were of type unsigned int, most likely yielding UINT_MAX, which happens to be 4294967295 on your system. If you really want to do that, you should convert the value to type unsigned int. This:
printf("%d %u", -1, (unsigned int)-1);
is well defined.
Your two lines of code are playing a lot of games with various types, treating values of one type as if they were of another type, and doing implicit conversions that might yield results that are implementation-defined and/or depend on the choices your compiler happens to make.
Whatever you're trying to do, there's undoubtedly a cleaner way to do it (unless you're just trying to see what your implementation does with this particular code).
Let us start with the assumption using OP's "c, here is of signed type"
char c=0xff; // Implementation defined behavior.
0xff is a hexadecimal constant with the value of 255 and type of int.
... the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. §6.3.1.4 3
So right off, the value of c is implementation defined (ID). Let us assume the common ID behavior of 8-bit wrap-around, so c --> -1.
A signed char will be promoted to int as part of a variadic argument to printf("%d %u",c,c); is the same as printf("%d %u",-1, -1);. Printing the -1 with "%d" is not an issue and "-1" is printed.
Printing an int -1 with "%x" is undefined behavior (UB) as it is a mis-matched specifier/type and does not fall under the exception of being representable in both types. The common UB is to print the value as if it was converted to unsigned before being passed. When UINT_MAX == 4294967295 (4-bytes) that prints the value as -1 + (UINT_MAX + 1) or "4294967295"`.
So with ID and UB, you get a result, but robust code would be re-written to depend on neither.
I have a binary value stored in a char in C, I want transform this byte into signed int in C.
Currently I have something like this:
char a = 0xff;
int b = a;
printf("value of b: %d\n", b);
The result in standard output will be "255", the desired output is "-1".
According to the C99 standard,
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
You need to cast your char to a signed char before assigning to int, as any value char could take is directly representable as an int.
#include <stdio.h>
int main(void) {
char a = 0xff;
int b = (signed char) a;
printf("value of b: %d\n", b);
return 0;
}
Quickly testing shows it works here:
C:\dev\scrap>gcc -std=c99 -oprint-b print-b.c
C:\dev\scrap>print-b
value of b: -1
Be wary that char is undefined by the C99 standard as to whether it is treated signed or unsigned.
6.2.5 Types
An object declared as type char is large enough to store any member of the basic
execution character set. If a member of the basic execution character set is stored in a
char object, its value is guaranteed to be positive. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.
...
The three types char, signed char, and unsigned char are collectively called
the character types. The implementation shall define char to have the same range,
representation, and behavior as either signed char or unsigned char.
Replace:
char a = 0xff
by
signed char a = 0xff; // or more explicit: = -1
to have printf prints -1.
If you don't want to change the type of a, as #veer added in the comments you can simply cast a to (signed char) before assigning its value to b.
Note that in both cases, this integer conversion is implementation-defined but this is the commonly seen implementation-defined behavior.
You are already wrong from the start:
char a = 0xff;
if char is signed, which you seem to assume, here you already have a value that is out of range, 0xFF is an unsigned quantity with value 255. If you want to see char as signed numbers use signed char and assign -1 to it. If you want to see it as a bit pattern use unsigned char and assign 0xFF to it. Your initialization of the int will then do what you expect it to do.
char, signed char and unsigned char are by definition of the standard three different types. Reserve char itself to characters, printing human readable stuff.