What is the reasoning behind char to int conversion output? - c

For example:
int x = 65535; char y = x; printf("%d\n", y)
This will output -1. Anyway to derive this by hand?

In order to derive this by hand you need to know several implementation-defined aspects of your system - namely
If char is signed or not
If the char is signed, what representation scheme is used
How does your system treat narrowing conversions of values that cannot be represented exactly in the narrow type.
Although the standard allows implementations to decide, a very common approach to narrowing conversions is to truncate the bits that do not fit in a narrow type. Assuming that this is the approach taken by your system, the first part of figuring out the output is to find the last eight bits of the int value being converted. In your case, 65535 is 11111111111111112, so the last eight bits are all ones.
Now you need to decide the interpretation of 111111112. On your system char is signed, and the system uses two's complement representation of negative values, so this pattern is interpreted as an eight-bit -1 value.
When you call printf, eight-bit value of signed char is promoted to int, so the preserved value is printed.
On systems where char is unsigned by default the same pattern would be interpreted as 255.

65535 is 0xffff
When converting to char, the left bits are left off:
0xffff AND 0xff is 0xff
When passing a char to a function, it is expanded to int. As the left-most bit of the char is a 1, this will be sign-extended, so it will become 0xffffffff (32 bits)
This is -1, so -1 is printed as.
As Dasblinkenlight points out, it matters if the char is signed or unsigned (whether as a default in the implementation or as the declaration unsigned char y). My last lines would read for unsigned char:
When passing an unsgined char to a function, it is expanded to unsigned int. As it is unsigned, just zeroes are added at the left, so it will become 0x000000ff (32 bits).
This is 255, so 255 is printed as.

65535 when converted to binary is 1111111111111111.
and when you assign this to a character variable it gets trimmed to least significant 8 bits which is 11111111.
This is equivalent to -1 in Binary. 2's complement of any number gives its negative value, hence 2's complement of 00000001:
11111110 + 1 = 11111111 is -1.

int x = 65535;
The value of 'x' 65535 is equivalent to 0xffff (binary = 1111 1111 1111 1111) in hexadecimal which represent the 2 Byte(16 bit) number .
Then the value of 'x' is converting into character 'y' and the size of character datatype is 1 Byte (8 bit) .so right side values of x is cutoff while assigning to 'y' because it only have 1B memory space .
So the value of 'y' is 0xff (binary = 1111 1111) which is equal to -1 in signed number integer representation .
And we are displaying the 'y' character value in term of integer by '%d' .

Related

Why does printing a character variable with %d give a negative value in c?

I tried the following piece of code, expecting the output to be positive 64:
char val = 0x80;
printf("%d",val>>1);
My understanding of what happens is(please correct me if i'm wrong as i probably am):
Referring to the ASCII table, there is no mapping of 0x80 to any character so i assume this is stored as an unsigned integer.
This is represented as 1000 0000 in bitwise format, so a right shift of 1 would result in 0100 0000
When printed as an integer value, this will then show as positive 64.
However it shows -64.
In contrast:
char val = 0x40;
printf("%d",val>>1);
gives positive 32.
Is the value implicitly converted to a signed integer in the first case and not in the second?
Your C implementation uses an eight-bit signed char. (The C standard permits char to be signed or unsigned.) In char val = 0x80;, a char cannot represent the value you initialize it with, 128. In this case, the value 128 is converted to char which, per C 2018 6.3.1.3 3, yields either an implementation-defined value or a trap. Your implementation likely produces −128. (This is a common result because 128 in binary is 10000000, and converting an out-of-range result to an eight-bit two’s complement integer often simply reinterprets the low eight bits of the value as eight-bit two’s complement. In two’s complement, 10000000 represents −128.)
So val>>1 asks to shift −128 right one bit. Per C 2018 6.5.7 5, shifting a negative value right yields an implementation defined value. Producing −64 is a common result.
(In detail, in val>>1, val is automatically promoted from char to int. It has the same value, −128. However, with a 32-bit int, it would then be represented as 11111111111111111111111110000000 instead of 10000000. Then shifting right “arithmetically,” which propagates the sign bit, yields 11111111111111111111111111000000, which is −64, the result you go. Some C implementations might shift right “logically,” which sets the sign bit to zero, yielding 01111111111111111111111111000000. In this case, the printf would show “2147483584”, which is 231−64).
Whether ASCII has any character with code 0x80 is irrelevant. The C rules apply to the values involved, regardless of what character encoding scheme is used.
Right shift of the signed integer is implementation-defined. In most modern systems signed integers are two's complement and the shift will be translated by the compiler to the arithmetic shift.
after the shift the binary value of val is 0xc0 which is -64 in the two's complement encoding.
The val is converted first to the signed integer then passed to the function. If you put some effort into your question and add a few more lines to your code you would discover it yourself.
int main(void)
{
char c = 0x80;
printf("%d\n", c >> 1);
printf("%x\n", c >> 1);
printf("%hhd\n", c >> 1);
printf("%hhx\n", c >> 1);
c >>= 1;
printf("%d\n", c);
printf("%x\n", c);
printf("%hhd\n",c);
printf("%hhx\n",c);
}
https://godbolt.org/z/YsaGos
You can also see if the MSB bit is 0 arithmetic shift behaves exactly as the binary shift, thus 0x40 >> 1 == 0x20

why is it printing 255

int main()
{
unsigned char a = -1;
printf("%d",a);
printf("%u",a);
}
when i have executed the above program i got 255 255 as the answer.
we know negative numbers will be stored in 2's complement.
since it is 2's complement the representation would be
1111 1111 -->2's complement.
but in the above we are printing %d(int) but integer is four bytes.
my assumption is even though it is character we are forcing compiler to treat it as integer.
so it internally uses sign extension concept.
1111 1111 1111 1111 1111 1111 1111 1111.
according to the above representation it has to be -1 in the first case since it is %d(signed).
in the second case it has to print (2^31- 1) but it is printing 255 and 255.
why it is printing 255 in both cases.
tell me if my assumption is wrong and give me the real interpretation.
Your assumption is wrong; the character will "roll over" to 255, then be padded to the size of an integer. Assuming a 32-bit integer:
11111111
would be padded to:
00000000 00000000 00000000 11111111
Up to the representation of a, you are correct. However, the %d and %u conversions of the printf() function both take an int as an argument. That is, your code is the same as if you had written
int main() {
unsigned char a = -1;
printf("%d", (int)a);
printf("%u", (int)a);
}
In the moment you have assigned -1 to a you have lost the information that it once was a signed value, the logical value of a is 255. Now, when you convert an unsigned char to an int, the compiler preserves the logical value of a and the code prints 255.
The compiler doesn't know what type the extra parameters in printf should be, since the only thing that specifies it should be treated as a 4-byte int is the format string, which is irrelevant at compile time.
What actually happens behind the scenes is the callee (printf) receives a pointer to each parameter, then casts to the appropriate type.
Roughly the same result as this:
char a = -1;
int * p = (int*)&a; // BAD CAST
int numberToPrint = *p; // Accesses 3 extra bytes from somewhere on the stack
Since you're likely running on a little endian CPU, the 4-byte int 0x12345678 is arranged in memory as | 0x78 | 0x56 | 0x34 | 0x12 |
If the 3 bytes on the stack following a are all 0x00 (they probably are due to stack alignment, but it's NOT GUARANTEED), the memory looks like this:
&a: | 0xFF |
(int*)&a: | 0xFF | 0x00 | 0x00 | 0x00 |
which evaluates to *(int*)&a == 0x000000FF.
unsigned char runs from 0-255 So the negative number -1 will print 255 -2 will print 254 and so on...
signed char runs from -128 to +127 so you get -1 for the same printf() which is not the case with unsigned char
Once you make a assignment to a char then the rest of the integer values will be padded so your assumption of 2^31 is wrong.
The negative number is represented using 2's complement(Implementation dependent)
So
1 = 0000 0001
So in order to get -1 we do
----------------------------------------
2's complement = 1111 1111 = (255) |
-----------------------------------------
It is printing 255, simply because this is the purpose from ISO/IEC9899
H.2.2 Integer types
1 The signed C integer types int, long int, long long int, and the corresponding
unsigned types are compatible with LIA−1. If an implementation adds support for the
LIA−1 exceptional values ‘‘integer_overflow’’ and ‘‘undefined’’, then those types are
LIA−1 conformant types. C’s unsigned integer types are ‘‘modulo’’ in the LIA−1 sense
in that overflows or out-of-bounds results silently wrap. An implementation that defines
signed integer types as also being modulo need not detect integer overflow, in which case,
only integer divide-by-zero need be detected.
If this is given, printing 255 is absolutly that, what the LIA-1 would expect.
Otherwise, if your implementation doesn't support C99's LIA-1 Annex part, then its simply undefined behaving.

Left shift operator

If I have the following:
char v = 32; // 0010 0000
then I do:
v << 2
the number becames negative. // 1000 0000 -128
I read the standard but it is only written:
If E1 has a signed type and nonnegative value, and E1 × 2 E2 is
representable in the result type, then that is the resulting value;
otherwise, the behavior is undefined.
so I don't understand if is a rule that if a bit goes on most left bit the
number must begin negative.
I'm using GCC.
Left shifting it twice would give 1000 0000)2 = 128)10.
If 128 is representable in char i.e. you're in some machine (with a supporting compiler) that provides a char of size > 8 bits then 128 would be the value you get (since it's representable in such a type).
Otherwise, if the size of a char is just 8 bits like most common machines, for a signed character type that uses two's complement for negative values, [-128, 127] is the representable range. You're in undefined behaviour land since it's not representable as-is in that type.
Signed data primitives like char use two's complement(http://en.wikipedia.org/wiki/Twos_complement) to encode value. You probably are looking for is unsigned char which won't encode the value using two's complement(no negatives).
Try using unsigned char instead char uses less bit for representing your character, by using unsigned char you avail more bits for representing your character
unsigned char var=32;
v=var<<2;

Printing declared char value in C

I understand that character variable holds from (signed)-128 to 127 and (unsigned)0 to 255
char x;
x = 128;
printf("%d\n", x);
But how does it work? Why do I get -128 for x?
printf is a variadic function, only providing an exact type for the first argument.
That means the default promotions are applied to the following arguments, so all integers of rank less than int are promoted to int or unsigned int, and all floating values of rank smaller double are promoted to double.
If your implementation has CHAR_BIT of 8, and simple char is signed and you have an obliging 2s-complement implementation, you thus get
128 (literal) to -128 (char/signed char) to -128 (int) printed as int => -128
If all the listed condition but obliging 2s complement implementation are fulfilled, you get a signal or some implementation-defined value.
Otherwise you get output of 128, because 128 fits in char / unsigned char.
Standard quote for case 2 (Thanks to Matt for unearthing the right reference):
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.60)
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
This all has nothing to do with variadic functions, default argument promotions etc.
Assuming your system has signed chars, then x = 128; is performing an out-of-range assignment. The behaviour of this is implementation-defined ; meaning that the compiler may choose an action but it must document what it does (and therefore, do it reliably). This action is allowed to include raising a signal.
The usual behaviour that modern compilers do for out-of-range assignment is to truncate the representation of the value to fit in the destination type.
In binary representation, 128 is 000....00010000000.
Truncating this into a signed char gives the signed char of binary representation 10000000. In two's complement representation, which is used by all modern C systems for negative numbers, this is the representation of the value -128. (For historical curiousity: in one's complement this is -127, and in sign-magnitude, this is -0 which may be a trap representation and thus raise a signal).
Finally, printf accurately prints out this char's value of -128. The %d modifier works for char because of the default argument promotions and the facts that INT_MIN <= CHAR_MIN and INT_MAX >= CHAR_MAX.; this behaviour is guaranteed except on systems which have plain char as unsigned, and sizeof(int)==1 (which do exist but you'd know about it if you were on one).
Lets look at the binary representation of 128 when stored into 8 bits:
1000 0000
And now let's look at the binary representation of -128 when stored into 8 bits:
1000 0000
The standard for char with your current setup looks to be a signed char (note this isn't in the c standard, look here if you don't believe me) and thus when you're assigning the value of 128 to x you're assigning it the value 1000 0000 and thus when you compile and print it out it's printing out the signed value of that binary representation (meaning -128).
It turns out my environment is the same in assuming char is actually signed char. As expected if I cast x to be an unsigned char then I get the expected output of 128:
#include <stdio.h>
#include <stdlib.h>
int main() {
char x;
x = 128;
printf("%d %d\n", x, (unsigned char)x);
return 0;
}
gives me the output of -128 128
Hope this helps!

Why stores 255 in a char variable give its value -1 in C?

I am reading a C book, and there is a text the author mentioned:
"if ch (a char variable) is a signed type, then storing 255 in the ch variable gives it the value -1".
Can anyone elaborate on that?
Assuming 8-bit chars, that is actually implementation-defined behaviour. The value 255 cannot be represented as a signed 8-bit integer.
However, most implementations simply store the bit-pattern, which for 255 is 0xFF. With a two's-complement interpretation, as a signed 8-bit integer, that is the bit-pattern of -1. On a rarer ones'-complement architecture, that would be the bit pattern of negative zero or a trap representation, with sign-and-magnitude, it would be -127.
If either of the two assumptions (signedness and 8-bit chars) doesn't hold, the value will be¹ 255, since 255 is representable as an unsigned 8-bit integer or as a signed (or unsigned) integer with more than 8 bits.
¹ The standard guarantees that CHAR_BIT is at least 8, it may be greater.
Try it in decimal. Suppose we can only have 3 digits. So our unsigned range is 0 - 999.
Let's see if 999 can actually behave as -1 (signed):
42 + 999 = 1041
Because we can only have 3 digits, we drop the highest order digit (the carry):
041 = 42 - 1
This is a general rule that applies to any number base.
That is not guaranteed behavior. To quote ANSI/ISO/IEC 9899:1999 §6.3.1.3 (converting between signed and unsigned integers) clause 3:
Otherwise, the new type is signed and the value cannot be represented in it;
either the result is implementation-defined or an implementation-defined signal
is raised.
I'll leave the bitwise/2's complement explanations to the other answers, but standards-compliant signed chars aren't even guaranteed to be too small to hold 255; they might work just fine (giving the value 255.)
That's how two's complement works. Read all about it here.
You have classical explanation in others messages. I give you a rule:
In a signed type with size n, presence of MSB set as 1, must interpreted as -2^(n-1).
For this concrete question, assuming size of char is 8 bits length (1 bytes), 255 to binary is equal to:
1*2^(7) +
1*2^(6) +
1*2^(5) +
1*2^(4) +
1*2^(3) +
1*2^(2) +
1*2^(1) +
1*2^(0) = 255
255 equivalent to 1 1 1 1 1 1 1 1.
For unsigned char, you get 255, but if you are dealing with char (same as signed char), MSB represents a negative magnitude:
-1*2^(7) +
1*2^(6) +
1*2^(5) +
1*2^(4) +
1*2^(3) +
1*2^(2) +
1*2^(1) +
1*2^(0) = -1

Resources