Explanation for the given code output in C - c

I have written code in C as
#include<stdio.h>
int main()
{
char a = 128;
char b = -128;
printf("%c",a);
printf("%c",b);
}
The output of above code is ÇÇ
Using 128 or -128 output is coming out to be same. Why? Please explain using binary if possible.

A signed char type typically has a range of -128 to 127. Since 128 is outside of this range, your compiler is converting it to a value with the same 8-bit bit pattern, and this is -128.

The literal -128 has type int and on a 32bit 2's complement representation has the bit pattern:
1111 1111 1111 1111 1111 1111 1000 0000
In this case, when you assign it to a char there is an implicit conversion (cast) such that only the LSB is used 1000 000 or in decimal, 128. Hence the result is the same.
Strictly the behaviour is implementation defined if char is signed, and the standard defines the behaviour in somewhat arcane "as-if" terms for unsigned char. Whether char itself is signed or unsigned is itself implementation defined as is the actual width and therefore range of a char. In practice though the above explanation is what is happening in this case and is the most likely behaviour for any implementation with 8-bit char, it makes no difference whether char is signed or unsigned.

Related

Why does printing a character variable with %d give a negative value in c?

I tried the following piece of code, expecting the output to be positive 64:
char val = 0x80;
printf("%d",val>>1);
My understanding of what happens is(please correct me if i'm wrong as i probably am):
Referring to the ASCII table, there is no mapping of 0x80 to any character so i assume this is stored as an unsigned integer.
This is represented as 1000 0000 in bitwise format, so a right shift of 1 would result in 0100 0000
When printed as an integer value, this will then show as positive 64.
However it shows -64.
In contrast:
char val = 0x40;
printf("%d",val>>1);
gives positive 32.
Is the value implicitly converted to a signed integer in the first case and not in the second?
Your C implementation uses an eight-bit signed char. (The C standard permits char to be signed or unsigned.) In char val = 0x80;, a char cannot represent the value you initialize it with, 128. In this case, the value 128 is converted to char which, per C 2018 6.3.1.3 3, yields either an implementation-defined value or a trap. Your implementation likely produces −128. (This is a common result because 128 in binary is 10000000, and converting an out-of-range result to an eight-bit two’s complement integer often simply reinterprets the low eight bits of the value as eight-bit two’s complement. In two’s complement, 10000000 represents −128.)
So val>>1 asks to shift −128 right one bit. Per C 2018 6.5.7 5, shifting a negative value right yields an implementation defined value. Producing −64 is a common result.
(In detail, in val>>1, val is automatically promoted from char to int. It has the same value, −128. However, with a 32-bit int, it would then be represented as 11111111111111111111111110000000 instead of 10000000. Then shifting right “arithmetically,” which propagates the sign bit, yields 11111111111111111111111111000000, which is −64, the result you go. Some C implementations might shift right “logically,” which sets the sign bit to zero, yielding 01111111111111111111111111000000. In this case, the printf would show “2147483584”, which is 231−64).
Whether ASCII has any character with code 0x80 is irrelevant. The C rules apply to the values involved, regardless of what character encoding scheme is used.
Right shift of the signed integer is implementation-defined. In most modern systems signed integers are two's complement and the shift will be translated by the compiler to the arithmetic shift.
after the shift the binary value of val is 0xc0 which is -64 in the two's complement encoding.
The val is converted first to the signed integer then passed to the function. If you put some effort into your question and add a few more lines to your code you would discover it yourself.
int main(void)
{
char c = 0x80;
printf("%d\n", c >> 1);
printf("%x\n", c >> 1);
printf("%hhd\n", c >> 1);
printf("%hhx\n", c >> 1);
c >>= 1;
printf("%d\n", c);
printf("%x\n", c);
printf("%hhd\n",c);
printf("%hhx\n",c);
}
https://godbolt.org/z/YsaGos
You can also see if the MSB bit is 0 arithmetic shift behaves exactly as the binary shift, thus 0x40 >> 1 == 0x20

tilde operator query in C working differently

I came across this question.
What is the output of this C code?
#include <stdio.h>
int main()
{
unsigned int a = 10;
a = ~a;
printf("%d\n", a);
}
I know what tilde operator do, now 10 can be represented as 1010 in binary, and if i bitwise not it, i get 0101, so i do not understand the output -11. Can anyone explain?
The bitwise negation will not result in 0101. Note that an int contains at least 16 bits. So, for 16 bits, it will generate:
a = 0000 0000 0000 1010
~a = 1111 1111 1111 0101
So we expect to see a large number (with 16 bits that would be 65'525), but you use %d as format specifier. This means you interpret the integer as a signed integer. Now signed integers use the two-complement representation [wiki]. This means that every integers where the highest bit is set, is negative, and furthermore that in that case the value is equal to -1-(~x), so -11. In case the specifier was %u, then the format would be an unsigned integer.
EDIT: like #R. says, %d is only well defined for unsigned integers, if these are in the range of the signed integers as well, outside it depends on the implementation.
It's undefined behaviour, since "%d" is for signed integers; for unsigned ones, use "%u".
Otherwise, note that negative values are often represented as a two's complement; So -a == (~a)+1, or the other way round: (~a) == -a -1. Hence, (~10) is the same as -10-1, which is -11.
The format specifier for an unsigned decimal integer is %u. %d is for a signed decimal integer.
printf("%d\n", a) is interpreting a as a signed int. You want printf("%u\n", a).

How to determine size limits of variable?

I'm struggling with determining edge-sizes of each variable. I can't understand the following problem.
To get maximum value of char for example, I use: ~ 0 >> 1
Which should work like this:
transfer 0 to binary: 0000 0000 (I assume that char is stored on 8
bits)
negate it: 1111 1111 (now I'm out of char max size)
move one place right: 0111 1111 (I get 127 which seems to be correct)
Now I want to present this result using printf function.
Why exactly do I have to use cast like this:
printf("%d\n", (unsigned char)(~0) >> 1)?
I just don't get it. I assume that it has something to do with point 2 when I get out of char range, but I'm not sure.
I will be grateful if you present me more complex explanation to this problem.
Please don't use these kinds of tricks. They might work on ordinary machines but they are possibly unportable and hard to understand. Instead, use the symbolic constants from the header file limits.h which contains the size limits for each of the basic types. For instance, CHAR_MAX is the upper bound for a char, CHAR_MIN is the lower bound. Further limits for the numeric types declared in stddef.h and stdint.h can be found in stdint.h.
Now for your question: Arithmetic is done on values of type int by default, unless you cause the operands involved to have a different type. This happens for various reasons, like one of the variables involved having a different type or you using a iteral of different type (like 1.0 or 1L or 1U). Even more importantly, the type of an arithmetic expression promotes from the inside to the outside. Thus, in the statement
char c = 1 + 2 + 3;
The expression 1 + 2 + 3 is evaluated as type int and only converted to char immediately before assigning. Even more important is that in the C language, you can't do arithmetic on types smaller than int. For instance, in the expression c + 1 where c is of type char, the compiler inserts an implicit conversion from char to int before adding one to c. Thus, a statement like
c = c + 1;
actually behaves like this in C:
c = (char)((int)c + 1);
Thus, ~0 >> 1 actually evaluates to 0xffffffff (-1) on a usual 32 bit architecture because the type int usually has 32 bits and right shifting of signed types usually shifts sign bits so the most significant bit becomes a one. Casting to unsigned char cause truncation, with the result being 0xff (255). All arguments but the first to printf are part of a variable argument list which is a bit complicated but basically means that all types smaller than int are converted to int, float is converted to double and all other types are left unchanged.
Now, how can we get this right? On an ordinary machine with two's complement and no padding bits one could use expressions like these to compute the largest and smallest char, assuming sizeof (char) < sizeof (int):
(1 << CHAR_BIT - 1) - 1; /* largest char */
-(1 << CHAR_BIT - 1); /* smallest char */
For other types, this is going to be slightly more difficult since we need to avoid overflow. Here is an expression that works for all signed integer types on an ordinary machine, where type is the type you want to have the limits of:
(type)(((uintmax_t)1 << sizeof (type) * CHAR_BIT - 1) - 1) /* largest */
(type)-((uintmax_t)1 << sizeof (type) * CHAR_BIT - 1) /* smallest */
For an unsigned type type, you could use this to get the maximum:
~(type)0
Please notice that all these tricks should not appear in portable code.
The exact effect of your actions is different from what you assumed.
0 is not 0000 0000. 0 has type int, which means that it is most likely 0000 0000 0000 0000 0000 0000 0000 0000, depending on how many bits int has on your platform. (I will assume 32-bit int.)
Now, ~0 is, expectedly, 1111 1111 1111 1111 1111 1111 1111 1111, which still has type int and is a negative value.
When you shift it to the right, the result is implementation-defined. Right-shifting negative signed integer values in C does not guarantee that you will obtain 0 in the sign bit. Quite the opposite, most platforms will actually replicate the sign bit when right-shifting. Which means that ~0 >> 1 will still give you 1111 1111 1111 1111 1111 1111 1111 1111.
Note, that even if you do this on a platform that shifts-in a 0 into the sign bit when right-shifting negative values, you will still obtain 0111 1111 1111 1111 1111 1111 1111 1111, which is in general case not the maximum value of char you were trying to obtain.
If you want to make sure that a right-shift operation shifts-in 0 bits from the left, you have to either 1) shift an unsigned bit-pattern or 2) shift a signed, but positive bit-pattern. With negative bit patterns you risk running into the sign-extending behavior, meaning that for negative values 1 bits would be shifted-in from the left instead of 0 bits.
Since C language does not have shifts that would work in the domain of [unsigned/signed] char type (the operand is promoted to int anyway before the shift), what you can do is make sure that you are shifting a positive int value and make sure that your initial bit-mask has the correct number of 1s in it. That is exactly what you achieve by using (unsigned char) ~0 as the initial mask. (unsigned char) ~0 will participate in the shift as a value of type int equal to 0000 0000 0000 0000 0000 0000 1111 1111 (assuming 8-bit char). After the shift you will obtain 0000 0000 0000 0000 0000 0000 0111 1111, which is exactly what you were trying to obtain.
That only works with unsigned integers. For signed integers, right shifting a negative number and the behaviour of in bit-wise inversion is implementation defined. Not only it depends on the representation of negative values, but also what CPU instruction the compiler uses to perform the right-shift (some CPUs do not have arithmetic (right) shift for instance.
So, unless you make additional constraints for your implementation, it is not possible to determine the limits of signed integers. This implies there is no completely portable way (for signed integers).
Note that whether char is signed or unsigned is also implementation defined and that (unsigned char)(~0) >> 1 is subject to integer promotions, so it will not yield a character result, but an int. (which makes the format specifier correct - allthough presumably unintended).
Use limits.h to get macros for your implementation's integer limits. This file has to be provided by any standard-compliant C compiler.

why is it printing 255

int main()
{
unsigned char a = -1;
printf("%d",a);
printf("%u",a);
}
when i have executed the above program i got 255 255 as the answer.
we know negative numbers will be stored in 2's complement.
since it is 2's complement the representation would be
1111 1111 -->2's complement.
but in the above we are printing %d(int) but integer is four bytes.
my assumption is even though it is character we are forcing compiler to treat it as integer.
so it internally uses sign extension concept.
1111 1111 1111 1111 1111 1111 1111 1111.
according to the above representation it has to be -1 in the first case since it is %d(signed).
in the second case it has to print (2^31- 1) but it is printing 255 and 255.
why it is printing 255 in both cases.
tell me if my assumption is wrong and give me the real interpretation.
Your assumption is wrong; the character will "roll over" to 255, then be padded to the size of an integer. Assuming a 32-bit integer:
11111111
would be padded to:
00000000 00000000 00000000 11111111
Up to the representation of a, you are correct. However, the %d and %u conversions of the printf() function both take an int as an argument. That is, your code is the same as if you had written
int main() {
unsigned char a = -1;
printf("%d", (int)a);
printf("%u", (int)a);
}
In the moment you have assigned -1 to a you have lost the information that it once was a signed value, the logical value of a is 255. Now, when you convert an unsigned char to an int, the compiler preserves the logical value of a and the code prints 255.
The compiler doesn't know what type the extra parameters in printf should be, since the only thing that specifies it should be treated as a 4-byte int is the format string, which is irrelevant at compile time.
What actually happens behind the scenes is the callee (printf) receives a pointer to each parameter, then casts to the appropriate type.
Roughly the same result as this:
char a = -1;
int * p = (int*)&a; // BAD CAST
int numberToPrint = *p; // Accesses 3 extra bytes from somewhere on the stack
Since you're likely running on a little endian CPU, the 4-byte int 0x12345678 is arranged in memory as | 0x78 | 0x56 | 0x34 | 0x12 |
If the 3 bytes on the stack following a are all 0x00 (they probably are due to stack alignment, but it's NOT GUARANTEED), the memory looks like this:
&a: | 0xFF |
(int*)&a: | 0xFF | 0x00 | 0x00 | 0x00 |
which evaluates to *(int*)&a == 0x000000FF.
unsigned char runs from 0-255 So the negative number -1 will print 255 -2 will print 254 and so on...
signed char runs from -128 to +127 so you get -1 for the same printf() which is not the case with unsigned char
Once you make a assignment to a char then the rest of the integer values will be padded so your assumption of 2^31 is wrong.
The negative number is represented using 2's complement(Implementation dependent)
So
1 = 0000 0001
So in order to get -1 we do
----------------------------------------
2's complement = 1111 1111 = (255) |
-----------------------------------------
It is printing 255, simply because this is the purpose from ISO/IEC9899
H.2.2 Integer types
1 The signed C integer types int, long int, long long int, and the corresponding
unsigned types are compatible with LIA−1. If an implementation adds support for the
LIA−1 exceptional values ‘‘integer_overflow’’ and ‘‘undefined’’, then those types are
LIA−1 conformant types. C’s unsigned integer types are ‘‘modulo’’ in the LIA−1 sense
in that overflows or out-of-bounds results silently wrap. An implementation that defines
signed integer types as also being modulo need not detect integer overflow, in which case,
only integer divide-by-zero need be detected.
If this is given, printing 255 is absolutly that, what the LIA-1 would expect.
Otherwise, if your implementation doesn't support C99's LIA-1 Annex part, then its simply undefined behaving.

Left shift operator

If I have the following:
char v = 32; // 0010 0000
then I do:
v << 2
the number becames negative. // 1000 0000 -128
I read the standard but it is only written:
If E1 has a signed type and nonnegative value, and E1 × 2 E2 is
representable in the result type, then that is the resulting value;
otherwise, the behavior is undefined.
so I don't understand if is a rule that if a bit goes on most left bit the
number must begin negative.
I'm using GCC.
Left shifting it twice would give 1000 0000)2 = 128)10.
If 128 is representable in char i.e. you're in some machine (with a supporting compiler) that provides a char of size > 8 bits then 128 would be the value you get (since it's representable in such a type).
Otherwise, if the size of a char is just 8 bits like most common machines, for a signed character type that uses two's complement for negative values, [-128, 127] is the representable range. You're in undefined behaviour land since it's not representable as-is in that type.
Signed data primitives like char use two's complement(http://en.wikipedia.org/wiki/Twos_complement) to encode value. You probably are looking for is unsigned char which won't encode the value using two's complement(no negatives).
Try using unsigned char instead char uses less bit for representing your character, by using unsigned char you avail more bits for representing your character
unsigned char var=32;
v=var<<2;

Resources