ASCII value is greater than 127 - c

#include <stdio.h>
int main() {
char c = 125;
c = c + 10;
printf("%d", c);
return 0;
}
The output of this code is -121.
How this output is -121? Can you please explain?
If I add 100 instead of 10 with c then output is -31.
Why?

In C language the char type is an integer type able to represent all the characters required by the language itself. The standard does not specify whether it is a signed or unsigned type. After seeing the output, I can guess that your system uses a signed char of size 8 bits (7 value bits and one sign bit) and represent negative values in 2-complement.
So (as int) 125 + 10 = 135. 135 > 128 so the actual value is 135 - 256 = -121.
(In two complement mode, number are just wrapped over 2**SIZE_IN_BITS...)

How it works is that the type char in your compiler is a signed type which is 8 bits wide, and represents integer values using a system called two's complement.
In this system, the positive values go from 0 to 127 in binary like this:
00000000 0
00000001 1
00000010 2
00000011 3
... .
01111111 - 127
Upon the next increment, the highest bit flips to 1: we get 10000000. Then the subsequent binary values are interpreted as negative under two's complement:
10000000 128 -128
10000001 129 -127
10000010 130 -126
...
11111111 255 -1
The middle column, continuing to count up from 128, shows the unsigned interpretation of the 8 bit datum; it just keeps counting toward 255. That would be the behavior of the type unsigned char.
The right column, showing negative numbers, is the two's complement interpretation of exactly the same bit patterns. This instead covers a negative range of integers from -128 to -1.
Now what happens in C when we do c + 10?
Firstly, according to the rules of the C language, the c value of type char gets promoted to the type int, which has a much larger range. Thus the int value 125 is being added to the int value 10, resulting in the int value 135.
What happens next is that c is assigned that value, exactly as if by c = 135. But 135 it not in range; char only goes up to 127. Implementation-defined behavior ensues. The 135 value is somehow forced to fit, by discarding bits.
C compilers for two's complement machines (i.e. virtually all compilers on the planet) truncate wider integer value to narrower values simply by truncation of the bottom bits of the value to fit the smaller type.
So, the binary representation of 135 is this.
10000111
More precisely, suppose int is 32 bits wide. Then the representation of the int value 135 is:
00000000000000000000000010000111
This is converted to char by chopping off the top 24 bits, leaving the bottom 8:
------------------------10000111
So we end up with
10000111
in the char type. But what is that? Referring to our original table, we can add a few more rows:
10000000 128 -128
10000001 129 -127
10000010 130 -126
10000011 131 -125
10000100 132 -124
10000101 133 -123
10000110 134 -122
10000111 135 -121 <----
...
11111111 255 -1
And there is the -121.

char type is represented in memory as a byte (most of the time 8 bits), so it can represent values from -128 to +127. If you try to store a bigger value in the memory, it causes an overflow and the result is not what expected.
Here is a nice video explaning what overflow is.

The char is one byte that is 8 bit. one bit is for sign and 7 bits are all of you can use.
"char" range is:
char ==> -128 ~ 127
static_cast<char>(c+1) = 126
static_cast<char>(c+2) = 127
static_cast<char>(c+3) = -128
if you use from "unsigned char", positive range is increase.
unsigned char ==> 0 ~ 255
static_cast<unsigned char>(c+3) = 128

Related

Addition of char value in c

#include<stdio.h>
int main()
{
char c =118,a=117;
c=c+10;
a=a+10;
printf("c:%d, a:%d\n", c,a);
}
The answer is c:-128, a:127.
Can someone explain me why c+10 is -128 and a+10 is 127?
Thanks in advance.
117 + 10 = 127 which is in range of the char type (-128 to 127)
If you open the calculator of your windows in programmer mode, you can see that 118 is represented as 1110110 in binary. also, 10 is represented as 1010. if we add this two, the result is ‭10000000‬. it is not in the range of the char type, and this number is equivalent to -128. so the -128 is printed.
because char is 8-bit signed in your compiler. So 118+10 is out of range (max is 127).
The implementation of your compiler "wraps" it around and you get -128 instead.
Signed 8-bit values range from -128 to 127. The bits for 127 are 0111 1111, and the bits for 128 would be 1000 0000. The problem is, in a signed number, the high order (leftmost) bit is the sign flag (0 is +, 1 is -). So, because it is signed, the computer interprets this as a negative number with the result -128 (This is called signed overflow, if I remember correctly, and everyone runs into this when programming at one point or another) (check out 2's complement to see why the low 7 bits are 128, not 0). You can get around this "problem" by declaring c and a as unsigned char instead of char.
BTW, you could save a variable this way:
char a=117;
printf("c:%d, a:%d\n", a+11,a+10);
In your case, a char is being represented by the compiler on your particular platform as a signed 8-bit value, and is in 2's complement reoresentation. That means the highest bit is the sign bit (if it's a 1, then the number is negative). So the range in binary for non-negative values is (in binary) 00000000-01111111 which is 0-127 in decimal. The negative values range (in binary) from 10000000 to 11111111 which is -128 to -1 in decimal.
If you start with 118, in binary that's:
01110110
If I add decimal 10, that's adding 1010 in binary:
01110110
+00001010
---------
10000000
You can see now the highest bit is set, meaning the number overflowed (became greater than the maximum 127 in decimal) and now represents a negative number. The 8-bit binary value 10000000 happens to represent -128 in decimal.
So adding (in decimal) 10 to the char value 118, yields -128.
Your value of a is 117, so 10+117 = 127 still fits in the 7 bits for a positive value of 127. You can do the above binary analysis as an exercise to see how that works.

Printing Integer value of character variable in C

Recently in my Institute I was given a code and was asked to find out the answer. The code looks like this.
#include <stdio.h>
int main()
{
char ch=500;
printf("%d\n",ch);
}
The output will come as -12
My question is: How can I calculate the value for this kind of code? Is there any formula or process for finding the values?
The code is syntactically correct but not doing what you think.
As Paul Hankin is suggesting your compiler should give you a warning as you are trying to set a number that does not fit into a char in a char variable.
A char is 1 byte so it can store a number up to 127 if signed or 255 if unsigned.
The value overflows and only the lower 8 bits are taken into account.
500 = 0b111110100
Take only the lower 8 bits: 0b11110100
The MSB is 1 so it's negative number.
1-complement plus 1 is 0b00001100 which is 12
That's why you get -12.
Replacing it by a short or an int should correctly print 500.
You can't know, unless you know the specifics of the given system.
char is typically only 8 bits wide and can't hold the value of 500. Furthermore, the char type is unsuitable for storing integer values, since it has implementation-defined signedness.
Meaning you can't know if it can contain values from 0 to 255 or from -128 to 127 (two's complement). It can even in theory have other constrains and other signedness formats.
Also, the conversion from a large unsigned integer to a smaller signed one is implementation-defined.
I would guess that your specific system has signed 8 bit char type and two's complement signedness. The raw value of 500 is 0x1F4. Upon initialization, your particular compiler will truncate this to fit an 8 bit variable, meaning you end up with only the least significant byte, 0xF4. Since you have an 8 bits signed variable in two's complement format, 0xF4 equals -12.
(The implicit type promotion done by printf preserves the sign.)
None of this behavior is guaranteed across different systems.
Needless to say, the code is completely non-portable. You should not write code like this, which heavily relies on numerous forms of poorly defined behavior.
here in this code ch is signed character type and seems its 8-bit ascii code...
so its range is from -128 to +127 (total 256)...
now lets cycle over this ascii range starting from 0 to 127 then -128 to 0 and so...
total value of integer number is 500
from 0 to 127 and -128 to 0 (whole circle),
left integer value is 500-256 = 244.
from 0 to 127,
left integer value is 244-128=116.
from -128 to -12,
left integer value is 116-116=0.
so after counting all 500 we are at -12 of ascii range....
"sorry for long and non-mathematical explanation....effort is to make you understand"
When using char data types to encode integers, the char values go from -128 to 127. The value is overflowing.
500 - 127 = 373
373 - 128 = 245
245 - 127 = 118
118 - 128 = -10
Since we are passing over 0 twice and that counts, it give us -12. If you try to print the character you will see that it is invalid there is no -12 literal. You can also check man ascii

Confusion about the output of this program

I am new to C programming and I am currently learning Data types revised chapter. In the below
program, my o/p is 36 but compiler is showing the o/p 35.
main( )
{
char ch = 291 ;
printf ( "\n%d %c", ch, ch ) ;
}
Can anyone explain me why the o/p is coming 35? I am currently using GCC 32-bit compiler.
Your system apparently has an 8-bit char type. That means 291 is too big to fit - the compiler reduces it modulo 256 (28) and you end up with 35.
In this case, Clang provides a great warning:
example.c:3:11: warning: implicit conversion from 'int' to 'char' changes value
from 291 to 35 [-Wconstant-conversion]
char ch = 291 ;
~~ ^~~
You should probably avoid relying on this behaviour, since it may vary from implementation to implementation. The C99 and C11 specs (Section 6.3.1.3) say about signed integer conversions:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Since you're using GCC, you might be interested to read this excerpt from the documentation:
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2N to be within range of the type; no signal is raised.
And there you have your full explanation of the reduction modulo 256.
You're actually making an overflow. A signed character can only go from values -128 to 127 (256 values = 28) in 8-bit character systems (which are pretty much everywhere).
So we go with an actual character which value equals 291 % 256 = 35.
Don't forget that the first character is 0, not 1.
Here is actually how a char is represented with the 2's complement system:
unsigned
0 ------- 127 128 ------- 255
signed
0 ------- 127 -128 ------- -1
So actually a signed char c1 = -128 equals an unsigned char c2 = 128
But here this problem is just irrelevant. We're talking about modulo because only the last eight bits are taken into account (where would the other be stored when there are only eight bits available for it in the memory ?).
291 = % 1 0010 0011
(% means binary representation)
It keeps only the % 0010 0011 which equals 35 and which will be considered exactly the same either you take it signed or not.
Because a char can only contain 8 bits of information, and 291 require more than that to be stored. It will then discard the higher bits and keep only what fitted in the variable.
You can simulate that by both bitwise and module operation:
291 % 256 = 35
291 & 0xFF = 35
A 8-bit char can contain values of -128 to 127 or 0 to 255, depending if its signed or unsigned.

Simple Character Interpretation In C

Here is my code
#include<stdio.h>
void main()
{
char ch = 129;
printf("%d", ch);
}
I get the output as -127. What does it mean?
It means that char is an 8-bit variable that can only hold 2^8 = 256 values, since the declaration is char ch, ch is a signed variable, which means it can store 127 negative and positive values. when you ask to go over 127 then the value starts over from -128.
Think of it like some arcade games where you go from one side of the screen to the other:
ch = 50;
-----> 50 is stored
|___________________________________|___________| since it fits
-128 0 50 127 between -127
and 128
ch = 129;
--- 129 goes over
--> 127 by 2, so
|__|____________________________________________| it 'lands' in
-128 -127 0 127 -127
BUT!! you shouldn't rely on this since it's undefined behaviour!
In honor of Luchian Grigore here's the bit representation of what's happening:
A char is a variable that will hold 8-bits or a byte. So we have 8 0's and 1's struggling to represent whatever value you desire. If the char is a signed variable it will represent whether it's a positive or negative number. You probably read about the one bit representing the sign, that's an abstraction of the true process; in fact it is only one of the first solutions implemented in electronics. But such a trivial method had a problem, you would have 2 ways of representing 0 (+0 and -0):
0 0000000 -> +0 1 0000000 -> -0
^ ^
|_ sign bit 0: positive |_ sign bit 1: negative
Inconsistencies guaranteed!! So, some very smart folks came up with a system called Ones' Complement which would represent a negative number as the negation (NOT operation) of its positive counterpart:
01010101 -> +85
10101010 -> -85
This system... had the same problem. 0 could be represented as 00000000 (+0) and 11111111 (-0). Then came some smarter folks who created Two's Complement, which would hold the negation part of the earlier method and then add 1, therefore removing that pesky -0 and giving us a shiny new number to our range: -128!. So how does our range look now?
00000000 +0
00000001 +1
00000010 +2
...
01111110 +126
01111111 +127
10000000 -128
10000001 -127
10000010 -126
...
11111110 -2
11111111 -1
So, this should give an idea of what's happening when our little processor tries to add numbers to our variable:
0110010 50 01111111 127
+0000010 + 2 +00000010 + 2
------- -- -------- ---
0110100 52 10000001 -127
^ ^ ^
|_ 1 + 1 = 10 129 in bin _| |_ wait, what?!
Yep, if you review the range table above you can see that up to 127 (01111111) the binary was fine and dandy, nothing weird happening, but after the 8'th bit is set at -128 (10000000) the number interpreted no longer held to its binary magnitude but to the Two's Complement representation. This means, the binary representation, the bits in your variable, the 1's and 0's, the heart of our beloved char, does hold a 129... its there, look at it! But the evil processor reads that as measly -127 cause the variable HAD to be signed undermining all its positive potential for a smelly shift through the real number line in the Euclidean space of dimension one.
It means you ran into undefined behavior.
Any outcome is possible.
char ch=129; is UB because 129 is not a representable value for a char for you specific setup.
Your char is most likely an 8-bit signed integer that is stored using Two's complement. Such a variable can only represent numbers between -128 and 127. If you do "127+1" it wraps around to -128. So 129 is equivalent to -127.
This comes from the fact that a char is coded on one byte, so 8 bits of data.
In fact char has a value coded on 7 bits and have one bit for the sign, unsigned char have 8 bits of data for its value.
This means:
Taking abcdefgh as 8 bits respectively (a being the leftmost bit, and h the rightmost), the value is encoded with a for the sign and bcdefgh in binary format for the real value:
42(decimal) = 101010(binary)
stored as :
abcdefgh
00101010
When using this value from the memory :
a is 0 : the number is positive, bcdefgh = 0101010 : the value is 42
What happens when you put 129 :
129(decimal) = 10000001(binary)
stored as :
abcdefgh
10000001
When using this value from the memory :
a is 0 : the number is negative, we should substract one and invert all bits in the value, so (bcdefgh - 1) inverted = 1111111 : the value is 127
The number is -127
On your system: char 129 has the same bits as the 8 bit signed integer -127.
An unsigned integer goes from 0 to 255, and signed integer -128 to 127.
Related (C++):
You may also be interested in reading the nice top answer to What is an unsigned char?
As #jmquigley points out. This is strictly undefined behavior and you should not rely on it.
Allowing signed integer overflows in C/C++
The char type is a 8-bit signed integer. If you interpret the representation of unsigned byte 129 in the two's complement signed representation, you get -127.
The type char can be either signed or unsigned, it's up to the compiler. Most compilers have it as `signed.
In your case, the compiler silently converts the integer 129 to its signed variant, and puts it in an 8-bit field, which yields -127.
char is 8 bits, signed. It can only hold values -128 to 127. When you try and assign 129 to it you get the result you see because the bit that indicates signing is flipped. Another way to think of it is that the number "wraps" around.
Whether a plain char is signed or unsigned, is implementation-defined behavior. This is a quite stupid, obscure rule in the C language. int, long etc are guaranteed to be signed, but char could be signed or unsigned, it is up to the compiler implementation.
On your particular compiler, char is apparently signed. This means, assuming that your system uses two's complement, that it can hold values of -128 to 127.
You attempt to store the value 129 in such a variable. This leads to undefined behavior, because you get an integer overflow. Strictly speaking, anything can happen when you do this. The program could print "hello world" or start shooting innocent bystanders, and still conform to ISO C. In practice, most (all?) compilers will however implement this undefined behavior as "wrap around", as described in other answers.
To sum it up, your code relies on two different behaviors that aren't well defined by the standard. Understanding how the result of such unpredictable code ends up in a certain way has limited value. The important thing here is to recognize that the code is obscure, and learn how to write it in a way that isn't obscure.
The code could for example be rewritten as:
unsigned char ch = 129;
Or even better:
#include <stdint.h>
...
uint8_t ch = 129;
As a rule of thumb, make sure to follow these rules in MISRA-C:2004:
6.1 The plain char type shall be used only for the storage and use of character values.
6.2 signed and unsigned char type shall be used only for the storage and use of numeric values.

unsigned char rotate

I'm a little bit confused as to what an unsigned char is. A signed char is the representation of the char in bit form right? A sample problem has us rotating to the right by n bit positions, the bits of an unsigned char with this solution:
unsigned char rotate(unsigned char x, int n) {
unsigned char temp = x << 8 - n;
x = x >> n;
return (x | temp);
}
If anyone could explain with char examples and their respective bits, it would be greatly appreciated. Thanks so much.
signed char, char and unsigned char are all integer types. For the sake of simplicity I'll assume that CHAR_BIT is 8 and that signed types are 2's complement. So:
signed char is a number from -128 to +127
unsigned char is a number from 0 to 255
char is either the same range as signed char, or the same range as unsigned char, depending on your C implementation.
As far as C is concerned, a character is just a number within the range of the char type (although various character functions like tolower require the value to be cast to an unsigned type on the way in, even if char is signed).
So, signed char and unsigned char are both representation of the character in bit form. For numbers in the range 0 to +127 they both use the same representation (there's only one way to represent positive numbers in binary). For numbers outside that range, the signed representation of a negative number n is the same bits as the unsigned representation of n + 256 (definition of 2's complement).
The reason this code uses unsigned char is that right-shift with a negative signed value has implementation-defined result. Left shift with a negative signed value has undefined behavior. Usually left-shift behaves the same as for unsigned values, which is OK, but right shift inserts bits at the left-hand-side with value 1, a so-called "arithmetic shift", which isn't what's wanted here. Unsigned values always shift in zeros, and it's the shifting in of zero that lets this code build the two parts of the rotated result and or them together.
So, assuming an input value of x = 254 (11111110), and n = 1, we get:
x << 7 is 0111111100000000
x >> 1 is 01111111
| is 0111111101111111
convert to unsigned char to return is 01111111
If we used a signed type instead of unsigned char, we'd quite possibly get:
x is -2 11111110
x << 7 is 11111111111111111111111100000000 (assuming 32-bit int, since
smaller types are always promoted to int for arithmetic ops)
x >> 1 is implementation-defined, possibly
11111111111111111111111111111111
| is 11111111111111111111111111111111
convert to signed char to return is -1
So the bit-manipulation with the unsigned char results in the correct answer, rotated by 1 bit to move the 0 from the end to the start. Bit-manipulation with the signed char, probably gives the wrong result, might give the right result if negative signed values do a logical right shift, but on really unusual implementations could do anything.
Pretty much always for bit-manipulation tasks like rotate, you want to use unsigned types. It removes the implementation-dependence (other than on the width of the type), and avoids you having to reason about negative and non-negative values separately.
Declaring a variable as unsigned char tells the compiler to treat the underlying bit pattern as a number from 0 (00000000) to 255 (11111111). Declaring it a char tells the compiler to apply two's complement to the underlying bit pattern and treat it as a number from -128 (10000000) to 127 (01111111).
Consider a 3-bit number. If it is unsigned, you have:
000 = 0
001 = 1
010 = 2
011 = 3
100 = 4
101 = 5
110 = 6
111 = 7
If it is signed you have:
100 = -4
101 = -3
110 = -2
111 = -1
000 = 0
001 = 1
010 = 2
011 = 3
What is neat with respect to arithmetic (as that link mentions) is that you don't have to treat signed binary numbers differently than unsigned ones. You just do the actual binary math without regard to signed or unsigned. But you do have to apply the signed/unsigned interpretation to the inputs and to the output.
In the signed realm you might have:
2 + (-3) = 010 + 101 = 111 = -1
But in the unsigned realm this is:
2 + 5 = 010 + 101 = 111 = 7
So it's all a matter of interpretation since the actual bit patters being added and the bit pattern of the sum are the same in both cases.
an unsigned char is just an 8-bit integer type that can take values between 0 and 255 and a signed char can take values between -127 and 128. In the actual machine code there is no real difference, except one: when you do a right shift on a signed type using >> (be it char, short or int) it is carried out as an arithmetical shift, meaning for negative values (which have a 1 as MSB) a 1 is shifted in, instead of a 0 and the above code will not work as expected.
EDIT: Your above code example of rotating an unsigned char by 3 bits for signed and unsigned:
00110101 rotated unsigned and signed is 10100110.
but for a number whit a 1 in front you get an arithmetic shift and thus
11010001 rotated unsigned is 00111010.
11010001 rotated signed is 11111010.

Resources