Confusion regarding getchar() in c [duplicate] - c

Can someone explain why this works?
char c = '9';
int x = (int)(c - '0');
Why does subtracting '0' from an ascii code of a char result the number that that char is representing?

Because the char are all represented by a number and '0' is the first of them all.
On the table below you see that:
'0' => 48
'1' => 49
'9' => 57.
As a result: ('9' - '0') = (57 − 48) = 9
Source: http://www.asciitable.com

char is an integer type, just like int and family. An object of type char has some numerical value. The mapping between characters that you type in a character literal (like '0') and the value that the char object has is determined by the encoding of that character in the execution character set:
C++11 §2.14.3:
An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set.
C99 §6.4.4.4:
An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'.
[...]
An integer character constant has type int.
Note that the int can be converted to a char.
The choice of execution character set is up to the implementation. More often than not, the choice is ASCII compatible, so the tables posted in other answers have the appropriate values. However, the character set does not need to be ASCII compatible. There are some restrictions, though. One of them is as follows (C++11 §2.3, C99 §5.2.1):
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’
[...]
In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.
This means that whatever value the character '0' has, the character '1' has value one more than '0', and character '2' has value one more than that, and so on. The numeric characters have consecutive values. You can summarise the mapping like so:
Character: 0 1 2 3 4 5 6 7 8 9
Corresponding value: X X+1 X+2 X+3 X+4 X+5 X+6 X+7 X+8 X+9
All of the digit characters have values offset from the value of '0'.
That means, if you have a character, let's say '9' and subtract '0' from it, you get the "distance" between the value of '9' and the value of '0' in the execution character set. Since they are consecutive, the distance will be 9.

Because the C standard guarantees that the characters 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 are always in this order regarding their numerical character code. So, if you subtract the char code of '0' from another digit, it will give its position relative to 0, which is its value...
From the C standard, Section 5.2.1 Character sets:
In both the source and execution basic character sets, the
value of each character after 0 in the above list of decimal digits shall be one greater than
the value of the previous

Because, the literals are arranged in sequence.
So if 0 was 48, 1 will be 49, 2 will be 50 etc.. in ASCII, then x would contain, ascii value of '9' minus the ascii value of '0' which means, ascii value of '9' would be 57 and hence, x would contain 57 - 48 = 9.
Also, char is an integral type.

the code ascii of numeric chars are ordered in the order '0' '1' '2' '3' '4' '5' '6' '7' '8' '9' as indicated in the ascii table
so if we make difference beween asii of '9' and ascii of '0' we will get 9

In the ASCII-table the Digits are aligned sequentially, starting with the lowest code for 0. If you subtract a higher number from 0, you create the difference of the two ASCII-values.
So, 9 has value 57 and 0 has 48, so if you subtract 48 from 57 you get 9.
Just have a look at the ASCII-table.
Look here.

Look at the ASCII TABLE:
'9' in ASCII = 57 //in Decimal
'0' in ASCII = 48 //in Decimal
57 - 48 = 9

First, try:
cout << (int)'0' << endl;
now try:
cout << (int)'9' << endl;
the charictors represent numbers in text form, but have a different value in when taken as a number.
Windows uses a Number to decide which charictor to print. So the number 0x30 represents the charictor 0 in the windows OS. The number 0x39 represents the charictor 9. After all, all a computer can recognize is numbers, it does'nt know what a "char" is.
Unfortunatly (int)('f' - '0') does not equal 15, though.
This gives you the various charictors and the number windows uses to represent them.
http://msdn.microsoft.com/en-us/library/windows/desktop/dd375731(v=vs.85).aspx
If you need to find that for another OS, you can search: Virtual Key Codes <OSname> in Google. to see what other OS's have as their codes.

Related

String to Int (using loop) in C [duplicate]

The following code is from K&R textbook, page number 71:
val =10.0*val+s[i] -'0'
What does s[i] -'0' mean here?
It seems that s is a character array or a pointer to the first element of a character array. And element s[i] contains a character that represents a digit as for example '5' . To convert this character that for example in ASCII has internal code equal to 53 (while internal code of character '0' is 48) there is used expression
s[i] -'0'
that is equivalent to
53 - 48
and equal to number 5
According to the C Standard (5.2.1 Character sets)
3...In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be
one greater than the value of the previous.
So in ASCII there is a relation
Character Code value
'0' - 48
'1' - 49
'2' - 50
'3' - 51
'4' - 52
'5' - 53
'6' - 54
'7' - 55
'8' - 56
'9' - 57
For example in EBCDIC internal codes of characters that represent digits are
240 - 249 ( 0 - 9 ) or in hexadecimal notation F0 - F9.
So it is a standard way to get numeric digit from a character independing on used character set.
It converts an int in char form into an actual int.
For example, if s[i] is '9' then s[i] - '0' will produce 9.
Probably the code is used to convert a string with decimal digits into the represented number (e.g. "1234" into 1234).
s[i] is the current digit, s[i]-'0' is the numerical value of the current digit (e.g. '9' becomes 9).
The rest of the C code is just how positional numerical systems works.
Suppose s[i] contains values from 0 - 9 then it will convert them to number.
For eg. s[0]='1';
so val=s[0]-'0';
will reduce to val=49-48; //ascii values
so val = 1;

Converting char to int using typecasting makes 1s into 49 and 0s into 48 in C?

I have a one byte char array consisting of a binary value and I am trying to split it into a two-dimensional int array (low nybble and high nybble). This is my code:
int nybbles[2][4]; //[0][] is low nybble, [1][] is high nybble.
for (int i = 0; i < 4; i++) {
nybbles[0][i] = (int)binarr[i];
nybbles[1][i] = (int)binarr[4 + i];
printf("%c%d ", binarr[i], nybbles[0][i]);
printf("%c%d\n", binarr[4 + i], nybbles[1][i]);
}
The output of this is:
048
149
048
048
149
048
048
048
I can easily fix this by adding a "- 48" to the end of both lines of code, as such:
nybbles[0][i] = (int)binarr[i] - 48;
nybbles[1][i] = (int)binarr[4 + i] - 48;
However, I see this as a very brute force solution. Why does this problem exist anyway? Are there better fixes than mine?
The values 48 and 49 are the ASCII codes for the characters '0' and '1'.
Rather than subtracting 48, subtract '0'. This makes it more clear what you're doing.
nybbles[0][i] = binarr[i] - '0';
nybbles[1][i] = binarr[4 + i] - '0';
Characters are encoded using numeric values, in ASCII, '0' code is 48, '1' is 49 and so on, what you are doing is using this to deduce the integer values of these characters.
Example:
'0' - '0' is equal to 0, why? Because 48 - 48 is equal to 0, just as '1' - '0' is equal to 1, meaning 49 - 48 is equal to 1, you see the pattern.
There is no brute force, just character arithmetic, it's quite common. I would just use:
nybbles[0][i] = binarr[i] - '0';
Besides being clearer, it's more portable given that ASCII is not the only encoding in existance, but all of them have contiguous digit encoding.
The number zero is not the same as the digit "0". On your platform, the code for the digit "0" happens to be 48.
Fundamentally, values and representations are different. Ten is the number of fingers I have. It can be represented as "X", "10", "0x0A", "ten", or "..........".
The digit "0" can be used to represent the value zero. Or it can be used for other purposes. Regardless of what you use it for, on your platform that character is represented by the character code whose value is forty-eight.
It is absolutely vital for programmers to understand that values and representations are different things. Use '0' when you need the value your platform uses to represented the character "0".
This is because of the conversion between ASCII and byte values. The computer just sees bytes (numbers), and those numbers mean different things depending on the context. When we're talking about characters and strings we're normally talking about ASCII text encoding. It has its own system. For instance, the value 0 is null in ASCII. The value 48 is 0 in ASCII. 57 is 9 in ASCII. 65 is A in ASCII. Interesting right? So, to get the correct ASCII digit from a byte value, you must subtract the ASCII offset (the numerical digits start at number 48, therefore subtract 48). The capital letters have an offset of 65, on the other hand.
Doing a search with terms such as "ascii byte binary table values comparison" will bring you to pages like this IBM-provided table.
The best solution in code is to use a to_string function, which is available in most languages. In C, you might have to look for a good external library, or just live with the arithmetic.
You experience this behavior because the digits you get from the original string are characters encoded in ASCII format.
So, 48 = 0x30 and 49 = 0x31 are the ASCII values to represent '0' and '1' characters respectively.
For the record, the "brute" force subtraction is how usually digit characters are converted to the corresponding integer values. The following expression is just to understand how it works
char charDigit = '4';
int digit = charDigit - 48; // digit is equal to integer 4, because '4' is ecoded by 52 (0x34 in hex)
Even better (it is the expression that is actually commonly used):
char charDigit = '4';
int digit = charDigit - '0'; // digit is equal to integer 4
It is better because it works not only for ASCII encoding.

What does s[i] - '0' mean?

The following code is from K&R textbook, page number 71:
val =10.0*val+s[i] -'0'
What does s[i] -'0' mean here?
It seems that s is a character array or a pointer to the first element of a character array. And element s[i] contains a character that represents a digit as for example '5' . To convert this character that for example in ASCII has internal code equal to 53 (while internal code of character '0' is 48) there is used expression
s[i] -'0'
that is equivalent to
53 - 48
and equal to number 5
According to the C Standard (5.2.1 Character sets)
3...In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be
one greater than the value of the previous.
So in ASCII there is a relation
Character Code value
'0' - 48
'1' - 49
'2' - 50
'3' - 51
'4' - 52
'5' - 53
'6' - 54
'7' - 55
'8' - 56
'9' - 57
For example in EBCDIC internal codes of characters that represent digits are
240 - 249 ( 0 - 9 ) or in hexadecimal notation F0 - F9.
So it is a standard way to get numeric digit from a character independing on used character set.
It converts an int in char form into an actual int.
For example, if s[i] is '9' then s[i] - '0' will produce 9.
Probably the code is used to convert a string with decimal digits into the represented number (e.g. "1234" into 1234).
s[i] is the current digit, s[i]-'0' is the numerical value of the current digit (e.g. '9' becomes 9).
The rest of the C code is just how positional numerical systems works.
Suppose s[i] contains values from 0 - 9 then it will convert them to number.
For eg. s[0]='1';
so val=s[0]-'0';
will reduce to val=49-48; //ascii values
so val = 1;

K&R 1.6 Arrays // Digit representation in an array construct

I found this example code on using arrays in the C language.
#include <stdio.h>
main () {
int c, i;
int ndigit[10];
for (i = 0; i < 10; ++i)
ndigit[i]=0;
while ((c = getchar()) != EOF)
if (c >= '0' && c <= '9')
++ndigit[c - '0'];
printf("digits =");
for (i = 0; i < 10; ++i)
printf(" %d", ndigit[i]);
}
I never saw arrays before, but I think I got it.
Still, I'm not sure on why the digit values have to be inserted in '..' nor why the assignement of i has to be expressed as c-'0'.
This is a passage of the book that should clarify my doubts:
This particolar program relies on the properties of the character representation of the digits. For example the text if (c >= '0' && c <= '9') determines whether the characters in c is a digit. If it is, the numeric value if that digit is c - '0'.
I don't understand how can these values be used in arithmetical expressions if they are characters, is it because they are mapped to numerical values?
Then why the whole program just doesn't work if they are written as numbers as in if (c >= 0 && c <= 9) nor it works if c isn't written in that way (which to my understanding is just "whatever number c is minus 0).
TL;DR: a "char" is just a one-byte-long integer.
I don't understand how can these values be used in arithmetical expressions if they are characters, is it because they are mapped to numerical values?
In C, a char is the "smallest addressable unit of the machine that can contain basic character set. It is an integer type." [1]. Normally, char is equivalent to "a one-byte-long integer", so they can hold values from 0 to (2^8)-1, or [0,255].
That being said, when you write
char c = '9';
You are saying "c is a one-byte-long integer whose value is the character-set representation of the character 9". By looking at the most common character set, the ASCII table [2], we see that the character 9 has an integer value of 57, so the above expression is equivalent to
char c = 57;
To convert a digit's character-set value to the digit itself (e.g. '9' to 9, or 57 to 9), you can rely on a property of character sets that digits are always stored sequentially and increasingly, and just subtract by the value of '0', which in ASCII is 48, so:
char c;
c = '9' - '0'; /* = 9 In any character set */
c = 57 - 48; /* = 9 */
c = '9' - 48; /* = 9 In ASCII */
c = 57 - '0'; /* = 9 In ASCII */
Keep in mind that while ASCII is the most common character set, this is actually machine-dependent.
[1] http://en.wikipedia.org/wiki/C_data_types#Basic_types
[2] http://www.asciitable.com/
if you see the man page of getchar() it says
....reads the next character from stdin and returns it as an unsigned char cast to an int....
So, an input of a digit [example, 9] is treated as a char input and the corresponding encoded [Usually ASCII] value is returned by getchar().
Now coming to your question(s),
why the digit values have to be inserted in '..'
A digit [or any other character, for that matter], written as '.', represents the corresponding ASCII value of the same. Check the ASCII table here.
For understanding, a 9 is a 9 whereas a '9' represents the correcsponding ASCII 57.
why the assignment of i has to be expressed as c-'0'.
If you notice the ASCII table closely, you can see, the corresponding values of 0 to 9 are in sequence. So, to get the particular digit as an int value, we can do c - '0' which is same as c - 48 which will give us the digit as an int.
I don't understand how can these values be used in arithmetical
expressions if they are characters, is it because they are mapped to
numerical values?
getchar() returns the character read.Prototype for it is
int getchar(void)
When a character is read getchar() returns the ASCII value of the char read.
The ASCII value for char's 0 to 9 are contiguous. So just making use of it if we have
char ch = '5';
int i = ch - '0'; /* 53 - 48 = 5 */
will give you the integer value 5. Converting character to integer. The arithmetic is performed by implicit conversion.
If you have a character '8' then this doesn't give you the integer value 8 but retuns ASCII value 56. So during arithmetic ch - '0' since both are char's the respective ASCII values are used and the arithmetic operation is performed

what does string - '0' do (string is a char)

what does this do
while(*string) {
i = (i << 3) + (i<<1) + (*string -'0');
string++;
}
the *string -'0'
does it remove the character value or something?
This subtracts from the character to which string is pointing the ASCII code of the character '0'. So, '0' - '0' gives you 0 and so on and '9' - '0' gives you 9.
The entire loop is basically calculating "manually" the numerical value of the decimal integer in the string string points to.
That's because i << 3 is equivalent to i * 8 and i << 1 is equivalent to i * 2 and (i << 3) + (i<<1) is equivalent to i * 8 + i * 2 or i * 10.
Since the digits 0-9 are guaranteed to be stored contiguously in the character set, subtracting '0' gives the integer value of whichever character digit you have.
Let's say you're using ASCII:
char digit = '6'; //value of 54 in ASCII
int actual = digit - '0'; //'0' is 48 in ASCII, therefore `actual` is 6.
No matter which values the digits have in the character set, since they're contiguous, subtracting the beginning ('0') from the digit will give the digit you're looking for. Note that the same is NOT particularly true for the letters. Look at EBCDIC, for example.
It converts the ascii value of 0-9 characters to its numerical value.
ASCII value of '0' (character) is 48 and '1' is 49.
So to convert 48-56('0'-'9') to 0-9, you just need to subtract 48 from the ascii value.
that is what your code line [ *string -'0' ] is doing.

Resources