K&R 1.6 Arrays // Digit representation in an array construct - c

I found this example code on using arrays in the C language.
#include <stdio.h>
main () {
int c, i;
int ndigit[10];
for (i = 0; i < 10; ++i)
ndigit[i]=0;
while ((c = getchar()) != EOF)
if (c >= '0' && c <= '9')
++ndigit[c - '0'];
printf("digits =");
for (i = 0; i < 10; ++i)
printf(" %d", ndigit[i]);
}
I never saw arrays before, but I think I got it.
Still, I'm not sure on why the digit values have to be inserted in '..' nor why the assignement of i has to be expressed as c-'0'.
This is a passage of the book that should clarify my doubts:
This particolar program relies on the properties of the character representation of the digits. For example the text if (c >= '0' && c <= '9') determines whether the characters in c is a digit. If it is, the numeric value if that digit is c - '0'.
I don't understand how can these values be used in arithmetical expressions if they are characters, is it because they are mapped to numerical values?
Then why the whole program just doesn't work if they are written as numbers as in if (c >= 0 && c <= 9) nor it works if c isn't written in that way (which to my understanding is just "whatever number c is minus 0).

TL;DR: a "char" is just a one-byte-long integer.
I don't understand how can these values be used in arithmetical expressions if they are characters, is it because they are mapped to numerical values?
In C, a char is the "smallest addressable unit of the machine that can contain basic character set. It is an integer type." [1]. Normally, char is equivalent to "a one-byte-long integer", so they can hold values from 0 to (2^8)-1, or [0,255].
That being said, when you write
char c = '9';
You are saying "c is a one-byte-long integer whose value is the character-set representation of the character 9". By looking at the most common character set, the ASCII table [2], we see that the character 9 has an integer value of 57, so the above expression is equivalent to
char c = 57;
To convert a digit's character-set value to the digit itself (e.g. '9' to 9, or 57 to 9), you can rely on a property of character sets that digits are always stored sequentially and increasingly, and just subtract by the value of '0', which in ASCII is 48, so:
char c;
c = '9' - '0'; /* = 9 In any character set */
c = 57 - 48; /* = 9 */
c = '9' - 48; /* = 9 In ASCII */
c = 57 - '0'; /* = 9 In ASCII */
Keep in mind that while ASCII is the most common character set, this is actually machine-dependent.
[1] http://en.wikipedia.org/wiki/C_data_types#Basic_types
[2] http://www.asciitable.com/

if you see the man page of getchar() it says
....reads the next character from stdin and returns it as an unsigned char cast to an int....
So, an input of a digit [example, 9] is treated as a char input and the corresponding encoded [Usually ASCII] value is returned by getchar().
Now coming to your question(s),
why the digit values have to be inserted in '..'
A digit [or any other character, for that matter], written as '.', represents the corresponding ASCII value of the same. Check the ASCII table here.
For understanding, a 9 is a 9 whereas a '9' represents the correcsponding ASCII 57.
why the assignment of i has to be expressed as c-'0'.
If you notice the ASCII table closely, you can see, the corresponding values of 0 to 9 are in sequence. So, to get the particular digit as an int value, we can do c - '0' which is same as c - 48 which will give us the digit as an int.

I don't understand how can these values be used in arithmetical
expressions if they are characters, is it because they are mapped to
numerical values?
getchar() returns the character read.Prototype for it is
int getchar(void)
When a character is read getchar() returns the ASCII value of the char read.
The ASCII value for char's 0 to 9 are contiguous. So just making use of it if we have
char ch = '5';
int i = ch - '0'; /* 53 - 48 = 5 */
will give you the integer value 5. Converting character to integer. The arithmetic is performed by implicit conversion.
If you have a character '8' then this doesn't give you the integer value 8 but retuns ASCII value 56. So during arithmetic ch - '0' since both are char's the respective ASCII values are used and the arithmetic operation is performed

Related

Issue with turning a character into an integer in C

I am having issues with converting character variables into integer variables. This is my code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
char string[] = "A2";
char letter = string[0];
char number = string[1];
char numbers[] = "12345678";
char letters[] = "ABCDEFGH";
int row;
int column;
for(int i = 0; i < 8; i++){
if(number == numbers[i]){
row = number;
}
}
}
When I try to convert the variable row into the integer value of the variable number, instead of 2 I get 50. The goal so far is to convert the variable row into the accurate value of the character variable number, which in this case is 2. I'm a little confused as to why the variable row is 50 and not 2. Can any one explain to me why it is not converting accurately?
'2' != 2. The '2' character, in ASCII, is 50 in decimal (0x32 in hex). See http://www.asciitable.com/
If you're sure they're really numbers you can just use (numbers[i] - '0') to get the value you're looking for.
2 in your case is a character, and that character's value is 50 because that's the decimal version of the byte value that represents the character 2 in ASCII. Remember, c is very low level and characters are essentially the same thing as any other value: a sequence of bytes. Just like letters are represented as bytes, so are the character representation of their value in our base 10 system. It might seem that 2 should have been represented with the value 2, but it wasn't.
If you use the atoi function, it will look at the string and compute the decimal value represented by the characters in your string.
However, if you're only converting one character to the decimal value it represents , you can take a short cut. subtract the digit from the value of '0'. Though the digits are not represented by the base 10 value they have for us humans, they are ordered sequentially in the ASCII code. And since in C the characters are simply byte values, the difference between a numeric character 0-9 and 0 is the value of the character.
char c = '2';
int i = c - '0';
If you understand why that would work, you get what I'm saying.

Confusion regarding getchar() in c [duplicate]

Can someone explain why this works?
char c = '9';
int x = (int)(c - '0');
Why does subtracting '0' from an ascii code of a char result the number that that char is representing?
Because the char are all represented by a number and '0' is the first of them all.
On the table below you see that:
'0' => 48
'1' => 49
'9' => 57.
As a result: ('9' - '0') = (57 − 48) = 9
Source: http://www.asciitable.com
char is an integer type, just like int and family. An object of type char has some numerical value. The mapping between characters that you type in a character literal (like '0') and the value that the char object has is determined by the encoding of that character in the execution character set:
C++11 §2.14.3:
An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set.
C99 §6.4.4.4:
An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'.
[...]
An integer character constant has type int.
Note that the int can be converted to a char.
The choice of execution character set is up to the implementation. More often than not, the choice is ASCII compatible, so the tables posted in other answers have the appropriate values. However, the character set does not need to be ASCII compatible. There are some restrictions, though. One of them is as follows (C++11 §2.3, C99 §5.2.1):
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’
[...]
In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.
This means that whatever value the character '0' has, the character '1' has value one more than '0', and character '2' has value one more than that, and so on. The numeric characters have consecutive values. You can summarise the mapping like so:
Character: 0 1 2 3 4 5 6 7 8 9
Corresponding value: X X+1 X+2 X+3 X+4 X+5 X+6 X+7 X+8 X+9
All of the digit characters have values offset from the value of '0'.
That means, if you have a character, let's say '9' and subtract '0' from it, you get the "distance" between the value of '9' and the value of '0' in the execution character set. Since they are consecutive, the distance will be 9.
Because the C standard guarantees that the characters 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 are always in this order regarding their numerical character code. So, if you subtract the char code of '0' from another digit, it will give its position relative to 0, which is its value...
From the C standard, Section 5.2.1 Character sets:
In both the source and execution basic character sets, the
value of each character after 0 in the above list of decimal digits shall be one greater than
the value of the previous
Because, the literals are arranged in sequence.
So if 0 was 48, 1 will be 49, 2 will be 50 etc.. in ASCII, then x would contain, ascii value of '9' minus the ascii value of '0' which means, ascii value of '9' would be 57 and hence, x would contain 57 - 48 = 9.
Also, char is an integral type.
the code ascii of numeric chars are ordered in the order '0' '1' '2' '3' '4' '5' '6' '7' '8' '9' as indicated in the ascii table
so if we make difference beween asii of '9' and ascii of '0' we will get 9
In the ASCII-table the Digits are aligned sequentially, starting with the lowest code for 0. If you subtract a higher number from 0, you create the difference of the two ASCII-values.
So, 9 has value 57 and 0 has 48, so if you subtract 48 from 57 you get 9.
Just have a look at the ASCII-table.
Look here.
Look at the ASCII TABLE:
'9' in ASCII = 57 //in Decimal
'0' in ASCII = 48 //in Decimal
57 - 48 = 9
First, try:
cout << (int)'0' << endl;
now try:
cout << (int)'9' << endl;
the charictors represent numbers in text form, but have a different value in when taken as a number.
Windows uses a Number to decide which charictor to print. So the number 0x30 represents the charictor 0 in the windows OS. The number 0x39 represents the charictor 9. After all, all a computer can recognize is numbers, it does'nt know what a "char" is.
Unfortunatly (int)('f' - '0') does not equal 15, though.
This gives you the various charictors and the number windows uses to represent them.
http://msdn.microsoft.com/en-us/library/windows/desktop/dd375731(v=vs.85).aspx
If you need to find that for another OS, you can search: Virtual Key Codes <OSname> in Google. to see what other OS's have as their codes.

what is difference between 9-'0' and '9'-'0' in C?

In my following code:
main(){
int c;
char c1='0';
int x=9-c1;
int y='9'-c1;
}
Now in this program I'm getting value of x as some arbitrary value, but the value of y is 0, which is the value that I expect. Why this difference?
Here is a good explanation. Just compile it and run:
#include <stdio.h>
int main(){
int c;
char c1='0';
int x=9-c1;
int y='9'-c1;
printf("--Code and Explanation--\n");
printf("int c;\n");
printf("char c1='0';\n");
printf("int x=9-c1;\n");
printf("int y='9'-c1;\n");
printf("c1 as char '0' has decimal value: %d\n", c1);
printf("decimal 9 - decimal %d or c1 = %d or x\n", c1, x);
printf("char '9' has decimal value %d - decimal %d or c1 = %d\n", '9', c1, y);
printf("Your Welcome :)\n");
return 0;
}
1st char are integers.
2nd chars might have a printable representation or output controlling function (like for ASCII: TAB, CR, LF, FF, BELL ...) depending on the character set in use.
For ASCII
char c = 'A';
is the same as
char c = 65;
is the same as
char c = 0x41;
Another character set widely in use for example is EBCDIC. It uses a different mapping of a character's integer value to its printable/controling representation.
Internally always the same integer value is used/stored.
The printable, often but not always ASCII representation of, for example 65 or 0x41, which is A, is only used when
either printing out using the printf()-family along with the conversion specifiers %s or %c or puts()
or scanning in using the scanf()-family along with the conversion specifiers %s or %c or fgets()
or when coding literals like 'A' or "ABC".
On all other operation only the char's integer value is used.
When you do calculations with chars, you have to keep in mind that to you it looks like a '0' or '9', but the compiler interprets is as its ASCII value, which is 48 for '0' and 57 for '9'.
So when you do:
int x=9-c1;
the result is 9 - 48 = -39. And for
int y='9'-c1;
the result is 57 - 48 = 9.
According to the C Standard (5.2.1 Character sets)
...In both the source and execution basic character sets, the value of
each character after 0 in the above list of decimal digits shall be
one greater than the value of the previous.
Thus expression '9' - '0' has the same value like 9 - 0 and is equal to 9 whether you are using for example the ASCII table of characters or the EBCDIC.
Expression 9 - '0' is implementation defined and depends on the coding table you are using. But in any case the value of the internal representation of character '0' is greater then 9. (9 is the value of the tab character representation '\t')
For example in the ASCII the value of the code of character '0' is equal to 48.
In the EBCDIC the value of '0' is equal to 240.
So you will get that 9 - '0' is some negative number.
For example it is equal to -39 if the character representations are based on the ASCII table or -231 if the character representations are based on the EBCDIC table.
You can see this yourself running this simple program
#include <stdio.h>
int main( void )
{
printf( "%d\n", 9 - '0' );
}
You could write the printf statement also in the following way;)
printf( "%d\n", '\t' - '0' );
because 9 as I mentioned is the value of the internal representation of the escape character '\t' (tab).

Get a integer from a string in c

char str[]="abcde1fgh";
int i;
i=str[5];
return;
After this process, the integer i must be 1. But it doesn't. Why not?
You code does not work because in your example 1 is a char.
Try the following instead:
int i = str[5] - '0';
Here is why it works: Based on Jamal's explanation from his comment below
The numerical value is obtained by subtracting some character e.g. str[5] with the character '0'. The numeric value for each character is found in the ASCII table. In this example, we are subtracting 49 (corresponding to '1') minus 48 (corresponding to '0') which equals 1.

what does string - '0' do (string is a char)

what does this do
while(*string) {
i = (i << 3) + (i<<1) + (*string -'0');
string++;
}
the *string -'0'
does it remove the character value or something?
This subtracts from the character to which string is pointing the ASCII code of the character '0'. So, '0' - '0' gives you 0 and so on and '9' - '0' gives you 9.
The entire loop is basically calculating "manually" the numerical value of the decimal integer in the string string points to.
That's because i << 3 is equivalent to i * 8 and i << 1 is equivalent to i * 2 and (i << 3) + (i<<1) is equivalent to i * 8 + i * 2 or i * 10.
Since the digits 0-9 are guaranteed to be stored contiguously in the character set, subtracting '0' gives the integer value of whichever character digit you have.
Let's say you're using ASCII:
char digit = '6'; //value of 54 in ASCII
int actual = digit - '0'; //'0' is 48 in ASCII, therefore `actual` is 6.
No matter which values the digits have in the character set, since they're contiguous, subtracting the beginning ('0') from the digit will give the digit you're looking for. Note that the same is NOT particularly true for the letters. Look at EBCDIC, for example.
It converts the ascii value of 0-9 characters to its numerical value.
ASCII value of '0' (character) is 48 and '1' is 49.
So to convert 48-56('0'-'9') to 0-9, you just need to subtract 48 from the ascii value.
that is what your code line [ *string -'0' ] is doing.

Resources