Numeric value of digit characters in C - c

I have just started reading through The C Programming Language and I am having trouble understanding one part. Here is an excerpt from page 24:
#include<stdio.h>
/*countdigits,whitespace,others*/
main()
{
intc,i,nwhite,nother;
intndigit[10];
nwhite=nother=0;
for(i=0;i<10;++i)
ndigit[i]=0;
while((c=getchar())!=EOF)
if(c>='0'&&c<='9')
++ndigit[c-'0']; //THIS IS THE LINE I AM WONDERING ABOUT
else if(c==''||c=='\n'||c=='\t')
++nwhite;
else
++nother;
printf("digits=");
for(i=0;i<10;++i)
printf("%d",ndigit[i]);
printf(",whitespace=%d,other=%d\n",
nwhite,nother);
}
The output of this program run on itself is
digits=9300000001,whitespace=123,other=345
The declaration
intndigit[10];
declares ndigit to be an array of 10 integers. Array subscripts always start at zero in C, so the elements are
ndigit[0], ndigit[ 1], ..., ndigit[9]
This is reflected in the for loops that initialize and print the array. A subscript can be any integer expression, which includes integer variables like i,and integer constants. This particular program relies on the properties of the character representation of the digits. For example, the test
if(c>='0'&&c<='9')
determines whether the character in c is a digit. If it is, the numeric value of that digit is
c-'0'`
This works only if '0', '1', ..., '9' have consecutive increasing values. Fortunately, this is true for all character sets. By definition, chars are just small integers, so char variables and constants are identical to ints in arithmetic expressions. This is natural and convenient; for example
c-'0'
is an integer expression with a value between 0 and 9 corresponding to the character '0' to '9' stored in c, and thus a valid subscript for the array ndigit.
The part I am having trouble understanding is why the -'0' part is necessary in the expression c-'0'. If a character is a small integer as the author says, and the digit characters correspond to their numeric values, then what is -'0' doing?

Digit characters don't correspond to their numeric values. They correspond to their encoding values (in this case, ASCII).
IIRC, ascii '0' is the value 48. And, luckily for this example and most character sets, the values of '0' through '9' are stored in order in the character set.
So, subtracting the ASCII value for '0' from any ASCII digit returns its "true" value of 0-9.

The numeric value of a character is (on most systems) its ASCII value. The ASCII value of '0' is 48, '1' is 49, etc.
By subtracting 48 from the value of the character '0' becomes 0, '1' becomes 1, etc. By writing it as c - '0' you don't actually need to know what the ASCII value of '0' is (or that the system is using ASCII - it could be using EBCDIC). The only thing that matters is that the values are consecutive increasing integers.

It converts from the ASCII code of the '0' key on your keyboard to the value zero.
if you did int x = '0' + '0' the result would not be zero.

In most character encodings, all of the digits are placed consecutively in the character set. In ASCII for example, they start with '0' at 0x30 ('1' is 0x31, '2' is 0x32, etc.). If you want the numeric value of a given digit, you can just subtract '0' from it and get the right value. The advantage of using '0' instead of the specific value is that your code can be portable to other character sets with much less effort.

If you access a character string by their characters you'll get the ASCII values back, even if the characters happen to be numbers.
Fortunately the guys who designed that character table made sure that the characters for 0 to 9 are sequential, so you can simply convert from ASCII to a number by subtracting the ASCII-value of '0'.
That's what the code does. I have to admit that it is confusing when you see it the first time, but it's not rocket science.
The ASCII-character value of '0' is 48, '1' is 49, '2' is 50 and so on.
For reference here is a nice ASCII-chart:
http://www.sciencelobby.com/ascii-table/images/ascii-table1.gif

Related

How does atof.c work? Subtracting an ASCII zero from an ASCII digit makes it an int? Am I missing something?

So as part of my C classes, for our first homework we are supposed to implement our own atof.c function, and then use it for some tasks. So, being the smart stay-at-home student I am I decided to look at the atof.c source code and adapt it to meet my needs. I think i'm on board with most of the operations that this function does, like counting the digits before and after the decimal point, however there is one line of code that I do not understand. I'm assuming this is the line that actually converts the ASCII digit into a digit of type int. Posting it here:
frac1 = 10*frac1 + (c - '0');
in the source code, c is the digit that they are processing, and frac1 is an int that stores some of the digits from the incoming ASCII string. but why does c- '0' work?? And as a followup, is there another way of achieving the same result?
There is no such thing as "text" in C. Just APIs that happen to treat integer values as text information. char is an integer type, and you can do math with it. Character literals are actually ints in C (in C++ they're char, but they're still usable as numeric values even there).
'0' is a nice way for humans to write "the ordinal value of the character for zero"; in ASCII, that's the number 48. Since the digits appear in order from 0 to 9 in all encodings I'm aware of, you can convert from the ordinal value in the encoding (e.g. ASCII) to actual numeric values by subtracting away '0' to get actual int values from 0 to 9.
You could just as easily subtract 48 directly (when compiled, it would be impossible to tell which option you used; 48 and ASCII '0' are indistinguishable), it would just be less obvious what you were doing to other people reading your source code.
The ASCII value of '0' is the 48'th character in code page 437 (IBM default character set). Similarly, '1' is the 49'th etc. Subtracting '0' instead of a magic number such as 48 is much clearer as far as self-documentation goes.

What does it mean to subtract '0' from a variable in C?

void push(float[],float);
Here, st[] is float data-type stack and exp[] is char data-type array storing postfix expression.
push(st,(float)(exp[i]-'0'));
I couldn't figure out the purpose of (exp[i]-'0') section though. Why are we subtracting '0'?
A character is basically nothing more than an integer, whose value is the encoding of the character.
In the most common encoding scheme, ASCII, the value for e.g. the character '0' is 48, and the value for e.g. '3' is 51. Now, if we have a variable someChar containing the character '3' and you do someChar - '0' it's the same as doing 51 - 48 which will result in the value 3.
So if you have a digit read as a character from somewhere, then you subtract '0' to get the integer value of that digit.
This also works on other encodings, not only ASCII, because the C specification says that all encodings must have the digits in consecutive order.
Note that this "trick" is not guaranteed to work for any non-digit character.

Can I always assume the characters '0' to '9' appear sequentially in any C character encoding

I'm writing a program in C that converts some strings to integers. The way I've implemeted this before is like so
int number = (character - '0');
This always works perfectly for me, but I started thinking, are there any systems using some obscure character encoding in which the characters '0' to '9' don't appear one after another in that order? This code assumes '1' follows '0', '2' follows '1' and so on, but is there ever a case when this is not true?
Yes, this is guaranteed by the C standard.
N1570 5.2.1 paragraph 3 says:
In both the source and execution basic character sets, the value of
each character after 0 in the above list of decimal digits shall be
one greater than the value of the previous.
This guarantee was possible because both ASCII and EBCDIC happen to have this property.
Note that there's no corresponding guarantee for letters; in EBCDIC, the letters do not have contiguous codes.

Confused with C type casting with characters

I am a Beginner in C programming and I am writing a simple program to encrypt a string to rot13 Ceaser cipher.Now, I know about casting floats and ints in C but I really don't know what's happening below with characters:
I do this char test = 'A' + 13;
and I get N as output.Fine!! but how? `What's going on underneath? My guess is 'A' is casted to integer both are added and then finally the answer is again casted to a char.
Why is this so?
In C, types such as char are numbers, and 'A' is a roundabout way to write the int value 65 (provided your character set is ASCII, which is the case on all modern platforms). So, the expression 'A' + 13 is equivalent to 65 + 13, and its result gets cast to char, the type on the left-hand-side of the assignment operator.
In other words, it's not that 'A' that is cast to int, it's the int sum of 65 and 13 that gets cast to char.
C treats characters as small integers. For example in ASCII, character 'a' has the value 97 and character 'A' has the value 65.
When a character appears in the source code, C simply uses its integer value.
char ch = 65; // ch is 'A' now
ch = ch + 2 // ch is 'C' now
or
char ch = 'A'; // ch has value 65 now
ch = ch + 2 // ch is 'C' now
test = 65(ASCII corresponding to A) + 13 = 78 = N (casted to character)
characters are interpreted as integers within C..so this happens
Each letter has its numerical representation. The ASCII character set (excluding the extended characters defined by IBM) is divided into four groups of 32 characters. The first 32 characters, ASCII codes 0 through 1Fh (31), form a special set of non-printing characters called the control characters. We call them control characters because they perform various printer/display control operations rather than displaying symbols. Examples includecarriage return, which positions the cursor to the left side of the current line of characters, line feed (which moves the cursor down one line on the output device), and back space (which moves the cursor back one position to the left). Unfortunately, different control characters perform different operations on different output devices. There is very little standardization among output devices. To find out exactly how a control character affects a particular device, you will need to consult its manual.
The second group of 32 ASCII character codes comprise various punctuation symbols, special characters, and the numeric digits. The most notable characters in this group include the space character (ASCII code 20h) and the numeric digits (ASCII codes 30h..39h). Note that the numeric digits differ from their numeric values only in the H.O. nibble. By subtracting 30h from the ASCII code for any particular digit you can obtain the numeric equivalent of that digit.
The third group of 32 ASCII characters is reserved for the upper case alphabetic characters. The ASCII codes for the characters "A".."Z" lie in the range 41h..5Ah (65..90). Since there are only 26 different alphabetic characters, the remaining six codes hold various special symbols.
The fourth, and final, group of 32 ASCII character codes are reserved for the lower case alphabetic symbols, five additional special symbols, and another control character (delete). Note that the lower case character symbols use the ASCII codes 61h..7Ah. If you convert the codes for the upper and lower case characters to binary, you will notice that the upper case symbols differ from their lower case equivalents in exactly one bit position.
You mus not confuse between ' and " the letter betwen ' as 'a' means 1 character its representation in computer memory is 97 and by adding 1‏ ‏
we get 98 which is b
consider the following example
#include
int main()
{
char ch = 'a';
//numeric reperesentation of a
printf("%c = %d\n",ch,ch);
ch = ch 15;
//numeric repersentation of c 15
printf("%c %d\n",ch,ch);
return 0;
}

C - Convert char to int

I know that to convert any given char to int, this code is possible [apart from atoi()]:
int i = '2' - '0';
but I never understood how it worked, what is significance of '0' and I don't seem to find any explanation on the net about that.
Thanks in advance!!
In C, a character literal has type int. [Character Literals/IBM]
In your example, the numeric value of '0' is 48, the numeric value of '2' is 50. When you do '2' - '0' you get 50 - 48 = 2. This works for ASCII numbers from 0 to 9.
See ASCII table to get a better picture.
Edit: Thanks to #ouah for correction.
All the chars in C are represented with an integer value, the ASCII code of the character.
For instance '0' corresponds to 48 and '2' corresponds to 50, so '2'-'0' gets you 50-48 = 2
Link to an ASCII table: http://www.robelle.com/smugbook/ascii.html
When you use the commas ' ' you are treating the number as a char, and if this is given to an int, the int will take the value of the ASCII code of this character.
Any character literal enclosed in single quotes corresponds to a number that represents the ASCII code of that character. In fact, such literals evaluate not to char, but to int, so they are perfectly interchangeable with other number literals.
Within your expression, '2' is interchangeable with 50, and '0' is interchangeable with 48.
Have a look at the ASCII table.
'0' is represented as 0x30, '9' is represented as 0x32.
This results in
0x32 - 0x30 = 2
It's all about the ASCII codes of the corresponding characters.
In C, all the digits (0 to 9) are encoded in ASCII by values 48 to 57, sequentially. So '0' actually gets value 48, and '2' has the value 50. So when you write int i = '2' - '0';, you're actually subtracting 48 from 50, and get 2.
'0' to '9' are guaranteed to be sequential values in C in all character sets. This not limited to ASCII and C is not limited to the ASCII character set.
So sequential here means that '2' value is '0' + 2.
Regarding int and char note that '0' and '9' values are of type int in C and not of type char. A character literal is of type int.
Both terms are internally represented by the ASCII code of the number, and as numeric digits have consecutive ASCII codes subtracting them gives you the difference between the two numbers.
You can do similar tricks with characters as well, eg shift lowercase to uppercase by subtracting 32 from a lowercase character
'a' - 32 = 'A'
This works only because ASCII assigns codes to characters in order i.e. '2' has a character code that is with 2 bigger than the character code of '0'.
In an another encoding it wouldn't work.
When you cast a char to an int it actually maps each character to the appropriate number in the ascii table.
This means that '2' - '0' is translated to 50 - 48.
So you could also find out the numeric distance of two letters in the same way, e.g.
'z' - 'a' equals 122 - 97 equals 25
You can look up the numeric representaions of each ASCII character in thsi table:
http://www.asciitable.com/
Actually a char is just a unsigned byte: C just treats it differently for different operations. For example printf(97) yields 97 as output, but printf((char)97) will give you 'a' as output.

Resources