C - Convert char to int - c

I know that to convert any given char to int, this code is possible [apart from atoi()]:
int i = '2' - '0';
but I never understood how it worked, what is significance of '0' and I don't seem to find any explanation on the net about that.
Thanks in advance!!

In C, a character literal has type int. [Character Literals/IBM]
In your example, the numeric value of '0' is 48, the numeric value of '2' is 50. When you do '2' - '0' you get 50 - 48 = 2. This works for ASCII numbers from 0 to 9.
See ASCII table to get a better picture.
Edit: Thanks to #ouah for correction.

All the chars in C are represented with an integer value, the ASCII code of the character.
For instance '0' corresponds to 48 and '2' corresponds to 50, so '2'-'0' gets you 50-48 = 2
Link to an ASCII table: http://www.robelle.com/smugbook/ascii.html

When you use the commas ' ' you are treating the number as a char, and if this is given to an int, the int will take the value of the ASCII code of this character.

Any character literal enclosed in single quotes corresponds to a number that represents the ASCII code of that character. In fact, such literals evaluate not to char, but to int, so they are perfectly interchangeable with other number literals.
Within your expression, '2' is interchangeable with 50, and '0' is interchangeable with 48.

Have a look at the ASCII table.
'0' is represented as 0x30, '9' is represented as 0x32.
This results in
0x32 - 0x30 = 2

It's all about the ASCII codes of the corresponding characters.
In C, all the digits (0 to 9) are encoded in ASCII by values 48 to 57, sequentially. So '0' actually gets value 48, and '2' has the value 50. So when you write int i = '2' - '0';, you're actually subtracting 48 from 50, and get 2.

'0' to '9' are guaranteed to be sequential values in C in all character sets. This not limited to ASCII and C is not limited to the ASCII character set.
So sequential here means that '2' value is '0' + 2.
Regarding int and char note that '0' and '9' values are of type int in C and not of type char. A character literal is of type int.

Both terms are internally represented by the ASCII code of the number, and as numeric digits have consecutive ASCII codes subtracting them gives you the difference between the two numbers.
You can do similar tricks with characters as well, eg shift lowercase to uppercase by subtracting 32 from a lowercase character
'a' - 32 = 'A'

This works only because ASCII assigns codes to characters in order i.e. '2' has a character code that is with 2 bigger than the character code of '0'.
In an another encoding it wouldn't work.

When you cast a char to an int it actually maps each character to the appropriate number in the ascii table.
This means that '2' - '0' is translated to 50 - 48.
So you could also find out the numeric distance of two letters in the same way, e.g.
'z' - 'a' equals 122 - 97 equals 25
You can look up the numeric representaions of each ASCII character in thsi table:
http://www.asciitable.com/
Actually a char is just a unsigned byte: C just treats it differently for different operations. For example printf(97) yields 97 as output, but printf((char)97) will give you 'a' as output.

Related

What does it mean to subtract '0' from a variable in C?

void push(float[],float);
Here, st[] is float data-type stack and exp[] is char data-type array storing postfix expression.
push(st,(float)(exp[i]-'0'));
I couldn't figure out the purpose of (exp[i]-'0') section though. Why are we subtracting '0'?
A character is basically nothing more than an integer, whose value is the encoding of the character.
In the most common encoding scheme, ASCII, the value for e.g. the character '0' is 48, and the value for e.g. '3' is 51. Now, if we have a variable someChar containing the character '3' and you do someChar - '0' it's the same as doing 51 - 48 which will result in the value 3.
So if you have a digit read as a character from somewhere, then you subtract '0' to get the integer value of that digit.
This also works on other encodings, not only ASCII, because the C specification says that all encodings must have the digits in consecutive order.
Note that this "trick" is not guaranteed to work for any non-digit character.

Confused with C type casting with characters

I am a Beginner in C programming and I am writing a simple program to encrypt a string to rot13 Ceaser cipher.Now, I know about casting floats and ints in C but I really don't know what's happening below with characters:
I do this char test = 'A' + 13;
and I get N as output.Fine!! but how? `What's going on underneath? My guess is 'A' is casted to integer both are added and then finally the answer is again casted to a char.
Why is this so?
In C, types such as char are numbers, and 'A' is a roundabout way to write the int value 65 (provided your character set is ASCII, which is the case on all modern platforms). So, the expression 'A' + 13 is equivalent to 65 + 13, and its result gets cast to char, the type on the left-hand-side of the assignment operator.
In other words, it's not that 'A' that is cast to int, it's the int sum of 65 and 13 that gets cast to char.
C treats characters as small integers. For example in ASCII, character 'a' has the value 97 and character 'A' has the value 65.
When a character appears in the source code, C simply uses its integer value.
char ch = 65; // ch is 'A' now
ch = ch + 2 // ch is 'C' now
or
char ch = 'A'; // ch has value 65 now
ch = ch + 2 // ch is 'C' now
test = 65(ASCII corresponding to A) + 13 = 78 = N (casted to character)
characters are interpreted as integers within C..so this happens
Each letter has its numerical representation. The ASCII character set (excluding the extended characters defined by IBM) is divided into four groups of 32 characters. The first 32 characters, ASCII codes 0 through 1Fh (31), form a special set of non-printing characters called the control characters. We call them control characters because they perform various printer/display control operations rather than displaying symbols. Examples includecarriage return, which positions the cursor to the left side of the current line of characters, line feed (which moves the cursor down one line on the output device), and back space (which moves the cursor back one position to the left). Unfortunately, different control characters perform different operations on different output devices. There is very little standardization among output devices. To find out exactly how a control character affects a particular device, you will need to consult its manual.
The second group of 32 ASCII character codes comprise various punctuation symbols, special characters, and the numeric digits. The most notable characters in this group include the space character (ASCII code 20h) and the numeric digits (ASCII codes 30h..39h). Note that the numeric digits differ from their numeric values only in the H.O. nibble. By subtracting 30h from the ASCII code for any particular digit you can obtain the numeric equivalent of that digit.
The third group of 32 ASCII characters is reserved for the upper case alphabetic characters. The ASCII codes for the characters "A".."Z" lie in the range 41h..5Ah (65..90). Since there are only 26 different alphabetic characters, the remaining six codes hold various special symbols.
The fourth, and final, group of 32 ASCII character codes are reserved for the lower case alphabetic symbols, five additional special symbols, and another control character (delete). Note that the lower case character symbols use the ASCII codes 61h..7Ah. If you convert the codes for the upper and lower case characters to binary, you will notice that the upper case symbols differ from their lower case equivalents in exactly one bit position.
You mus not confuse between ' and " the letter betwen ' as 'a' means 1 character its representation in computer memory is 97 and by adding 1‏ ‏
we get 98 which is b
consider the following example
#include
int main()
{
char ch = 'a';
//numeric reperesentation of a
printf("%c = %d\n",ch,ch);
ch = ch 15;
//numeric repersentation of c 15
printf("%c %d\n",ch,ch);
return 0;
}

Character to integer conversion off for alphabet characters

I'm converting a character string character by character into integers. So 'A' - '0' should be 10. However even though the numbers come out fine, alphabetical characters (i.e. A-F) come out as being off by 7. For instance, here's my line of code for conversion:
result = result + (((int) (*new - '0')) * pow(16, bases));
If I print that line piece by piece for a hex string like "A2C9" then for some reason my A is converted to 17 and my C is converted to 19. However the numbers 2 and 9 come out correctly. I'm trying to figure out if I'm missing something somewhere.
You are subtracting ASCII values. That is fine for A-Z and for 0-9, but not if you start mixing them. Read about the ASCII table to better understand the issue.
Here is the table:
http://www.asciitable.com/index/asciifull.gif
The ASCII code for 'A' is 65; for 'Z', it is 90.
The ASCII code for '0' is 48; for '9', it is 57. These codes are also used in Unicode (UTF-8), 8859-x, and many other codesets.
When you calculate 'A' - '0', you get 65 - 48 = 17, which is the 'off-by-seven' you are seeing.
To convert the alphabetic characters 'A' to 'F' to their hex equivalents, you need some variation on:
c - 'A' + 10;
Remembering that 'a' to 'f' are also allowed and for them you'd need:
c - 'a' + 10;
Or you'd need to convert to upper-case first. Or you can use:
const char hexdigits[] = "0123456789ABCDEF";
int digit = strchr(hexdigits, toupper(c)) - hexdigits;
or any of a myriad other techniques. This last fragment assumes that c is known to contain a valid hex digit. It fails horribly if that is not the case.
Note that C does guarantee that the codes for the digits 0-9 are consecutive, but does not guarantee that the codes for the letters A-Z are consecutive. In particular, if the codeset is EBCDIC (mainly but not solely used on IBM mainframes), the codes for the letters are not contiguous.

Converting Decimal Literals to ASCII Equivalent for putchar in C

I am trying to understand why the following statement works:
putchar( 1 + '0' );
It seems that the + '0' expression converts the literal to the respective ASCII version (49 in this particular case) that putchar likes to be given.
My question was why does it do this? Any help is appreciated. I also apologize if I have made any incorrect assumptions.
This has nothing to do with ASCII. Nobody even mentioned ASCII.
What this code does assume is that in the system's character encoding all the numerals appear as a contiguous range from '0' to '9', and so if you add an offset to the character '0', you get the character for the corresponding numeral.
All character encodings that could possibly be used by a C or a C++ compiler must have this property (e.g. 2.3/3 in C++), so this code is portable.
Characters '0' to '9' are consecutive. The C standard guarantees this.
In ASCII:
'0' = 48
'1' = 49
'2' = 50
etc.
The '0' is simply seen as an offset.
'0' + 0 = 48, which is '0'.
'0' + 1 = 49, which is '1'.
etc.

Numeric value of digit characters in C

I have just started reading through The C Programming Language and I am having trouble understanding one part. Here is an excerpt from page 24:
#include<stdio.h>
/*countdigits,whitespace,others*/
main()
{
intc,i,nwhite,nother;
intndigit[10];
nwhite=nother=0;
for(i=0;i<10;++i)
ndigit[i]=0;
while((c=getchar())!=EOF)
if(c>='0'&&c<='9')
++ndigit[c-'0']; //THIS IS THE LINE I AM WONDERING ABOUT
else if(c==''||c=='\n'||c=='\t')
++nwhite;
else
++nother;
printf("digits=");
for(i=0;i<10;++i)
printf("%d",ndigit[i]);
printf(",whitespace=%d,other=%d\n",
nwhite,nother);
}
The output of this program run on itself is
digits=9300000001,whitespace=123,other=345
The declaration
intndigit[10];
declares ndigit to be an array of 10 integers. Array subscripts always start at zero in C, so the elements are
ndigit[0], ndigit[ 1], ..., ndigit[9]
This is reflected in the for loops that initialize and print the array. A subscript can be any integer expression, which includes integer variables like i,and integer constants. This particular program relies on the properties of the character representation of the digits. For example, the test
if(c>='0'&&c<='9')
determines whether the character in c is a digit. If it is, the numeric value of that digit is
c-'0'`
This works only if '0', '1', ..., '9' have consecutive increasing values. Fortunately, this is true for all character sets. By definition, chars are just small integers, so char variables and constants are identical to ints in arithmetic expressions. This is natural and convenient; for example
c-'0'
is an integer expression with a value between 0 and 9 corresponding to the character '0' to '9' stored in c, and thus a valid subscript for the array ndigit.
The part I am having trouble understanding is why the -'0' part is necessary in the expression c-'0'. If a character is a small integer as the author says, and the digit characters correspond to their numeric values, then what is -'0' doing?
Digit characters don't correspond to their numeric values. They correspond to their encoding values (in this case, ASCII).
IIRC, ascii '0' is the value 48. And, luckily for this example and most character sets, the values of '0' through '9' are stored in order in the character set.
So, subtracting the ASCII value for '0' from any ASCII digit returns its "true" value of 0-9.
The numeric value of a character is (on most systems) its ASCII value. The ASCII value of '0' is 48, '1' is 49, etc.
By subtracting 48 from the value of the character '0' becomes 0, '1' becomes 1, etc. By writing it as c - '0' you don't actually need to know what the ASCII value of '0' is (or that the system is using ASCII - it could be using EBCDIC). The only thing that matters is that the values are consecutive increasing integers.
It converts from the ASCII code of the '0' key on your keyboard to the value zero.
if you did int x = '0' + '0' the result would not be zero.
In most character encodings, all of the digits are placed consecutively in the character set. In ASCII for example, they start with '0' at 0x30 ('1' is 0x31, '2' is 0x32, etc.). If you want the numeric value of a given digit, you can just subtract '0' from it and get the right value. The advantage of using '0' instead of the specific value is that your code can be portable to other character sets with much less effort.
If you access a character string by their characters you'll get the ASCII values back, even if the characters happen to be numbers.
Fortunately the guys who designed that character table made sure that the characters for 0 to 9 are sequential, so you can simply convert from ASCII to a number by subtracting the ASCII-value of '0'.
That's what the code does. I have to admit that it is confusing when you see it the first time, but it's not rocket science.
The ASCII-character value of '0' is 48, '1' is 49, '2' is 50 and so on.
For reference here is a nice ASCII-chart:
http://www.sciencelobby.com/ascii-table/images/ascii-table1.gif

Resources