Char to int conversion in C - c

If I want to convert a single numeric char to it's numeric value, for example, if:
char c = '5';
and I want c to hold 5 instead of '5', is it 100% portable doing it like this?
c = c - '0';
I heard that all character sets store the numbers in consecutive order so I assume so, but I'd like to know if there is an organized library function to do this conversion, and how it is done conventionally. I'm a real beginner :)

Yes, this is a safe conversion. C requires it to work. This guarantee is in section 5.2.1 paragraph 2 of the latest ISO C standard, a recent draft of which is N1570:
Both the basic source and basic execution character sets shall have the following
members:
[...]
the 10 decimal digits
0 1 2 3 4 5 6 7 8 9
[...]
In both the source and execution basic character sets, the
value of each character after 0 in the above list of decimal digits shall be one greater than
the value of the previous.
Both ASCII and EBCDIC, and character sets derived from them, satisfy this requirement, which is why the C standard was able to impose it. Note that letters are not contiguous iN EBCDIC, and C doesn't require them to be.
There is no library function to do it for a single char, you would need to build a string first:
int digit_to_int(char d)
{
char str[2];
str[0] = d;
str[1] = '\0';
return (int) strtol(str, NULL, 10);
}
You could also use the atoi() function to do the conversion, once you have a string, but strtol() is better and safer.
As commenters have pointed out though, it is extreme overkill to call a function to do this conversion; your initial approach to subtract '0' is the proper way of doing this. I just wanted to show how the recommended standard approach of converting a number as a string to a "true" number would be used, here.

Try this :
char c = '5' - '0';

int i = c - '0';
You should be aware that this doesn't perform any validation against the character - for example, if the character was 'a' then you would get 91 - 48 = 49. Especially if you are dealing with user or network input, you should probably perform validation to avoid bad behavior in your program. Just check the range:
if ('0' <= c && c <= '9') {
i = c - '0';
} else {
/* handle error */
}
Note that if you want your conversion to handle hex digits you can check the range and perform the appropriate calculation.
if ('0' <= c && c <= '9') {
i = c - '0';
} else if ('a' <= c && c <= 'f') {
i = 10 + c - 'a';
} else if ('A' <= c && c <= 'F') {
i = 10 + c - 'A';
} else {
/* handle error */
}
That will convert a single hex character, upper or lowercase independent, into an integer.

You can use atoi, which is part of the standard library.

Since you're only converting one character, the function atoi() is overkill. atoi() is useful if you are converting string representations of numbers. The other posts have given examples of this. If I read your post correctly, you are only converting one numeric character. So, you are only going to convert a character that is the range 0 to 9. In the case of only converting one numeric character, your suggestion to subtract '0' will give you the result you want. The reason why this works is because ASCII values are consecutive (like you said). So, subtracting the ASCII value of 0 (ASCII value 48 - see ASCII Table for values) from a numeric character will give the value of the number. So, your example of c = c - '0' where c = '5', what is really happening is 53 (the ASCII value of 5) - 48 (the ASCII value of 0) = 5.
When I first posted this answer, I didn't take into consideration your comment about being 100% portable between different character sets. I did some further looking around around and it seems like your answer is still mostly correct. The problem is that you are using a char which is an 8-bit data type. Which wouldn't work with all character types. Read this article by Joel Spolsky on Unicode for a lot more information on Unicode. In this article, he says that he uses wchar_t for characters. This has worked well for him and he publishes his web site in 29 languages. So, you would need to change your char to a wchar_t. Other than that, he says that the character under value 127 and below are basically the same. This would include characters that represent numbers. This means the basic math you proposed should work for what you were trying to achieve.

Yes. This is safe as long as you are using standard ascii characters, like you are in this example.

Normally, if there's no guarantee that your input is in the '0'..'9' range, you'd have to perform a check like this:
if (c >= '0' && c <= '9') {
int v = c - '0';
// safely use v
}
An alternative is to use a lookup table. You get simple range checking and conversion with less (and possibly faster) code:
// one-time setup of an array of 256 integers;
// all slots set to -1 except for ones corresponding
// to the numeric characters
static const int CHAR_TO_NUMBER[] = {
-1, -1, -1, ...,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, // '0'..'9'
-1, -1, -1, ...
};
// Now, all you need is:
int v = CHAR_TO_NUMBER[c];
if (v != -1) {
// safely use v
}
P.S. I know that this is an overkill. I just wanted to present it as an alternative solution that may not be immediately evident.

As others have suggested, but wrapped in a function:
int char_to_digit(char c) {
return c - '0';
}
Now just use the function. If, down the line, you decide to use a different method, you just need to change the implementation (performance, charset differences, whatever), you wont need to change the callers.
This version assumes that c contains a char which represents a digit. You can check that before calling the function, using ctype.h's isdigit function.

Since the ASCII codes for '0','1','2'.... are placed from 48 to 57 they are essentially continuous. Now the arithmetic operations require conversion of char datatype to int datatype.Hence what you are basically doing is:
53-48 and hence it stores the value 5 with which you can do any integer operations.Note that while converting back from int to char the compiler gives no error but just performs a modulo 256 operation to put the value in its acceptable range

You can simply use theatol()function:
#include <stdio.h>
#include <stdlib.h>
int main()
{
const char *c = "5";
int d = atol(c);
printf("%d\n", d);
}

Related

Comparing single quote numbers instead of regular numbers(numbers without quotes)? C Programming Language K&R

This is part of a code to count white spaces, numbers, or other from the K&R "C programming book." I am confused why it compares "int c" to digits using '0' and '9' instead of 0 and 9. I realize the code doesn't work if I use 0 and 9 without quotes. I am just trying to understand why. Does this have to do with c being equal to getchar()?
while ((c = getchar()) != EOF)
if (c >= '0' && c <= '9')
++ndigit[c-'0'];
else if (c == ' ' || c == '\n' || c == '\t')
++nwhite;
else
++nother;
looking at the man page for getchar, we see that it returns the character read as an unsigned char cast to an int. So we can assume the value stored is not an integer number, but its ascii equivalent, and can be compared with chars such as '0' and '9'.
A char usually is just an integer. Where the meaing is given by some charset. For example ASCII.
So for example we could store "Hello" as the sequence 72, 65, 108, 108 and 111.
Using single quotes (as in '9') we tell that we mean the number which represents the character '9'. Behind the scenes the computer only knows numbers and so this will end up in the code 57 for our example (see char '9', in red, maps to code 57 in the ASCII table). For more examples see linked ASCII table above.
Same counts for the chars in our input data. Also those are encoded into those numbers according to the charset we're using.
In contrast if we would just use a plain 9 we would ask for exactly the code 9. And not "the code which represents char 9". That's the difference.
BTW: There's another "trick" used in the code sample. it is c-'0' which asks to subtract "the code behind the character '0'" from our current character c. If we do this, we will end up with the digit not as the character, but as the number behind it. Example:
Assume c is the character '4'.
So in c it is stored as the code 52 (see ASCII table)
If we now want the numeric value 4 in place of the character '4' we just subtract the character '0' from it (code 48 in ASCII)
So 52 - 48 will end up as 4 (not a char but the number behind it)
getchar() returns a signed integer in order to allow it to return EOF (-1). If it returned a char, you could not have an error value.
Moreover '9' is a literal character constant, whose value is the character set code for the digit character '9' and not the integer value 9, and in C (but not C++) has type int, so there is in any case no type mismatch in the expression c <= '9' for example, it is an int comparison.
Even if that were not the case, and a literal character constants had char type (as in C++), there would be an implicit type promotion to int before comparison.
Also, you need to understand that a char is not specifically a character, but rather simply an integer type that is the:
Smallest addressable unit of the machine that can contain basic character set.

Check if array is ASCII

How do I check in C if an array of uint8 contains only ASCII elements?
If possible please refer me to the condition that checks if an element is ASCII or not
Your array elements are uint8, so must be in the range 0-255
For standard ASCII character set, bytes 0-127 are used, so you can use a for loop to iterate through the array, checking if each element is <= 127.
If you're treating the array as a string, be aware of the 0 byte (null character), which marks the end of the string
From your example comment, this could be implemented like this:
int checkAscii (uint8 *array) {
for (int i=0; i<LEN; i++) {
if (array[i] > 127) return 0;
}
return 1;
}
It breaks out early at the first element greater than 127.
All valid ASCII characters have value 0 to 127, so the test is simply a value check or 7-bit mask. For example given the inclusion of stdbool.h:
bool is_ascii = (ch & ~0x7f) == 0 ;
Possibly however you intended only printable ASCII characters (excluding control characters). In that case, given inclusion of ctype.h:
bool is_printable_ascii = (ch & ~0x7f) == 0 &&
(isprint() || isspace()) ;
Your intent may be lightly different in terms of what characters you intend to include in your set - in which case other functions in ctype.h may be applied or simply test the values for value or range to include/exclude.
Note also that the ASCII set is very restricted in international terms. The ANSI or "extended ASCII" set uses locale specific codepages to define the glyphs associated with codes 128 to 255. That is to say the set changes depending on language/locale settings to accommodate different language characters, accents and alphabets. In modern systems it is common instead to use a multi-byte Unicode encoding (or which there are several with either fixed or variable length codes). UTF-8 encoding is a variable width encoding where all single byte encodings are also ASCII codes. As such, while it is trivial to determine whether data is entirely within the ASCII set, it does not follow that the data is therefore text. If the test is intended to distinguish binary data from text, it will fail in a great many scenarios unless you can guarantee a priori that all text is restricted to the ASCII set - and that is application specific.
You cannot check if something is "ASCII" with standard C.
Because C does not specify which symbol table that is used by a compiler. Various other more or less exotic symbol tables exists/existed.
UTF8 for example, is a superset of ASCII. Older, dysfunctional 8 bit symbol tables have existed, such as EBCDIC and "Extended ASCII". To tell if something is for example ASCII or EBCDIC can't be done trivially, without a long line of value checks.
With standard C, you can only do the following:
You can check if a character is printable, with the function isprint() from ctype.h.
Or you can check if it only has up to 7 bits only set, if((ch & 0x7F)==ch).
In C programming, a character variable holds ASCII value (an integer number between 0 and 127) rather than that character itself.
The ASCII value of lowercase alphabets are from 97 to 122. And, the ASCII value of uppercase alphabets are from 65 to 90.
incase of giving the actual code , i am giving you example.
You can assign int to char directly.
int a = 47;
char c = a;
printf("%c", c);
And this will also work.
printf("%c", a); // a is in valid range
Another approach.
An integer can be assigned directly to a character. A character is different mostly just because how it is interpreted and used.
char c = atoi("47");
Try to implement this after understand the following logic properly.

Clear explanation of ndigit[c - '0']

I've finally started to read the K&R, and I've just arrived to the Array part.
But in the example of this section, there's a piece of code I really don't understand completely, so I'd like to ask you for a clear explanation of it, as I wanna understand every concept of C as I know it's the fundamental for a fast learning of C++.
By the way I already have a decent knowledge of JAVA, hope this will help you in your explanation setup.
Question:
In this piece of code ndigit[c - '0'] I don't understand what it's trying to do, I know from other Stack Overlfow questions that 0 should refer to the ASCI standards and should be 48, but still don't understand what c and that 0 have in relationship.
Short answer:
by ndigit[c-'0'] you are not doing normal arithmetic e.g [5+5] = [10]. Rather you are first converting your characters into character code. Then doing arithmetic with character code.
To say simply- we represent five in english with this "5" sign. Other languages have their own sign for five i.e δΊ” is five in japanise. Likewise Computer represent five with 53(assuming ascii, ascii is like a language). So with ndigit[c-'0'], its first being converted to character code then doing arithmetic.
Long answer:
Lets go through ndigit[c - '0'] first.
ndigit = name of the array
[ ] = index, used for specifying total elements number.
c - '0' = arithmetic operation. c is a variable. - is minus.And
'0' is not 0 but 48.
now let me add additional code from the from the book -
int ndigit[10];
...//fill in the array with 0s
while((c = getchar()) != EOF)
if(c >= '0' && c <= '9')
++ndigit[c - '0']; //<== unable to understand this part
here something to note of is getchar(). getchar() return int type data. even though it return int type but it wont return the character rather it will return character code. Let me give a example-
#include <stdio.h>
int main(){
int c;
c = getchar();
printf("%d\n", c);
}
output:
#mix:~
$ cc test.c
#mix:~
$ ./a.out
5 ;wrote 5 in terminal
53 ; printed 53 instead of 5
5 is the character. And 53 is the character code of 5.
Now lets back to our main topic ndigit[c - '0']. c is getting a value from getchar(). and getchar() reads from input. lets say the input is 5 . Now due to getchar() function behaviour instead of 5, c will contain 53. So
ndigit[c - '0'] == `ndigit[53 - '0']` != `ndigit[5 - '0']`
Also notice we are not using 0. Rather '0'.
means the arithmetic -
ndigit[53 - '0']
=ndigit[53] ;**wrong**
using '0' means we meant to use character code(used single quote). Like described above '0' = 48 (according to ascii). so
ndigit[53 - '0']
= ndigit[53 - 48]
= ndigit[5]
Now we get back our character which was read by getchar(). But why should we get back our character? Will ndigit[c] instead of ndigit[c - '0'] work?
ndigit[c] wont work because at the start of our code we wrote ndigit[10] . Our ndigit[10] array can hold 10 elements at max. As a result c cant be greater than 10 or ndigit[53] is invalid as its size is 53 and surpassing 10 . Thats why we use ndigit[c - '0'] to do character code subtraction and get a value under 10.
IF still unclear, search and learn about the following-
charater encoding
array in c
In this code c is a char, presumably representing a digit. In C a char is an integral type, so you can perform arithmetic operations on them.
Digits are encoded with numbers from a consecutive range: if the code for '0' is k, the code for '1' is k+1, the code for '2' is k+2, and so on. That is why by subtracting '0' from a character representing a digit you get the numeric value of that digit.
For example, by subtracting '5'-'0' you get a numeric 5 instead of character '5'.
If you make an array ndigit[10], then ndigit[c - '0'] lets you access an array element corresponding to the digit. This can be used, for example, to count the number of different digits in the input.
As you said '0' is equal to 48 (assuming ASCII encoding). Thus the other digits are equal to 49 through 57 respectively. So '1' is equal to 49, '2' to 50 etc. Thus '1' - '0' is equal to '49 - 48', which is 1 and '2' - '0' is equal to '50 - 48', which is 2 and so on.
In other words c - '0' converts a digit like '5' to its integer equivalent (which would be 5 for '5').

What is the size of a given char?

I've got a code that i cannot understand in C;
char c is string, that supposed to be randomized,
here is the question however, 26 is supposed to be range of values starting from 97, but it easy to understand for integer, but in case of char i have no clue what it is supposed to be
char c = (char) rand() % 26 + 97;
That is generating a random character. In ASCII, alphabetical characters start at 97. So, the code is taking 97, adding a random number between 0 and 25 to it, then casting it to a char, which generates a random alphabetical character.
97 = ascii 'a'
It generates a random character between 'a' and 'z' inclusive.
Ref: ASCII values
It's a bad, non-portable way to generate a random character by directly computing the ASCII code.
A better, more portable, way is to randomize the index into a table of characters. This pushes the responsibility for what code is used to represent each character into the compiler, where it belongs:
char random_char(void)
{
const char alpha[] = "abcdefghijklmnopqrstuvwxyz";
return alpha[rand() % sizeof alpha];
}
Any decent compiler will very likely inline the above.
NOTE Using % to range-limit the return value of rand() is generally frowned upon, but that's not the focus here.

C programming - integers and characters

I ripped this from an ebook on C programming.
I understand that ASCII representations of the characters '0' and '9' are integers, so I understand the compatibility with the integer array. I am simply not sure how the shown output is computed? There input is the code itself.
What does this statement mean?
++ndigit[c-'0'];
So, is the program essentially checking if the input is one of the first 10 installments of of the ASCII code table?
ASCII CODE
No, it doesn't.
c - '0' subtracts the (not necessarily ASCII) character code of the character 0 from that of c. This will yield a number between 0 and 9 if c is a digit. Then, the resulting integer is used to index the zero-initialized ndigit array using the [] operator, and the prefix increment operator (++) is then used to increment the element at that particular index.
By the way, the code is erroneous at multiple places. I suggest you switch to another book because this one appears to be either outdated and/or encouraging the use of several types of bad programming practice.
First, main() doesn't have a return type, which is an error. It needs to be declared as int main() or int main(void) or int main(int, char **). Older compilers had the bad habit of assuming an implicit int return type if it was omitted, but this behavior is now deprecated.
Second, it would be better to initialize the ndigit array, like this:
int ndigit[10] = { 0 };
The for loop is superfluous because we can have initialization; it's also less readable than the initialization syntax, and it's also dangerous: the author doesn't calculate the count of the array using sizeof(ndigits) / sizeof(ndigits[0]), but he hardcodes its length, which may cause a buffer overrun when the length of the array is changed (decreased) and the hard-coded length value in the for loop is forgotten about.
The program computes the number of times a digit between 0 and 9 was introduced as input, how many white spaces and how many other characters were in the input.
++ndigit[c-'0'];
'0' - as integer is the ASCII code for 0.
c - is the read character (its ASCII code)
c - '0' = the actual digit (between 0 and 9) represented by the ASCII code c.
For example '3'(ASCII) would be 3(digit=integer) + '0'(ASCII)
So that's how you obtain the index in the array for your digit and you increment the number of times that digit showed up.

Resources