Why is '0' subtracted when converting string to number? - c

I am new to C and I was looking for a custom function in C that would convert a string to an integer and I came across this algorithm which makes perfect sense except for one part. What exactly is the -'0' doing on this line n = n * 10 + a[c] - '0';?
int toString(char a[]) {
int c, sign, offset, n;
if (a[0] == '-') { // Handle negative integers
sign = -1;
}
if (sign == -1) { // Set starting position to convert
offset = 1;
}
else {
offset = 0;
}
n = 0;
for (c = offset; a[c] != '\0'; c++) {
n = n * 10 + a[c] - '0';
}
if (sign == -1) {
n = -n;
}
return n;
}
The algorithm did not have an explanation from where I found it, here.

The reason subtracting '0' works is that character code points for decimal digits are arranged sequentially starting from '0' up, without gaps. In other words, the character code for '5' is greater than the character code for '0' by 5; character code for '6' is greater than the character code for '0' by 6, and so on. Therefore, subtracting the code of zero '0' from a code of another digit produces the value of the corresponding digit.
This arrangement is correct for ASCII codes, EBSDIC, UNICODE codes of decimal digits, and so on. For ASCII codes, the numeric codes look like this:
'0' 48
'1' 49
'2' 50
'3' 51
'4' 52
'5' 53
'6' 54
'7' 55
'8' 56
'9' 57

Assuming x has a value in the range between '0' and '9', x - '0' yields a value between 0 and 9. So x - '0' basically converts a decimal digits character constant to its numerical integer value (e.g., '5' to 5).
C says '0' to '9' are implementation defined values but C also guarantees '0' to '9' to be sequential values.

Related

About atoi function

`
#include <unistd.h>
int ft_atoi(char *str)
{
int c;
int sign;
int result;
c = 0;
sign = 1;
result = 0;
while ((str[c] >= '\t' && str[c] <= '\r') || str[c] == ' ')
{
c++;
}
while (str[c] == '+' || str[c] == '-')
{
if (str[c] == '-')
sign *= -1;
c++;
}
while (str[c] >= '0' && str[c] <= '9')
{
result = (str[c] - '0') + (result * 10);
c++;
}
return (result * sign);
}
#include <stdio.h>
int main(void)
{
char *s = " ---+--+1234ab567";
printf("%d", ft_atoi(s));
}
`
This line: result = (str[c] - '0') + (result * 10);
Why do we subtract zero and multiply by 10? How its convert ascii to int with this operations?
Thanks...
Some detail before answering your question
Internally everything is a number a char is not an exception.
In C char is a promoted type of integer meaning characters are integer in C. The char which is promoted type of integer are mapped to responding ASCII Value.
For example:
Capital Letter Range
65 => 'A' to 90 => 'Z'
Small Letter Range
97 => 'a' to 122 => 'z'
Number Range
48 => '0' to 57 => '9'
To answer your question
The ASCII CHARACTER '0' subtracted from any ASCII CHARACTER that is a digit(0-9) results to an actual Integer.
For Example
'9'(57) - '0'(48) = 9 (int)
'8'(56) - '0'(48) = 8 (int)
Remember char are promoted integer in C Read the detail to understand this statement.
And Also the ASCII CHARACTER '0' added to any INTEGER in the range(0-9) results to an ASCII CHARACTER.
For Example
9 + '0'(48) = '9'(57) (char)
8 + '0'(48) = '8' (56)(char)
Please see ASCII table
The ASCII code for '0' is 48 - not zero. Therefore, to convert to decimal you need to subtract 48

C program that sums a char with int

I have a given exercise that wants me to find the uppercase letter that is K places from the letter in this case char variable that is named C. The range is uppercase letters from A to Z.
For example if the input is B 3 the output should be E. For this specific input its simple you just sum the values and you get your answer but for example what if we go out of the range. Here is one example F 100 the program should output B because if the value is > than Z the program starts from A.
If there are some confusions I will try to explain it more here are some test cases and my code that only work if we don't cross the range.
Input Output
B 3 E
X 12345 S
F 100 B
T 0 T
#include <stdio.h>
int main(){
int K;
char C,rez;
scanf("%c %d",&C,&K);
int ch;
for(ch = 'A';ch <= 'Z';ch++){
if(C>='A' && C<='Z'){
rez = C+K;
}
}
printf("%c",rez);
return 0;
}
Think of the letters [A-Z] as base 26 where zero is A, one is B and 25 is Z.
As we sum of the letter (in base 26) and the offset, it is only the least significant base 26 digit we have interest, so use % to find the least significant base 26 digit much like one uses % 10 to find the least significant decimal digit.
scanf(" %c %d",&C,&K);
// ^ space added to consume any white-space
if (C >= 'A' && C <= 'Z') {
int base26 = C - 'A';
base26 = base26 + K;
base26 %= 26;
int output = base26 + 'A';
printf("%c %-8d %c\n", C, K, output);
}
For negative offsets we need to do a little more work as % in not the mod operator, but the remainder. This differs with some negative operands.
base26 %= 26;
if (base < 0) base26 += 26; // add
int output = base26 + 'A';
Pedantically, C + K may overflow with extreme K values. To account for that, reduce K before adding.
// base26 = C + K;
base26 = C + K%26;
We could be a little sneaky and add 26 to insure the sum is not negative.
if (C >= 'A' && C <= 'Z') {
int base26 = C - 'A';
base26 = base26 + K%26 + 26; // base26 >= 0, even when K < 0
base26 %= 26; // base26 >= 0 and < 26
int output = base26 + 'A';
printf("%c %-8d %c\n", C, K, output);
}
... or make a complex one-line
printf("%c %-8d %c\n", C, K, (C - 'A' + K%26 + 26)%26 + 'A');
This can be accomplished by using 2 concepts.
ASCII value
Modulus operator (%)
In C every character has an ASCII value. Basically it goes from 0-127.
The character 'A' has the value of 65
The character 'B' has the value of 66 (65 + 1)
and so on...
Until Z which is 65 + 25 = 90
And the 2nd concept I want to highlight in math is modulo arithmetic where if you always want to map a number to certain range, you can use a modulus operator.
Modulus is the reminder that you get after dividing a number by another number.
In our case, we have 26 alphabets so we can always get a number between 0 to 25
For the example you took
100 % 26 = 22
But you have to consider the starting point too.
So, we always subtract the initial alphabet by the value of 'A', i.e. 65 so that 'A' maps to 0 and 'Z' maps to 25
So, if we start with 'F' and need to go 100 places..
Subtract 'A' value from 'F' value. Characters behave like numbers so you can actually store 'F' - 'A' in an integer
In this case 'F' - 'A' = 5
Next we add the offset to this.
5 + 100 = 105
Then we perform modulus with 26
105 % 26 = 1
Finally add the value of 'A' back to the result
'A' + 1 = 'B'
And you are done
Get the remainder of input number with 26 using modulo operator. If sum of input character and remainder is less than or equal to Z then its the answer otherwise again find the remainder of sum with 26 and that will be answer (take care of offset because the ASCII decimal value of letter A is 65).
Roughly the implementation will be:
#include <stdio.h>
int main(){
int K;
char C, rez;
scanf("%c %d",&C,&K);
// Validate the user input
int ch;
int rem = K % 26;
if ((rem + C) - 'A' < 26) {
rez = rem + C;
} else {
rez = ((rem + C - 'A') % 26) + 'A';
}
printf("%c\n",rez);
return 0;
}
Note that, I know there is scope of improvement in the implementation. But this is just to give an idea to OP about how it can be done.
Output:
# ./a.out
B 3
E
# ./a.out
X 12345
S
# ./a.out
F 100
B
# ./a.out
T 0
T

c-changing string to int

I'm trying to change a string of chars into a number.
For example the string '5','3','9' into 539.
what I did is:
for (j = 0; j < len_of_str; j++)
num = num + ((str[j] - 48) * (10 ^ (len_of_str - j)))
printf("%d", num);
num is the number which would contain the number as a int the minus 48 is to change the value in ASCII to a number who's like the real number.
and the (10 ^ (len_of_str - j)) is the change the values to hundreds, thousands, etc...
Several issues:
First, ^ is not an exponentiation operator in C - it's a bitwise exclusive-OR operator. Instead of getting 10N, you'll get 10 XOR N, which is not what you want. C does not have an exponentiation operator (ironic for a language that defines eleventy billion operators, but there you go) - you'll need to use the library function pow instead. Or you can avoid the whole issue and do this instead:
num = 0;
for ( j = 0; j < len_of_str; j++ )
{
num *= 10;
num += str[j] - 48;
}
Second, str[j]-48 assumes ASCII encoding. To make that a bit more generic, use str[j] - '0' instead (in most encodings digit characters are sequential,
so '9' - '0' should equal 9).
Finally, is there a reason you're not using one of the built-in library functions such as atoi or strtol?
num = (int) strtol( str, NULL, 0 );
printf( "num = %d\n", num );
As pointed out by the comments above, ^ does not actually calculate a power, but instead does a bit-wise XOR (see wikipedia). For instance for 0101 ^ 0111 == 0010, as XOR will only set the bits to one for which the inputs differ in that bit.
To calculate 10 to the power something in c, use pow(double x, double y) from <math.h>. See this post for more information.
Convert a sequence of digits into an integer is a special case of the more general case of parsing a number (integer or real) into a binary integer or double value.
One approach is to describe the number using a pattern, which you can either describe iteratively, or recursively as follows,
An integer_string is composed of:
and optional '+' or '-' (sign)
follwed by a digit_sequence
a digit_sequence is composed of:
digit ('0', '1', '2', '3', ..., '9')
followed by an optional (recursive) digit_sequence
This can be written using Backus-Naur formalism as,
integer_string := { '+' | '-' } digit_sequence
digit_sequence := digit { digit_sequence }
digit := [ '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' ]
Should you desire, you can extend the above to recognize a real number,
real_number := integer_string { '.' { digit_sequence } }
{ [ 'e' | 'E' ] integer_string }
Although the above is not quite correct, as it forces a digit before the decimal (fix is left as an exercise for the reader).
Once you have the Backus-Naur formalism, it is easy to recognize the symbols that comprise the pattern, and the semantic action of the actual conversion to integer
long int
atol_self(char* str)
{
if(!str) return(0);
//accumulator for value
long int accum=0; //nothing yet
//index through the string
int ndx=0;
//handle the optional sign
int sign=1;
if ( str[ndx=0] == '+' ) { sign=1; ndx+=1; }
else if ( str[ndx=0] == '+' ) { sign=1; ndx+=1; }
for( ; str[ndx] && isdigit(str[ndx]); ) {
int digval = str[ndx] - '0';
accum = accum*10 + digval;
++ndx;
}
return(accum*sign);
}

Converting ascii hex string to byte array

I have a char array say char value []={'0','2','0','c','0','3'};
I want to convert this into a byte array like unsigned char val[]={'02','0c','03'}
This is in an embedded application so i can't use string.h functions. How can i do this?
Sicne you talk about an embedded application I assume that you want to save the numbers as values and not as strings/characters. So if you just want to store your character data as numbers (for example in an integer), you can use sscanf.
This means you could do something like this:
char source_val[] = {'0','A','0','3','B','7'} // Represents the numbers 0x0A, 0x03 and 0xB7
uint8 dest_val[3]; // We want to save 3 numbers
for(int i = 0; i<3; i++)
{
sscanf(&source_val[i*2],"%x%x",&dest_val[i]); // Everytime we read two chars --> %x%x
}
// Now dest_val contains 0x0A, 0x03 and 0xB7
However if you want to store it as a string (like in your example), you can't use unsigned char
since this type is also just 8-Bit long, which means it can only store one character. Displaying 'B3' in a single (unsigned) char does not work.
edit: Ok according to comments, the goal is to save the passed data as a numerical value. Unfortunately the compiler from the opener does not support sscanf which would be the easiest way to do so. Anyhow, since this is (in my opinion) the simplest approach, I will leave this part of the answer at it is and try to add a more custom approach in this edit.
Regarding the data type, it actually doesn't matter if you have uint8. Even though I would advise to use some kind of integer data type, you can also store your data into an unsigned char. The problem here is, that the data you get passed, is a character/letter, that you want to interpret as a numerical value. However, the internal storage of your character differs. You can check the ASCII Table, where you can check the internal values for every character.
For example:
char letter = 'A'; // Internally 0x41
char number = 0x61; // Internally 0x64 - represents the letter 'a'
As you can see there is also a differnce between upper an lower case.
If you do something like this:
int myVal = letter; //
myVal won't represent the value 0xA (decimal 10), it will have the value 0x41.
The fact you can't use sscanf means you need a custom function. So first of all we need a way to conver one letter into an integer:
int charToInt(char letter)
{
int myNumerical;
// First we want to check if its 0-9, A-F, or a-f) --> See ASCII Table
if(letter > 47 && letter < 58)
{
// 0-9
myNumerical = letter-48;
// The Letter "0" is in the ASCII table at position 48 -> meaning if we subtract 48 we get 0 and so on...
}
else if(letter > 64 && letter < 71)
{
// A-F
myNumerical = letter-55
// The Letter "A" (dec 10) is at Pos 65 --> 65-55 = 10 and so on..
}
else if(letter > 96 && letter < 103)
{
// a-f
myNumerical = letter-87
// The Letter "a" (dec 10) is at Pos 97--> 97-87 = 10 and so on...
}
else
{
// Not supported letter...
myNumerical = -1;
}
return myNumerical;
}
Now we have a way to convert every single character into a number. The other problem, is to always append two characters together, but this is rather easy:
int appendNumbers(int higherNibble, int lowerNibble)
{
int myNumber = higherNibble << 4;
myNumber |= lowerNibbler;
return myNumber;
// Example: higherNibble = 0x0A, lowerNibble = 0x03; -> myNumber 0 0xA3
// Of course you have to ensure that the parameters are not bigger than 0x0F
}
Now everything together would be something like this:
char source_val[] = {'0','A','0','3','B','7'} // Represents the numbers 0x0A, 0x03 and 0xB7
int dest_val[3]; // We want to save 3 numbers
int temp_low, temp_high;
for(int i = 0; i<3; i++)
{
temp_high = charToInt(source_val[i*2]);
temp_low = charToInt(source_val[i*2+1]);
dest_val[i] = appendNumbers(temp_high , temp_low);
}
I hope that I understood your problem right, and this helps..
If you have a "proper" array, like value as declared in the question, then you loop over the size of it to get each character. If you're on a system which uses the ASCII alphabet (which is most likely) then you can convert a hexadecimal digit in character form to a decimal value by subtracting '0' for digits (see the linked ASCII table to understand why), and subtracting 'A' or 'a' for letters (make sure no letters are higher than 'F' of course) and add ten.
When you have the value from the first hexadeximal digit, then convert the second hexadecimal digit the same way. Multiply the first value by 16 and add the second value. You now have single byte value corresponding to two hexadecimal digits in character form.
Time for some code examples:
/* Function which converts a hexadecimal digit character to its integer value */
int hex_to_val(const char ch)
{
if (ch >= '0' && ch <= '9')
return ch - '0'; /* Simple ASCII arithmetic */
else if (ch >= 'a' && ch <= 'f')
return 10 + ch - 'a'; /* Because hex-digit a is ten */
else if (ch >= 'A' && ch <= 'F')
return 10 + ch - 'A'; /* Because hex-digit A is ten */
else
return -1; /* Not a valid hexadecimal digit */
}
...
/* Source character array */
char value []={'0','2','0','c','0','3'};
/* Destination "byte" array */
char val[3];
/* `i < sizeof(value)` works because `sizeof(char)` is always 1 */
/* `i += 2` because there is two digits per value */
/* NOTE: This loop can only handle an array of even number of entries */
for (size_t i = 0, j = 0; i < sizeof(value); i += 2, ++j)
{
int digit1 = hex_to_val(value[i]); /* Get value of first digit */
int digit2 = hex_to_val(value[i + 1]); /* Get value of second digit */
if (digit1 == -1 || digit2 == -1)
continue; /* Not a valid hexadecimal digit */
/* The first digit is multiplied with the base */
/* Cast to the destination type */
val[j] = (char) (digit1 * 16 + digit2);
}
for (size_t i = 0; i < 3; ++i)
printf("Hex value %lu = %02x\n", i + 1, val[i]);
The output from the code above is
Hex value 1 = 02
Hex value 2 = 0c
Hex value 3 = 03
A note about the ASCII arithmetic: The ASCII value for the character '0' is 48, and the ASCII value for the character '1' is 49. Therefore '1' - '0' will result in 1.
It's easy with strtol():
#include <stdlib.h>
#include <assert.h>
void parse_bytes(unsigned char *dest, const char *src, size_t n)
{
/** size 3 is important to make sure tmp is \0-terminated and
the initialization guarantees that the array is filled with zeros */
char tmp[3] = "";
while (n--) {
tmp[0] = *src++;
tmp[1] = *src++;
*dest++ = strtol(tmp, NULL, 16);
}
}
int main(void)
{
unsigned char d[3];
parse_bytes(d, "0a1bca", 3);
assert(d[0] == 0x0a);
assert(d[1] == 0x1b);
assert(d[2] == 0xca);
return EXIT_SUCCESS;
}
If that is not available (even though it is NOT from string.h), you could do something like:
int ctohex(char c)
{
if (c >= '0' && c <= '9') {
return c - '0';
}
switch (c) {
case 'a':
case 'A':
return 0xa;
case 'b':
case 'B':
return 0xb;
/**
* and so on
*/
}
return -1;
}
void parse_bytes(unsigned char *dest, const char *src, size_t n)
{
while (n--) {
*dest = ctohex(*src++) * 16;
*dest++ += ctohex(*src++);
}
}
Assuming 8-bit bytes (not actually guaranteed by the C standard, but ubiquitous), the range of `unsigned char` is 0..255, and the range of `signed char` is -128..127. ASCII was developed as a 7-bit code using values in the range 0-127, so the same value can be represented by both `char` types.
For the now discovered task of converting a counted hex-string from ascii to unsigned bytes, here's my take:
unsigned int atob(char a){
register int b;
b = a - '0'; // subtract '0' so '0' goes to 0 .. '9' goes to 9
if (b > 9) b = b - ('A' - '0') + 10; // too high! try 'A'..'F'
if (b > 15) b = b - ('a' - 'A); // too high! try 'a'..'f'
return b;
}
void myfunc(const char *in, int n){
int i;
unsigned char *ba;
ba=malloc(n/2);
for (i=0; i < n; i+=2){
ba[i/2] = (atob(in[i]) << 4) | atob(in[i+1]);
}
// ... do something with ba
}

what does putchar('0' + num); do?

I am trying to understand how the putchar('0' + r); works. Below, the function takes an integer and transform it to binary.
void to_binary(unsigned long n)
{
int r;
r = n % 2;
if (n >= 2)
to_binary(n / 2);
putchar('0' + r);
}
I google the definition of putchar but I didn't find this. To test it, I added a printf to see the value of the r:
void to_binary(unsigned long n)
{
int r;
r = n % 2;
if (n >= 2)
to_binary(n / 2);
printf("r = %d and putchar printed ", r);
putchar('0' + r);
printf("\n");
}
and I run it (typed 5) and got this output:
r = 1 and putchar printed 1
r = 0 and putchar printed 0
r = 1 and putchar printed 1
So I suppose that the putchar('0' + r); prints 0 if r=0, else prints 1 if r=1, or something else happens?
In C '0' + digit is a cheap way of converting a single-digit integer into its character representation, like ASCII or EBCDIC. For example if you use ASCII think of it as adding 0x30 ('0') to a digit.
The one assumption is that the character encoding has a contiguous area for digits - which holds for both ASCII and EBCDIC.
As pointed out in the comments this property is required by both the C++ and C standards. The C standard says:
5.2.1 - 3
In both the source and execution basic character sets, the value of
each character after 0 in the above list of decimal digits shall be
one greater than the value of the previous.
'0' represents an integer equal to 48 in decimal and is the ASCII code for the character 0 (zero). The ASCII code for the character for 1 is 49 in decimal.
'0' + r is the same as 48 + r. When r = 0, the expression evaluates to 48 so a 0 is outputted. On the other hand, when r = 1, the expression evaluates to 49 so a 1 is outputted. In other words, '0' + 1 == '1'
Basically, it's a nice way to convert decimal digits to their ASCII character representations easily. It also works with the alphabet (i.e. 'A' + 2 is the same as C)
It's a common technique used for char handing.
char a = '0' + r (r in [0,9]) will convert an integer to its char format based on given char base (i.e. '0' in this case), you will get '0'...'9'
Similarly, char a = 'a' + r or char a = 'A' + r (r in [0,25]) will convert an integer to its char format, you will get 'a'...'z' or 'A'...'Z' (except for EBCDIC systems which has discontinuous area for alphabets).
Edit:
You can also do the other way around, for example:
char myChar = 'c';
int b = myChar - 'a'; // b will be 2
Similar idea is used to convert a lowercase char to uppercase:
char myChar = 'c';
char newChar = myChar - 'a' + 'A'; // newChar will be 'C'
U are adding the ASCII value of the number's
say '0' ASCII value is 48
'1' -> 49,and so on CHECK HERE FOR COMPLETE TABLE
so when u add one to 48 it will 49 and putchar functuion prints the character sent to it. when u do
putchar('0' + r )
if r = 1 putchar(48 + 1) (converting into ASCII value)
putchar(49) which is 1

Resources