This question already has answers here:
Why does subtracting '0' in C result in the number that the char is representing?
(8 answers)
Closed 1 year ago.
On page 24 of K and R - C programming book,
there is this code from a program to count digits and other input.
while((c = getchar()) != EOF)
if(c >= '0' && c <= '9')
++ndigit[c - '0'];
I am not understanding the last line of the code. What does [c - '0'] mean?
Entire program from the book: https://imgur.com/a/4WhIOsz
Every character has a numeric value; that means if you input the character "0" you're not actually inputting the number "0". You're inputting the character "0" which its value is 48 in the ASCII table (it may be different in other character sets...)
so when you input the character "0" you get the actual number zero in the following (According to the ASCII table):
c - '0' means c - 48
'0' - '0' means 48 - 48 = 0
'1' - '0' means 49 - 48 = 1
'2' - '0' means 50 - 48 = 2
and so on...
notice that the code is designed to work with different character sets. not just ASCII.
Search "ASCII table" in google to see the chart in order to understand it better.
'0' is the ASCII character 0, with a decimal value of 48. (See https://www.asciitable.com for a full listing.)
Since the numerals 0-9 are all guaranteed by the C standard to have consecutive values (in this case, decimal values from 48 to 57), by subtracting the character 0 from the input character, you arrive at the corresponding integer value, that you can then use in further processing.
'0' - '0' = 0
'1' - '0' = 1
'2' - '0' = 2
...
'9' - '0' = 9
In this case, the further processing is then used to index into the ndigit array.
C language mandates the code representation for numbers to be consecutive. That means that even if you do not use ASCII (EBCDIC used to be another common charset) you can be sure that the code for '5' will be the code for '0' + 5.
So the idiom c - '0' is guaranteed to be portable across any conformant system, while c - 48 (or c - 0x30) will only work on ASII (or ASCII derivatives like latin1, cp1252, utf8, etc.) systems.
Related
"0123456789abcdef"[num % base]
Putting num%base = 0
"0123456789abcdef"[0]
It gives 48 but how.
When i use 0 as index than it should give me 0.
"0123456789abcdef" is a string literal, which defines an array filled with those characters (plus a terminating null character). [num % base] is a subscript operator with the subscript num % base.
So "0123456789abcdef"[num % base] selects one of the characters from the array. It is intended to select the character for the last digit of num when it is represented as a numeral in base base.
When num % base is zero, it selects the “0” character. Your C implementation uses ASCII codes for some characters. The ASCII code for “0” is 48.
This question already has answers here:
Why does subtracting '0' in C result in the number that the char is representing?
(8 answers)
Convert char to int in C and C++
(14 answers)
Closed 4 years ago.
m=1e9 + 7;
inline ll rem(char s[],ll m)
{
ll sum=0 , i;
for(i=0;S[i]!='\0';i++)
{
if(sum>=m)
sum %= m;
sum=(sum * 10 + S[i] - '0');
}
return sum%m;
}
here S[i] is a string of integer characters. My question is
what does -'0' does here, also can a character (here S[i]) be automatically converted to integer form is the above
sum=(sum * 10 + S[i] - '0');
equation.
First, you have to remember that characters in C are represented as tiny integers corresponding to the character's value in the machine's character set, which is typically ASCII.
For example, 'A' n ASCII is 65, and '0' is 48.
So if you're converting a string of digits to an integer, you want to do something like
int digit = c - 48;
That converts '0' to 0, '1' to 1, etc.
But that magic number 48 is mystifying, and it's theoretically also wrong on a machine using a character set other than ASCII. So the easier (because you don't have to remember that value 48), self-documenting (as long as your reader understands the idiom), and more portable way is to do
int digit = c - '0';
This works because, as I said, '0' is 48 in ASCII. But, more importantly, even on a non-ASCII machine, '0' is whatever value the character '0' has in that machine's character set, so it's always the right value to subtract, no matter what kind of machine you're using.
If c is the numerical value of an uppercase character (i.e. B is 66) and for the sake of argument, k is a key value of 2? I'm new to programming and don't understand how the modulo works in this. I know it takes the value of the remainder, but then wouldn't it simplify like this?
c = B = 66
k = 2
I imagine the result should be 'D'
(66 - 65 +2)%26 +65
(3)%26 +65
0 + 65
65 = 'A'
I must not understand the way % works.
Key Fact - The ASCII code of the letter"A" is 65.
Here is how your cypher works - the original expression in the question title.
Take the ASCII value of a letter, subtract the value of "A" from it giving you a 0 based number.
Add the key value to this number shifting it by k places.
Now divide the number you got above by 26, discard the quotient and use the remainder. This is the modulo operator %. This always keeps you numbers in the 0-25 range, since dividing by 26 will never a have a remainder great than 25.
Add 65 to it to convert it into an "encrypted" uppercase letter.
This allows the key to be ANY number and still keeps the "encrypted" output within the ASCII range of A-Z.
You are interpreting the % operator as division. In reality, it's modulo or forget-the-quotient-I want-the-remainder operator.
Example
0%2 is 0
1%2 is 1
2%2 is 0
3%2 is 1
And so on. Modulo is cyclic.
Modulus is not int division. Modulus gives you the remainder of a division, so 3 / 26 is 0 with a remainder of 3. Therefore, 3 % 26 is 3.
3 % 26 is 3, not 0. Modulus is the remainder. Think of modulus 12 on a clock. If it is ten o'clock, and you add 4 hours, 10 + 4 = 14. But on the clock, the hand now points to 2, not 14. No matter how many hours you add, the hand always points to a number from 1 to 12. This is how modulus works.
10 + 4 = 14
14 % 12 = 2 (14 divided by 12 is 1 with remainder 2)
10 + 100 = 110
110 % 12 = 4 (110 divided by 12 is remainder 4)
If it is 10 o'clock, and you wait 100 hours, the hand now points to 4.
(Using the remainder of a division, dividing by 12 always gives a number from 0 to 11, so think of 12 o'clock as 0 o'clock.)
((c - 65 + k) % 26) + 65) works, but is non portable and unnecessarily obfuscated.
65 is the ASCII code for 'A' the character constant representing the letter A. c - 65 or better c - 'A' evaluates to the distance of the uppercase letter stored in c from A, hence 1 for the letter B.
Adding k operates a shift in the alphabet, but can produce offsets greater than 25, hence the modulo operation to compute the remainder of the division by 26. (c - 65 + k) % 26 gives the offset if the encoded letter.
Adding 65 or more appropriately 'A' converts the offset back to an uppercase letter.
This expression makes the silent assumption that all uppercase letters are consecutive in the execution character set, which is true for ASCII, but not for older character sets such as EBCDIC.
Note also that the above expressions only work for positive values of k. If k is negative, the result of (c - 'a' + k) % 26 + 'a' may be negative too, hence k should be changed to a positive value first with this code:
k = k % 26;
if (k < 0)
k = k + 26;
Here is a more readable alternative:
char encode_letter(char c, int k) {
k = k % 26;
if (k < 0)
k = k + 26;
if (c >= 'A' && c <= 'Z')
return (c - 'A' + k) % 26 + 'A';
else
if (c >= 'a' && c <= 'a')
return (c - 'a' + k) % 26 + 'a';
else
return c;
}
This macro is found in "dosname.h" from coreutils.
# define _IS_DRIVE_LETTER(C) (((unsigned int) (C) | ('a' - 'A')) - 'a' \
<= 'z' - 'a')
Could someone explain it for me?
It checks, if the given character is a valid letter to be used to name a drive (e.g. the "C" in "C:\Something" is usually used to name the main hard drive) in MS-DOS-based operating systems.
It's an efficient way to check, if the character matches the regular expressions [a-zA-Z] (English alphabet letters, both upper-case and lower-case).
It does so by first eliminating the differences between lower-case and upper-case letters using the bitwise "or" operation |. In the general case a conditional addition would be necessary: Add the distance between the lower-case 'a' and the upper-case 'A', if the ASCII code is below the one of 'A'. But since in ASCII the difference between a lower-case character and its corresponding upper-case value is always exactly 32 (= 2^5 = 100000b) and the range of lower-case letters is smaller than 32, we see that the only difference in the binary representation of a lower-case letter and its corresponding upper-case letter is the 6th bit, e.g. 1000001b == A and 1100001b == a. So we can convert upper-case letters to lower-case letters by setting that 6th bit using the | with 32 (== 'a' - 'A') as second operand. This will do weird stuff to all other characters that aren't letters, but it won't ever convert them to a letter. This trick works more efficiently (without having to use a condition).
Then it is checked, if the character (that is now lower-case, if it was an upper-case letter before) is between 'a' and 'z'. This is done by subtracting 'a' from the character and checking, if the result is smaller than the number of characters. That works because in ASCII the English characters of the same case are defined as a continuous sequence.
# define _IS_DRIVE_LETTER(C) (((unsigned int) (C) | ('a' - 'A')) - 'a' \
<= 'z' - 'a')
('a' - 'A') is the lowercase flag.
(C) | ('a' - 'A') set the lowercase flag in C. (now C can be between 'a' and 'z' but can be anything else also).
'z' - 'a' compute the range between letters 'a' and 'z' (so 25, for letter 'a'(0) to letter 'z'(25)).
The comparison <= checks to verify is 'C' converted to lowercase and offset to 0 beginning at letter 'a' is between 0 and 25, so between letters 'a' and 'z' (so, is it a valid drive letter for MS-DOS).
I came across this C program:
int main() {
printf("Enter your address, (e.g. 51 Anzac Road) ");
gets(address);
number = 0;
i = 0;
while (address[i] != ' ') {
number = number * 10 + (address[i] - 48);
i++;
}
}
I understand number = number * 10 + (address[i] - 48); is to get the number from input, but can anybody explain to me how this works? How does that produce the number from the input?
C requires the digits 0 through 9 to be stored contiguously, in that order, in the execution character set. 48 is the ASCII value of '0', so, for instance:
'3' - 48 == 3
for any digit.
ASCII is not required for C, so better is:
'3' - '0'
because while 48 is right for ASCII, '0' is by definition right for any character set.
If address contains "456 ", then:
when i == 0 and number == 0, number * 10 + (address[0] - 48) equals 0 * 10 + 4, or 4.
when i == 1, number * 10 + (address[1] - 48) is 4 * 10 + 5, or 45.
when i == 2, number * 10 + (address[2] - 48) is 45 * 10 + 6, or 456
and you're done.
Never use gets(), it's dangerous, and isn't even part of C anymore.
In ASCII, the digit characters '0' through '9' occupy code points 48 through 57 (i hex, 0x30 through 0x39) so, to turn a digit character into a value, you just subtract 48.
As an aside, you should really subtract '0' since the standard doesn't guarantee ASCII, though it does guarantee that the digit characters are contiguous and ordered. C under z/OS, for example, uses EBCDIC which places the digits at code points 0xf0 through 0xf9.
The loop itself is a simple shift-and-add type, to create a number from multiple digit characters. Say you have the string "123", and number is initially zero.
You multiply number (zero) by ten to get zero then add digit character '1' (49) and subtract 48. This gives you one.
You then multiply number (one) by ten to get ten and add digit character '2' (50), again subtracting 48. This gives you twelve.
Finally, you multiply number (twelve) by ten to get a hundred and twenty then add digit character '3' (51) and subtract 48. This gives you a hundred and twenty three.
There are better ways to do this in the C standard library, atoi or the more robust strtol-type functions, all found in stdlib.h. The latter allow you to better detect if there was "rubbish" at the end of the number, for assistance with validation (atoi cannot tell the difference between 123 and 123xyzzy).
And, as yet another aside, you should avoid gets() like the plague. It, like the "naked" scanf("%s"), is not suitable for user input, and opens your code to buffer overflow problems. In fact, unlike scanf(), there is no safe way to use gets(), which is undoubtedly why it has been removed from C11, the latest standard. A more robust user input function can be found here.
There's also a large class of addresses for which that code will fail miserably, such as:
3/28 Tivoli Rd
57a Smith Street
Flat 2, 12 Xyzzy Lane