what does_is_drive_letter macro do? - c

This macro is found in "dosname.h" from coreutils.
# define _IS_DRIVE_LETTER(C) (((unsigned int) (C) | ('a' - 'A')) - 'a' \
<= 'z' - 'a')
Could someone explain it for me?

It checks, if the given character is a valid letter to be used to name a drive (e.g. the "C" in "C:\Something" is usually used to name the main hard drive) in MS-DOS-based operating systems.
It's an efficient way to check, if the character matches the regular expressions [a-zA-Z] (English alphabet letters, both upper-case and lower-case).
It does so by first eliminating the differences between lower-case and upper-case letters using the bitwise "or" operation |. In the general case a conditional addition would be necessary: Add the distance between the lower-case 'a' and the upper-case 'A', if the ASCII code is below the one of 'A'. But since in ASCII the difference between a lower-case character and its corresponding upper-case value is always exactly 32 (= 2^5 = 100000b) and the range of lower-case letters is smaller than 32, we see that the only difference in the binary representation of a lower-case letter and its corresponding upper-case letter is the 6th bit, e.g. 1000001b == A and 1100001b == a. So we can convert upper-case letters to lower-case letters by setting that 6th bit using the | with 32 (== 'a' - 'A') as second operand. This will do weird stuff to all other characters that aren't letters, but it won't ever convert them to a letter. This trick works more efficiently (without having to use a condition).
Then it is checked, if the character (that is now lower-case, if it was an upper-case letter before) is between 'a' and 'z'. This is done by subtracting 'a' from the character and checking, if the result is smaller than the number of characters. That works because in ASCII the English characters of the same case are defined as a continuous sequence.

# define _IS_DRIVE_LETTER(C) (((unsigned int) (C) | ('a' - 'A')) - 'a' \
<= 'z' - 'a')
('a' - 'A') is the lowercase flag.
(C) | ('a' - 'A') set the lowercase flag in C. (now C can be between 'a' and 'z' but can be anything else also).
'z' - 'a' compute the range between letters 'a' and 'z' (so 25, for letter 'a'(0) to letter 'z'(25)).
The comparison <= checks to verify is 'C' converted to lowercase and offset to 0 beginning at letter 'a' is between 0 and 25, so between letters 'a' and 'z' (so, is it a valid drive letter for MS-DOS).

Related

wht do this mean "0123456789abcdef"[num % base]

"0123456789abcdef"[num % base]
Putting num%base = 0
"0123456789abcdef"[0]
It gives 48 but how.
When i use 0 as index than it should give me 0.
"0123456789abcdef" is a string literal, which defines an array filled with those characters (plus a terminating null character). [num % base] is a subscript operator with the subscript num % base.
So "0123456789abcdef"[num % base] selects one of the characters from the array. It is intended to select the character for the last digit of num when it is represented as a numeral in base base.
When num % base is zero, it selects the “0” character. Your C implementation uses ASCII codes for some characters. The ASCII code for “0” is 48.

Wrapping a value in C

I really wanted to know how do you wrap a value or a letter in C. Say if I were to increment 'z', how can I make it turn to 'a' and vice versa? The same goes with integers. If I have 9, how can I make it turn to 0 upon incrementation?
The operator you are looking for is modulo %.
Wrapping integers is straight forward:
int my_int = /* some value */
int wrapped_my_int = my_int % 10;
To wrap ascii characters is a little more complicated because the start value is not 0. Just need to rebase to 0 first and then back to 'a' based:
char my_char = /* some value */
char wrapped_my_char = (my_char - 'a') % ('z' - 'a' + 1) + 'a';

What does [c - '0'] mean in array? [duplicate]

This question already has answers here:
Why does subtracting '0' in C result in the number that the char is representing?
(8 answers)
Closed 1 year ago.
On page 24 of K and R - C programming book,
there is this code from a program to count digits and other input.
while((c = getchar()) != EOF)
if(c >= '0' && c <= '9')
++ndigit[c - '0'];
I am not understanding the last line of the code. What does [c - '0'] mean?
Entire program from the book: https://imgur.com/a/4WhIOsz
Every character has a numeric value; that means if you input the character "0" you're not actually inputting the number "0". You're inputting the character "0" which its value is 48 in the ASCII table (it may be different in other character sets...)
so when you input the character "0" you get the actual number zero in the following (According to the ASCII table):
c - '0' means c - 48
'0' - '0' means 48 - 48 = 0
'1' - '0' means 49 - 48 = 1
'2' - '0' means 50 - 48 = 2
and so on...
notice that the code is designed to work with different character sets. not just ASCII.
Search "ASCII table" in google to see the chart in order to understand it better.
'0' is the ASCII character 0, with a decimal value of 48. (See https://www.asciitable.com for a full listing.)
Since the numerals 0-9 are all guaranteed by the C standard to have consecutive values (in this case, decimal values from 48 to 57), by subtracting the character 0 from the input character, you arrive at the corresponding integer value, that you can then use in further processing.
'0' - '0' = 0
'1' - '0' = 1
'2' - '0' = 2
...
'9' - '0' = 9
In this case, the further processing is then used to index into the ndigit array.
C language mandates the code representation for numbers to be consecutive. That means that even if you do not use ASCII (EBCDIC used to be another common charset) you can be sure that the code for '5' will be the code for '0' + 5.
So the idiom c - '0' is guaranteed to be portable across any conformant system, while c - 48 (or c - 0x30) will only work on ASII (or ASCII derivatives like latin1, cp1252, utf8, etc.) systems.

can someone please tell why " -'0' "is being done in the fifth line of following function [duplicate]

This question already has answers here:
Why does subtracting '0' in C result in the number that the char is representing?
(8 answers)
Convert char to int in C and C++
(14 answers)
Closed 4 years ago.
m=1e9 + 7;
inline ll rem(char s[],ll m)
{
ll sum=0 , i;
for(i=0;S[i]!='\0';i++)
{
if(sum>=m)
sum %= m;
sum=(sum * 10 + S[i] - '0');
}
return sum%m;
}
here S[i] is a string of integer characters. My question is
what does -'0' does here, also can a character (here S[i]) be automatically converted to integer form is the above
sum=(sum * 10 + S[i] - '0');
equation.
First, you have to remember that characters in C are represented as tiny integers corresponding to the character's value in the machine's character set, which is typically ASCII.
For example, 'A' n ASCII is 65, and '0' is 48.
So if you're converting a string of digits to an integer, you want to do something like
int digit = c - 48;
That converts '0' to 0, '1' to 1, etc.
But that magic number 48 is mystifying, and it's theoretically also wrong on a machine using a character set other than ASCII. So the easier (because you don't have to remember that value 48), self-documenting (as long as your reader understands the idiom), and more portable way is to do
int digit = c - '0';
This works because, as I said, '0' is 48 in ASCII. But, more importantly, even on a non-ASCII machine, '0' is whatever value the character '0' has in that machine's character set, so it's always the right value to subtract, no matter what kind of machine you're using.

Converting string to a number

I came across this C program:
int main() {
printf("Enter your address, (e.g. 51 Anzac Road) ");
gets(address);
number = 0;
i = 0;
while (address[i] != ' ') {
number = number * 10 + (address[i] - 48);
i++;
}
}
I understand number = number * 10 + (address[i] - 48); is to get the number from input, but can anybody explain to me how this works? How does that produce the number from the input?
C requires the digits 0 through 9 to be stored contiguously, in that order, in the execution character set. 48 is the ASCII value of '0', so, for instance:
'3' - 48 == 3
for any digit.
ASCII is not required for C, so better is:
'3' - '0'
because while 48 is right for ASCII, '0' is by definition right for any character set.
If address contains "456 ", then:
when i == 0 and number == 0, number * 10 + (address[0] - 48) equals 0 * 10 + 4, or 4.
when i == 1, number * 10 + (address[1] - 48) is 4 * 10 + 5, or 45.
when i == 2, number * 10 + (address[2] - 48) is 45 * 10 + 6, or 456
and you're done.
Never use gets(), it's dangerous, and isn't even part of C anymore.
In ASCII, the digit characters '0' through '9' occupy code points 48 through 57 (i hex, 0x30 through 0x39) so, to turn a digit character into a value, you just subtract 48.
As an aside, you should really subtract '0' since the standard doesn't guarantee ASCII, though it does guarantee that the digit characters are contiguous and ordered. C under z/OS, for example, uses EBCDIC which places the digits at code points 0xf0 through 0xf9.
The loop itself is a simple shift-and-add type, to create a number from multiple digit characters. Say you have the string "123", and number is initially zero.
You multiply number (zero) by ten to get zero then add digit character '1' (49) and subtract 48. This gives you one.
You then multiply number (one) by ten to get ten and add digit character '2' (50), again subtracting 48. This gives you twelve.
Finally, you multiply number (twelve) by ten to get a hundred and twenty then add digit character '3' (51) and subtract 48. This gives you a hundred and twenty three.
There are better ways to do this in the C standard library, atoi or the more robust strtol-type functions, all found in stdlib.h. The latter allow you to better detect if there was "rubbish" at the end of the number, for assistance with validation (atoi cannot tell the difference between 123 and 123xyzzy).
And, as yet another aside, you should avoid gets() like the plague. It, like the "naked" scanf("%s"), is not suitable for user input, and opens your code to buffer overflow problems. In fact, unlike scanf(), there is no safe way to use gets(), which is undoubtedly why it has been removed from C11, the latest standard. A more robust user input function can be found here.
There's also a large class of addresses for which that code will fail miserably, such as:
3/28 Tivoli Rd
57a Smith Street
Flat 2, 12 Xyzzy Lane

Resources