C array push, why subtract '0'? - c

I'm learning C from The C Programming Language, Second Edition. In it, there is the following code:
#include <stdio.h>
/* count digits, white space, others */
main() {
int c, i, nwhite, nother;
int ndigit[10];
nwhite = nother = 0;
for (i=0; i<10; ++i) {
ndigit[i] = 0;
}
while ((c = getchar()) != EOF) {
if (c >= '0' && c <= '9') {
++ndigit[c-'0'];
}
else if (c == ' ' || c == '\n' || c == '\t') {
++nwhite;
}
else {
++nother;
}
}
printf("digits =");
for (i=0; i<10; ++i) {
printf(" %d", ndigit[i]);
}
printf(", white space = %d, other = %d\n", nwhite, nother);
}
Now, I can understand what this code is doing. It is counting how many times each digit appears in the input, and then putting that count into the index of the digit, ie 11123 = 0 3 1 1 0 0 0 0. I'm just curious about 1 line of it:
++ndigit[c-'0'];
This adds 1 to the index c of the array, but why does it subtract 0 from c? Surely that's pointless, right?

The expression c - '0' is converting from the character representation of a number to the actual integer value of the same digit. For example it converts the char '1' to the int 1
I think it makes a bit more sense to look at complete examples here
int charToInt(char c) {
return c - '0';
}
charToInt('4') // returns 4
charToInt('9') // returns 9

It's not subtracting zero... It's subtracting the ASCII value of the character '0'.
Doing so gives you an ordinal value for the digit, rather than its ASCII representation. In other words, it converts the characters '0' through '9' to the numbers 0 through 9, respectively.

Related

How do I get my integer arithmetic into a long long?

As part of exercise 2-3 in Ritchie and Kernighan's C programming language, I've written a program that converts hexadecimal inputs into decimal outputs. I want it to be able to handle larger numbers, but it seems to be doing integer arithmetic somewhere. When you enter something like "DECAFCAB" it spits out a large negative int. I figured out that I need to add the "LL" suffix to my literals, which I did, but it's still not working. Any help please? Sorry if this is a dumb question or a typo, but I've been at it for an hour and can't figure it out. :(
#include <stdio.h>
#define MAX_LINE 1000
void getline(char s[])
{
int i;
char c;
for(i = 0; i < MAX_LINE-1 && (c=getchar()) != EOF && c != '\n'; ++i)
s[i] = c;
s[i] = '\0';
printf("\n%s", s);
}
long long htoi(char s[]) // convert the hex string to dec
{
long long n = 0;
int i = 0;
if(s[i] == '0') // eat optional leading Ox or OX
++i;
if(s[i] == 'x' || s[i] == 'X')
++i;
while(s[i] != '\0')
{
if((s[i] >= '0' && s[i] <= '9'))
n = 16LL * n + (s[i] - '0'); // here is the arithmetic in question
else if(s[i] >= 'A' && s[i]<= 'F')
n = 16LL * n + (s[i] - 'A' + 10LL);
else if(s[i] >= 'a' && s[i] <= 'f')
n = 16LL * n + (s[i] - 'a' + 10LL);
else {
printf("\nError: Encountered a non-hexadecimal format: the '%c' character was unexpected.", s[i]);
printf("\nHexadecimal numbers can begin with an optional 0x or 0X only, and contain 0-9, A-F, and a-f.\n\n");
return -1;
}
++i;
}
return n;
}
main()
{
char input[MAX_LINE];
long long hex_output;
while(1){
getline(input);
hex_output = htoi(input);
if(hex_output >= 0)
printf("\nThe value of the hexadecimal %s is %d in decimal.\n\n", input, hex_output);
}
}
You told printf to expect an int when you made the placeholder %d. To make it expect (and therefore read the entirety of a) long long, modify it to %lld.
The reason it looks like a plain int is that with varargs functions like printf, it doesn't know what the argument sizes are, and the format string is the only way to figure it out. When you say to expect plain int, it reads sizeof(int) bytes from the argument, not sizeof(long long) bytes (it's not necessarily byte-oriented, but that's how much data is read), and (on a little endian system with 4 byte int and 8 byte long long) you see (roughly) the result of the argument with the top 4 bytes masked off.
The problem you are experiencing comes from treating a (conventionally) "unsigned" hexadecimal integer value as "signed". Resorting using to a larger built-in data type will get you past the problem with going from 31 to 32 bits, but this masks the actual problem. (If you extend to 64 bits, you will encounter the same problem and be back asking, "why doesn't this work.")
Better is to write code that doesn't require ever wider registers. There will always be a maximum width, but the answer to this OP is to use an "unsigned long".
#include <stdio.h>
unsigned long htoi( char s[] ) { // convert the hex string to dec
unsigned long n = 0;
int i = 0;
if(s[i] == '0') // eat optional leading Ox or OX
++i;
if(s[i] == 'x' || s[i] == 'X')
++i;
for( ; s[i]; i++ ) {
unsigned int dVal = 0; // don't copy/paste complex statements.
if((s[i] >= '0' && s[i] <= '9'))
dVal = s[i] - '0'; // simple
else if(s[i] >= 'A' && s[i]<= 'F')
dVal = s[i] - 'A' + 10; // simple
else if(s[i] >= 'a' && s[i] <= 'f')
dVal = s[i] - 'a' + 10; // simple
else {
// less verbose
printf("\nError: '%c' unexpected.", s[i] );
return 0; // NB: Notice change!!
}
n = (16 * n) + dVal; // simple...
}
return n;
}
int main() {
// simplified, stripping out user input.
char *hexStr = "0xDECAFCAB";
unsigned long hex_output = htoi( hexStr );
// Notice the format specifier to print an ordinary (unsigned) long
printf( "\nThe value of the hexadecimal %s is %u in decimal.\n\n", hexStr, hex_output );
return 0;
}
The value of the hexadecimal 0xDECAFCAB is 3737844907 in decimal.
When K&R wrote the original book, there was no such thing as "long long", but there was "unsigned long".

Parse char from int input in C

I am trying to display a matrix by taking input from a user. Here, the input is a lower triangular matrix and the user may enter the 'x' character which has to be replaced with INT_MAX.
The below program is not working correctly as the output is not matching the expected one.
#include <limits.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int read_int() {
char input[30] = {0};
int number;
for (int i = 0; i < sizeof(input) - 1; i++){
char c = (char)getc(stdin);
if (c == 'x' || c == 'X')
return INT_MAX;
if (c < '0' || '9' < c){
if (i == 0) continue;
input[i] = 0;
return atoi(input);
}
input[i] = c;
}
input[29] = 0;
return atoi(input);
}
int main() {
int N = read_int();
int matrix[N][N];
memset(matrix, 0, N * N * sizeof(int));
for(int i = 0; i < N; ++i){
for(int j = 0; j <= i; ++j){
int distance = read_int();
matrix[i][j] = distance;
matrix[j][i] = distance;
}
}
printf("\n");
for(int i = 0; i < N; ++i){
for(int j = 0; j < N; ++j){
printf("%d\t", matrix[i][j]);
}
printf("\n");
}
printf("\n");
return 0;
}
For input:
3
x 2
x x 2
The Above program prints:
3 2147483647 2147483647
2147483647 32 2147483647
2147483647 2147483647 32
which is not expected
It should be
3 2147483647 2147483647
2147483647 2 2147483647
2147483647 2147483647 2
Update: The answers below, doesn't work for all case [except accepted one]
One such case is -
5
10
50 20
30 5 30
100 20 50 40
10 x x 10 50
it just keeps on taking input
Your logic for skipping whitespace is broken because when you eventually assign a character after skipping position 0, you will always be writing a "wanted" character at position i. That means anything already in position 0 remains.
In your case, it's undefined behavior because input[0] was originally filled with 3 on the first input where no whitespace was skipped, but in subsequent calls to your function it is uninitialized. You then go on to write a 2 into input[1] and thus by pure chance (your array from previous calls has not been overwritten on the stack and the stack is the same), you end up with the string "32" sitting in input.
What you need to do is have some way to count the actual required characters so that you write them into the array at the correct position. One naive approach would be:
int pos = 0;
for(...) {
// other logic...
// Actually write a character we want
input[pos++] = c;
}
Another way that is more like how integer input works is:
int c;
int pos = 0;
while(pos < sizeof(input) - 1 && (c = getc(stdin)) != EOF)
{
if (c == 'x' || c == 'X')
return INT_MAX;
else if (pos == 0 && isspace(c))
continue;
else if (!isdigit(c) && !(pos == 0 && (c == '-' || c == '+')))
break;
input[pos++] = c;
}
input[pos] = '\0';
return atoi(input);
I think the problem is this part of the loop:
if (c < '0' || '9' < c){
if (i == 0) continue;
input[i] = 0;
return atoi(input);
}
If you have entered 3enterx 2 as your input, then the 3 gets read successfully, and the the x gets returned as INT_MAX as intended, but in the next call to read_int, the next character in the input sequence is a space (i.e. c == ' '), and therefore it branches here. Since i == 0 at this point, the loop continues, which means i is incremented to 1, but this also means that input[0] is never changed. Most likely, input[0] contains the same value from the previous call to read_int (3), but in any case, it's undefined behaviour.
As a quick alternative, you can simply change this condition to:
if (c != ' ' && (c < '0' || '9' < c)){
This will mean input[0] will be set to a space character, which atoi will ignore.
An alternative solution could be to read in an entire line at once and tokenise the line.

Why have I to write c - '0' instead of just c? [duplicate]

This question already has answers here:
Use of s[i] - '0' [duplicate]
(3 answers)
Closed 5 years ago.
Hey I can't understand why my code doesn't write when I put just ++ndigit[c] (instead of ++ndigit[c -'0'], then with ++nchar[c] it's ok.
If you have any tuto I'll be really interested !
#include <stdio.h>
int main()
{
int c, i, y, ns;
int ndigit[10];
int nchar[26];
ns = 0;
for(i = 0; i >= 0 && i<= 9; ++i) {
ndigit[i] = 0;
}
for(y = 'a'; y <= 'z'; ++y) {
nchar[y] = 0;
}
while((c = getchar()) != EOF) {
if(c == ' ' || c == '\t') {
++ns;
}
if(c >= 'a' && c <= 'z') {
++nchar[c];
}
if(c >= '0' && c <= '9') {
++ndigit[c];
//++ndigit[c-'0'];
}
if(c == '\n') {
printf("chiffres: ");
for(i=0;i<10;++i) {
printf("%d:%d ", i, ndigit[i]);
}
printf("lettres: ");
for(y='a';y<='z';++y) {
printf("%d:%d ", y, nchar[y]);
}
printf("space: %d\n", ns);
}
}
}
Actually when you set the variable to c='0', it means that the value of c is now the ascii value of '0' and that is = 48.
Since you are setting the value of c to 48 but the array size is 10, your code will get a runtime exception because you are trying to access an index that doesn't even exist.
Remember when you use '0' it means character. So setting this value to an int variable makes the value equals to the ascii value of that character. Instead you can use c=0 directly.
Because the character '4' (for example) is usually not equal to the integer 4. I.e. '4' != 4.
Using the most common character encoding scheme ASCII, the character '4' has the value 52, and the character '0' has the value 48. That means if you do e.g. '4' - '0' you in practice to 52 - 48 and get the result 4 as an integer.

Check whether the input is digit or not in C programming

I am currently reading this book: The C Programming Language - By Kernighan and Ritchie (second Edition) and one of the examples I am having trouble understanding how to check whether the input is digit or not. The example is on Page 22, explaining under the array chapter.
Below is the example.
#include <stdio.h>
/* count digits, white space, others */
main()
{
int c, i, nwhite, nother;
int ndigit[10];
nwhite = nother = 0;
for (i = 0; i < 10; ++i)
{
ndigit[i] = 0;
}
while ((c = getchar()) != EOF)
{
if (c >= '0' && c <= '9')
{
++ndigit[c-'0'];
}
else if (c == ' ' || c == '\n' || c == '\t')
{
++nwhite;
}
else
{
++nother;
}
printf("digits =");
for (i = 0; i < 10; ++i)
{
printf(" %d", ndigit[i]);
}
printf(", white space = %d, other = %d\n",nwhite, nother);
}
For this example, what confused me is that the author mentioned that the line ++ndigit[c-'0'] checks whether the input character in c is a digit or not. However, I believe that only the if statement ( if (c>= '0' && c<= '9') ) is necessary, and it will check if c is digit or not. Plus, I do not understand why [c-'0'] will check the input(c) is digit or not while the input variable (c) is subtracted from the string-casting ('0').
Any suggestions/explanations would be really appreciated.
Thanks in advance :)
The if statement checks whether the character is a digit, and the ++ndigit[c-'0'] statement updates the count for that digit. When c is a character between '0' and '9', then c-'0' is a number between 0 and 9. To put it another way, the ASCII value for '0' is 48 decimal, '1' is 49, '2' is 50, etc. So c-'0' is the same as c-48, and converts 48,49,50,... to 0,1,2...
One way to improve your understanding is to add a printf to the code, e.g. replace
if (c >= '0' && c <= '9')
++ndigit[c-'0'];
with
if (c >= '0' && c <= '9')
{
++ndigit[c-'0'];
printf( "is digit '%c' ASCII=%d array_index=%d\n", c, c, c-'0' );
}
I would try to explain with an example
suppose the input is abc12323
So the frequency of 1=1
frequency of 2=2
frequency of 3=2
if (c >= '0' && c <= '9') //checks whether c is a digit
++ndigit[c-'0'];
Now if you do printf("%d",c) then you will get the ascii value of the
character
for c='0' the ascii value will be 48,c='1' ascii value will be 49 and it goes
till 57 for c='9'.
In your program you are keeping a frequency of the digits in the input so you need to update the index of the digit in the array every time you get it
if you do ndigit[c]++ then it will update ndigit[48] for c='0',ndigit[49] for c='1'
So either you can do ndigit[c-'0']++ as ascii value of '0'=48 in decimal
or you can simply do ndigit[c-48]++ so for c='0' ndigit[0] is updated,c=1'
ndigit[1] is updated
you can check the re factored code here http://ideone.com/nWZxL1
Hope it helps you,Happy Coding

How does getchar_unlocked() work?

My question is based on a CodeChef problem called Lucky Four.
This is my code:
int count_four() {
int count = 0;
char c = getchar_unlocked();
while (c < '0' || c > '9')
c = getchar_unlocked();
while (c >= '0' && c <= '9') {
if (c == '4')
++count;
c = getchar_unlocked();
}
return count;
}
int main() {
int i, tc;
scanf("%d", &tc);
for (i = 0; i < tc; ++i) {
printf("%d\n", count_four());
}
return 0;
}
Let's say I make a slight change to count_four():
int count_four() {
int count = 0;
char c = getchar_unlocked();
while (c >= '0' && c <= '9') {
if (c == '4')
++count;
c = getchar_unlocked();
}
while (c < '0' || c > '9') // I moved this `while` loop
c = getchar_unlocked();
return count;
}
This is my output after moving the while loop below the other one:
0
3
0
1
0
instead of:
4
0
1
1
0
The input used to test the program:
5
447474
228
6664
40
81
Why is this happening? How do getchar() and getchar_unlocked() work?
getchar_unlocked is just a lower level function to read a byte from the stream without locking it. In a single thread program, it behaves exactly like getchar().
Your change in the count_four function changes its behavior completely.
The original function reads the standard input. It skips non digits, causing an infinite loop at end of file. It then counts digits until it gets a '4'. The count is returned.
Your version reads the input, it skips digits, counting occurrences of '4', it then skips non digits, with the same bug on EOF, and finally returns the count.

Resources