C programming - integers and characters

C programming - integers and characters - c

I ripped this from an ebook on C programming.
I understand that ASCII representations of the characters '0' and '9' are integers, so I understand the compatibility with the integer array. I am simply not sure how the shown output is computed? There input is the code itself.
What does this statement mean?
++ndigit[c-'0'];
So, is the program essentially checking if the input is one of the first 10 installments of of the ASCII code table?
ASCII CODE

No, it doesn't.
c - '0' subtracts the (not necessarily ASCII) character code of the character 0 from that of c. This will yield a number between 0 and 9 if c is a digit. Then, the resulting integer is used to index the zero-initialized ndigit array using the [] operator, and the prefix increment operator (++) is then used to increment the element at that particular index.
By the way, the code is erroneous at multiple places. I suggest you switch to another book because this one appears to be either outdated and/or encouraging the use of several types of bad programming practice.
First, main() doesn't have a return type, which is an error. It needs to be declared as int main() or int main(void) or int main(int, char **). Older compilers had the bad habit of assuming an implicit int return type if it was omitted, but this behavior is now deprecated.
Second, it would be better to initialize the ndigit array, like this:
int ndigit[10] = { 0 };
The for loop is superfluous because we can have initialization; it's also less readable than the initialization syntax, and it's also dangerous: the author doesn't calculate the count of the array using sizeof(ndigits) / sizeof(ndigits[0]), but he hardcodes its length, which may cause a buffer overrun when the length of the array is changed (decreased) and the hard-coded length value in the for loop is forgotten about.

The program computes the number of times a digit between 0 and 9 was introduced as input, how many white spaces and how many other characters were in the input.
++ndigit[c-'0'];
'0' - as integer is the ASCII code for 0.
c - is the read character (its ASCII code)
c - '0' = the actual digit (between 0 and 9) represented by the ASCII code c.
For example '3'(ASCII) would be 3(digit=integer) + '0'(ASCII)
So that's how you obtain the index in the array for your digit and you increment the number of times that digit showed up.

Related

What happens when we make an array defined using characters instead of integers in C?

This is a code I have used to define an array:
int characters[126];
following which I wanted to get a record of the frequencies of all the characters recorded for which I used the while loop in this format:
while((a=getchar())!=EOF){
characters[a]=characters[a]+1;
}
Then using a for loop I print the values of integers in the array.
How exactly is this working?
Does C assign a specific number for letters ie. a,b,c, etc in the array?

What happens when we make an array defined using characters instead of integers in C?
Let's be sure we are clear: you are using integer values returned by getchar() as indexes into your array. This is not defining the array, it is just accessing its elements.
Does C assign a specific number for letters ie. a,b,c, etc in the array?
There are no letters in the array. There are ints. However, yes, the characters read by getchar() are encoded as integer values, so they are, in principle, suitable array indexes. Thus, this line ...
characters[a]=characters[a]+1;
... reads the int value then stored at index a in array characters, adds 1 to it, and then assigns the result back to element a of the array, provided that the value of a is a valid index into the array.
More generally, it is important to understand that although one of its major uses is to represent characters, type char is an integer type. Its values are numbers. The mapping from characters to numbers is implementation and context dependent, but it is common enough for the mapping to be consistent with the ASCII code that you will often see programs that assume such a mapping.
Indeed, your code makes exactly such an assumption (and others) by allowing only for character codes less than 126.
You should also be aware that if your characters array is declared inside a function then it is not initialized. The code depends on all elements to be initially to zero. I would recommend this declaration instead:
int characters[UCHAR_MAX + 1] = {0};
That upper bound will be sufficient for all the non-EOF values returned by getchar(), and the explicit zero-initialization will ensure the needed initial values regardless of where the array is declared.

I have realized the charecter set that can function as an input for getchar() is part of the ASCII table and comes under an int. I used the code following to find that out:
#include <stdio.h>
int main(){
int a[128];
a['b']=4;
printf("%d",a[98]); //it is 98 as according to the table 'b' is assigned the value of 98
}
following which executing this code i get the output of 4.
I am really new to coding so feel free to correct me.

Character values are represented using some kind of integer encoding - ASCII (very common), EBCDIC (mostly IBM mainframes), UTF-8 (backward-compatible to ASCII), etc.
The character value 'a' maps to some integer value - 97 in ASCII and UTF-8, 129 in EBCDIC. So yes, you can use a character value to index into an array - arr['a']++ would be equivalent to arr[97]++ if you were using ASCII or UTF-8.
The C language does not dictate this - it's determined by the underlying platform.

Clear explanation of ndigit[c - '0']

I've finally started to read the K&R, and I've just arrived to the Array part.
But in the example of this section, there's a piece of code I really don't understand completely, so I'd like to ask you for a clear explanation of it, as I wanna understand every concept of C as I know it's the fundamental for a fast learning of C++.
By the way I already have a decent knowledge of JAVA, hope this will help you in your explanation setup.
Question:
In this piece of code ndigit[c - '0'] I don't understand what it's trying to do, I know from other Stack Overlfow questions that 0 should refer to the ASCI standards and should be 48, but still don't understand what c and that 0 have in relationship.

Short answer:
by ndigit[c-'0'] you are not doing normal arithmetic e.g [5+5] = [10]. Rather you are first converting your characters into character code. Then doing arithmetic with character code.
To say simply- we represent five in english with this "5" sign. Other languages have their own sign for five i.e 五 is five in japanise. Likewise Computer represent five with 53(assuming ascii, ascii is like a language). So with ndigit[c-'0'], its first being converted to character code then doing arithmetic.
Long answer:
Lets go through ndigit[c - '0'] first.
ndigit = name of the array
[ ] = index, used for specifying total elements number.
c - '0' = arithmetic operation. c is a variable. - is minus.And
'0' is not 0 but 48.
now let me add additional code from the from the book -
int ndigit[10];
...//fill in the array with 0s
while((c = getchar()) != EOF)
if(c >= '0' && c <= '9')
++ndigit[c - '0']; //<== unable to understand this part
here something to note of is getchar(). getchar() return int type data. even though it return int type but it wont return the character rather it will return character code. Let me give a example-
#include <stdio.h>
int main(){
int c;
c = getchar();
printf("%d\n", c);
}
output:
#mix:~
$ cc test.c
#mix:~
$ ./a.out
5 ;wrote 5 in terminal
53 ; printed 53 instead of 5
5 is the character. And 53 is the character code of 5.
Now lets back to our main topic ndigit[c - '0']. c is getting a value from getchar(). and getchar() reads from input. lets say the input is 5 . Now due to getchar() function behaviour instead of 5, c will contain 53. So
ndigit[c - '0'] == `ndigit[53 - '0']` != `ndigit[5 - '0']`
Also notice we are not using 0. Rather '0'.
means the arithmetic -
ndigit[53 - '0']
=ndigit[53] ;**wrong**
using '0' means we meant to use character code(used single quote). Like described above '0' = 48 (according to ascii). so
ndigit[53 - '0']
= ndigit[53 - 48]
= ndigit[5]
Now we get back our character which was read by getchar(). But why should we get back our character? Will ndigit[c] instead of ndigit[c - '0'] work?
ndigit[c] wont work because at the start of our code we wrote ndigit[10] . Our ndigit[10] array can hold 10 elements at max. As a result c cant be greater than 10 or ndigit[53] is invalid as its size is 53 and surpassing 10 . Thats why we use ndigit[c - '0'] to do character code subtraction and get a value under 10.
IF still unclear, search and learn about the following-
charater encoding
array in c

In this code c is a char, presumably representing a digit. In C a char is an integral type, so you can perform arithmetic operations on them.
Digits are encoded with numbers from a consecutive range: if the code for '0' is k, the code for '1' is k+1, the code for '2' is k+2, and so on. That is why by subtracting '0' from a character representing a digit you get the numeric value of that digit.
For example, by subtracting '5'-'0' you get a numeric 5 instead of character '5'.
If you make an array ndigit[10], then ndigit[c - '0'] lets you access an array element corresponding to the digit. This can be used, for example, to count the number of different digits in the input.

As you said '0' is equal to 48 (assuming ASCII encoding). Thus the other digits are equal to 49 through 57 respectively. So '1' is equal to 49, '2' to 50 etc. Thus '1' - '0' is equal to '49 - 48', which is 1 and '2' - '0' is equal to '50 - 48', which is 2 and so on.
In other words c - '0' converts a digit like '5' to its integer equivalent (which would be 5 for '5').

Differences between int/char arrays/strings

I'm still new to the forum so I apologize in advance for forum - etiquette issues.
I'm having trouble understanding the differences between int arrays and char arrays.
I recently wrote a program for a Project Euler problem that originally used a char array to store a string of numbers, and later called specific characters and tried to use int operations on them to find a product. When I used a char string I got a ridiculously large product, clearly incorrect. Even if I converted what I thought would be compiled as a character (str[n]) to an integer in-line ((int)str[n]) it did the exact same thing. Only when I actually used an integer array did it work.
Code is as follows
for the char string
char str[21] = "73167176531330624919";
This did not work. I got an answer of about 1.5 trillion for an answer that should have been about 40k.
for the int array
int str[] = {7,3,1,6,7,1,7,6,5,3,1,3,3,0,6,2,4,9,1,9};
This is what did work. I took off the in-line type casting too.
Any explanation as to why these things worked/did not work and anything that can lead to a better understanding of these ideas will be appreciated. Links to helpful stuff are as well. I have researched strings and arrays and pointers plenty on my own (I'm self taught as I'm in high school) but the concepts are still confusing.
Side question, are strings in C automatically stored as arrays or is it just possible to do so?

To elaborate on WhozCraig's answer, the trouble you are having does not have to do with strings, but with the individual characters.
Strings in C are stored by and large as arrays of characters (with the caveat that there exists a null terminator at the end).
The characters themselves are encoded in a system called ascii which assigns codes between 0 - 127 for characters used in the english language (only). Thus "7" is not stored as 7 but as the ascii encoding of 7 which is 55.
I think now you can see why your product got so large.
One elegant way to fix would be to convert
int num = (int) str[n];
to
int num = str[n] - '0';
//thanks for fixing, ' ' is used for characters, " " is used for strings
This solution subtracts the ascii code for 0 from the ascii code for your character, say "7". Since the numbers are encoded linearly, this will work (for single digit numbers). For larger numbers, you should use atoi or strtol from stdlib.h

Strings are just character arrays with a null terminating byte.
There is no separate string data type in c.
When using a char as an integer, the numeric ascii value is used. For example, saying something like printf("%d\n", (int)'a'); will result in 97 (the ascii value of 'a') being printed.
You cannot use a string of numbers to do numeric calculations unless you convert it to an integer array. To convert a digit as a character into its integer form, you can do something like this:
char a = '2';
int a_num = a - '0';
//a_num now stores integer 2
This causes the ascii value of '0' (48) to be subtracted from ascii value '2' (50), finally leaving 2.

char str[21] = "73167176531330624919"
this code is equivalent to
char str[21] = {'7','3','1','6','7','1','7','6','5',/
'3','1','3','3','0','6','2','4','9','1','9'}
so whatever stored in str[21] is not numbers, but the char(their ASCII equivalent representation is different).
side question answer - yes/no, the strings are automatically stored as char arrays, but the string does has a extra character('\0') as the last element(where a char array need not have such a one).

Looping through an array in C

Just wondering if someone could explain this to me? I have a program that asks a user to input a sentence. The program then reads the user input into an array and changes all of the vowels to a $ sign. My question is how does the for loop work? When initialising char c = 0; does that not mean that the array element is an int? I can't understand how it functions.
#include <stdio.h>
#include <string.h>
int main(void)
{
char words[50];
char c;
printf("Enter any number of words: \n");
fgets(words, 50, stdin);
for(c = 0; words[c] != '\n'; c++)
{
if(words[c] =='a'||words[c]=='e'||words[c]=='i'||words[c]=='o'||words[c]=='u')
{
words[c] = '$';
}
}
printf("%s", words);
return 0;
}

The code treats c as an integer variable (in C, char is basically a very narrow integer). In my view it would be cleaner to declare it as int (perhaps unsigned int). However, given that words is at most 50 characters long, char c works fine.
As to the loop:
c = 0 initializes c to zero.
words[c] != '\n' checks -- right at the start and also after each iteration -- whether the current character (words[c]) is a newline, and stops if it is.
c++ increments c after each iteration.

An array is like a building, you have several floors each one with a number.
In the floor 1 lives John.
In floor 2 lives Michael.
If you want to go to Jonh apartment you press 1 on the elevator. If you want to go to Michael's you press 2.
Thats the same with arrays. Every position in the array stores a value, in this case a letter.
Every position has a index associated. The first position is 0.
When you want to access a position of the array you use array[position] where position is the index in the array that you want to access.
The variable c holds the position to be acessed. When you do words[c] you're acctualy accessing the cnt position in the array and retrieving its value.
Supose the word is cool
word[1] results in o,
word[0] results in c
To determine the end of the word, a the caracter \n is set at the last position of the array.

Not really, char and int are implicitly converted.
You can look at a char in this case as a smaller int. sizeof(char) == 1, so it's smaller than an int, that's probably the reason it was used. Programatically, there's no difference in this case, unless the input string is very long, in which case the char will overflow before an int does.

Number literals (such as 0 in your case) are compatible with variables of type char. In fact, even a character literal enclosed in single quotes (for example '\n') is of type int but is implicitly converted to a char when assigned or compared to another char.
Number literals are interchangeable with character literals, as long as the former do not exceed the range of a character.
The following should result in a compiler warning:
char c = 257;
whereas this will not:
char c = 127;

A char is C is an integral type as is short, int, long, and long long (and many other types):
It is defined as the smallest addressable unit on the machine you are compiling on and will usually be 8 bits which means it can hold values -128 to 127. And an unsigned char can hold values 0 - 255.
It works as an iterator in the above since it will stop before 50 all the time and it can hold values up to 127. Whereas an int type can usually hold values up to 2,147,483,647, but takes up 4 times the space in the machine as an 8 bit char. An int is only guaranteed to be at least 16 bits in C which means values between −32,768 and 32,767 or 0 - 6,5535 for an unsigned int.
So your loop is just accessing elements in your array, one after the other like words[0] at the beginning to look at the first character, then words[1] to look at the next character. Since you use a char, which I'm assuming is 8 bits on your machine as that is very common. Your char will be enough to store the iterator for your loop until it gets above 127. If you read in more than 127 characters (instead of just 50) and used a char to iterate you would run into weird problems since the char can't hold 128 and will loop around to -128. Causing you to access words[-128] which would most likely result in a Segmentation Fault.

Usage of function putc

I am working on a C program that I did not write and integrating it with my C++ code. This C program has a character array and usage putc function to print the content of it. Like this:
printf("%c\n","01"[b[i]]);
This is a bit array and can have either ASCII 0 or ASCII 1 (NOT ASCII 48 and 49 PLEASE NOTE). This command prints "0" and "1" perfectly. However, I did not understand the use of "01" in the putc command. I can also print the contents like this:
printf("%d\n",b[i]);
Hence I was just curious. Thanks.
Newbie

The "01" is a string literal, which for all intents and purposes is an array. It's a bit weird-looking... you could write:
char *characters = "01";
printf("%c\n", characters[b[i]]);
or maybe even better:
char *characters = "01";
int bit = b[i];
printf("%c\n", characters[bit]);
And it would be a little easier to understand at first glance.

Nasty way of doing the work, but whoever wrote this was using the contents of b as an array dereference into the string, "01":
"foo"[0] <= 'f'
"bar"[2] <= 'r'
"01"[0] <= '0'
"01"[1] <= '1'
your array, b, contains 0s and 1s, and the author wanted a way to quickly turn those into '0's and '1's. He could, just as easily have done:
'0' + b[i]
But that's another criminal behavior. =]

The String "01" is getting cast into a character array (which is what strings are in C), and the b[i] specifies either a 0 or a 1, so the "decomposed" view of it would be.
"01"[0]
or
"01"[1]
Which would select the "right" character from the char array "string". Note that this is only possible C due to the definition that a string is a pointer to a character. Thus, the [...] operation becomes a memory offset operation equal to the size of one item of the type of pointer (in this case, one char).
Yes, your printf would be much better, as it requires less knowledge of obscure "c" tricks.

This line is saying take the array of characters "01" and reference an array element. Get that index from the b[i] location.
Thus "01"[0] returns the character 0 and "01"[1] returns the character 1

Do the statement you understand.
Simplifying the other one, by replacing b[i] with index, we get
"01"[index]
The string literal ("01") is of type char[3]. Getting its index 0 or 1 (or 2) is ok and returns the character '0' or '1' (or '\0').

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight