getop() function K&R book p 78 - c

I'm studying K&R book. Currently i'm reading function getop() at p.78.
I do understand the code but i need clarifications about 2 things.
The code of getop() is as follows:
int getch(void);
void ungetch(int);
/* getop: get next character or numeric operand */
int getop(char s[])
{
int i, c;
while ((s[0] = c = getch()) == ' ' || c == '\t')
;
s[1] = '\0';
if (!isdigit(c) && c != '.')
return c; /* not a number */
i = 0;
if (isdigit(c)) /* collect integer part */
while (isdigit(s[++i] = c = getch()))
;
if (c == '.') /* collect fraction part */
while (isdigit(s[++i] = c = getch()))
;
s[i] = '\0';
if (c != EOF)
ungetch(c);
return NUMBER;
}
My question is about: s[0] in:
while ((s[0] = c = getch()) == ' ' || c == '\t')
The idea behind the while loop is to skip spaces and horizontal tab, so why are we saving 'c' in s[0]? Why the authors didn't simply write:
while (c= getch() == ' ' || c == '\t')
We are not going to use spaces and tabs later on, why do we need to save c in s[0] for? What is the need for s[0] here?
My second question is about:
s[1] = '\0';
Why are we assigning '\0' (end of string) to s[1] here?
I have read some of the previous answers posted on stackoverflow.com about it but i'm not totally convinced!
The accepted answer about the above question is: "Because the function might return before the remaining input is read, and then s needs to be a complete (and terminated) string."
Ok. But what if input has one white space at the beginning and followed by an operand or operator? In this case, s[1] = '\0' will close the string too early? isn't it?

In answer to your first question, the assignment to s[0] in this case is a convenient coding shortcut. The value of c is copied to s[0] for every character read by getch(), regardless of whether it will be used or discarded. If it is to be discarded, no big deal; it will be overwritten on the next iteration of the while() loop. If it is to be used, then it has already been copied into its necessary location in the destination array s[].
In answer to your second question,
But what if input has one white space at the beginning and followed by
an operand or operator?
Note that the previous while() loop prevents white space characters (spaces and tabs) from appearing in s[0] after exit from the loop. Therefore, after execution of
s[1] = '\0';
the s[] string will consist of a single character that is neither a space nor a tab, followed by a string terminator.
In the next statement
if (!isdigit(c) && c != '.')
return c; /* not a number */
the function will return if the character is anything but a digit or a decimal point. This is why it was necessary to terminate the string.

But what if input has one white space at the beginning and followed by an operand or operator? In this case, s[1] = '\0' will close the string too early? isn't it?
Nope,
i = 0;
if (isdigit(c)) /* collect integer part */
while (isdigit(s[++i] = c = getch()))
This makes sure, that if there is something to be read, it will get overwritten on \0, as i=0 and s[++i] would mean, storing in s[1], which contains the \0

for your first question about: s[0] in:
while ((s[0] = c = getch()) == ' ' || c == '\t')
because the saving 'c' in s[0] help to storing first number in advanced so that we can start our next code from simply i equal to 1.
i = 0;
if (isdigit(c)) /* collect integer part */
while (isdigit(s[++i] = c = getch()))
the above code is used for storing next string character which is start from index i = 1
About your second question :
we can not do
s[0] = '\0';
because at that time we already stored first number in string at s[0]
see
(s[0] = c = getch())

The answers given here are already good, though i would like to add one more point on the 2nd question.
"Ok. But what if input has one white space at the beginning and followed by an operand or operator? In this case, s[1] = '\0' will close the string too early? isn't it?"
In this case we do not care about the string at all(it would be overwritten anyway if a number is encountered) because the string is used only if a decimal number is encountered , rest of the characters such as '+' or '-' or '\n' are directly returned.

Related

why is getchar() inserting additional characters in my input

I want my code to receive an endless stream of input until the user (or text file) returns EOF. The problem is that in both input cases the first character registered is some additional one.
Case in point - the input was Applepie, why is the first character registered 'p'?
Code:
string = (char *) malloc(allocated);
while (c != EOF){
c = getchar();
if (counter >= allocated -1){
allocated *= 2;
new_buffer = realloc(string, allocated );
if (! new_buffer){
return (char *) string;
}
string = new_buffer;
}
/*store the char*/
string[counter] = c;
counter++;
/*printf("current char: %c\n", string[counter]);*/
}
string[counter] = '\0';
return (char*)string;
}
Preliminary: why there is a -1 in your string
This has been discussed extensively in comments. It is because getString() writes the EOF indicating the end of the input into the string. It should avoid doing that by checking for EOF before putting the character just read in the string.
Unexpected first character
Case in point - the input was Applepie, why is the first character registered 'p'?
Because of this:
int counter = 1; /*prints new line every 10 characters*/
[...]
char c = string[counter];
printAndSum() then proceeds to print c, provided that its value is not -1, so if it prints anything at all then the first character it prints will be the one at index 1. After that, it updates c on each loop iteration with
c = string[index++];
, which has the effect of starting at the beginning of the string on the second iteration. There are various ways you could fix that, but one of the least disruptive would be to change the initialization of c to
char c = string[0]; // or string[index]
, and to change the update of c to use preincrement instead of postincrement:
c = string[++index];

What index the i-th element of any array refer to in c?

i've stumbled upon an assignment in a piece of code, in which we add a null character to an array line[i] = '\0' to explicitly declare it's a string, the latter rose in me the question: as the null character is exactly at the end of any string, well how do we know that adding \0 to the i-th element of line would be added to the last position in it, in my eyes i in line, could be any element with any index ,so do the i-th index of any array refer to the last position or what ?
Code like this usually appears just after code that has used the same index variable i to construct the string.
For example:
char string[10];
int i = 0;
string[i++] = 'a';
string[i++] = 'b';
string[i++] = 'c';
string[i] = '\0';
Or, more realistically:
char line[100];
int i = 0;
int c;
while((c = getchar()) != EOF && c != '\n')
line[i++] = c;
line[i] = '\0';
This second example reads one line of text from standard input and stores it in the line array as a proper, null-terminated string.
(In real code, of course, you also have to worry about the possibility of overflowing the array.)
To make things really clear, you can imagine writing code like this more explicitly, with a separate variable to hold the length of the string. For example:
i = 0;
while((c = getchar()) != EOF && c != '\n')
line[i++] = c;
int length_of_string = i;
line[length_of_string] = '\0';
When you see that line
line[length_of_string] = '\0';
it makes it more obvious that the \0 terminator is being stored at a spot in the string that someone has actually determined to be the length of the string. But as you can see, since the variable length_of_string has just been set based on the value of i after the loop, it's perfectly equivalent to just write
line[i] = '\0';
There's sort of an academic-sounding term called loop invariant, but code like this ends up being a perfect example of what it means, and it's worth thinking about for a moment. A loop invariant is something you can say about a loop that's true at all times, for every trip through the loop, at the beginning or the end or in the middle of the loop. For the read-a-line loop I've just shown, the loop invariant is:
i always contains the number of characters that have been read into the string line.
Let's look at all of the ways this "loop invariant" is true. To make things very clear, I'm going to write the loop again, with some comments to make it clear what I mean by the "top" and "bottom" of the loop:
i = 0;
while((c = getchar()) != EOF && c != '\n') {
/* top of loop */
line[i++] = c;
/* bottom of loop */
}
Before the loop runs, the string is empty, so i starts out as 0.
At the top of the loop, before the line[i++] = c step, i still has the value it did last time through the loop.
In the middle of the loop, the line line[i++] = c simultaneously stores the character c into the line array (and at the right spot!), and increments i.
At the bottom of the loop, after the line[i++] = c step, i contains the updated number of characters in the string.
After the loop (and this was your question), since i still contains the number of characters that have been read and stored into line, it's precisely the right index to use to null-terminate the string, with the line line[i] = '\0'.
The other thing that's worth paying attention to here is that the line in the middle of the loop, that simultaneously stores the next character into the line array, at the right spot, and increments i at the same time, is, once again:
line[i++] = c;
My question for you to think about is, what if I had instead written
line[++i] = c; /* WRONG */
It can be hard, at first, to really understand the difference between i++ and ++i, to understand why you would care, to understand why you might pick one over the other. This code here, I think, is an example that really makes the point.
(For extra credit, think about this: What if arrays in C were 1-based, instead of 0-based? What parts of the read-one-line loop would change, and is it still possible to maintain all facets of the loop invariant?)
If you have an already existing string and you just want it to be terminated with \0 on the last+1 index with a correct value, write a function to determine this position. E.g. check the char on the current position and check if the next position contains a legit value. You can then go trough the whole string and determine the last position, then return a pointer to the last position+1 and set your terminator. If you work with a variety of predefined strings this would be the most scalable approach for me.

How can I make my program recognize tabs and turn them into normal spaces?

I am currently in the process of learning/ re-learning C. I had some trouble learning it last year in school. I am using a book and one of the exercises is asking to get excessive spaces and turn them into just one space. The problem is that I can't make it recognize tabs. I have looked for some on here already but they all deal with arrays. I can't use arrays. This is what I have so far.
int main()
{
int c;
c = getchar();
while (c != EOF)
{
if (c == ' ')
{
putchar(c);
for (c = getchar(); c == ' '; c = getchar())
;
}
else if (c == '\t')
putchar(' ');
putchar(c);
c = getchar();
}
}
So basically, the code starts off with getting a char value and putting it into "c". then while c is not EOF, if c equals the ASCII value of SPACE it is going to output it and enter a for loop. The for loop tests for additional spaces. c equals the following char and while c equals the ASCII value of SPACE, it the for loop will do nothing and it will iterate until all additional spaces are gone. The if statement works perfectly by the way.
Then I go into an else if which I am certain is wrong but it's my latest attempt. So for this I said else if c equals the ASCII value of tab (is that a thing? If not that might be what my error is) put down a space by using putchar(' '). I feel like that command might be wrong as well. After that statement it then exits the conditional, puts out the value then c now equals a new char and the loop continues.
Thanks!
EDIT: So right after posting this I realized, at least I think, my error is the putchar(c) at the bottom which is still printing out the tab regardless? Although I am still not sure how to approach the problem. One more thing is that these are the only commands I can use. The book still assumes I don't know how arrays and such work in C yet.
Although its a bit unclear on how subsequent spaces/tabs are to be handled, consider simply saving the previous character.
int previous = EOF;
int ch;
while ((ch = fgetchar()) != EOF) {
if (ch == '\t') { // or if (isblank(ch)) ...
ch = ' ';
}
if (previous != ' ') {
putchar(ch);
}
previous = ch;
}

C duplicate character,character by character

I am trying to find if two characters following by one another are the same character. ie if i have this input "The oldd woman" I want to print next to second D "duplicate". Here is my code but I can't find out how can I do this coding. Here is my code:
void main()
{
char ch,ch2;
ch = getchar(); // getting the line of text
int i = 1;
ch2 = ch;
while (ch != '\n') // the loop will close if nothing entered
{
if (ch == ch2[&i]) {
printf("%c-duplicate", ch);
}
i++;
if (ch == 'A' || ch == 'E' || ch == 'I' || ch == 'O' || ch == 'U') { // checking for uppaercase vowel
printf("%c-upper case vowel", ch);
}
else if (ch == 'a' || ch == 'e' || ch == 'i' || ch == 'o' || ch == 'u') { // checking for lowecase vowel
printf("%c-lower case vowel", ch);
}
else if (ispunct(ch)) { // checking for punctuation
printf("%c-punctuation", ch);
}
else
putchar(ch);
printf("\n");
ch = getchar();
}
printf("\n");
}
}
I am setting the character to another variable and then checking them with the first if statement. The program should run character by character.
Below is an example that I believe is what you intended. c is the current character being read (note: it is type int) and c1 is the previous character read (initialized to -1 to insure it does not match the first test).
While you can compare A and E..., the string library provides strchr that easily allows testing if a single character is included within a larger string.
Rather than call printf for each duplicate or vowel, etc.., why not use sprintf to build a string containing all the criteria applicable to any one character. That way you only call printf once at the end of each iteration (depending on whether you are printing all or just those that match criteria). s is used as the buffer that holds the match information for each character, offset is simply the number of characters previously written to s. (you should check to insure you don't exceed the number of characters available in s (but that was unneeded here)
It is unclear whether you want to print each character in the input string, or just those you have matched so far. The code below with only output those characters that match one of the criteria (if no argument is given). If you would like to see all characters, then pass any value as the first argument to the program.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAXL 32
int main (int argc, char **argv) {
char *uvowel = "AEIOU", *lvowel = "aeiou";
int c, c1 = -1;
while ((c = getchar()) != '\n' && c != EOF) {
char s[MAXL] = ""; /* description buffer */
int offset = 0; /* offset */
if (c == c1) /* new == old */
offset = sprintf (s, " duplicate");
if (strchr (uvowel, c)) /* upper-case? */
sprintf (s + offset, " upper-case vowel");
else if (strchr (lvowel, c)) /* lower-case? */
sprintf (s + offset, " lower-case vowel");
else if (ispunct (c)) /* punctuation? */
sprintf (s + offset, " punctuation");
else if (argc > 1) /* if printing all */
sprintf (s + offset, " other");
if (*s) /* print c and s */
printf (" %c - %s\n", c, s);
c1 = c; /* save last char read */
}
return 0;
}
Example Use/Output
$ echo "The oldd woman" | ./bin/classdup
e - lower-case vowel
o - lower-case vowel
d - duplicate
o - lower-case vowel
a - lower-case vowel
Pass any value as the first argument to print all characters:
$ echo "The oldd woman" | ./bin/classdup 1
T - other
h - other
e - lower-case vowel
- other
o - lower-case vowel
l - other
d - other
d - duplicate other
- other
w - other
o - lower-case vowel
m - other
a - lower-case vowel
n - other
Duplicate vowels
$ echo "womaan" | ./bin/classdup
o - lower-case vowel
a - lower-case vowel
a - duplicate lower-case vowel
Look things over and let me know if you have any questions. There are many ways to do this, this is just one that seemed close to your intent.
(note: you will want to pass the -Wno-unused-parameter compiler option to eliminate the warning about argv being unused, or just do a stub test somewhere in the code, e.g. if (argv[1]) {})
Worth answering to try and help understand variables and pointers I think.
To try and answer . . . as simply as possible . . . NOTE #1: the main problem/issue is that ch and ch2 are declared as single char variables. They can be 'a' or 'b' or '\n' or 0x20 or any other single char. They are NOT char arrays or pointers. You have a comment where you read one char 'ch = getchar() // getting the line of text', that comment is incorrect (although you do have good comments in showing what you are thinking in your example), anyway, this 'ch = getchar()' just gets a single char. Later you treat ch2 as an array.
char ch,ch2;
. . . then later:
if (ch == ch2[&i]) { // ouch, does that even compile? yes. oh dear. how do we explain this one?!
ouch! This is wrong because it treats ch2 as an array/pointer.
The way your loop is working now is ch2 is set once to the very first char read. And it never changes.
It may compile okay BUT does it give a warning? Actually, in fairness to you. I do not get a warning. gcc 4.8.2 is perfectly happy with ch2 being a char and doing (ch == ch2[&i]). Now, ch2[&i] may be syntactically valid code, it will compile ok. It will even run ok. BUT what does it mean? Is it semantically valid? Let's forget about this weird thing until later.
Note that you can have c compiling fine BUT it can be quite full of pointer errors and can crash/hang. So . . . be careful :-).
Try making a change like this:
ch = getchar(); // does not get the line of text, just gets one char
. . .
ch2 = 0; // for the first iteration of loop
if (ch == ch2) { // change from ch2[&i] - wow :-) very confused!
. . .
ch2 = ch; // set ch2 to the last char read before reading new
ch = getchar(); // read one new char
This makes the code work just using 2 chars. ch and ch2. You do not use i. You do not use an array or string or char pointer.
NOTE #1.1: ch2[&i] compiles and runs. BUT IT IS WRONG, OHHHH SOOOOOO WRONG. And weird. How does array access work in c? The syntax of c[&i] is "correct" (maybe depends on compiler). BUT please do not use this syntax! What does it mean? It is semantically dubious. Looks like perhaps intent was to use array of chars together with i. Quick example showing assigning and reading from array of chars correctly:
char s[100]; // string of 100 chars, accessing index below 0 and above 99 is bad
i=0;
s[i]='H'; // assign char 'H' to first letter of string (s[0])
i++; // increment i, i is now 2.
s[i]='i';
i++;
s[i]=0; // end string
printf("s[0]=%c s[1]=%c s[2]=%02x string:%s",s[0],s[1],s[2],s);
NOTE #1.2: ch2[&i] compiles and runs. How and why does it compile?
&i means the pointer to the variable i in memory
%p in printf will show pointer value
So try adding this to code sample:
printf("%p %p %p\n", &ch, &ch2, &i);
// ch2[i] will not compile for me, as ch2 is not an array. syntax error
// ch2[&i] DOES compile for me. But . . what is the value ?
// What does it mean. I do not know! Uncharted territory.
printf("ch2[&i]:%p:%02x\n",&i,ch2[&i]);
printf("ch2[&ch]:%p:%02x\n",&ch,ch2[&ch]);
printf("ch2[&ch2]:%p:%02x\n",&ch2,ch2[&ch2]);
I'm getting something like this each run the pointers change:
ch2[&i]:0xbfa0c54c:bfa0de71
ch2[&ch]:0xbfa0c54a:08
ch2[&ch2]:0xbfa0c54b:00
The discovered explaination:
Usually we declare an array e.g. 'int array[100];' where 100 is the size of array. array[0] is first element and array[99] is the last. We index the array using integer. Now, all arrays are pointers. SO *array is the same as array[0]. *(array+1) is the same as array[1].
So far so good.
Now *(1+array) is also the same as array[1].
We can say int i=7; And use array[i] or *(array+7) or *(7+array) OR i[array] to show the 7th element of the array. i[array] to any programmer should look very VERY WROOOONG (not syntactically wrong BUT philosophically/semantically/morally wrong!)
Okay. Fine. Calm down. Jeez. Now with 'char ch2;' ch2 is a single char. It is NOT an array. ch2[&i] works(works as in compiles and sometimes/mostly runs ok!!!) because the last(WROOOONG) i[array] notation is valid. Looking at TYPES is interesting:
i[array] === <type int>[<type int *>].
ch2[&i] === <type char>[<type int *>].
C happily and merrily casts char to int and int can be added to pointer or used as pointer. SO FINALLY IN CONCLUSION: the syntax ch2[&i] evaluates to an integer at offset of: &i(pointer to integer i) PLUS value of char ch2. There is no good reason to use this syntax! It's DANGEROUS. You end up accessing a memory location which may or may not be valid and as your pointer is pointer to single variable the location in not valid in reference to any other values.
See here: Why 'c[&i]' compiles while 'c[i]' doesn't?
NOTE #2: watch the bracketing {}. while and main closing } and indentation do not match in example. The program functions okay with this error. The putchar(ch) runs as part of the last else. The commands after that run at end of while loop.
NOTE #3 main should return int not void main optionally takes nothing '()' or '(int argc, char**argv)'.
Per other suggestions: Change ch == ch2[&i] This makes no sense here
Since you set ch = ch2 before the loop then the line
if(ch == ch2 )
(After you fix it) will always evaluate to true the first time around
Your else, is very confusing, if you have more than one line of code there you have to put in brackets
Keep in mind, when you enter your input you are actually submitting two characters eg ("e" AND "\n") because you press enter after you type the character and that counts
Try a little harder, meaning put an error message, put down the results of your attempts at a solution. It helps both us and you. It seems a little like you wrote this and just immediately want a fix. Programming gets harder, if you can't work through problems (with suggestions) then it's going to hurt a lot more, but it doesn't have to.
For a quick an dirty proof of concept, add another ch=getchar(); immediately after the one under your else. Note that the code below should run but doesn't do exactly what you want yet, you'll need to do some further debugging.
Edit 10/19/2016
Fixed the char2 issue you guys pointed out
Moved ch2 = ch to above the lines which get the new character
#include <stdio.h>
#include <ctype.h>
int main(){
char ch,ch2;
ch = getchar(); // getting the line of text
int i = 1;
ch2 = 0;
while (ch != '\n') // the loop will close if nothing entered
{
if (ch == ch2) {
printf("%c-duplicate", ch);
}
i++;
if (ch == 'A' || ch == 'E' || ch == 'I' || ch == 'O' || ch == 'U') { // checking for uppaercase vowel
printf("%c-upper case vowel", ch);
}
else if (ch == 'a' || ch == 'e' || ch == 'i' || ch == 'o' || ch == 'u') { // checking for lowecase vowel
printf("%c-lower case vowel", ch);
}
else if (ispunct(ch)) { // checking for punctuation
printf("%c-punctuation", ch);
}
printf("\n");
ch2 = ch;
ch = getchar();
ch = getchar();
}
printf("\n");
return 0;
}

misunderstanding the working mechanism of getint function

In K&R, at the beginning of chapter 5, is presented the function getint that
performs free-format input conversion by breaking a stream of charachters into integer values, one integer per call.
The function is pretty simple, but i can't actually understand why is c pushed back in to the buffer in the first if-statement. Because of this, every time you call getint it will do the same thing. Because it will read the next charachter in the buffer, which is c.
Is this strategy intended to be a kind of security mechanism?
int getint(int *pn) {
int c, sign;
while(isspace(c = getch()))
;
if(!isdigit(c) && c != EOF && c != '+' && c != '-') {
ungetch(c);
return 0;
}
sign = (c == '-') ? -1 : 1;
if(c == '+' || c == '-')
c = getch();
for(*pn = 0; isdigit(c); (c = getch()))
*pn = 10 * *pn + (c - '0');
*pn *= sign;
if(c != EOF)
ungetch(c);
return c;
}
The code you ask about causes getint to stop if the next character in the stream is not part of a numeral (or a space) because it is not a digit or sign character.
The notion is that as long as you call getint while there are acceptable numerals in the input, it will convert those numerals to int values and return them. When you call getint while there is not an acceptable numeral next in the input, it will not perform a conversion. Since it is not performing a character, it leaves the character it is not using in the stream.
A proper routine would return an error indication so the caller can easily determine that getint is not performing a conversion for this reason. As it is, the caller cannot distinguish between getint returning 0 for a “0” in the input and getint returning 0 for a non-numeral in the input. However, as this is only tutorial code, features like this are omitted.

Resources