Role of else if (state == OUT) in C word count - c

I'm learning C following the book "The C Programming Language" - K&R;
I found myself stuck in the understanding of the role of else if (state == OUT):
#define IN 1
#define OUT 0
main ()
{
int c, nw, state;
state = OUT;
nw = 0;
while ((c = getchar()) != EOF) {
if (c == ' ' || c == '\n' || c == '\t')
state = OUT;
else if (state == OUT) {
state = IN;
++nw;
}
}
printf ("%d", nw);
}
In the word counting program, I mean, in the way I read it there must be something I am doing wrong, because I fail to understand why this makes the difference, from simple else, since state = OUT is already default condition; but in practice I observe that it does, because if I write just else
then the statement state = IN; ++nw will count characters and not words;
from the way I read it, the loop is saying that for each input character (stored in the variable c), if it is a space, a new line, or a tab, then it's value is zero, everything else, will be 1, so I fail to see how it is grouping characters into words, because state was OUT already before the loop, so how is else if (state == OUT) getting the program to put the characters into one word?
I have been thinking whole night about it but I couldn't find an answer in my thoughts, nor in the book

To count words, we want to increment the count, with ++nw, only once per word.
If we write:
if (c == ' ' || c == '\n' || c == '\t')
state = OUT;
else {
state = IN;
++nw;
}
then ++nw will be executed every time c is not one of the white-space characters space, new-line, or tab. However, by writing:
if (c == ' ' || c == '\n' || c == '\t')
state = OUT;
else if (state == OUT) {
state = IN;
++nw;
}
then ++nw will not be executed when we are already in a word (state is IN). It will be executed only when we are out of a word (state is OUT) and are going into a new word (because c is not one of the white-space characters). Thus, ++nw is executed only when we start a new word, not for each character in the word.

cccc cc c c cccccc cc
^ ^ ^ ^ ^ ^
| | | | | |
These are the two ways to count words - only the transitions count.
Geometrically this is easy and intuitive: every c that has a white space to its left.
But if you step (once, blindly) through each element, you only have to store the last character, which corresponds to the "left". This minimal memory makes this algorithm a state machine.
When you hit a letter, you only count it if you come from OUTside a word, but at the same time set the state to INside, so the next letter does not get counted. The next space then will set the trigger to "OUT".

This statement is checking character (c) values:
if (c == ' ' || c == '\n' || c == '\t')
and if any character (c) is a space or newline or tab then the state value is changed to OUT.
This statement else if (state == OUT) is checking the value of state whenever a character (c) is not a space, newline or tab.
The assumption of the program is that all words are separated by a space, newline or tab character and is at least one character long.
Punctuation characters such as :;,.?! etc. are included as legitimate characters for words.
Use of getchar ensures that at least one character is read.
The program will complete when it detects an End Of File character i.e. no further input.
The word count is increased on the first character that is not space, newline or tab.

This program counts words by modeling a very simple state machine with two states:
OUT: We have not read a character that is part of a word
IN: We have read a character that is part of a word
The number of words is determined by the number of times we transition from the OUT to the IN state. Given an output like "This is a test", the transitions work like this:
This is a test
^+++v--^+v^v----^+++
where + represents the IN state, - represents OUT state, ^ represents the transition from OUT to IN, and v represents the transition from IN to OUT. The number of words is equal to the number of times we see ^.
In order to know whether or not we're making the transition from the OUT to the IN state, we have to check the current value of state when we see a non-whitespace character; hence, the
else if (state == OUT)
as the alternate branch instead of a plain else. Otherwise we'd increment nw for every non-whitespace character, rather than on the transition from OUT to IN.

Related

How can I make my program recognize tabs and turn them into normal spaces?

I am currently in the process of learning/ re-learning C. I had some trouble learning it last year in school. I am using a book and one of the exercises is asking to get excessive spaces and turn them into just one space. The problem is that I can't make it recognize tabs. I have looked for some on here already but they all deal with arrays. I can't use arrays. This is what I have so far.
int main()
{
int c;
c = getchar();
while (c != EOF)
{
if (c == ' ')
{
putchar(c);
for (c = getchar(); c == ' '; c = getchar())
;
}
else if (c == '\t')
putchar(' ');
putchar(c);
c = getchar();
}
}
So basically, the code starts off with getting a char value and putting it into "c". then while c is not EOF, if c equals the ASCII value of SPACE it is going to output it and enter a for loop. The for loop tests for additional spaces. c equals the following char and while c equals the ASCII value of SPACE, it the for loop will do nothing and it will iterate until all additional spaces are gone. The if statement works perfectly by the way.
Then I go into an else if which I am certain is wrong but it's my latest attempt. So for this I said else if c equals the ASCII value of tab (is that a thing? If not that might be what my error is) put down a space by using putchar(' '). I feel like that command might be wrong as well. After that statement it then exits the conditional, puts out the value then c now equals a new char and the loop continues.
Thanks!
EDIT: So right after posting this I realized, at least I think, my error is the putchar(c) at the bottom which is still printing out the tab regardless? Although I am still not sure how to approach the problem. One more thing is that these are the only commands I can use. The book still assumes I don't know how arrays and such work in C yet.
Although its a bit unclear on how subsequent spaces/tabs are to be handled, consider simply saving the previous character.
int previous = EOF;
int ch;
while ((ch = fgetchar()) != EOF) {
if (ch == '\t') { // or if (isblank(ch)) ...
ch = ' ';
}
if (previous != ' ') {
putchar(ch);
}
previous = ch;
}

How to check and validate user input is one of two valid choices

I have the following code asking for user input either A or P. I have the same sort of setup for hour and minutes, where hour would be between 1 and 12 and minutes would be between 0 and 59. That part of my code is thoroughly working.
My issue is that I don't know how to check what the timePeriod variable is and ensure that it is either A or P and to print an error message and prompt again if it is anything else including lowercase a and p. User input has to be in uppercase and ONLY A or P.
I've only put the function code here. I added the clean_stdin code as well so the while statement inside getTimePeriod might be easier to understand. As I said before, I'm using a similar set up for both the hour and minutes and that's working.
char getTimePeriod(void)
{
char timePeriod, term;
while ( (((scanf("%c%c",&timePeriod,&term)!=2 || term !='\n') && clean_stdin()) || timePeriod != "A" || timePeriod != "P") && printf("Invalid value entered. Must be A or P. Please re-enter: ") );
return timePeriod;
}
int clean_stdin()
{
while (getchar()!='\n');
return 1;
}
Edit: For those getting their panties in a twist about this being bad code, it works for me based on my assignment requirements for an Intro to C course. Hope that clarifies the noob-ness of this question as well.
Also note that
timePeriod != 'A'
does not work. I don't know why but it doesn't work.
Recommend separating user input from validation.
scanf() tries to do both at once. It is easier to handle potential wrong user input if simply a line of input is read (fgets() - standard or getline() common #Jonathan Leffler) and then parsed by various means.
// return choice or EOF
int GetChoice(const char *prompt, const char *reprompt, const char *choices) {
char buf[10];
puts(prompt);
while (fgets(buf, sizeof buf, stdin)) {
buf[strcspn(buf, "\n")] = 0; // drop potential trailing \n
char *p = strchr(choices, buf[0]);
if (p && buf[1] == '\0') {
// Could fold upper/lower case here if desired.
return *p;
}
puts(reprompt);
}
return EOF;
}
int timePeriod = GetChoice("TimePeriod A or P", "Try Again", "AP");
switch (timePeriod) {
case 'A' : ...
case 'P' : ...
default: ...
Additional checks could be added. That is the best part about rolling this off to a helper function, it can be used is multiple places in code and be improved as needed in a localized manner.
OP code comments:
It user input is not as expected, it is unclear that OP's complex while() condition will properly empty user's line of input. It certainly has trouble if EOF is encountered or if first char is a '\n'.
timePeriod != "A" as commented by #Alan Au is not the needed code. That compares timePeriod to the address of the string "A". Use timePeriod != 'A'.
clean_stdin() should be clean_stdin(void). It is an infinite loop on EOF. Consider:
int ch;
while ((ch = getchar()) != '\n' && ch != EOF);
The problem you're having is here:
timePeriod != "A" || timePeriod != "P"
First, as was mentioned before, you can't compare a character to a string. You need to use single quotes instead of double quotes. Assuming that's been corrected, this conditional will always be true since timePeriod will always either not be 'A' or not be 'P'. This needs to be a logical AND:
(timePeriod != 'A' && timePeriod != 'P')
Note also that an extra set of parenthesis were added to make sure that the order of operation in your while expression is preserved.
Regarding the "cute" comment, what that means is that cramming a bunch of statements in the while expression and leaving the body blank makes your code difficult to read and therefore more prone to bugs. Had you broken that up into multiple statements each doing one logical thing you would have been more likely to find this bug yourself. Olaf made the comment he did primarily to warn other users who come across your code about just that.

How does this else-if loop track number of words entered in C

I'm learning C using the K&R book, on a windows machine. I am trying out the program(bare bones Unix word count) which counts lines, characters, and words. Although this program correctly counts the number of characters, the no. of lines and words in my output are always 0 and 1, irrespective of what I enter. I also am somewhat stumped by one part of the program, which I'll get to next-
#include<stdio.h>
#define IN 1
#define OUT 0
int main()
{
int c,state, nc,nw,nl;
nl=nw=nc=0;
state=OUT;
while(c=getchar()!=EOF)
{
++nc;
if(c=='\n')
++nl;
if(c=='\n'||c=='\t'||c==' ')
state=OUT;
else if(state==OUT)
{
state=IN;
++nw;
}
}
printf("\n No. of characters, lines and words are : %d,%d,%d\n",nc,nl,nw);
return 0;
}
From what it looks, this program is using nc, nl and nw, respectively, to count the number of characters, lines and words entered in the input stream. My understanding of the program logic, thus far, is -
IN and OUT are two variables used to indicate the current state of the program. IN indicates that the program is currently 'inside' a word- in other words- no space, newline or tab has been encountered so far in the characters entered. Or so I think.
At the very beginning, before the while loop, the STATE is set to out. This indicates that right now, there is no word encountered.
When the while loop begins, with every character entered(unless it is EOF-Ctrl+Z), the number of character nc is incremented. In the first if statement, if the character is a newline '\n', nl is incremented. This should keep track of the number of lines encountered.
The second if statement is used to keep track of whether the program is currently inside a word or not, by setting the STATE to 0, whenever there is a blank, newline or tab. I've understood the logic thus far.
However, I am completely stumped coming to the else-if. Here, the program checks if STATE is OUT. Now, STATE will be out in two conditions:
when the program runs for the first time, and STATE is set to 0 before the while loop. Example- Consider the input WORD. Here, before W is encountered, STATE is set to 0.
Now that STATE is 0, and input is W we come to the else if statement. The next input after W is O. So, STATE is set to 1(indicating the program is inside a word), and the word count is incremented.
But, since the original input was WORD, what happens when R is encountered? What is the value of STATE now? Is it still 1 because it was set to 1 inside the last else-if statement? But then again, if that is 1, there is no condition for when STATE is 1.
Lastly, it's obvious that the program is flawed in some way, because in my sample output below, the number of lines and words are always fixed(0 and 1).
hello word
good morning
^Z
No. of characters, lines and words are : 24,0,1
I understand that my question is very long, but I'm really stumped and looking for answers to two major points:
How does the else-if statement logic work.
Why is the program throwing incorrect output.
Many thanks for your help
You are getting wrong input because you are missing parentheses:
while((c=getchar())!=EOF)
^ ^
Without them you always compare the return value of getchar() with EOF and assign the result of this comparison to c. That is, c will always be either 1 or 0.
How does the else-if statement logic work.
The IF statement check if there is a new line, a space or a tabulation, to CUT a word, so if there is, it put the "state" variable to OUT.
The next loop turn, if the "c" variable is not a new line or tabulation or space, because "state" variable is OUT, the ELSE IF is called.
The ELSE IF increment the nw, because after a space a tabulation or a new line (and if it's not another one) it's a new word. And put back the "state" variable to IN, to return to the IF statement.
EXAMPLE:
"WORD" => "W" -> nc++ nw++ state=OUT, "O" -> nc++ state=IN, "R" -> nc++ state=IN, "D" -> nc++ state=IN
"WO RD" => "W" -> nc++ nw++ state=OUT, "O" -> nc++ state=IN, " " -> nc++ state=OUT, "R" -> nc++ nw++ state=IN, "D" -> nc++ state=IN
And if you want to understand easely, add just after the while statement:
while((c=getchar())!=EOF)
{
printf("number of char = %d, number of words = %d, number or lines = %d, state = %d",nc,nw,nl,state)
So you'll see what the code does after each loop turn.
Here is a very basic walk-through the fixed code. I hope it will answer all of the original questions.
The only other suggestion is to enable and check compiler warning messages, as they often have clues about potent sources of errors. In fact, gcc, and clang will warn about the original program and suggest the correct fix.
Include the standard (std) Input/Output header files
#include <stdio.h>
Use the pre-processor to define to (constant) macros, which are used to represent the state of either being IN-side a word, or OUT-side a word. The definition for "outside" means that the current character (c) is a white space in this program.
White space being a character that does not display anything, but may modify the output, such as moving to the next character location (space), to the next tab stop (tab), or advancing to the next line (newline).
#define IN 1
#define OUT 0
Being a simple program, the program is located in the main function. That is okay for a short program like this one, but not a good idea in larger, more complex programs.
int main(int argc, char* argv[])
{
int c; /* This is a 'current' character being read from input */
int state; /* The state of being either IN- or OUT-side of a word. */
int nc; /* Count of number of characters read */
int nw; /* Count of number of "words" */
int nl; /* Line count */
nl = nw = nc = 0; /* Initialize the counts to zero */
state = OUT; /* Begin with the word 'state' being OUT-side of a word */
Get a single character from standard input (stdin), assign it to
the variable c. This is done first because of the (added) parenthesis enclosing the expression c = getchar(). Then the result of this assignment (which is equal to c) is compared to EOF (end of file).
While the contents of c are not equal to EOF, the while loop's body executes repeatedly, until the getchar() does assign an EOF to c.
while ( EOF != (c = getchar()) )
{
Since you have a new character increment the character count, nc, variable by one.
++nc;
If c is a newline, increment the number of lines, nl, count.
if (c == '\n')
++nl;
If the variable c is a newline, tab, or space, then sent the state variable to OUT, because they indicate that c is not part of a "word."
if (c == '\n' || c== '\t' || c == ' ') {
state = OUT;
}
If the previous if statement did not evaluate to true, follow the else statement.
The else statement consists of a second if statement which evaluates whether state is equal to OUT. If so, then execute the next block.
else if (state == OUT)
{
This block contains the two statements, set state to IN, and increment the value of nw (word count).
state = IN;
++nw;
} /* end of "else if" block */
} /* end of while loop block */
After getchar() returns an EOF (end of file), and the while loop ends, the program prints this summary output before returning zero to the parent process (don't worry about that here, it's just house-keeping) and ending the program.
printf("\n No. of characters, lines and words are : %d, %d, %d\n", nc, nl, nw);
return 0;
} /* end of main */

C Programming Language Exercise 1.5.4 (editing error found in code?)

I'm continue working with the Classic K&R Book "The C Programming Language", Second Edition.
SCENE.
I was having problems with the output of exercise in page 22 about word counting. Not getting the desired output.
QUESTION.
Finally I think that found a mistake in the code syntax ?
CODE FRON ORIGINAL PDF.
Exercise say (Copy and paste from PDF):
1.5.4 Word Counting The fourth in our series of useful programs counts lines, words, and characters, with the loose definition that a word is
any sequence of characters that does not contain a blank, tab or
newline. This is a bare-bones version of the UNIX program wc.
#include <stdio.h>
#define IN 1 /* inside a word */
#define OUT 0 /* outside a word */
/* count lines, words, and characters in input */
main()
{
int c, nl, nw, nc, state;
state = OUT;
nl = nw = nc = 0;
while ((c = getchar()) != EOF) {
++nc;
if (c == '\n')
++nl;
if (c == ' ' || c == '\n' || c = '\t')
state = OUT;
else if (state == OUT) {
state = IN;
++nw;
}
}
printf("%d %d %d\n", nl, nw, nc);
}
CHANGE NEDED TO WORK
Where Say:
if (c == ' ' || c == '\n' || c = '\t')
No need to say?
if (c == ' ' || c == '\n' || c == '\t')
Please.
If my assessment is correct. The same error appears later on page 23 which explains the line.
CONFIRMATION NEEDED.
It tells me exactly what the logic but it seems so incredible that this error is in a book like this. I need a confirmation, there will be this freaking.
Thank soo much in advance for your comments.
ADDED AT USER Rohan REQUEST:
What's difference in 2 lines? Make question more clear. – Rohan 33 mins ago.
ANSWER to Rohan:
Hello and Thanks to question. In C
x = 5 is used to assign to the variable x the value 5, for example.
However, if what you want is to check the equality symbol is correct ==
if x == 5 Say if x is equal to 5 ...
As you can see the book are assigning c = '\t'
which makes no sense and was giving me errors
"If c is equal to tab" is if (c == '\t') Where \t are the TAB key.
I hope I have clarified some doubt
And thanks for asking
You can check http://www.tutorialspoint.com/cprogramming/c_operators.htm
Where you can see:
== Checks if the values of two operands are equal or not, if yes then condition becomes true.
(A == B) is not true.
Yes, you are correct in your assessment that the = needs to be replaced with a ==.
Typos happen, even in books and even in K&R.
However, you mention that you copied the code straight from a PDF. That PDF was likely OCRed from a book and OCR software is far from perfect, so mistakes can do happen.

Counting words in a string?

Hello for this program I am supposed to count the number of words in a string. So far, I have found out how to find the number of characters in a string but am unable to figure out how to turn the letters that make a word, and count it as 1 word.
My function is:
int wordcount( char word[MAX] ){
int i, num, counter, j;
num = strlen( word );
counter = 0;
for (i = 0; i < num; i++)
{
if (word[i] != ' ' || word[i] != '\t' || word[i] != '\v' || word[i] != '\f')
{
}
}
return counter;
}
I tried some variations, but the middle part of the if statement is where I am confused. How can I count the number of words in a string? Testing for this tests if the string has multiple spaces like "Hello this is a string"
Hints only since this is probably homework.
What you're looking to count is the number of transitions between 'word' characters and whitespace. That will require remembering the last character and comparing it to the current one.
If one is whitespace and the other is not, you have a transition.
With more detail, initialise the lastchar to whitespace, then loop over every character in your input. Where the lastchar was whitespace and the current character is not, increase the word count.
Don't forget to copy the current character to lastchar at the end of each loop iteration. And it should hopefully go without saying that the word count should be initialised to 0.
There is a linux util 'wc' that can count words.
have a look (it includes some explanation and a sample):
http://en.literateprograms.org/Word_count_(C)
and a link to the source
http://en.literateprograms.org/index.php?title=Special:DownloadCode/Word_count_(C)&oldid=15634
When you're in the if part, it means you're inside a word. So you can flag this inword and look whether you change from out of word (which would be your else part) to inword and back.
This is a quick suggestion — there could be better ways, but I like this one.
First, be sure to "know" what a word is made of. Let us suppose it's made of letters only. All the rest, being punctuation or "blanks", can be considered as a separator.
Then, your "system" has two states: 1) completing a word, 2) skipping separator(s).
You begin your code with a free run of the skip separator(s) code. Then you enter the "completing a word" state which you will keep until the next separator or the end of the whole string (in this case, you exit). When it happens, you have completed a word, so you increment your word counter by 1, and you go in the "skipping separators" state. And the loop continue.
Pseudo C-like code:
char *str;
/* someone will assign str correctly */
word_count = 0;
state = SKIPPING;
for(c = *str; *str != '\0'; str++)
{
if (state == SKIPPING && can_be_part_of_a_word(c)) {
state = CONSUMING;
/* if you need to accumulate the letters,
here you have to push c somewhere */
}
else if (state == SKIPPING) continue; // unneeded - just to show the logic
else if (state == CONSUMING && can_be_part_of_a_word(c)) {
/* continue accumulating pushing c somewhere
or, if you don't need, ... else if kept as placeholder */
}
else if (state == CONSUMING) {
/* separator found while consuming a word:
the word ended. If you accumulated chars, you can ship
them out as "the word" */
word_count++;
state = SKIPPING;
}
}
// if the state on exit is CONSUMING you need to increment word_count:
// you can rearrange things to avoid this when the loop ends,
// if you don't like it
if (state == CONSUMING) { word_count++; /* plus ship out last word */ }
the function can_be_part_of_a_word returns true if the read char is in [A-Za-z_] for example, false otherwise.
(It should work If I have not done some gross error with the abetment of the tiredness)

Resources