Counting words in a string? - c

Hello for this program I am supposed to count the number of words in a string. So far, I have found out how to find the number of characters in a string but am unable to figure out how to turn the letters that make a word, and count it as 1 word.
My function is:
int wordcount( char word[MAX] ){
int i, num, counter, j;
num = strlen( word );
counter = 0;
for (i = 0; i < num; i++)
{
if (word[i] != ' ' || word[i] != '\t' || word[i] != '\v' || word[i] != '\f')
{
}
}
return counter;
}
I tried some variations, but the middle part of the if statement is where I am confused. How can I count the number of words in a string? Testing for this tests if the string has multiple spaces like "Hello this is a string"

Hints only since this is probably homework.
What you're looking to count is the number of transitions between 'word' characters and whitespace. That will require remembering the last character and comparing it to the current one.
If one is whitespace and the other is not, you have a transition.
With more detail, initialise the lastchar to whitespace, then loop over every character in your input. Where the lastchar was whitespace and the current character is not, increase the word count.
Don't forget to copy the current character to lastchar at the end of each loop iteration. And it should hopefully go without saying that the word count should be initialised to 0.

There is a linux util 'wc' that can count words.
have a look (it includes some explanation and a sample):
http://en.literateprograms.org/Word_count_(C)
and a link to the source
http://en.literateprograms.org/index.php?title=Special:DownloadCode/Word_count_(C)&oldid=15634

When you're in the if part, it means you're inside a word. So you can flag this inword and look whether you change from out of word (which would be your else part) to inword and back.

This is a quick suggestion — there could be better ways, but I like this one.
First, be sure to "know" what a word is made of. Let us suppose it's made of letters only. All the rest, being punctuation or "blanks", can be considered as a separator.
Then, your "system" has two states: 1) completing a word, 2) skipping separator(s).
You begin your code with a free run of the skip separator(s) code. Then you enter the "completing a word" state which you will keep until the next separator or the end of the whole string (in this case, you exit). When it happens, you have completed a word, so you increment your word counter by 1, and you go in the "skipping separators" state. And the loop continue.
Pseudo C-like code:
char *str;
/* someone will assign str correctly */
word_count = 0;
state = SKIPPING;
for(c = *str; *str != '\0'; str++)
{
if (state == SKIPPING && can_be_part_of_a_word(c)) {
state = CONSUMING;
/* if you need to accumulate the letters,
here you have to push c somewhere */
}
else if (state == SKIPPING) continue; // unneeded - just to show the logic
else if (state == CONSUMING && can_be_part_of_a_word(c)) {
/* continue accumulating pushing c somewhere
or, if you don't need, ... else if kept as placeholder */
}
else if (state == CONSUMING) {
/* separator found while consuming a word:
the word ended. If you accumulated chars, you can ship
them out as "the word" */
word_count++;
state = SKIPPING;
}
}
// if the state on exit is CONSUMING you need to increment word_count:
// you can rearrange things to avoid this when the loop ends,
// if you don't like it
if (state == CONSUMING) { word_count++; /* plus ship out last word */ }
the function can_be_part_of_a_word returns true if the read char is in [A-Za-z_] for example, false otherwise.
(It should work If I have not done some gross error with the abetment of the tiredness)

Related

Displaying a string one word per line using IN/OUT flag

I have written a program that first stores an arbitrary number of lines of text from the user. After that, it checks when a new word has come and if it does, then it prints it in a new line.
Below is my code:
#include<stdio.h>
#define IN 1 //inside a word
#define OUT 0 //outside a word
int main()
{
char s[100]; //storing the string entered by user
int c; //getchar reading variable
int i=0; //iterating variable
while((c=getchar())!=EOF)
{
s[i++]=c;
}
s[i]='\0'; //end of string
i=0;
int p=0; //stores the start of the word
int current=OUT; //flag to indicate if program is inside or outside a word
while(s[i]!='\0')
{
if(current==OUT) //when program is outside a word
{
if(s[i]!=' ' || s[i]!='\n' || s[i]!='\t') //word found
{
p=i; //store starting position of word
current=IN;
}
}
else if(current==IN) //program is inside a word
{
if(s[i]==' ' || s[i]=='\n' || s[i]=='\t') //end of word found
{
current=OUT; //flag now outside the word
for(int j=p;j<i;j++) //print from starting position of word
{
printf("%c",s[j]);
}
printf("\n"); //go next line
}
}
++i; //incremnent the iterator variable
}
return 0;
}
My program works well if I just enter the string in a proper manner, i.e. without any extra spaces or new lines.
But if I enter a line as follows ( notice the extra spaces and new lines):
*I am a boy
I went to Japan */
Then it prints those extra newlines and spaces along with word too, which according to me should not happen because of the IN and OUT flags.
The output is like this:
enter image description here
I request you to please help me out.
I know I can do this easily with the putchar() method checking one character at one time, but I am just curious as to what I am doing wrong in this implementation.
First bug that jumps out at me:
if(s[i]!=' ' || s[i]!='\n' || s[i]!='\t')
will always return true. You want &&, or else use a !() around the whole condition that you use the other place, for symmetry.
Or better yet, factor that out into a function, or use isspace from <ctype.h>.
Your filtering condition for determining if a character is white space is not correct. The || operator means OR. Using chained OR will allow the expression to evaluate to true every time. You need the AND operator &&. The and operator fails as soon as one operand evaluates to false, or in the case of C, 0.
Besides that, there are better ways to check for white space. One idea is using the isspace function from <ctype.h>, which accepts a character as an int, which can also be an unsigned char, and determines if that character is any of ' ', '\t', '\v', '\n' or '\r'. You can also do character checking via switch statements
switch(ch) {
case ' ':
// do something
break;
case '\n':
//do something
break;
}

explain the logic behind the structure of code needed to account for extra spaces, in order to calculate the correct average word length

This is the question on my assignment: Write a program that prompts the user to enter a sentence (assume that a sentence can have a maximum of 50 characters). It then counts the vowels and consonants in it. It also calculates the average word length of the input sentence. Word length is the total number of alphabetic characters in the sentence divided by the total number of words in it. Words are separated by one or more spaces. All the results are displayed at the end.
So far I have been able to complete all aspects of the question but I am running into a logical error on my part. When the user inputs more than a normal amount of spaces, it messes up the answer given for average word length.
Here is my code calculating average word length:
for(i = 1; sent[i] != '\0'; i++){
if( sent[i] == ' '){
++spaceCount;
}
else if((sent[i] != ' ') && (sent[i] != '\n')){
++charCount;
}
}
avgWordLength = (charCount / (spaceCount+1)) ;
Could someone help explain the logic behind the structure of code needed to account for extra spaces, in order to calculate the correct average word length
Here is a link to a previously already answered question:
Average word length for a sentence
But my school has not taught the "getchar" function yet and I would not like to use it unless I have too. To be more clear, is there away to complete the question without using the "getchar" function?
Here is an example of the problem when compiling and running
// Everything works good when
string: Thursday is ok
Average word length: 4.00 characters
// this is where my code fall apart
string: Thursday is ok
Average word length: 1.86 characters
Well, if you think about it, what you want to do is just treat any uninterrupted series of whitepace characters as one for the purpose of computing the word count. You can include ctype.h and use the isspace function to test all possible whitespace characters, or if you are supposed to do it manually, then at least check for space or tab characters (e.g. you could have a mixed sequence of spaces and tabs that should still be counted as a single (e.g. " \t \t ")
To handle multiple whitespace characters and count the sequence as one, just set a flag (e.g. ws for whitespace) and only increment spaceCount when you encounter the first whitespace, and reset the flag if another non-whitespace character is encountered.
Putting those pieces together, you could do something like the following:
int ws = 0; /* flag to treat multiple whitespace as 1 */
for(i = 0; sent[i]; i++){
if (sent[i] == ' ' || sent[i] == '\t') {
if (!ws) {
spaceCount++;
ws = 1;
}
}
else {
charCount++; /* non-whitespace character count */
ws = 0;
}
}
(note: begin your check at i = 0 to protect against Undefined Behavior in the event sent is the empty-string.)
(note2: you can check charCount before setting your first spaceCount and check ws after leaving the loop to handle leading and trailing whitespace -- and adjust spaceCount as necessary. That is left as an exercise)
Look things over and let me know if you have any further questions.
Could someone help explain the logic behind the structure of code needed to account for extra spaces, in order to calculate the correct average word length
You could use a state machine. You have two states:
1) Looking for the end of a word.
2) Looking for the end of a space sequence.
Look at the first character in the sentence. It is either the beginning of a word or a space. This tells you if you are in state 1 or 2.
If in state 1, then look for a space or the end of the sentence. If you find a space, set your state to 2.
If in state 2, then look for a non-space or the end of the sentence. If you find a non-space then set your state to 1.
counts the vowels and consonants in it. It also calculates the average word length of the input sentence.
Could someone help explain the logic behind the structure of code needed to account for extra spaces
There really is no need to count spaces. Instead all that is needed to to count the number of times a letter begins a word - it followed a non-letter - or was first character.
// pseudo code
sentence_stats(const char *s) {
vowels = 0;
consonants = 0;
word_count = 0;
previous = 0;
while (*s) {
if (isletter(*s)) { // OP to make isletter(), isvowel()
if (!isletter(previous)) {
word_count++; // start of word
}
if (isvowel(*s)) vowels++;
else consonants++;
} else if (*s == ' ') {
; // nothing to do
} else {
TBD_CODE_Handle_non_letter_non_space();
}
previous = *s;
s++;
}
average = (vowels + consonants)/word_count
}

Making a more efficient function - Word counter

I'm going through some C programming questions and I want to make sure I got the fundamentals down. Currently I'm on a word counter question:
Q: Write a function which will determine how many words are in a given string. You can assume that one or more
consecutive white spaces is a delimiter between words, and that the string you pass to your function is null terminated.
I got the thing working, but efficiency is important. I'm wondering how it can be improved. Have to use pointers and no other library besides #include(stdio.h) Thanks!
#include <stdio.h>
int word_counter(char string[])
{
//We start with first word unless we have a empty string then we have no words
int count;
if(*string!='\0'){
count=1;
}
else{
count=0;
return 0;
}
//while we dont reach the end of the string
while(*string!='\0'){
//if we detect a whitespace
if(*string==' '){
//get previous character
string--;
// If previous character is not a space we increase the count
// Otherwise we dont since we already counted a word
if(*string!=' '){
count++;
}
//return pointer to current character
string++;
}
// set pointer to next character
string++;
}
return count;
}
//just to test if it works
int main(void)
{
char str[] = "Hello World!";
printf("How many words? = %i\n", word_counter(str));
return 0;
}
Looking at your code, I see there's a special case for the initial condition of an empty string. Sometimes getting the initial condition out of the way early simplifies the rest of the algorithm, sometimes you can eliminate it by changing how you look at the problem. This time it's the second one.
If you think about this as counting the boundaries between words, the algorithm becomes simpler. There's two ways to define a word boundary, from the front, and from the back.
" Prestidigitation \n"
^ ^
front back
Are we looking for a non-whitespace character after a whitespace character? Or are we looking for a whitespace character after a non-whitespace character?
You also have code that looks backwards in the string (string--), that's often not safe because what if the string starts with whitespace? Then you've walked backwards off the string, so moving backwards should be avoided.
Finally, there's the problem of whether or not there's any whitespace at the end of the string. We'd have to special case the end of the string.
So looking at the first word boundary is the way to go: a non-whitespace character after a whitespace character. Instead of looking backwards, we'll track the state of the previous character (last_was_space below).
That's a non-whitespace character after a whitespace character. What if the string doesn't start with whitespace?
"Basset hounds got long ears."
^
What about this?
Since we have last_was_space, we can initialize it to true and pretend the start of the string starts with whitespace. This also handles leading whitespace like " this is four words".
Finally, there's more types of space than just space like tab and newline and other exotic stuff. Instead of writing if( *space == ' ' || *space == '\n' == ... ) we can use switch to make things tidy and efficient. This is one of those rare cases where you want to take advantage of its "fall through" mechanic to do the same thing for multiple cases.
#include <stdio.h>
// Note that it's `const` since we don't touch the string memory.
int word_counter(const char string[]) {
// Start with no words.
int count = 0;
// Pretend every word starts with space.
short last_was_space = 1;
// Using a for loop to make the movement of the pointer more apparent
for( ; *string!='\0'; string++ ) {
// A switch can be faster than an if/else if.
switch( *string ) {
// There's more than one type of whitespace.
// These are from isspace().
// It takes advantage of switch's fall through.
case ' ':
case '\t':
case '\n':
case '\r':
case '\v':
case '\f':
// Remember we saw a space.
last_was_space = 1;
break;
default:
if( last_was_space ) {
// Non-whitespace after space, count it
count++;
// Remember we didn't see a space.
last_was_space = 0;
}
break;
}
}
return count;
}
Normally I'd use bool from stdbool.h and isspace from ctype.h, but your exercise can only use stdio.h.

Word count in C?

I have a problem with counting words in std. I use the same method when I count words in files there works OK.
My method is as follows: We read until ctrl+d. If the next character is a line return, increase new_lines. Otherwise, we increase the words because the next method (last if) doesn't read until first space and I lost first word. In the end If the current character is a space and next element is something other than a space, increase words.
Now I'm going to explain about problem. If I have empty line program increase words but why I use second if for this. If I don't have empty lines program work.
int status_read=1;
while (status_read > 0){ // read to end of file
status_read = read(STDOUT_FILENO, buff, 9999); // read from std
for (i = 0; i < status_read ; i++) { // until i<status_read
if (buff[i] == '\n') {
new_lines++;
if (buff[i+1]!='\n')
wordcounter++;
}
if (buff[i] == ' ' && buff[i+1]!=' ')
wordcounter++;
}
}
As #FredLarson commented, you are trying to read from standard out, not standard in (that is, you should be using STDIN_FILENO, not STDOUT_FILENO).
If I have empty line program increase words but why I use second if
for this. If I don't have empty lines program work.
That's due to
if (buff[i] == '\n') {
new_lines++;
if (buff[i+1]!='\n')
wordcounter++;
}
- to solve this problem, just don't increment wordcounter here - replace the above with
if (buff[i] == '\n') ++new_lines;
Otherwise,
we increase the words because the next method (last if) doesn't read
until first space and I lost first word.
To avoid the problem of losing the first word on a line, as well as that with buff[i+1] (see M Oehm's comments above), I suggest changing
if (buff[i] == ' ' && buff[i+1]!=' ')
wordcounter++;
to
if (wasspace && !isspace(buff[i])) ++wordcounter;
wasspace = isspace(buff[i]);
- wasspace being defined and initialized to int wasspace = 1; before the file read loop.

How to erase every occurences of vowels in a string

In a schools assignment we are asked to remove every occurences of vowels from a string.
So:
"The boy kicked the ball" would result in
"Th by kckd th bll"
Whenever a vowel is found, all the subsequent characters somehow have to shift left, or at least that's my approach. Being that I just started learning C, it may very well be that it's a ridiculous approach.
What I'm trying to do is: When I hit the first vowel, I "shift" the next char ([i+1]) to the current pos (i). the shifting then has to continue for every subsequent character, so int startshift is set to 1 so the first if block excecutes on every subsequent iteration.
The first if block also test to see if the next char is a vowel. Without such a test any character preceding a vowel would "transform" to the adjacent vowel, and every vowel except the first would still be present. However this resulted in every vowel being replaced by the preceding char, hence the if else block.
Anyway, this ugly code is what I've come up with so far. (The names used for the char* pointers make no sense (I just don't know what to call them), and having two sets of them is probably redudant.
char line[70];
char *blank;
char *hlp;
char *blanktwo;
char *hlptwo;
strcpy(line, temp->data);
int i = 0;
int j;
while (line[i] != '\n') {
if (startshift && !isvowel(line[i+1])) { // need a test for [i + 1] is vowel
blank = &line[i+1]; // blank is set to til point to the value of line[i+1]
hlp = &line[i]; // hlp is set to point to the value of line[i]
*hlp = *blank; // shifting left
} else if (startshift && isvowel(line[i+1])) {
blanktwo = &line[i+1];
hlptwo = &line[i];
*hlptwo = *blanktwo;
//*hlptwo = line[i + 2]; // LAST MOD, doesn't work
}
for (j = 0; j < 10; j++) { // TODO: j < NVOWELS
if (line[i] == vowels[j]) { // TODO: COULD TRY COPY EVERYTHING EXCEPT VOWELS
blanktwo = &line[i+1];
hlptwo = &line[i];
*hlptwo = *blanktwo;
startshift = 1;
}
}
i++;
}
printf("%s", line);
The code doesn't work.
with text.txt:
The boy kicked the ball
He kicked it hard
./oblig1 remove test.txt produces:
Th boy kicked the ball
e kicked it hard
NB. I've omitted the outer while loop used for iterating the lines in the text file.
Just some food for thought, since this is homework and I don't want to spoil the fun:
You might also tackle this problem without using a second 'temp->data' buffer. If the given input string is in a modifiable memory chunk, like
char data[] = "The boy kicked the ball";
You could also write a program which maintains two pointers into the buffer:
One pointer points to the position in the string where the next vowel would need to be written; this pointer is advanced whenever a vowel was written.
The second pointer points to the position in the string where the next character to consider is read from; this pointer is advanced whenever a character is read.
If you think about it, you can see that the first pointer will not advance as fast as the second pointer (since every character is read, but not every character is written out - vowels are skipped).
If you go for this route, consider that you may need to terminate the string properly.
Try use std containers and objects
#include <iostream>
#include <string>
#include <vector>
std::string editStr = "qweertadoi";
std::vector<char> vowels{'i', 'o', 'u', 'e', 'a'};
int main() {
for(unsigned int i = 0; i<editStr.size(); i++){
for(char c: vowels){
if(editStr.at(i) == c){
editStr.erase(i--,1);
break;
}
}
}
std::cout << editStr << std::endl;
return 0;
}

Resources