Counting words program in C omitting double spaces and punctuation - c

I am doing a course on the basics of C programming, I've been given a task to create a program that counts the number of words in a sentence, I've achieved this, however I have a secondary task to stop the program from counting punctuation, on top of this if i type in a consecutive space i need the program to ignore it, i don't know how to get round it. could anyone point me into the right direction, I am not looking for anyone to write the code for me.
here is my code:
#include <stdio.h>
int main()
{
const char end = '.';
int words = 1;
printf("please enter a sentence: \n");
char c = getchar();
while (c != end)
{
c = getchar();
if (c == ' ')
words++;
}
printf("the total number of words is %d", words);
getchar();
getchar();
}

Working from the code you give (i.e. counting words by finding delimiting spaces), you can solve the multiple spaces problem by remembering the last character ignoring subsequent spaces
while (c != end)
{
c = getchar();
if (c == ' ' && previous_c != ' ')
words++;
previous_c = c;
}
Note though that if the user begins the input with a single space, then the program will still count this as one word. To prevent this you should initialse previous_c to some known value (e.g. 0) and check for this case also. This means the if condition would become (c == ' ' && (previous_c != ' ' || previous_c == 0))
As Cool Guy commented, the program you've shown already ignores punctuation as-is.
As another improvement I would suggest looking at using a do...while loop instead of the while loop to reduce the places you need to call getchar()

I would split it in 2 loops. One for skipping all non-word-chars and a second to skip the word-chars. Between these loops starts a word and will be counted.
#include <stdio.h>
int main()
{
char c;
int words = 0;
printf("please enter a sentence: \n");
for(;;) {
while((c=getchar())!=EOF && !isalpha(c));
if(c==EOF) break;
words++;
while((c=getchar())!=EOF && isalpha(c));
if(c==EOF) break;
}
}

Related

C program to count symbols in a file (without \n \t ' ') and print the line with the most of the symbols and the line with the least amount of symbols

So I have this big assigment for university and I am a beginner in programming and I need little bit of help with this program I am writing.So basically I have to write a program which counts the symbols and characters on every line in a file(without '\n', '\t', ' ') and to print out the line with the most symbols and characters and that one with the least.
For example if I have in the file
1.Hello my name is Martin
2.I love meatballs
3.Bacon is great
It should print out for example:
"Line number 1 has the most characters and line number 3 has the least characters".
for (c = getc(fp); c != EOF; c = getc(fp))
{
if(c=='\n')
{
lines++;
}
if (c != '\n' && c != '\t' && c!= ' ')
{
count++;
}
}
I made the symbol count and the line count but everything else is darkness.Please help thank you.
** Edit ** My apologies, thanks for pointing out the inadequacies.
From what the assignment description sounds like, you don't need to count the total number of symbols in the entire text file, but the total number of symbols per line. Have a few variables for line number with least symbols, the least amount of symbols, the line number with the most symbols, and the largest amount of symbols. Each time as you read a line, have a temporary variable that counts the non-whitespace symbols as long as the character is not '\n'. Once you reach '\n', check if 1) the current line count is either smaller than the smallest count or 2) the current line count is larger than the largest count. If so, update the new counts and the new line numbers.
I'd code it as this:
int lineNO = 1;
int leastLine, mostLine, currCount, leastCount, mostCount = 0;
int c;
while ((c = getc(fp)) != EOF) {
if (c == '\n') {
/* check if currCount < leastCount or if currCount > mostCount */
/* if so, update the necessary lineNO's */
currCount = 0;
lineNO++;
}
if (c != '\n' && c != '\t' && c!= ' ') {
currCount++;
}
}
Thanks so much for your help.I was able to write the program and find the leastLine - here is the code:
int lineNO = 1;
int mostLine, currCount, leastCount[200], mostCount = 0;
int c;
int len=0;
int leastLine[200];
while ((c = getc(fp)) != EOF) {
if (c == '\n') {
currCount = 0;
lineNO++;
leastLine[lineNO]=lineNO;
}
if (c != '\n' && c != '\t' && c!= ' ') {
currCount++;
leastCount[lineNO]=currCount;
}
}
int i;
for(i=0;i<=lineNO;i++){
if(leastCount[0]>leastCount[i]){
leastCount[0]=leastCount[i];
leastLine[0]=leastLine[i];
}
}
printf("%d",leastLine[0]);
But there are some problems:
1.If I change this line and instead of leastCound[0]>leastCount[i] I change greater than to less than I getsomething like "-327806".
2.In my "text.txt" file(the file where I test my program) if line 1 is the leastLine , the program doesn't count it as the leastLine..
I get this when I modify the program to print all numbers and lines :https://pasteboard.co/GY2wLq7.jpg
Instead of using the arrays to keep track of the line numbers and their lengths and then iterating over the arrays to check which line number has the least characters, I think it would be easier to just keep track of the counts with the variables that I used above: leastLine, mostLine, currCount, leastCount, mostCount.
What I would do in the commented area is:
if (currCount < leastCount) {
leastCount = currCount;
leastLine = lineNO;
}
else if (currCount > mostCount) {
mostCount = currCount;
mostLine = lineNO;
}
You would also need to do this check at the end, after the while loop, because you wouldn't be able to check the last line otherwise. (Unless you slightly modify the while loop.)
If you really want to use an array to keep track of the lengths, I'd suggest an array declared like this: int lineLengths[200], where the index (1 - 199, not 0-199 because line numbers start at 1 so in order to avoid confusion, it would be easier to leave the first element at index 0 empty) represents the line number, and the value at the index would be the number of characters. Then, you could iterate over this array to find the smallest and largest counts and store the line numbers for both.

c Creating a program that counts how many words are in a sentence and also capitalizes all the letters in the sentence.

So i have this code but when i run it it always says that the amount of words are 1 no matter how many i put in and hopefully it is in easy fix. I tried changing the scanf to just %s but that didn't work because it only printed out the first word but it got the number of words right.
#include <stdio.h>
int main()
{
int words = 0;
char ch,sen[100]="", i;
printf("Enter a sentence ended by a '.', a '?', or a '!':");
scanf("%[^\n]", sen);
while ((ch = getchar()) != '\n') {
if (ch == ' ')
words++;
}
words++;
for(i=0;sen[i];i++)
{
if( (sen[i]>=97) && (sen[i]<=122) )
sen[i]-=32;
}
printf("Capitalized sentence: %s\n", sen);
printf("Total number of words:%d\n", words);
return 0;
}
Your program has a major bug. scanf() won't read/store the newline. Then the newline is read by getchar(). This loop will only execute once.
while ((ch = getchar()) != '\n') {
if (ch == ' ')
words++;
}
Hence you are getting only 1 word. Why you are using 2 methods to take input.
Either use scan() and manipulate variable "sen" or use getchar() and store character 1 by 1 in sen.
// don't use scanf() in this case
i=0;
while ((ch = getchar()) != '\n') {
if (ch == ' ')
words++;
sen[i++] = ch;
}
Recommended will be to use fgets() to get such inputs. Learn about it.

K&R C Exercise 1-9 *almost* solved

K&R C Exercise 1-9 states:
Write a program to copy its input to its output, replacing each string of one or more blanks by a single blank.
I have nearly solved this exercise, but the code I've written (see below) always prints an extra space before the first nonspace character. So input that looks like this
X(space)(space)X(space)(space)X(space)(space)X
results in output that looks like this
(space)X(space)X(space)X(space)X
#include <stdio.h>
int main()
{
int c; //current input character
int s; //consecutive input space counter
c = getchar();
s = 0;
while ((c = getchar()) != EOF){
if (c == ' '){
++s;
if (s == 1) //uses the counter to print only the
putchar(' '); //first space in each string of spaces
}
else {
putchar(c);
if (s != 0) //resets the space counter when it
s = 0; //encounters a non-space input character
}
}
return 0;
}
Why does my code always print a leading space when I run it?
How can I modify this code to print the first input character first instead of a leading space?
Do not throw away the first char. #David Hoelzer
// Commented out
//c = getchar();
s = 0;
while ((c = getchar()) != EOF){
Also note unbalanced } near return 0;

counting the number of sentences in a paragraph in c

As part of my course, I have to learn C using Turbo C (unfortunately).
Our teacher asked us to make a piece of code that counts the number of characters, words and sentences in a paragraph (only using printf, getch() and a while loop.. he doesn't want us to use any other commands yet). Here is the code I wrote:
#include <stdio.h>
#include <conio.h>
void main(void)
{
clrscr();
int count = 0;
int words = 0;
int sentences = 0;
char ch;
while ((ch = getch()) != '\n')
{
printf("%c", ch);
while ((ch = getch()) != '.')
{
printf("%c", ch);
while ((ch = getch()) != ' ')
{
printf("%c", ch);
count++;
}
printf("%c", ch);
words++;
}
sentences++;
}
printf("The number of characters are %d", count);
printf("\nThe number of words are %d", words);
printf("\nThe number of sentences are %d", sentences);
getch();
}
It does work (counts the number of characters and words at least). However when I compile the code and check it out on the console window I can't get the program to stop running. It is supposed to end as soon as I input the enter key. Why is that?
Here you have the solution to your problem:
#include <stdio.h>
#include <conio.h>
void main(void)
{
clrscr();
int count = 0;
int words = 0;
int sentences = 0;
char ch;
ch = getch();
while (ch != '\n')
{
while (ch != '.' && ch != '\n')
{
while (ch != ' ' && ch != '\n' && ch != '.')
{
count++;
ch = getch();
printf("%c", ch);
}
words++;
while(ch == ' ') {
ch = getch();
printf("%c", ch);
}
}
sentences++;
while(ch == '.' && ch == ' ') {
ch = getch();
printf("%c", ch);
}
}
printf("The number of characters are %d", count);
printf("\nThe number of words are %d", words);
printf("\nThe number of sentences are %d", sentences);
getch();
}
The problem with your code is that the innermost while loop was consuming all the characters. Whenever you enter there and you type a dot or a newline it stays inside that loop because ch is different from a blank. However, when you exit from the innermost loop you risk to remain stuck at the second loop because ch will be a blank and so always different from '.' and '\n'. Since in my solution you only acquire a character in the innermost loop, in the other loops you need to "eat" the blank and the dot in order to go on with the other characters.
Checking these conditions in the two inner loops makes the code work.
Notice that I removed some of your prints.
Hope it helps.
Edit: I added the instructions to print what you type and a last check in the while loop after sentences++ to check the blank, otherwise it will count one word more.
int ch;
int flag;
while ((ch = getch()) != '\r'){
++count;
flag = 1;
while(flag && (ch == ' ' || ch == '.')){
++words;//no good E.g Contiguous space, Space at the beginning of the sentence
flag = 0;;
}
flag = 1;
while(flag && ch == '.'){
++sentences;
flag=0;
}
printf("%c", ch);
}
printf("\n");
I think the problem is because of your outer while loop's condition. It checks for a newline character '\n', as soon as it finds one the loop terminates. You can try to include your code in a while loop with the following condition
while((c=getchar())!=EOF)
this will stop taking input when the user presses Ctrl+z
Hope this helps..
You can implement with ease an if statement using while statement:
bool flag = true;
while(IF_COND && flag)
{
//DO SOMETHING
flag = false;
}
just plug it in a simple solution that uses if statements.
For example:
#include <stdio.h>
#include <conio.h>
void main(void)
{
int count = 0;
int words = 1;
int sentences = 1;
char ch;
bool if_flag;
while ((ch = getch()) != '\n')
{
count++;
if_flag = true;
while (ch==' ' && if_flag)
{
words++;
if_flag = false;
}
if_flag = true;
while (ch=='.' && if_flag)
{
sentences++;
if_flag = false;
}
}
printf("The number of characters are %d", count);
printf("\nThe number of words are %d", words);
printf("\nThe number of sentences are %d", sentences);
getch();
}
#include <stdio.h>
#include <ctype.h>
int main(void){
int sentence=0,characters =0,words =0,c=0,inside_word = 0,temp =0;
// while ((c = getchar()) != EOF)
while ((c = getchar()) != '\n') {
//a word is complete when we arrive at a space after we
// are inside a word or when we reach a full stop
while(c == '.'){
sentence++;
temp = c;
c = 0;
}
while (isalnum(c)) {
inside_word = 1;
characters++;
c =0;
}
while ((isspace(c) || temp == '.') && inside_word == 1){
words++;
inside_word = 0;
temp = 0;
c =0;
}
}
printf(" %d %d %d",characters,words,sentence);
return 0;
}
this should do it,
isalnum checks if the letter is alphanumeric, if its an alphabetical letter or a number, I dont expect random ascii characters in my sentences in this program.
isspace as the name says check for space
you need the ctype.h header for this. or you could add in
while(c == ' ') and whie((c>='a' && c<='z') || (c >= 'A' && c<='Z')
if you don't want to use isalpace and isalnum, your choice, but it will be less elegant :)
The trouble with your code is that you consume the characters in each of your loops.
a '\n' will be consumed either by the loop that scans for words of for sentences, so the outer loop will never see it.
Here is a possible solution to your problem:
int sentences = 0;
int words = 0;
int characters = 0;
int in_word = 0; // state of our parser
int ch;
do
{
int end_word = 1; // consider a word wil end by default
ch = getch();
characters++; // count characters
switch (ch)
{
case '.':
sentences++; // any dot is considered end of a sentence and a word
break;
case ' ': // a space is the end of a word
break;
default:
in_word = 1; // any non-space non-dot char is considered part of a word
end_word = 0; // cancel word ending
}
// handle word termination
if (in_word and end_word)
{
in_word = 0;
words++;
}
} while (ch != '\n');
A general approach to these parsing problems is to write a finite-state machine that will read one character at a time and react to all the possible transitions this character can trigger.
In this example, the machine has to remember if it is currently parsing a word, so that one new word is counted only the first time a terminating space or dot is encountered.
This piece of code uses a switch for concision. You can replace it with an if...else if sequence to please your teacher :).
If your teacher forced you to use only while loops, then your teacher has done a stupid thing. The equivalent code without other conditional expressions will be heavier, less understandable and redundant.
Since some people seem to think it's important, here is one possible solution:
int sentences = 0;
int words = 0;
int characters = 0;
int in_word = 0; // state of our parser
int ch;
// read initial character
ch = getch();
// do it with only while loops
while (ch != '\n')
{
// count characters
characters++;
// count words
while (in_word)
{
in_word = 0;
words++;
}
// skip spaces
while (ch == ' ')
{
ch = -1;
}
// detect sentences
while (ch == '.')
{
sentences++;
ch = -1;
}
// detect words
while ((ch != '\n')
{
word_detected = 1;
ch = -1;
}
// read next character
ch = getch();
}
Basically you can replace if (c== xxx) ... with while (c== xxx) { c = -1; ... }, which is an artifical, contrieved way of programming.
An exercise should not promote stupid ways of doing things, IMHO.
That's why I suspect you misunderstood what the teacher asked.
Obviously if you can use while loops you can also use if statements.
Trying to do this exercise with only while loops is futile and results in something that as little or nothing to do with real parser code.
All these solutions are incorrect. The only way you can solve this is by creating an AI program that uses Natural Language Processing which is not very easy to do.
Input:
"This is a paragraph about the Turing machine. Dr. Allan Turing invented the Turing Machine. It solved a problem that has a .1% change of being solved."
Checkout OpenNLP
https://sourceforge.net/projects/opennlp/
http://opennlp.apache.org/

Average word length for a sentence

I want to calculate average word length for a sentence.
For example, given input abc def ghi, the average word length would be 3.0.
The program works but i want to ignore extra spaces between the words. So, given the following sentence:
abc def
(two spaces between the words), the average word length is calculated to be 2.0 instead of 3.0.
How can I take into account extra spaces between words? These are to be ignored, which would give average word length of 3.0 in the example above, instead of the erroneously calculated 2.0.
#include <stdio.h>
#include <conio.h>
int main()
{
char ch,temp;
float avg;
int space = 1,alphbt = 0,k = 0;
printf("Enter a sentence: ");
while((ch = getchar()) != '\n')
{
temp = ch;
if( ch != ' ')
{
alphbt++;
k++; // To ignore spaces before first word!!!
}
else if(ch == ' ' && k != 0)
space++;
}
if (temp == ' ') //To ignore spaces after last word!!!
printf("Average word lenth: %.1f",avg = (float) alphbt/(space-1));
else
printf("Average word lenth: %.1f",avg = (float) alphbt/space);
getch();
}
The counting logic is awry. This code seems to work correctly with both leading and trailing blanks, and multiple blanks between words, etc. Note the use of int ch; so that the code can check for EOF accurately (getchar() returns an int).
#include <stdio.h>
#include <stdbool.h>
int main(void)
{
int ch;
int numWords = 0;
int numLetters = 0;
bool prevWasASpace = true; //spaces at beginning are ignored
printf("Enter a sentence: ");
while ((ch = getchar()) != EOF && ch != '\n')
{
if (ch == ' ')
prevWasASpace = true;
else
{
if (prevWasASpace)
numWords++;
prevWasASpace = false;
numLetters++;
}
}
if (numWords > 0)
{
double avg = numLetters / (float)(numWords);
printf("Average word length: %.1f (C = %d, N = %d)\n", avg, numLetters, numWords);
}
else
printf("You didn't enter any words\n");
return 0;
}
Various example runs, using # to indicate where Return was hit.
Enter a sentence: A human in Algiers#
Average word length: 3.8 (C = 15, N = 4)
Enter a sentence: A human in Algiers #
Average word length: 3.8 (C = 15, N = 4)
Enter a sentence: A human in Algiers #
Average word length: 3.8 (C = 15, N = 4)
Enter a sentence: #
You didn't enter any words
Enter a sentence: A human in AlgiersAverage word length: 3.8 (C = 15, N = 4)
Enter a sentence: You didn't enter any words
In the last but one example, I typed Control-D twice (the first to flush the 'A human in Algiers' to the program, the second to give EOF), and once in the last example. Note that this code counts tabs as 'not space'; you'd need #include <ctype.h> and if (isspace(ch)) (or if (isblank(ch))) in place of if (ch == ' ') to handle tabs better.
getchar() returns an int
I am confused why you have used int ch and EOF!
There are several parts to this answer.
The first reason for using int ch is that the getchar() function returns an int. It can return any valid character plus a separate value EOF; therefore, its return value cannot be a char of any sort because it has to return more values than can fit in a char. It actually returns an int.
Why does it matter? Suppose the value from getchar() is assigned to char ch. Now, for most characters, most of the time, it works OK. However, one of two things will happen. If plain char is a signed type, a valid character (often ΓΏ, y-umlaut, 0xFF, formally Unicode U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS) is misrecognized as EOF. Alternatively, if plain char is an unsigned type, then you will never detect EOF.
Why does detecting EOF matter? Because your input code can get EOF when you aren't expecting it to. If your loop is:
int ch;
while ((ch = getchar()) != '\n')
...
and the input reaches EOF, the program is going to spend a long time doing nothing useful. The getchar() function will repeatedly return EOF, and EOF is not '\n', so the loop will try again. Always check for error conditions in input functions, whether the function is getchar(), scanf(), fread(), read() or any of their myriad relatives.
Obviously counting non-space characters is easy, your problem is counting words. Why count words as spaces as you're doing? Or more importantly, what defines a word?
IMO a word is defined as the transition from space character to non-space character. So, if you can detect that, you can know how many words you have and your problem is solved.
I have an implementation, there are many possible ways to implement it, I don't think you'll have trouble coming up with one. I may post my implementation later as an edit.
*Edit: my implementation
#include <stdio.h>
int main()
{
char ch;
float avg;
int words = 0;
int letters = 0;
int in_word = 0;
printf("Enter a sentence: ");
while((ch = getchar()) != '\n')
{
if(ch != ' ') {
if (!in_word) {
words++;
in_word = 1;
}
letters++;
}
else {
in_word = 0;
}
}
printf("Average word lenth: %.1f",avg = (float) letters/words);
}
Consider the following input: (hyphens represent spaces)
--Hello---World--
You currently ignore the initial spaces and the ending spaces, but you count each of the middle spaces, even though they are next to each other. With a slight change to your program, in particular to 'k' we can deal with this case.
#include <stdio.h>
#include <conio.h>
#include <stdbool.h>
int main()
{
char ch;
float avg;
int numWords = 0;
int numLetters = 0;
bool prevWasASpace = true; //spaces at beginning are ignored
printf("Enter a sentence: ");
while((ch = getchar()) != '\n')
{
if( ch != ' ')
{
prevWasASpace = false;
numLetters++;
}
else if(ch == ' ' && !prevWasASpace)
{
numWords++;
prevWasASpace = true; //EDITED this line until after the if.
}
}
avg = numLetters / (float)(numWords);
printf("Average word lenth: %.1f",avg);
getch();
}
You may need to modify the preceding slightly (haven't tested it).
However, counting words in a sentence based on only spaces between words, might not be everything you want. Consider the following sentences:
John said, "Get the phone...Now!"
The TV announcer just offered a buy-1-get-1-free deal while saying they are open 24/7.
It wouldn't cost them more than $100.99/month (3,25 euro).
I'm calling (555) 555-5555 immediately on his/her phone.
A(n) = A(n-1) + A(n-2) -- in other words the sequence: 0,1,1,2,3,5, . . .
You will need to decide what constitutes a word, and that is not an easy question (btw, y'all, none of the examples included all varieties of English). Counting spaces is a pretty good estimate in English, but it won't get you all of the way.
Take a look at the Wikipedia page on Text Segmentation. The article uses the phrase "non-trivial" four times.

Resources