Count words from a string - c

while( (ch = fgetc( infile )) != EOF )
if(ch ==' ') words++;
It works nice, but in case if we have blank lines in a string, how are we suppose to detect these lines and to count thw words right?

Your code does not count words, it counts spaces. In many cases the two counts would be different - for example, when words are separated by more than one space.
You need to change the logic in such a way that you set a boolean flag "I'm inside a word" when you see a character that belongs to a word, and has the following logic when it sees a whitespace character (a space, a tab, or a newline character):
if (isspace(ch)) {
if (sawWordFlag) {
words++;
sawWordFlag = false;
}
}
One way to detect if a character belongs to a word is to call isalnum on it. Both isalnum and isspace functions require you to include <ctype.h> header.

So sscanf already does what you need, it will eat any number of whitespaces before a string including tabs and newlines. This algorithm works with leading or trailing spaces as well.
int words = 0;
int i = 0;
while(sscanf(inFile, "%*s%n", &i) != EOF){
inFile += i;
words++;
}
sscanf is extremely versatile you can easily read out each word as follows:
int words = 0;
int size = strlen(inFile);
if(size > 0){
char* word = (char*)malloc((size + 1) * sizeof(char));
for(int i = 0; sscanf(sentence, "%s%n", word, &i) > 0; sentence += i){
// Do what you want with word here
words++;
}
free(word);
}

char prev = 'x'; // anything but space
while((ch == fgetc(infile)) != EOF)
{
if(ch == ' ' && ch == prev)
continue;
else if(ch == ' ' && ch != prev)
words++;
prev = ch;
}

Related

The answer outputs blanks

Program task -
Enter a string, display it word for word on the screen.
The problem is that if you type a lot of spaces between words, they will show up when you check. How can this be fixed?
#include <stdio.h>
int main()
{
int inw = 0, i = 0, count = 0;
char s[10000];
printf("Print string (max 10000 sb):\n");
gets(s);
while (s[i] != '\0') {
if (s[i] != ' ' && s[i] != '\t') {
putchar(s[i]);
}
else if (s[i] == ' ') {
printf("\n");
}
i++;
}
return 0;
}
Ugly, but this gets the job done. Just need a flag to keep track of whether or not you just printed a new line. Also cleaned up unused variables and changed to using fgets
#include <stdio.h>
#include <stdbool.h>
int main()
{
int i = 0;
char s[10000];
bool justPrintedNewline = false;
printf("Print string (max 10000 sb):\n");
fgets(s, sizeof s, stdin);
while (s[i] != '\0') {
if (s[i] != ' ' && s[i] != '\t') {
putchar(s[i]);
justPrintedNewline = false;
}
else if (s[i] == ' ' && justPrintedNewline == false) {
printf("\n");
justPrintedNewline = true;
}
i++;
}
return 0;
}
Demo
You did a great job in the algorithm just fix a little thing.
You can create a flag and after space you increase the flag to 1.
Then you will know you will print just one space.
After printing " " check for a char that isn't " " for update the flag to 0.
When the flag is 1 DONT print anything just wait for another valid char.
Take care,
Ori
Only print a line-feeed when starting a word and after all is done.
Change code to:
If a space
-- print a '\n' when the prior character is a non-white-space.
Else
-- if (prior character is white-space) print a '\n'
-- print it
char prior = 'a';
while (s[i]) {
char ch = s[i];
if (ch != ' ' && ch != '\t') {
if (prior == ' ' || prior == '\t') {
putchar('\n');
}
putchar(ch);
}
prior = ch;
i++;
}
putchar('\n');
There is a bit of a trick to it: use a second, inside loop to skip past spaces and another to print words. The outer loop should only terminate if you have reached the end of the string.
while (s[i] != '\0')
{
// skip all spaces
while ((s[i] != '\0') && isspace( s[i] )) ++i;
// print the word
while ((s[i] != '\0') && !isspace( s[i] ))
{
putchar( s[i] );
}
// print the newline after a word
putchar( '\n' );
}
By the way, gets() is a really, really dangerous function. It should never have been included in the language. You are OK to use it for a homework, but in reality you should use fgets().
char s[1000];
fgets( s, sizeof(s), stdin );
The fgets() function is a bit more fiddly to use than gets(), but the above snippet will work for you.
Your other option for solving this homework is to use scanf() to read a word at a time from user input, and print it each time through the loop. I’ll leave that to you to look up. Don’t forget to specify your max string length in your format specifier. For example, a 100 char array would be a maximum 99-character string, so you would use "%99s" as your format specifier.

Printf for string goes down a line unwillingly

This is my program (school exercise, should be receiving a string from the user, change it and return the original and new string in a certain format):
#include <stdio.h>
#define MAX_STRING_LENGTH 50
char switchChar(char c) {
if ((c >= 'A') && (c <= 'Z')) {
c = c + 32;
} else
if ((c >= 'a') && (c <= 'z')) {
c = c - 32;
}
if ((c > '5') && (c <= '9')) {
c = 56;
}
if ((c >= '0') && (c < '5')) {
c = 48;
}
return c;
}
int main(void) {
char temp;
int i = 0;
char stringInput[MAX_STRING_LENGTH + 1];
printf("Please enter a valid string\n");
fgets(stringInput, 50, stdin);
char newString[MAX_STRING_LENGTH + 1];
while ((i != MAX_STRING_LENGTH + 1) && (stringInput[i] != '\0')) {
temp = switchChar(stringInput[j]);
newString[i] = temp;
i++;
}
printf( "\"%s\"", stringInput);
printf("->");
printf( "\"%s\"", newString);
return 0;
}
When running, the output goes down a line after the string and before the last " character, although it should all be printed in the same line.
I would appreciate any directions.
There are several issues in your code:
fgets() reads and leaves the newline character at the end of the destination array if present and if enough space is available. For consistency with your algorithm, you should strip this newline. You can do this safely with stringInput[strcspn(stringInput, "\n")] = '\0'; or use a little more code if you cannot use <string.h>. The presence of this newline character explains the observed undesirable behavior.
You read a line with fgets(), but you pass a buffer size that might be incorrect: hard coded to 50 when the array size is MAX_STRING_LENGTH + 1. With MAX_STRING_LENGTH defined as 50, it is not a problem, but if you later change the definition of the macro, you might forget to update the size argument to fgets(). Use sizeof stringInput for consistency
you forget to set the null terminator in newString. Testing the boundary value for i is not necessary as stringInput is null terminated within the array boundaries.
in switchChar(), you should not hardcode character values from the ASCII charset: it reduces portability and most importantly, reduces readability.
Here is a corrected and simplified version:
#include <stdio.h>
#define MAX_STRING_LENGTH 50
char switchChar(char c) {
if ((c >= 'A') && (c <= 'Z')) {
c = c + ('a' - 'A');
} else
if ((c >= 'a') && (c <= 'z')) {
c = c - ('a' - 'A');
} else
if ((c > '5') && (c <= '9')) {
c = '8';
} else
if ((c >= '0') && (c < '5')) {
c = '0';
}
return c;
}
int main(void) {
char stringInput[MAX_STRING_LENGTH + 1];
char newString[MAX_STRING_LENGTH + 1];
int c;
printf("Please enter a valid string\n");
if (fgets(stringInput, sizeof stringInput, stdin) != NULL) {
// strip the newline character if present
//stringInput[strcspn(stringInput, "\n")] = '\0';
char *p;
for (p = stringInput; *p != '\0' && *p != '\n'); p++)
continue;
*p = '\0';
for (i = 0; stringInput[i] != '\0'; i++) {
newString[i] = switchChar(stringInput[i]);
}
newString[i] = '\0';
printf("\"%s\"", stringInput);
printf("->");
printf("\"%s\"", newString);
printf("\n");
}
return 0;
}
It's because fgets() reads in the newline character as well if there's room in the buffer and it's stored in your newString.
You can remove it with:
fgets(stringInput,50,stdin);
stringInput[strcspn(stringInput, "\n")] = 0; /* removes the trailing newline if any */
From fgets():
fgets() reads in at most one less than size characters from stream
and stores them into the buffer pointed to by s. Reading stops after
an EOF or a newline. If a newline is read, it is stored into the
buffer. A terminating null byte ('\0') is stored after the last character in the buffer.
You requirements contain:
get only one string
no special processing for blank characters
In that case, scanf is probably more adapted than fgets, because the former will clean the input for any initial blank(space or tab) and stop before the first trailing blank (space, tab, cr or newline). Remark: as scanf stops before the first blank, the string cannot contains spaces or tab. If it is a problem, use fgets.
Just replace the line:
fgets(stringInput, 50, stdin);
with:
i = scanf("%50s", stringInput);
if (i != 1) { /* always control input function return code */
perror("Could not get input string");
return 1;
}
If you prefere to use fgets for any reason, you should remove the (optional) trailing newline:
if (NULL == fgets(stringInput, 50, stdin)) { /* control input */
perror("Could not get input string");
return 1;
}
int l = strlen(stringInput);
if ((l > 0) && (stringInput[l - 1] == '\n')) { /* test for a trailing newline */
stringInput[l - 1] = '\0'; /* remove it if found */
}

c Creating a program that counts how many words are in a sentence and also capitalizes all the letters in the sentence.

So i have this code but when i run it it always says that the amount of words are 1 no matter how many i put in and hopefully it is in easy fix. I tried changing the scanf to just %s but that didn't work because it only printed out the first word but it got the number of words right.
#include <stdio.h>
int main()
{
int words = 0;
char ch,sen[100]="", i;
printf("Enter a sentence ended by a '.', a '?', or a '!':");
scanf("%[^\n]", sen);
while ((ch = getchar()) != '\n') {
if (ch == ' ')
words++;
}
words++;
for(i=0;sen[i];i++)
{
if( (sen[i]>=97) && (sen[i]<=122) )
sen[i]-=32;
}
printf("Capitalized sentence: %s\n", sen);
printf("Total number of words:%d\n", words);
return 0;
}
Your program has a major bug. scanf() won't read/store the newline. Then the newline is read by getchar(). This loop will only execute once.
while ((ch = getchar()) != '\n') {
if (ch == ' ')
words++;
}
Hence you are getting only 1 word. Why you are using 2 methods to take input.
Either use scan() and manipulate variable "sen" or use getchar() and store character 1 by 1 in sen.
// don't use scanf() in this case
i=0;
while ((ch = getchar()) != '\n') {
if (ch == ' ')
words++;
sen[i++] = ch;
}
Recommended will be to use fgets() to get such inputs. Learn about it.

How to count characters except double or single quotes?

I'm new to programming and learning C through a book.
The author of the book explains about logical operators (AND, NOT and OR) by giving the following example which counts the number of characters except the double or single quotes and period character.
I couldn't understand how it counts the number of characters except the quotes and period character. I understand that with the AND operator both conditions should be true.
#include <stdio.h>
#define PERIOD '.'
int main(void)
{
char ch;
int charcount = 0;
while ((ch = getchar()) != PERIOD)
{
if (ch != '"' && ch != '\'')
charcount++;
}
printf("There are %d non-quote characters.\n", charcount);
return 0;
}
I will try to explain you the main part of the code :
while ((ch = getchar()) != PERIOD)
{
Here, it will check every character contained in your text, as long as the character differ from PERIOD, which is a dot, so it simply check all of the characters in the sentence.
if (ch != '"' && ch != '\'')
charcount++;
}
Here, it adds 1 to the charcount if the condition is true. For the if to return true, both ch != '"' and ch != '\'' must be true ! The && operator is a logical AND, and for an AND to return true (1), both conditions must be equal to 1. So if the character is equal to " or ', the AND will return 0, and so we won't add 1 to the charcount.
This condition if (ch != '"' && ch != '\''), is checking if the entered character is "or ' if not then it increments the count of characters otherwise not. If user enterd d or #, it will satisfy the condition because ASCII value of # is not equal to " or ', and the count will get incremented.
Well, for each char returned by getchar() and stored in ch (from an input stream, like a keyboard or a file) it will test if it's not a double quote (ch != '"') and if it's not a quote (ch != '\'')
\ is an escape character, which means '\'' is the char '
If it's neither of them, then it will increments the counter (charcount++;).
And this will go on as long as getchar() doesn't return a period ((ch = getchar()) != PERIOD)(if it does, the PERIOD won't be counted as the code will step out the will loop immediately).
You can skip using continue:
#include<stdio.h>
#define PERIOD '.'
int main(void)
{
char ch;
int charcount = 0;
while ((ch = getchar()) != PERIOD)
{
if (ch != '"' && ch != '\''){
continue;
}
charcount++;
}
printf("There are %d non-quote characters.\n", charcount);
return 0;
}
Output:
./program
Michi"""LoL'''Another'"LoL.
There are 8 non-quote characters.

counting the number of sentences in a paragraph in c

As part of my course, I have to learn C using Turbo C (unfortunately).
Our teacher asked us to make a piece of code that counts the number of characters, words and sentences in a paragraph (only using printf, getch() and a while loop.. he doesn't want us to use any other commands yet). Here is the code I wrote:
#include <stdio.h>
#include <conio.h>
void main(void)
{
clrscr();
int count = 0;
int words = 0;
int sentences = 0;
char ch;
while ((ch = getch()) != '\n')
{
printf("%c", ch);
while ((ch = getch()) != '.')
{
printf("%c", ch);
while ((ch = getch()) != ' ')
{
printf("%c", ch);
count++;
}
printf("%c", ch);
words++;
}
sentences++;
}
printf("The number of characters are %d", count);
printf("\nThe number of words are %d", words);
printf("\nThe number of sentences are %d", sentences);
getch();
}
It does work (counts the number of characters and words at least). However when I compile the code and check it out on the console window I can't get the program to stop running. It is supposed to end as soon as I input the enter key. Why is that?
Here you have the solution to your problem:
#include <stdio.h>
#include <conio.h>
void main(void)
{
clrscr();
int count = 0;
int words = 0;
int sentences = 0;
char ch;
ch = getch();
while (ch != '\n')
{
while (ch != '.' && ch != '\n')
{
while (ch != ' ' && ch != '\n' && ch != '.')
{
count++;
ch = getch();
printf("%c", ch);
}
words++;
while(ch == ' ') {
ch = getch();
printf("%c", ch);
}
}
sentences++;
while(ch == '.' && ch == ' ') {
ch = getch();
printf("%c", ch);
}
}
printf("The number of characters are %d", count);
printf("\nThe number of words are %d", words);
printf("\nThe number of sentences are %d", sentences);
getch();
}
The problem with your code is that the innermost while loop was consuming all the characters. Whenever you enter there and you type a dot or a newline it stays inside that loop because ch is different from a blank. However, when you exit from the innermost loop you risk to remain stuck at the second loop because ch will be a blank and so always different from '.' and '\n'. Since in my solution you only acquire a character in the innermost loop, in the other loops you need to "eat" the blank and the dot in order to go on with the other characters.
Checking these conditions in the two inner loops makes the code work.
Notice that I removed some of your prints.
Hope it helps.
Edit: I added the instructions to print what you type and a last check in the while loop after sentences++ to check the blank, otherwise it will count one word more.
int ch;
int flag;
while ((ch = getch()) != '\r'){
++count;
flag = 1;
while(flag && (ch == ' ' || ch == '.')){
++words;//no good E.g Contiguous space, Space at the beginning of the sentence
flag = 0;;
}
flag = 1;
while(flag && ch == '.'){
++sentences;
flag=0;
}
printf("%c", ch);
}
printf("\n");
I think the problem is because of your outer while loop's condition. It checks for a newline character '\n', as soon as it finds one the loop terminates. You can try to include your code in a while loop with the following condition
while((c=getchar())!=EOF)
this will stop taking input when the user presses Ctrl+z
Hope this helps..
You can implement with ease an if statement using while statement:
bool flag = true;
while(IF_COND && flag)
{
//DO SOMETHING
flag = false;
}
just plug it in a simple solution that uses if statements.
For example:
#include <stdio.h>
#include <conio.h>
void main(void)
{
int count = 0;
int words = 1;
int sentences = 1;
char ch;
bool if_flag;
while ((ch = getch()) != '\n')
{
count++;
if_flag = true;
while (ch==' ' && if_flag)
{
words++;
if_flag = false;
}
if_flag = true;
while (ch=='.' && if_flag)
{
sentences++;
if_flag = false;
}
}
printf("The number of characters are %d", count);
printf("\nThe number of words are %d", words);
printf("\nThe number of sentences are %d", sentences);
getch();
}
#include <stdio.h>
#include <ctype.h>
int main(void){
int sentence=0,characters =0,words =0,c=0,inside_word = 0,temp =0;
// while ((c = getchar()) != EOF)
while ((c = getchar()) != '\n') {
//a word is complete when we arrive at a space after we
// are inside a word or when we reach a full stop
while(c == '.'){
sentence++;
temp = c;
c = 0;
}
while (isalnum(c)) {
inside_word = 1;
characters++;
c =0;
}
while ((isspace(c) || temp == '.') && inside_word == 1){
words++;
inside_word = 0;
temp = 0;
c =0;
}
}
printf(" %d %d %d",characters,words,sentence);
return 0;
}
this should do it,
isalnum checks if the letter is alphanumeric, if its an alphabetical letter or a number, I dont expect random ascii characters in my sentences in this program.
isspace as the name says check for space
you need the ctype.h header for this. or you could add in
while(c == ' ') and whie((c>='a' && c<='z') || (c >= 'A' && c<='Z')
if you don't want to use isalpace and isalnum, your choice, but it will be less elegant :)
The trouble with your code is that you consume the characters in each of your loops.
a '\n' will be consumed either by the loop that scans for words of for sentences, so the outer loop will never see it.
Here is a possible solution to your problem:
int sentences = 0;
int words = 0;
int characters = 0;
int in_word = 0; // state of our parser
int ch;
do
{
int end_word = 1; // consider a word wil end by default
ch = getch();
characters++; // count characters
switch (ch)
{
case '.':
sentences++; // any dot is considered end of a sentence and a word
break;
case ' ': // a space is the end of a word
break;
default:
in_word = 1; // any non-space non-dot char is considered part of a word
end_word = 0; // cancel word ending
}
// handle word termination
if (in_word and end_word)
{
in_word = 0;
words++;
}
} while (ch != '\n');
A general approach to these parsing problems is to write a finite-state machine that will read one character at a time and react to all the possible transitions this character can trigger.
In this example, the machine has to remember if it is currently parsing a word, so that one new word is counted only the first time a terminating space or dot is encountered.
This piece of code uses a switch for concision. You can replace it with an if...else if sequence to please your teacher :).
If your teacher forced you to use only while loops, then your teacher has done a stupid thing. The equivalent code without other conditional expressions will be heavier, less understandable and redundant.
Since some people seem to think it's important, here is one possible solution:
int sentences = 0;
int words = 0;
int characters = 0;
int in_word = 0; // state of our parser
int ch;
// read initial character
ch = getch();
// do it with only while loops
while (ch != '\n')
{
// count characters
characters++;
// count words
while (in_word)
{
in_word = 0;
words++;
}
// skip spaces
while (ch == ' ')
{
ch = -1;
}
// detect sentences
while (ch == '.')
{
sentences++;
ch = -1;
}
// detect words
while ((ch != '\n')
{
word_detected = 1;
ch = -1;
}
// read next character
ch = getch();
}
Basically you can replace if (c== xxx) ... with while (c== xxx) { c = -1; ... }, which is an artifical, contrieved way of programming.
An exercise should not promote stupid ways of doing things, IMHO.
That's why I suspect you misunderstood what the teacher asked.
Obviously if you can use while loops you can also use if statements.
Trying to do this exercise with only while loops is futile and results in something that as little or nothing to do with real parser code.
All these solutions are incorrect. The only way you can solve this is by creating an AI program that uses Natural Language Processing which is not very easy to do.
Input:
"This is a paragraph about the Turing machine. Dr. Allan Turing invented the Turing Machine. It solved a problem that has a .1% change of being solved."
Checkout OpenNLP
https://sourceforge.net/projects/opennlp/
http://opennlp.apache.org/

Resources