Word count in C? - c

I have a problem with counting words in std. I use the same method when I count words in files there works OK.
My method is as follows: We read until ctrl+d. If the next character is a line return, increase new_lines. Otherwise, we increase the words because the next method (last if) doesn't read until first space and I lost first word. In the end If the current character is a space and next element is something other than a space, increase words.
Now I'm going to explain about problem. If I have empty line program increase words but why I use second if for this. If I don't have empty lines program work.
int status_read=1;
while (status_read > 0){ // read to end of file
status_read = read(STDOUT_FILENO, buff, 9999); // read from std
for (i = 0; i < status_read ; i++) { // until i<status_read
if (buff[i] == '\n') {
new_lines++;
if (buff[i+1]!='\n')
wordcounter++;
}
if (buff[i] == ' ' && buff[i+1]!=' ')
wordcounter++;
}
}

As #FredLarson commented, you are trying to read from standard out, not standard in (that is, you should be using STDIN_FILENO, not STDOUT_FILENO).

If I have empty line program increase words but why I use second if
for this. If I don't have empty lines program work.
That's due to
if (buff[i] == '\n') {
new_lines++;
if (buff[i+1]!='\n')
wordcounter++;
}
- to solve this problem, just don't increment wordcounter here - replace the above with
if (buff[i] == '\n') ++new_lines;
Otherwise,
we increase the words because the next method (last if) doesn't read
until first space and I lost first word.
To avoid the problem of losing the first word on a line, as well as that with buff[i+1] (see M Oehm's comments above), I suggest changing
if (buff[i] == ' ' && buff[i+1]!=' ')
wordcounter++;
to
if (wasspace && !isspace(buff[i])) ++wordcounter;
wasspace = isspace(buff[i]);
- wasspace being defined and initialized to int wasspace = 1; before the file read loop.

Related

C program to read specific lines from a file

I'm trying to make a program that will count the lines in a file and will refer to specific lines as another count(i.e lines that start with a # should not be counted)
while(fgets(tempstring,sizeof(tempstring),fptr)){
lines++;
if(tempstring[0] != '#' || tempstring[0]!='\n'|| tempstring[0]!=' '){
++count;
}
Now what am I doing wrong here?
Also i have noticed that the first time i call fgets i get ∩ as an output for tempstring[0] why is that?
Your condition is always true - you wanted to either use &&, or negate the overall ||:
if (tempstring[0] != '#' && tempstring[0]!='\n' && tempstring[0]!=' ')
or
if(!(tempstring[0] == '#' || tempstring[0] == '\n' || tempstring[0] == ' '))
which is equivalent. Note that you can remove if altogether, because true in C is the same as 1:
count += (tempstring[0] != '#' && tempstring[0]!='\n' && tempstring[0]!=' ');
Also note that fgets may or may not give you the beginning of line, depending on sizeof(tempstring). If tempstring is not long enough for the whole string from the file, your call may produce a string from the middle of another string, causing incorrect behavior. This is harder to fix, because now you need a loop that checks for the last character of the string returned from fgets to be '\n'.

How way to move a file pointer to the next word in a text file?

I have a program that requires that I start from word N, hash the next N+M words (concatenated and done through another function so the original pointer is not moved), and then increment the FILE pointer that is pointing at N to the next word.
The only way I thought to do this was to increment the FILE pointer until a space is found, then increment further until we found the first character of the next word. This is necessary because the file I am reading may have multiple spaces between words which would not result in a matching string compared to a file that has the same word content but single spaces.
This method would then require ungetc() because we we would have taken from the stream the first character of the next word.
Any ideas on a different implementation or am I pretty well restricted to this method?
while ( (c = fgetc(fileToHash) != ' ' )
;
while( (c = fgetc(fileToHash)) == ' ')
;
ungetc(c, fileToHash);
Yes, if you insist on using the file pointer as your index, that's pretty much what you've got. A better solution would probably be to read part or all of the file into a buffer and manipulate your pointer into the buffer, unless you intend to do random-access overwriting of the file's contents -- which is generally completely impractical with text files.
How about this.
void findWord(FILE *f, int n) {
int c = 0;
while (n-- > 0 && c != EOF) {
do c = fgetc(f); while (c != EOF && !isalpha(c));
while (c != EOF && isalpha(c)) c = fgetc(f);
}
}
You can use fscanf to read words delimited by whitespaces. This example will read each word from standard input and print each of them on a new line:
char buf[128];
while (fscanf(stdin, "%s", buf) > 0)
puts(buf);

Counting words in a string?

Hello for this program I am supposed to count the number of words in a string. So far, I have found out how to find the number of characters in a string but am unable to figure out how to turn the letters that make a word, and count it as 1 word.
My function is:
int wordcount( char word[MAX] ){
int i, num, counter, j;
num = strlen( word );
counter = 0;
for (i = 0; i < num; i++)
{
if (word[i] != ' ' || word[i] != '\t' || word[i] != '\v' || word[i] != '\f')
{
}
}
return counter;
}
I tried some variations, but the middle part of the if statement is where I am confused. How can I count the number of words in a string? Testing for this tests if the string has multiple spaces like "Hello this is a string"
Hints only since this is probably homework.
What you're looking to count is the number of transitions between 'word' characters and whitespace. That will require remembering the last character and comparing it to the current one.
If one is whitespace and the other is not, you have a transition.
With more detail, initialise the lastchar to whitespace, then loop over every character in your input. Where the lastchar was whitespace and the current character is not, increase the word count.
Don't forget to copy the current character to lastchar at the end of each loop iteration. And it should hopefully go without saying that the word count should be initialised to 0.
There is a linux util 'wc' that can count words.
have a look (it includes some explanation and a sample):
http://en.literateprograms.org/Word_count_(C)
and a link to the source
http://en.literateprograms.org/index.php?title=Special:DownloadCode/Word_count_(C)&oldid=15634
When you're in the if part, it means you're inside a word. So you can flag this inword and look whether you change from out of word (which would be your else part) to inword and back.
This is a quick suggestion — there could be better ways, but I like this one.
First, be sure to "know" what a word is made of. Let us suppose it's made of letters only. All the rest, being punctuation or "blanks", can be considered as a separator.
Then, your "system" has two states: 1) completing a word, 2) skipping separator(s).
You begin your code with a free run of the skip separator(s) code. Then you enter the "completing a word" state which you will keep until the next separator or the end of the whole string (in this case, you exit). When it happens, you have completed a word, so you increment your word counter by 1, and you go in the "skipping separators" state. And the loop continue.
Pseudo C-like code:
char *str;
/* someone will assign str correctly */
word_count = 0;
state = SKIPPING;
for(c = *str; *str != '\0'; str++)
{
if (state == SKIPPING && can_be_part_of_a_word(c)) {
state = CONSUMING;
/* if you need to accumulate the letters,
here you have to push c somewhere */
}
else if (state == SKIPPING) continue; // unneeded - just to show the logic
else if (state == CONSUMING && can_be_part_of_a_word(c)) {
/* continue accumulating pushing c somewhere
or, if you don't need, ... else if kept as placeholder */
}
else if (state == CONSUMING) {
/* separator found while consuming a word:
the word ended. If you accumulated chars, you can ship
them out as "the word" */
word_count++;
state = SKIPPING;
}
}
// if the state on exit is CONSUMING you need to increment word_count:
// you can rearrange things to avoid this when the loop ends,
// if you don't like it
if (state == CONSUMING) { word_count++; /* plus ship out last word */ }
the function can_be_part_of_a_word returns true if the read char is in [A-Za-z_] for example, false otherwise.
(It should work If I have not done some gross error with the abetment of the tiredness)

remove trailing blanks

I'm working through k&r and I'm working on problem 1-18. Write a program to remove TRAILING blanks and tabs from each line of input and to delete entirely blank lines. My idea is to read in each line and have a count for the number of spaces. If it is the first blank print it using putchar. If there is a second blank do not print it and reset spaces to 0. Then continue to read through and remove spaces.
At the moment I have it to just print anything else it reads because I am attempting to do it bit by bit. Once I run this program I get two of these �� in the terminal. I think I am having problems nesting the if statement or the else statement incorrectly, I had some errors on there earlier. Am I going about the logic of removing blanks the correct way? If someone could point me in the right direction to fix the code I would be grateful.
#include <stdio.h>
main()
{
int c, i, spaces; /*c for input, i for counting*/
i = 0;
c = 0;
spaces = 0;
while ((c = getchar())!=EOF)
if(spaces = 0 && c == ' ')
++spaces;
putchar(c);
if(spaces >= 1)
spaces = 0;
else {
putchar(c);
}
}
Try with:
if(spaces == 0 && c == ' ')
{
++spaces;
putchar(c);
if(spaces >= 1)
spaces = 0;
}
else
{
putchar(c);
}
your indentation suggests you want the else branch to go with the first if, but as it is now, it corresponds with the inner if.
Also, spaces = 0 is an assignment, spaces == 0 is a comparison.
Read a string.
Find the end of the string using strlen() or similar method.
for(i=string_length-i; i>=0; i--) or use pointers if you prefer.
Break loop upon first non-space found.
Insert a null termination at first non-space+1.

Trying to convert morse code to english. struggling

I'm trying to create a function to read Morse code from one file, convert it to English text, print the converted text to the terminal, and write it to an output file. Here's a rough start...
#define TOTAL_MORSE 91
#define MORSE_LEN 6
void
morse_to_english(FILE* inputFile, FILE* outputFile, char morseStrings[TOTAL_MORSE][MORSE_LEN])
{ int i = 0, compare = 0;
char convert[MORSE_LEN] = {'\0'}, *buffer = '\0';
//read in a line of morse string from file
// fgets(buffer, //then what?
while(((convert[i] = fgetc(inputFile)) != ' ') && (i < (MORSE_LEN - 1)))
{ i++;
}
if (convert[i + 1] == ' ')
convert[i + 1] = '\0';
//compare read-in string w/morseStrings
for (i = 48, compare = strcmp(convert, morseStrings[i]); //48 is '0'
i < (TOTAL_MORSE - 1) && compare != 0;
i++)
{ compare = strcmp(convert, morseStrings[i]);
}
printf("%c", (char)i);
}
I have initialized morseStrings to the morse code.
That's my function right now. It does not work, and I'm not really sure what approach to take.
My original algorithm plan was something like this:
1. Scan Morse code in from file, character by character, until a space is reached
1.1 save to a temporary buffer (convert)
2. loop while i < 91 && compare != 0
compare = strcmp(convert, morseString[i])
3. if (test ==0) print ("%c", i);
4. loop through this until eof
but.. I can't seem to think of a good way to test if the next char in the file is a space. So this has made it very difficult for me.
I got pretty frustrated and googled for ideas, and found a suggestion to use this algorithm
Read a line
Loop
-strchr() for a SPACE or EOL
-copy characters before the space to another string
-Use strcmp() and loop to find the letter
-Test the next character for SPACE.
-If so, output another space
-Skip to next morse character
List item
Endloop
But, this loops is kind of confusing. I would use fgets() (I think), but I don't know what to put in the length argument.
Anyways, I'm tired and frustrated. I would appreciate any help or insight for this problem. I can provide more code if necessary.
Your original plan looks fine. You're off by 1 when you check for the ' ' in the buffer, though. It's at convert[i], not convert[i + 1]. The i++ inside the loop doesn't happen when a space is detected.
I wouldn't use strchr(), to complicated.
Loop through the Inputfile reading a line
tokenize line with [strtok][1]
loop through tokens and save(best append) the single Letters to a Buffer
close looops and print
a bit of pseudocode for u
while(there is a next line){
tokens = strtok(line);
int i = 0;
while(tokens hasnext){
save to buffer}}
If you are concerned about the CPU time you can write a lookup table to find the values, something as a switch like this:
case '.-': code = "A"; break;
case '-...': code = "B"; break;
case '-.-.': code = "C"; break;
After you split the morse code by the spaces and send the diferent . and - combinations to the switch to get the original character.
I hope this help.
Best regards.

Resources