I am trying to write a program that prints the number of words found in a text file. Words are defined as sequences of characters separated by any number of white space.
However, I am having a problem when there are multiple whitespaces because then it doesn't report the right number of words.
Here is my code so far:
#include <stdio.h>
int main()
{
FILE *fp;
char str;
int i=0;
/* opening file for reading */
fp = fopen("myfile.txt" , "r");
if(fp == NULL) {
perror("Error opening file");
return(-1);
}
while(( str = fgetc(fp)) != EOF ) {
if (str == ' ')
++i;
}
printf("%d\n", i);
fclose(fp);
return(0);
}
myfile.txt is:
Let's do this! You can do it. Believe in yourself.
I'm not sure if I use fgets, fscanf, or fgetc.
Let's say I define whitespace as it is defined in the fscanf function when reading a string
It prints 14 which is not right. I'm not sure how to account for multiple whitespaces. In this case, whitespaces are any number of spaces between words.
Counting a whitespace only if it is not preceded by any other white space will do the trick.
#include <stdio.h>
int main()
{
FILE *fp;
char str;
char prevchar; //tracks the previous character
int i=0;
/* opening file for reading */
fp = fopen("myfile.txt" , "r");
if(fp == NULL) {
perror("Error opening file");
return(-1);
}
prevchar='x'; //initialize prevchar to anything except a space
while(( str = fgetc(fp)) != EOF ) {
if (str == ' ' && prevchar!=' ') // update the count only if previous character encountered was not a space
++i;
prevchar=str;
}
printf("%d\n", i+1);
fclose(fp);
return(0);
}
Edit: The code assumes that words are separated by one or more spaces and does not cover all the corner cases like when sentences spread over multiple lines or when words are separated by comma and not spaces. But these cases can be covered by adding more conditions.
just use a little state diagram, two cases are, either you are inside a word, or you are outside a word
#include <stdio.h>
int main()
{
FILE *fp;
char str;
int i=0,inside_word =0;
/* opening file for reading */
fp = fopen("myfile.txt" , "r");
if(fp == NULL) {
perror("Error opening file");
return(-1);
}
inside_word =0;
while(( str = fgetc(fp)) != EOF ) {
if (str == ' ' || str == '\n' || str == '\t')
inside_word = 0;
else if(inside_word == 0){
i++;
inside_word=1;
}
}
printf("%d\n", i);
fclose(fp);
return(0);
}
First thing comes into my mind is, add another while loop right after ++i to exhaust space characters.
And by the way, be careful with your terminology, you are not dealing with whitespaces you are just taking care of space characters. \t and \n are also whitespaces!
How about using regular expression such as '!\s+!' to replace with a single space ' ', then continue with your code
Related
My Code is HERE
int main(){
FILE *fp;
fp = fopen("dic.txt", "r");
while(getc(fp) != EOF){
if(getc(fp) == ' '){
printf("up ");
}
}
}
My dic.txt is HERE
dic.txt
my predict is that "up up up up "
because, there are four space " "
but it printed "up " only one
what is problem?
You are calling getc twice per iteration of the loop; one of these two calls compares the character to EOF, while the other call compares the character to ' '.
This has two consequences:
Your program will only print "up" for the spaces which are on even position, and will miss all spaces which are on odd position;
Your program might make one extra call to getc after reaching EOF the first time.
How to fix
You need to make a single call to getc per iteration of the loop. Save the character returned by getc to a local variable; then use this variable to check for spaces in the body of the loop, and to check for EOF in the condition of the loop.
You want this:
#include <stdio.h>
int main() {
FILE* fp;
fp = fopen("dic.txt", "r");
if (fp == NULL)
{
printf("Can't open file\n");
return 1;
}
int ch; // int is needed her, not char !!
while ((ch = getc(fp)) != EOF) { // read one char and check if it's EOF in one go
if (ch == ' ') {
printf("up ");
}
}
}
You need to call getc once only in the loop, otherwise you skip one out of two characters.
Bonus: you need to check if fopen fails.
Try Out This Code:
FILE *fp;
fp = fopen("dic.txt", "r");
int ch = getc(fp);
while(ch != EOF){
if(getc(fp) == ' '){
printf("up ");
}
}
return 0;
I have a list of columns containing text but I just to fetch first upper row from this list. How to do that?
#include <stdio.h>
int main()
{
FILE *fr;
char c;
fr = fopen("prog.txt", "r");
while( c != EOF)
{
c = fgetc(fr); /* read from file*/
printf("%c",c); /* display on screen*/
}
fclose(fr);
return 0;
}
Your stop condition is EOF, everything will be read to the end of the file, what you need is to read till newline character is found, furthermore EOF (-1) should be compared with int type.
You'll need something like:
#include <stdio.h>
#include <stdlib.h>
int main()
{
FILE *fr;
int c;
if(!(fr = fopen("prog.txt", "r"))){ //check file opening
perror("File error");
return EXIT_FAILURE;
}
while ((c = fgetc(fr)) != EOF && c != '\n')
{
printf("%c",c); /* display on screen*/
}
fclose(fr);
return EXIT_SUCCESS;
}
This is respecting your code reading the line char by char, you also have the library functions that allow you to read whole line, like fgets() for a portable piece of code, or getline() if you are not on Windows, alternatively download a portable version, and, of course you can make your own like this one or this one.
For whatever it's worth, here's an example that uses getline
#include <stdio.h>
int main()
{
FILE *fr;
char *line = NULL;
size_t len = 0;
ssize_t nread;
if (!(fr = fopen("prog.txt", "r"))) {
perror("Unable to open file");
return 1;
}
nread = getline(&line, &len, fr);
printf("line: %s, nread: %ld\n", line, nread);
fclose(fr);
return 0;
}
Some notes:
getline() can automatically allocate your read buffer, if you wish.
getline() returns the end of line delimiter. You can always strip it off, if you don't want it.
It's ALWAYS a good idea to check the status of I/O calls like "fopen()".
just replace EOF as '\n'(new line char). Than your code will read until reaching the new line. Here is what it looks like:
#include <stdio.h>
int main()
{
FILE *fr;
char c = ' ';
fr = fopen("prog.txt", "r");
while(c != EOF && c != '\n')
{
c = fgetc(fr); /* read from file*/
if(c != EOF){
printf("%c",c); /* display on screen*/
}
}
fclose(fr);
return 0;
}
I have not tested it yet but probably work. Please let me know if there is some problem with the code i will edit it.
Edit1:char c; in line 5 is initialized as ' ' for dealing with UB.
Edit2:adding condition (c != EOF) to while loop in line 7, for not giving reason to infinite loop.
Edit3:adding if statement to line 10 for not printing EOF which can be reason for odd results.
FILE *fp = fopen("story.txt", "r");
if(fp == NULL){
printf("\nError opening file.\nExiting program.\n");
exit(1);
}
char text[100];
while(fgets(text, 100, fp) != NULL){
printf("%s", text);
}
printf("\n");
fclose(fp);
I'm trying to print the first 100 characters of a text file, including new lines, however when I use the code above it presents some weird behavior. First of all, it only prints the very last line of the text file, which itself is under 100 characters. Also, if I include two print statements in the while loop i.e.
while(fgets(text, 100, fp) != NULL){
printf("%s", text);
printf("%s", text);
}
It prints a lot more than 125 chars of the text file (somewhere in the thousands, it's a big text file), and the contents of said text is a bunch of seemingly random segments from the file in one constant stream, no new lines or anything.
So I guess my question is is there any way to use fgets so that it prints the text in the file, starting from the top, and includes new lines? I eventually have to use this to turn a text file into a character array, so that I can make a new, modified character array based off of that array, which will be printed to a new text file. So if there is a better way to approach that end goal, that would be appreciated.
EDIT: after some discussion in the comments I've realized that the text I am using is just one big block of text with carriage returns and no newlines. I guess at this point my main problem is how to turn this text file with carriage returns into a character array.
If the goal is to read a text file in lines of 100 characters, and to do away with the carriage returns, you can still use fgets() as long as you remember that fgets() will take characters up to and including the next newline, or until one less than the specified number of characters has been read.
The code below reads a line of text, up to BUFFER_SZ-1 characters, increases the memory allocation to hold a new line of text, and copies the line into the allocated space, removing carriage returns and any trailing newlines. Note that the address of the reallocated space is first stored in a temporary pointer. realloc() returns a null pointer in the event of an allocation error, and this step avoids a potential memory leak.
Since the lines are broken at BUFFER_SZ-1 characters, words may be split across lines, and you may want to develop additional logic to handle this. It would be more efficient to reallocate in larger chunks and less frequently, rather than once for every line, as is done here. Also note that it may be useful to open the file in binary mode to more closely parse line endings.
#include <stdio.h>
#include <stdlib.h>
#define BUFFER_SZ 100
int main(void)
{
FILE *fp = fopen("story.txt", "r");
if (fp == NULL) {
fprintf(stderr, "Unable to open file\n");
exit(EXIT_FAILURE);
}
char buffer[BUFFER_SZ];
char (*text)[sizeof buffer] = NULL;
char (*temp)[sizeof buffer] = NULL;
size_t numlines = 0;
while (fgets(buffer, sizeof buffer, fp) != NULL) {
++numlines;
/* Allocate space for next line */
temp = realloc(text, sizeof(*text) * numlines);
if (temp == NULL) {
fprintf(stderr, "Error in realloc()\n");
exit(EXIT_FAILURE);
}
text = temp;
/* Copy buffer to text, removing carriage returns and newlines */
char *c = buffer;
char *line = text[numlines-1];
while (*c != '\n' && *c != '\0') {
if (*c != '\r') {
*line++ = *c;
}
++c;
}
*c = '\0';
}
if (fclose(fp) != 0) {
fprintf(stderr, "Unable to close file\n");
}
for (size_t i = 0; i < numlines; i++) {
printf("%s\n", text[i]);
}
free(text);
return 0;
}
Another option would be to replace the carriage returns with newlines. This may be what OP had in mind. The above program is easily modified to accomplish this. Note that the \n is removed from the printf() statement that displays the results, since newlines are now included in the strings.
...
/* Copy buffer to text, converting carriage returns to newlines */
char *c = buffer;
char *line = text[numlines-1];
while (*c != '\n' && *c != '\0') {
if (*c == '\r') {
*line++ = '\n';
} else {
*line++ = *c;
}
++c;
}
*c = '\0';
}
if (fclose(fp) != 0) {
fprintf(stderr, "Unable to close file\n");
}
for (size_t i = 0; i < numlines; i++) {
printf("%s", text[i]);
}
...
It doesn't copy one line onto the end of another. It simply reuses the buffer you keep passing it. If you want multiple lines stored, copy them to another buffer, and concatenate them. (See: strcat)
So I've tried searching for a solution to this extensively but can only really find posts where the new line or null byte is missing from one of the strings. I'm fairly sure that's not the case here.
I am using the following function to compare a word to a file containing a list of words with one word on each line (dictionary in the function). Here is the code:
int isWord(char * word,char * dictionary){
FILE *fp;
fp = fopen(dictionary,"r");
if(fp == NULL){
printf("error: dictionary cannot be opened\n");
return 0;
}
if(strlen(word)>17){
printf("error: word cannot be >16 characters\n");
return 0;
}
char longWord[18];
strcpy(longWord,word);
strcat(longWord,"\n");
char readValue[50] = "a\n";
while (fgets(readValue,50,fp) != NULL && strcmp(readValue,longWord) != 0){
printf("r:%sw:%s%d\n",readValue,longWord,strcmp(longWord,readValue));//this line is in for debugging
}
if(strcmp(readValue,longWord) == 0){
return 1;
}
else{
return 0;
}
}
The code compiles with no errors and the function reads the dictionary file fine and will print the list of words as they appear in there. The issue I am having is that even when the two strings are identical, strcmp is not returning 0 and so the function will return false for any input.
eg I get:
r:zymoscope
w:zymoscope
-3
Any ideas? I feel like I must be missing something obvious but have been unable to find anything in my searches.
I see you are appending a newline to your test strings to try to deal with the problem of fgets() retaining the line endings. Much better to fix this at source. You can strip all trailing stuff like this, immediately after reading from file.
readValue [ strcspn(readValue, "\r\n") ] = '\0'; // remove trailing newline etc
The string you are reading contains trailing character(s), and hence is not the same as the string you are comparing it against.
Remove the trailing newline (and CR if that is there); then you do not need to add any newline or carriage return to the string being compared:
int isWord(char *word, char *dictionary){
FILE *fp;
fp = fopen(dictionary, "r");
if (fp == NULL){
fprintf(stderr, "error: dictionary cannot be opened\n");
return 0;
}
if (strlen(word) > 16){
fprintf(stderr, "error: word cannot be >16 characters\n");
return 0;
}
char readValue[50];
while (fgets(readValue, 50, fp) != NULL){
char *ep = &readValue[strlen(readValue)-1];
while (*ep == '\n' || *ep == '\r'){
*ep-- = '\0';
}
if (strcmp(readValue, word) == 0){
return 1;
}
}
return 0;
}
[Solved] Writing parsing code is a trap. A line with 15 spaces will have 15 words. Blank lines will also count as a word. Back to flex and bison for me.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
FILE *fp = NULL;
int iChars =0, iWords =0, iLines =0;
int ch;
/* if there is a command line arg, then try to open it as the file
otherwise, use stdin */
fp = stdin;
if (argc == 2) {
fp = fopen(argv[1],"r");
if (fp == NULL) {
fprintf(stderr,"Unable to open file %s. Exiting.\n",argv[1]);
exit(1);
}
}
/* read until the end of file, counting chars, words, lines */
while ((ch = fgetc(fp)) != EOF) {
if (ch == '\n') {
iWords++;
iLines++;
}
if (ch == '\t' || ch == ' ') {
iWords++;
}
iChars++;
}
/* all done. If the input file was not stdin, close it*/
if (fp != stdin) {
fclose(fp);
}
printf("chars: %d,\twords: %d,\tlines: %d.\n",iChars,iWords,iLines);
}
TEST DATA foo.sh
#!/home/ojblass/source/bashcrypt/a.out
This is line 1
This is line 2
This is line 3
ojblass#linux-rjxl:~/source/bashcrypt>
wc foo.sh
5 13 85 foo.sh
ojblass#linux-rjxl:~/source/bashcrypt>
a.out foo.sh
chars: 85, words: 14, lines: 5.
Your algorithm is wrong. If you have in the test file 2 blank characters in succession the counter for words will be incremented twice, but it should be incremented only once.
A solution will be to remember last character read. If the character read is a special character (blank, new line, ...) and the previous character is an alphanumeric then you increment the counter for words.
You are counting \n as a word even for a blank line.