Counting the number of words using C from a text file - c

Hey I have been trying to count the number of words in my text file, to load up a bunch of words for a Hangman game, from C but I am hitting a brick wall. This piece of code I am using is supposed I am using this piece of code;
FILE *infile;
FILE *infile;
char buffer[MAXWORD];
int iwant, nwords;
iwant = rand() %nwords;
// Open the file
infile = fopen("words.txt", "r");
// If the file cannot be opened
if (infile ==NULL) {
printf("The file can not be opened!\n");
exit(1);
}
// The Word count
while (fscanf(infile, "%s", buffer) == 1) {
++nwords;
}
printf("There are %i words. \n", nwords);
fclose(infile);
}
If anyone has anyone has any suggestions on how to fix this I would be very grateful.
The text file has 1 word per line, with 850 words.
Applied the buffer suggestion, however the word count still came out at 1606419282.
The correction of putting
int nwords = 0;
Worked!! Thank you very much!

So the words are one entry per line?
while (fscanf(infile, "%s", &nwords) == 1); {
++nwords;
}
Doesn't do what you think it does. It reads a string in nwords, which isn't a string.
If you want to do it like this then you need to allocate a string ie char buffer[XXX] which is long enough to contain the longest lien in your data file and use:
while (fscanf(infile, "%s", buffer) == 1) {
++nwords;
}

The variable nwords is never initialized. You cannot assume it to start out as zero.
If it were, you'd get a crash ("divide by zero") on the next line, whose purpose eludes me:
iwant = rand() %nwords;
So, replace
int iwant, nwords;
iwant = rand() %nwords;
by
int nwords = 0;

After reading the first word and whitespace after it, your fscanf RETURNS to input buffer the whitespace. So, the next time you read EMPTY word.
Change proposed:
fscanf(infile, "%s ", &buffer) // notice the space!!! And & before buffer
It will throw off ALL whitespace till the next word. It should work.
P.S. Better not use [f]scanf :-)

Related

getline() in C creating an infinite loop and skipping first word?

I'm a beginner in C and I'm trying to create a simple todo list program. I'm trying to use getline in a while loop, as I saw on another stack overflow answer and I thought I understood it but it's just creating an infinite loop. Also it seems to be skipping the first word for some reason. Here is my code so far:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
FILE *list;
int i = 0;
int item = 0;
char items[10];
char chars[1000];
char * line = NULL;
size_t len = 0;
ssize_t read;
int main() {
list = fopen("/Users/bendavies/Documents/C/list.txt", "r");
int letterCount = fscanf(list,"%s",chars);
printf("Welcome to the to-do list. It can hold up to 10 items.\n");
printf("%d\n", letterCount);
if (letterCount == -1) {
printf("The list currently has no items!\n");
} else {
while ((read = getline(&line, &len, list)) != 1) {
item += 1;
printf("%d. %s", item, line);
}
}
fclose(list);
return 0;
}
The output I'm currently getting with the following list.txt:
Eat food
Drink water
Breath air
Is:
1. food
2. Drink water
3. Breath air
4. 18446744073709551615
5. 18446744073709551615
and so on and so forth.
Thank you in advance! :)
but it's just creating an infinite loop.
This line of your code: while ((read = getline(&line, &len, list)) != 1)
Unless a line contains 1 character (just a newline), it will be an infinite loop. The POSIX getline() function will return -1 (not EOF, even though EOF is usually -1) when the file is completely read. So change that line to:
while ((read = getline(&line, &len, list)) != -1)
But, I don't see you using the value of read inside that loop, so this would be better:
Fix 1: while (getline(&line, &len, list) != -1)
And inside that loop, I see: printf("%d. %s", item, line);
You might find very old implementations of getline() that don't include the newline, in which case, if you want your output in separate lines, you need to put a \n:
Fix 2: printf("%d. %s\n", item, line);
However, if you use a more modern implementation, it will preserve the newline in accordance with the POSIX specification.
Also, if the very last 'line' in the file is not terminated with a newline, you might still want to add one. In that case, you could keep the read length and use that to detect whether there is a newline at the end of the line.
Also it seems to be skipping the first word for some reason
Because of int letterCount = fscanf(list,"%s",chars);
That fscanf reads the first word of your file. Now the file pointer is at that position (end of the first word) and further reading of the file will happen from that place.
So, reposition the file pointer to the beginning of the file after reading the first word from the file:
Fix 3:
int letterCount = fscanf(list,"%s",chars);
fseek(list, 0, SEEK_SET); // <-- this will reposition the file pointer as required

Confusion when dealing with files

I'm writing a program that takes a string from one file, a string from another file and writes them to a third file in separate columns. I really have two questions, If I use fgets to get the string from the files and it stops at the end of the line. Will it automatically know to start from the next line for the next string. Also, how can I format the input to make two columns. An example would be...
first string is "John" from the first file.
second string is "Appleseed" from the second file.
third file would have in it "John______________________Appleseed"
The second line of the third file would have "Benny__________________________ Backburner"
Just to format columns.
Do you want underscores, or will spaces suffice? It's much simpler with spaces. You can read the specification of printf()
to see the details of what the format strings do.
while (fgets(buffer1, sizeof(buffer1), fp1) != 0 &&
fgets(buffer2, sizeof(buffer2), fp2) != 0)
{
buffer1[strcspn(buffer1, "\n")] = '\0';
buffer2[strcspn(buffer2, "\n")] = '\0';
fprintf(fp3, "%-25s %s\n", buffer1, buffer2);
}
This reads one line from each of the first two files, removes the newlines from the buffer, and then formats them with the first column left-justified in a width of 25, and the second printed after 3 spaces.
If you must use underscores instead of spaces, then you need something like this:
char uscore[256];
memset(uscore, '_', sizeof(uscore)-1);
uscore[sizeof(uscore)-1] = '\0';
while (fgets(buffer1, sizeof(buffer1), fp1) != 0 &&
fgets(buffer2, sizeof(buffer2), fp2) != 0)
{
buffer1[strcspn(buffer1, "\n")] = '\0';
buffer2[strcspn(buffer2, "\n")] = '\0';
int len1 = max(0, 25 - strlen(buffer1));
fprintf(fp3, "%s%*.*s%s\n", buffer1, len1, len1, uscore, buffer2);
}
Putting this together, illustrating both techniques at once:
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
if (argc != 4)
{
fprintf(stderr, "Usage: %s infile-1 infile-2 outfile\n", argv[0]);
return 1;
}
FILE *fp1 = fopen(argv[1], "r");
FILE *fp2 = fopen(argv[2], "r");
FILE *fp3 = fopen(argv[3], "w");
if (fp1 == 0 || fp2 == 0 || fp3 == 0)
{
fprintf(stderr, "%s: failed to open one of the files %s, %s or %s\n",
argv[0], argv[1], argv[2], argv[3]);
return 1;
}
char uscore[256];
memset(uscore, '_', sizeof(uscore)-1);
uscore[sizeof(uscore)-1] = '\0';
char buffer1[1024];
char buffer2[1024];
while (fgets(buffer1, sizeof(buffer1), fp1) != 0 &&
fgets(buffer2, sizeof(buffer2), fp2) != 0)
{
buffer1[strcspn(buffer1, "\n")] = '\0';
buffer2[strcspn(buffer2, "\n")] = '\0';
fprintf(fp3, "%-25s %s\n", buffer1, buffer2);
int len1 = strlen(buffer1);
if (len1 < 28)
len1 = 28 - len1;
else
len1 = 0;
fprintf(fp3, "%s%*.*s%s\n", buffer1, len1, len1, uscore, buffer2);
}
fclose(fp1);
fclose(fp2);
fclose(fp3);
return 0;
}
Sample input file data.1:
California
Esoteric
Mismatch
Unexpected
Non-sequitur
Extra-long word list from file 1
Sample input file data.2:
Drought
Persecution
Preliminary
Adequate
Pusillanimous
Rather long word from file.2 too
Example output:
California Drought
California__________________Drought
Esoteric Persecution
Esoteric____________________Persecution
Mismatch Preliminary
Mismatch____________________Preliminary
Unexpected Adequate
Unexpected__________________Adequate
Non-sequitur Pusillanimous
Non-sequitur________________Pusillanimous
Extra-long word list from file 1 Rather long word from file.2 too
Extra-long word list from file 1Rather long word from file.2 too
There are endless tweaks you can make depending on a more precise definition of the format you want. Amongst other things, you can make sure there's a minimum of 3 underscores between the first and second words in the 'must have underscores' example. You could limit the lengths of the strings that are printed.
The code should check that it gets a newline within the first 1023 bytes; it doesn't.
If I use fgets to get the string from the files and it stops at the end of the line. Will it automatically know to start from the next line for the next string.
If the line can be completely stored in the buffer then Yes, it will (see below for further explanation).
However, there is not really something like lines in the data read from the file. It is more like a continuous stream of bytes. If you have a file in your editor that looks like:
a
b
c
the data that fgets see is like more like this byte stream:
a\nb\nc\n
The first call to fgets will read the a and the \n leaving the remaining input as
b\nc\n
The next call to fgets will read the b and the \n and thereby it works as if it starts from the "next line" but it really just continues from where the last call stopped.
Also notice what happens if the line is longer than your buffer. If the file is
abcd
efgh
and you do
fgets(buffer, 3, f)
then the first call to fgets will give ab\0 and the next call will continue reading cd\0.
In other words - if the line is too long to be completely stored in the buffer, fgets will not continue from "the next line". If you always want to continue from the next line, you must add code to read from the file until you read a \n
Also, how can I format the input to make two columns.
Well, your question doesn't include sufficient details to come up with exact code, e.g. what should be the spacing between columns, what to do if the input is larger than the spacing, etc.
In any case - see https://stackoverflow.com/a/45295262/4386427 (by Jonathan Leffler) which give you some good hints.

how to copy and write a file from a specific line number

I would like to copy a huge txt file and 'shrink' it. this is my code, but it seems it's still takes a lot of time reading the file. is there a way to read from a specific line number to EOF? for instance, the first 1 million lines are not useful to me, how to read from line 1 million. or anyway to read from EOF?
include<stdio.h>
include<stdlib.h>
void main() {
FILE *fp1, *fp2;
char ch;
int i = 1;
int n = 0;
int k;
fp1 = fopen("co.data", "r"); /* open a file to read*/
fp2 = fopen("Output.txt", "w"); /* open a file to write*/
printf("please enter how many lines do not need to be copied\n");
scanf ("%d", &k);
while (1) {
ch = fgetc(fp1); /* a loop to read/copy the file*/
if (ch == '\n') /* record the number of lines*/
i++;
if (ch == EOF)
break;
else if (i>k)
putc(ch, fp2);
}
printf("File copied Successfully!\n");
printf("number of lines read is %d\n",i-1);
printf("number of lines copied is %d\n",i-1-k);
fclose(fp1);
fclose(fp2);
}
There are two potential answers to your question, depending on if your file has known line lengths or not.
is there a way to read from a specific line number to EOF
In a file with line lengths are completely arbitrary (variable), no.
For example, if line 1 is 10 characters, and line 2 is 20 characters, then there is no way to calculate where line 3 is going to start without iterating through lines 1 and 2.
Operating systems aren't magic; if this kind of functionality was supported, they'd have to iterate through the file first as well. Either way, you're going to be looping through the contents.
Now, if the line lengths are guaranteed to be the same, that's a different story.
Say you have a text file like so:
AAAAAAA
BBBBBBB
CCCCCCC
Each line in the above text file is 7 characters. Assuming your line terminator is \n, each line takes up exactly 8 bytes.
In this case, you can safely fread() 8 bytes at a time and know that you're getting exactly one line. In order to jump to a particular byte in a file, you would use fseek().
Since you know the length of the lines in this scenario, you could jump to line N by simply doing
fseek(fp1, S * N, SEEK_SET);
where N is the line number (starting at 0) and S is the length of the line (as mentioned above, 8 bytes in our example file).
Note that the second solution will break if you're using a multi-byte encoding such as Unicode. Keep that in mind.
Using fgets() i made program, try it.
#include<stdio.h>
#include<stdlib.h>
int main()
{
FILE *fp1, *fp2;
char ch,*str,*r;
int i =0;
int n = 0;
int l;
fp1 = fopen("co.data", "r");
fp2 = fopen("Output.txt", "w+");
printf("please enter how many lines do not need to be copied\n");
scanf ("%d", &l);
while (1)
{
if(r=fgets(str, 500, fp1))
{ /* a loop to read/copy the file*/
i++;
}
if (r == NULL)
break;
else if (i > l)
fputs(str, fp2);
}
printf("File copied Successfully!\n");
printf("number of lines read is %d\n",i-1);
printf("number of lines copied is %d\n",i-1-l);
fclose(fp1);
fclose(fp2);
}

How to read in a text file of tab-separated integers in C?

I have a file of simply tab-separated integers (a .txt file) and I wish to read them in with just C, line by line. So, say each line has 5 integers. How can I accomplish this?
My first attempt was as follows. It was just to read in a single integer, but even that didn't work:
FILE *fp;
char blah[255];
int *some_int;
fp = fopen("test.txt", "rt");
while (fgets(blah, 255, fp) != NULL)
{
sscanf(blah, "%d", some_int);
printf("%d\n", *some_int);
}
Here's a way no one else suggested, that doesn't use fscanf so you can have sane error handling:
char buffer[BUFSIZE];
size_t size = 5;
int *data = malloc(size * sizeof *line);
if(line == NULL) error();
while(fgets(buffer, sizeof buffer, fp)
{
size_t i = 0;
char *next = buffer;
while(*next && *next != '\n')
{
data[i++] = strtol(next, &next, 0);
// check for errors
}
}
Basically, instead of trying to use *scanf's "%d" to read characters, use the function it (probably) calls to do the conversion: strtol. Where *scanf goes through the string to match the format string but doesn't let you "save your place" in between function calls, strtol does, which is what you need to read an arbitrary number of integers.
I haven't written all your code for you - you have to do the hard error handling. Possible errors include:
i == size, in which case you can try to make data bigger with realloc. Alternately, you could loop through the buffer and count how many numbers there are beforehand, then allocate that many so you don't need to reallocate later.
fgets didn't read the entire line (check that the last character before '\0' is '\n'). In this case you'll probably want to refill the buffer and keep reading numbers. Be careful in this case - you'll likely need to go back and recalculate the last number - fgets might have cut it off. (This is one disadvantage to using fgets.)
Erroneous input - handle however you like.
#include <stdio.h>
int main(){
FILE *fp;
int scanned = 0;
int some_ints[5];
fp = fopen("test.txt", "r");
while ((scanned = fscanf(fp, "%d %d %d %d %d", some_ints, some_ints+1, some_ints+2, some_ints+3, some_ints+4)) != EOF) {
if(scanned ==5){
printf("%d %d %d %d %d\n", some_ints[0], some_ints[1], some_ints[2], some_ints[3], some_ints[4]);
}
else {
printf("Whoops! Input format is incorrect!\n");
break;
}
}
}
I'd do something like this:
int storedVals[MAX_STORED_VALS];
int bf;
int ii=0;
while (!feof(fp) && ii<MAX_STORED_VALS) {
if (fscanf(fp," %d",&bf)) {
storedVals[ii++]=bf;
}
}
fscanf automatically does white space trimming. So as long as there's a space in your scan string, it'll get rid of zero or more \t (tabs) and \n (newlines) to find the next integer. Of course, this doesn't do much by way of error correction.

Weird Scanf Issue

I am trying to finish a homework program that compares a string with a text file, so the user can essentially search the text file for the search term (string) in the file. I'm getting there :)
However today I'm running into a very weird issue. When it asks for the term to search for I input the text, but it never ends. I could type all day long and it still asks for input. What weird issue(s) am I overlooking? Fresh pair of eyes might help :)
/*
ask the user for a word
convert user word to LOWER CASE
open output file
open input file
test to be sure input file is open
search for target word and keep count --> how??
print results to monitor
write results to file
close files
*/
#include<stdio.h>
#include<stdlib.h>
int main (void)
{
//declare
int i =0;
int count = 0;
/*************************************************************
working with arrays and strings
*************************************************************/
char mystring[50]; //what user puts in
char target[50]; //the word in the file we are looking for
printf("input your message ");
scanf("%s", mystring);
//printf("%s", mystring);
/*************************************************************
find file, write to it, output the string, end and close file
**************************************************************/
//define text file to use
FILE *cfile;
//name of file == file
cfile = fopen("./thanksgiving_proclamation.txt", "a");
//error handling if file does not exist
if(cfile == NULL) printf("Cannot open file");
/*************************************************************
parse through file and search for string
**************************************************************/
//convert string to lowercase
for(i = 0; i < /*strlen(mystring)*/ 500; i++)//convert to string length
{
if(target[i] >= 'A' && target[i] <='Z')
//convert char between a and z into lowercase
target[i] = target[i] + 32; //makes uppercase char
}
//compare our strings
do{
//scan through file
fscanf(cfile, "%s", mystring);
//convert string to lowercase
for(i = 0; i < /*strlen(mystring)*/ 300; i++)//convert to string length
{
if(mystring[i] >= 'A' && mystring[i] <='Z')
//convert char between a and z into lowercase
mystring[i] = mystring[i] + 32; //makes uppercase char
}
if(strcmp(mystring, target) == 0)
count++;
}while(!feof(cfile));
//while(strcmp(target,"quit")!=0)//end loop
//print to file
fprintf(cfile, "%s", mystring);
//close file
fclose(cfile);
//show user file has been written
printf("\nSuccess. File has been written\n");
printf("Press Enter to Continue...");
getchar();
return 0;
}
You open the file in append mode:
cfile = fopen("...", "a");
and then you try to read from it.
fscanf(cfile, "%s", mystring);
For a first attempt at solving the problem, I'd try to open the file for reading, read from it inside the loop and close the file. Then open it again, this time for appending to add the mystring there (and fclose it).
Once that works, if you want to, try to see if opening in "reading and appending mode" works ...
cfile = fopen("...", "a+");
You don't need "&mystring", "mystring" is already the address of the array.
It would be better to use gets or getline.
You are reading the search string into mystring, but then you are also reading the file contents into mystring.
I think pmg has hit on the actual problem; you've opened the file in append mode, and according to my copy of H&S reading from an append stream is not permitted. You'd have to open it "a+" (append/update) in order to read and write the stream.
You should always check the result of the *scanf() call (fscanf(), sscanf(), scanf(), etc.) for success before checking feof() or ferror(), and you should never make feof() the loop test condition (since it won't return true until after you've attempted to read past the end of the file, your loop will always execute once too many times).
I'd change your loop to something like this:
for(;;)
{
if (fscanf(cfile, "%s", mystring) != 1)
{
if (feof(cfile))
{
fprintf(stderr, "Reached end of file!\n");
break; // exit loop
}
if (ferror(cfile))
{
fprintf(stderr, "Error while reading from file!\n");
break;
}
}
/**
* continue as before
*/
}
It ends when you hit Enter and only stores characters till a whitespace.

Resources