fgets to read line by line in files - c

I understand that fgets reads until EOF or a newline.
I wrote a sample code to read lines from a file and I observed that the fgets gets executed more than the desired number of times. It is a very simple file with exactly two blank lines, ie. I pressed enter once and saved the file.
Below is the code:
fp = fopen("sample.txt","r");
while (!feof(fp)) {
fgets(line,150,fp);
i++;
printf("%s",line);
}
printf("%d",i);
Why is the while loop getting executed three times instead of 2, as there are only two blank lines in the file?

In your case, the last line is seemingly read twice, except that it isn't. The last call to fgets returns NULL to indicate the the end of file has been read. You don't check that, and print the old contents of the buffer again, because the buffer wasn't updated.
It is usually better not to use feof at all and check return values from the f... family of functions:
fp = fopen("sample.txt", "r");
while (1) {
if (fgets(line,150, fp) == NULL) break;
i++;
printf("%3d: %s", i, line);
}
printf("%d\n",i);
The function feof returns true after trying to read beyond the end of the file, which only happens after your last (unsuccessful) call to fgets, which tries to read at or rather just before the end of the file. The answers in this long SO post explain more.

Related

C99: Is it standard that fscanf() sets eof earlier than fgetc()?

I tried with VS2017 (32 Bit Version) on a 64 bit Windows PC and it seems to me that fscanf() sets the eof flag immediately after successfully reading the last item within a file. This loop terminates immeadiately after fscanf() has read the last item in the file related to stream:
while(!feof(stream))
{
fscanf(stream,"%s",buffer);
printf("%s",buffer);
}
I know this is insecure code... I just want to understand the behaviour. Please forgive me ;-)
Here, stream is related to an ordinary text file containing strings like "Hello World!". The last character in that file is not a newline character.
However, fgetc(), having processed the last character, tries to read yet another one in this loop, which leads to c=0xff (EOF):
while (!feof(stream))
{
c = fgetc(stream);
printf("%c", c);
}
Is this behaviour of fscanf() and fgetc() standardized, implementation dependent or something else? I am not asking why the loop terminates or why it does not terminate. I am interested in the question if this is standard behaviour.
In my experience, when working with <stdio.h> the precise semantics of the "eof" and "error" bits are very, very subtle, so much so that it's not usually worth it (it may not even be possible) to try to understand exactly how they work. (The first question I ever asked on SO was about this, although it involved C++, not C.)
I think you know this, but the first thing to understand is that the intent of feof() is very much not to predict whether the next attempt at input will reach the end of the file. The intent is not even to say that the input stream is "at" the end of the file. The right way to think about feof() (and the related ferror()) is that they're for error recovery, to tell you a bit more about why a previous input call failed.
And that's why writing a loop involving while(!feof(fp)) is always wrong.
But you're asking about precisely when fscanf hits end-of-file and sets the eof bit, versus getc/fgetc. With getc and fgetc, it's easy: they try to read one character, and they either get one or they don't (and if they don't, it's either because they hit end-of-file or encountered an i/o error).
But with fscanf it's trickier, because depending on the input specifier being parsed, characters are accepted only as long as they're appropriate for the input specifier. The %s specifier, for example, stops not only if it hits end-of-file or gets an error, but also when it hits a whitespace character. (And that's why people were asking in the comments whether your input file ended with a newline or not.)
I've experimented with the program
#include <stdio.h>
int main()
{
char buffer[100];
FILE *stream = stdin;
while(!feof(stream)) {
fscanf(stream,"%s",buffer);
printf("%s\n",buffer);
}
}
which is pretty close to what you posted. (I added a \n in the printf so that the output was easier to see, and better matched the input.) I then ran the program on the input
This
is
a
test.
and, specifically, where all four of those lines ended in a newline. And the output was, not surprisingly,
This
is
a
test.
test.
The last line is repeated because that's what (usually) happens when you write while(!feof(stream)).
But then I tried it on the input
This\n
is\n
a\n
test.
where the last line did not have a newline. This time, the output was
This
is
a
test.
This time, the last line was not repeated. (The output was still not identical to the input, because the output contained four newlines while the input contained three.)
I think the difference between these two cases is that in the first case, when the input contains a newline, fscanf reads the last line, reads the last \n, notices that it's whitespace, and returns, but it has not hit EOF and so does not set the EOF bit. In the second case, without the trailing newline, fscanf hits end-of-file while reading the last line, and so does set the eof bit, so feof() in the while() condition is satisfied, and the code does not make an extra trip through the loop, and the last line is not repeated.
We can see a bit more clearly what's going on if we look at fscanf's return value. I modified the loop like this:
while(!feof(stream)) {
int r = fscanf(stream,"%s",buffer);
printf("fscanf returned %2d: %5s (eof: %d)\n", r, buffer, feof(stream));
}
Now, when I run it on a file that ends with a newline, the output is:
fscanf returned 1: This (eof: 0)
fscanf returned 1: is (eof: 0)
fscanf returned 1: a (eof: 0)
fscanf returned 1: test. (eof: 0)
fscanf returned -1: test. (eof: 1)
We can clearly see that after the fourth call, feof(stream) is not true yet, meaning that we'll make that last, extra, unnecessary, fifth trip through the loop. But we can see that during the fifth trip, fscanf returns -1, indicating (a) that it did not read a string as expected and (b) it reached EOF.
If I run it on input not containing the trailing newline, on the other hand, the output is like this:
fscanf returned 1: This (eof: 0)
fscanf returned 1: is (eof: 0)
fscanf returned 1: a (eof: 0)
fscanf returned 1: test. (eof: 1)
Now, feof is true immediately after the fourth call to fscanf, and the extra trip is not made.
Bottom line: the moral is (the morals are):
Don't write while(!feof(stream)).
Do use feof() and ferror() only to test why a previous input call failed.
Do check the return value of scanf and fscanf.
And we might also note: Do beware of files not ending in newline! They can behave surprisingly differently.
Addendum: Here's a better way to write the loop:
while((r = fscanf(stream,"%s",buffer)) == 1) {
printf("%s\n", buffer);
}
When you run this, it always prints exactly the strings it sees in the input. It doesn't repeat anything; it doesn't do anything significantly differently depending on whether the last line does or doesn't end in a newline. And -- significantly -- it doesn't (need to) call feof() at all!
Footnote: In all of this I've ignored the fact that %s with *scanf reads strings, not lines. Also that %s tends to behave very badly if it encounters a string that's larger than the buffer that's to receive it.
Both of your loops are incorrect: feof(f) is only set after an unsuccessful attempt to read past the end of file. In your code, you do not test for fgetc() returning EOF nor if fscanf() returns 0 or EOF.
Indeed fscanf() can set the end of file condition of a stream if it reaches the end of file, which it does for %s if the file does not contain a trailing newline, whereas fgets() would not set this condition if the file ends with a newline. fgetc() sets the condition only when it returns EOF.
Here is a modified version of your code that illustrates this behavior:
#include <stdio.h>
int main() {
FILE *fp = stdin;
char buf[100];
char *p;
int c, n, eof;
for (;;) {
c = fgetc(fp);
eof = feof(fp);
if (c == EOF) {
printf("c=EOF, feof()=%d\n", eof);
break;
} else {
printf("c=%d, feof()=%d\n", c, eof);
}
}
rewind(fp); /* clears end-of-file and error indicators */
for (;;) {
n = fscanf(fp, "%99s", buf);
eof = feof(fp);
if (n == 1) {
printf("fscanf() returned 1, buf=\"%s\", feof()=%d\n", buf, eof);
} else {
printf("fscanf() returned %d, feof()=%d\n", n, eof);
break;
}
}
rewind(fp); /* clears end-of-file and error indicators */
for (;;) {
p = fgets(buf, sizeof buf, fp);
eof = feof(fp);
if (p == buf) {
printf("fgets() returned buf, buf=\"%s\", feof()=%d\n", buf, eof);
} else
if (p == NULL) {
printf("fscanf() returned NULL, feof()=%d\n", eof);
break;
} else {
printf("fscanf() returned %p, buf=%p, feof()=%d\n", (void*)p, (void*)buf, eof);
break;
}
}
return 0;
}
When run with standard input redirected from a file containing Hello world without a trailing newline, here is the output:
c=72, feof()=0
c=101, feof()=0
c=108, feof()=0
c=108, feof()=0
c=111, feof()=0
c=32, feof()=0
c=119, feof()=0
c=111, feof()=0
c=114, feof()=0
c=108, feof()=0
c=100, feof()=0
c=EOF, feof()=1
fscanf() returned 1, buf="Hello", feof()=0
fscanf() returned 1, buf="world", feof()=1
fscanf() returned -1, feof()=1
fgets() returned buf, buf="Hello world", feof()=1
fscanf() returned NULL, feof()=1
The C Standard specifies the behavior of the stream functions in terms of individual calls to fgetc, fgetc sets the end of file condition when it cannot read a byte from the stream at end of file.
The behavior illustrated above conforms to the Standard and shows how testing feof() is not a good approach to validate input operations. feof() can return non-zero after successful operations and can return 0 before unsuccessful operations. feof() is should only be used to distinguish end of file from input error after an unsuccessful input operation. Very few programs make this distinction, hence feof() is almost never used on purpose and almost always indicates a programming error. For extra explanations, read this: Why is “while ( !feof (file) )” always wrong?
If I might offer a tl;dr to both the comprehensive answers here, formatted input reads characters until it has reason to stop. Since you say
The last character in that file is not a newline character
and the %s directive reads a string of non-whitespace characters, after it reads the ! in World! it has to read another character. There isn't one, which lights eof.
Put whitespace (space, newline, whatever) at the end of the phrase, and your printf will print the last word twice: once because it read it, and again because the scanf failed to find a string to read before hitting eof, so the %s conversion never happened leaving the buffer untouched.

C Programming - type in the program that copies a file using line-at-a-time I/O (fgets and fputs) but use a MAXLINE of 4

I apologize in advance because I just started learning C. This is my code so far.
int main ()
{
char buf [4];
// open the file
FILE *fp = fopen("readme","r");
// Return if could not open file
if (fp == NULL)
return 0;
while(fgets(buf,4,fp) != NULL){
// Checking for end of file
if (feof(fp))
break ;
fputs(buf,stdout);
} while(1);
fclose(fp);
return(0);
}
Im having trouble getting the results I need. I assume that setting a buffer to 4 would likely return the first four characters of each line. I could be wrong but Ive been stuck on this a few hours trying to figure out how to Overflow the Buffer. Appreciate any help, Thanks!
You are not understanding how fgets works.
fgets will read up to 3 characters (and not 4, because fgets '\0'-terminates the buffer). If the newline '\n' is among these 3,
fgets will stop reading and writes the newline and returns, otherwise it
continues until the buffer is full.
Let's say the input is: This is a very long line\nThis is another line\n,
then fgets will read the first 3 characters and store 'T', 'h', 'i',
'\0 in the buffer. The next time fgets reads the next 3 characters:
's', ' ', 'i' and store them in the buffer. Eventually it will reach to
the first newline, and stops reading there.
In general fgets will try to read the max number of characters that the buffer can hold.
If the newline is among these characters, it stops reading and writes the newline
in the buffer as well. If the newline is not found among them, then fgets will
stop when the buffer is full.
Whether fgets reads a whole line depends on the size of the buffer, so even
with two consecutive calls of fgets, you might not still getting the whole
line, as I showed you with the example above. That's why your assumption
I assume that setting a buffer to 4 would likely return the first four
characters of each line is wrong.
The only way to know if fgets read a whole line is looking whether the newline was written in the buffer.
Also note that
if (feof(fp))
break ;
is not needed here, because fgets will return NULL when the end-of-file is
reached and no more characters can be read, the loop will end anyway, so doing
the extra check is pointeless.
And this
while(fgets(buf,4,fp) != NULL){
...
} while(1);
is the same as this:
while(fgets(buf,4,fp) != NULL){
...
}
while(1);
So after reading the whole file, you are entering in an endless loop.

how to make fscanf, scan a single word instead of whole line

There is a text file called 1.txt like below that contains certain names, I want to add them to a linked list however the code doesn't scan only one name but the whole line, how can I make fscanf so that it only scans a single name?
Example 1.txt:
ana,bob,max,dolores
My code:
FILE *fp = fopen("1.txt", "rt");
while (!feof(fp)) {
char name_in[100];
fscanf(fp, "%s,", name_in);
printf("%s", name_in);
addnewnode(head, name_in);
}
fclose(fp);
The problem is that with the "%s" format, then scanf will not stop scanning until it hit the end of the input or a whitespace. That means you can't use scanf alone to parse your input.
Instead I suggest you read your whole line into a buffer, for example using fgets. Once you have it then you can use strtok in a loop to tokenize the line.
Not using scanf also sidesteps a big problem with your format string: Namely that it will look for a trailing comma in the input. If there's no comma at the end of the line (like in your example) then scanf will just block. It will block until there is a comma, but if you don't give it a comma then it might block forever. Either that or you will not get the last entry because scanf will fail. Checking what scanf returns is crucial.
Also, I strongly suggest you read Why is “while ( !feof (file) )” always wrong?.
What's in a name?
A name is usually thought of as containing letters, maybe spaces and some other characters. Code needs to be told what char make up a name, what are valid delimiters and handle other unexpected char.
"%s" only distinguishes between white-space and non-white-space. It treats , the same as letters.
"%width[A-Za-z' ]" will define a scanset accepting letters, ' and space. It will read/save up to width characters before appending a null character.
Always a good idea to check the return value of a input function before using the populated objects.
FILE *fp = fopen("1.txt", "rt");
if (fp == NULL) Handle_Error();
// end-of-file signal is not raised until after a read attempt.
// while (!feof(fp)) {
char name_in[100];
char delimiter[2];
int count;
while ((count = fscanf(fp, "%99[A-Za-z' ]%1[,\n]", name_in, delimiter)) == 2) {
printf("<%s>%s", name_in, delimiter);
addnewnode(head, name_in);
if (delimiter[0] == '\n') {
; // Maybe do something special at end-of-line
}
}
fclose(fp);
// Loop stopped unexpectedly
if (count != EOF || !feof(fp)) {
puts("Oops");
}
More robust code would read the line like with fgets() and then process the string. Could use similar code as above but with sscanf()
To include - in a scanset so code can handle hyphenated names, list it first. You may want to allow other characters too.
"%width[-A-Za-z' .]"

Read empty file and the output is symbol

i am new in c. So in my university, i just learn about file in c. and i got a task. If i put an empty file in my project directory, and read it. The output are symbols (i dont know what symbol it is). So here is the code, please help
player dota[100];
FILE *fp;
fp = fopen("soal09.txt", "r");
if(fp == NULL)
{
printf("Error Opening The File!!\n");
return 0;
}
else
{
while(!feof(fp))
{
fscanf(fp, "%[^ ] %d %d\n", &dota[idx].name, &dota[idx].score, &dota[idx].num);
idx++;
}
}
fclose(fp);
do
{
enter();
menu();
printf("Input your choice [1..5]: ");
scanf("%d", &choose); fflush(stdin);
if(choose == 1)
{
system("cls");
enter();
printf("%-20s %-15s %s\n", "Player Name", ": Average Score", ": Number of Playing");
printf("====================================================================\n");
for(int i = 0; i < idx; i++)
{
printf("%-20s %-15d %d\n", dota[i].name, dota[i].score, dota[i].num);
}
printf("\nPress Enter to Continue...");
getchar();
}
getchar();
return 0;
}
and the output is ╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠╠ -858993460
Thank you ^^
The end-of-file indicator that is checked by feof() is only set after a previous file I/O operation has failed. You must attempt an I/O operation to find out if you have reached the end of the file. So, with an empty file, your code attempts to read the file, the end-of-file indicator is set, no values are read into the first struct, but idx is incremented, so it looks like you have added data for a player. But the fields of the first struct are uninitialized, so you are seeing garbage. Also note that dota[idx].name is presumably an array of chars, so it decays to a pointer to char when passed to fscanf(). Using &dota[idx].name, as you have, is wrong, though it might appear to work. It does cause the compiler to emit a warning, and you should have these enabled (I always use at least gcc -Wall -Wextra -pedantic).
You should not use feof() to control file I/O loops. One simple solution is to use the return value of fscanf() to control the loop:
while(fscanf(fp, "%[^ ] %d %d\n",
dota[idx].name, &dota[idx].score, &dota[idx].num) == 3) {
idx++;
}
This will only update a player struct if three assignments are made by the call to fscanf(). But, the problem with this simple solution is that it doesn't handle malformed input well. If a line of the data file is missing a field, the struct will be incorrectly filled, and the loop will terminate, even if there are more lines in the file to read. Also, since no field width is specified for the string conversion, a long name could crash your program.
A better solution uses fgets() to read the lines of the file into a buffer, and then uses sscanf() to extract the information from the buffer:
#include <string.h> // for strcpy()
...
char buffer[1000];
int line = 0;
char temp_name[100];
int temp_score, temp_num;
while(fgets(buffer, sizeof(buffer), fp)) {
++line;
if (sscanf(buffer, "%99[^ ] %d %d\n",
temp_name, &temp_score, &temp_num) == 3) {
strcpy(dota[idx].name, temp_name);
dota[idx].score = temp_score;
dota[idx].num = temp_num;
++idx;
} else if (buffer[0] != '\n') {
fprintf(stderr, "Bad input in line %d\n", line);
}
}
Here, a generous buffer is declared to accept a line of text from the file, and temporary variables are declared to hold the values to be stored in the struct fields. I have chosen a size of 100 for temp_name, but this should match the size of the string array in your actual struct. Note that the string conversion in sscanf() has a field width of 99, so that at most 99 non-space (not non-whitespace, and why aren't you just using %99s here?) characters are matched, leaving space for the '\0' to be added.
fgets() will return NULL when it reaches the end of the file, so the loop will continue until that happens. For each line that is read, a line counter is incremented. Then sscanf() is used to read data into the temporary variables. The value returned from sscanf() is checked to be sure that 3 assignments were made, and if so, then the data is copied into the struct, and idx is incremented. Note that strcpy() is needed to copy the string from temp_name to dota[idx].name.
If the value returned from sscanf() indicates that something other than 3 assignments were made, there is a check to see if buffer holds an empty line. If not, an error message is printed to stderr providing the line number of the bad input in the file.
A couple of further comments. Your do loop appears to be missing the associated while(). And you use fflush(stdin) after the scanf() inside the do loop. fflush()is meant to flush output streams. The behavior of fflush() on input streams is explicitly undefined in the C Standard (ISO/IEC 9899:2011 7.21.5.2/2), though I believe that Microsoft deviates from the Standard here. Nevertheless, you should not use fflush(stdin) in portable C code. Instead, use something like this:
int c;
...
scanf("%d", &choose);
while ((c = getchar()) != '\n' && c != EOF)
continue; // discard extra characters
This code reads characters from the input stream until either a '\n' or EOF is reached, clearing any characters left behind by the previous call to scanf() from the input stream.

Reading in data from a file, using fscanf (following a sepecific pattern)

I am trying to read in from a file, and I can't get the pattern of it right. Can someone tell me what I can do to get it working?
int main()
{
char name[20];
int age;
float highbp, lowbp, risk;
FILE *fp;
fp = fopen("data.dat", "r");
if(fp == NULL){
printf("cannot open file\n\n");
}
while(fscanf(fp, "name:%s\nage:%d\nbp:%f\nrisk:%f", name, &age, &highbp, &risk) != EOF){
}
printf("Name: %s\n", name);
printf("%d\n", age);
printf("%f\n", highbp);
printf("%f\n", risk);
}
data.dat:
name:tom
age:32
bp:43.00
risk:0.0
If it can't open the file it prints a message, but then continues. Instead it should return from main.
if (fp == NULL) {
printf("cannot open file\n\n");
return 1;
}
fscanf will return the number of items parsed, so it's probably safer to stop reading when the number returned < 4 (not all the items could be read).
Presumably "data.dat" contains multiple records and each line has a line ending. This means that after reading the first record the next character in the file is the line ending for the "risk:0.0" line. You should end the fscanf template with \n.
This is because the second time it tries to parse the file, fscanf will see that character, which it isn't expecting (the fscanf template starts "name:"), so it will stop reading, and you'll get only the first record.
You should change the "name" format specifier from %s to %19s to make it read at most 19 characters (+terminating '\0'). The way you have it now is a guaranteed failure in case someone gives you 20+ character name.
Can someone tell me what I can do to get it working?
I suggest you separate the functionality in different statements.
Don't try to cram all of the program functionality in 1 statement.
Your big statement is doing 3 things:
it is reading data from file
it is comparing the return value of scanf with EOF
it is controlling when to stop reading
I suggest you do (at least) 3 different statements for the 3 different actions.
Hint: comparing the return value of scanf only with EOF is a little too short

Resources