Reading text-file until EOF using fgets in C - c

what is the correct way to read a text file until EOF using fgets in C? Now I have this (simplified):
char line[100 + 1];
while (fgets(line, sizeof(line), tsin) != NULL) { // tsin is FILE* input
... //doing stuff with line
}
Specifically I'm wondering if there should be something else as the while-condition? Does the parsing from the text-file to "line" have to be carried out in the while-condition?

According to the reference
On success, the function returns str.
If the end-of-file is encountered while attempting to read a character, the eof indicator is
set (feof). If this happens before any characters could be read, the
pointer returned is a null pointer (and the contents of str remain
unchanged). If a read error occurs, the error indicator (ferror) is
set and a null pointer is also returned (but the contents pointed by
str may have changed).
So checking the returned value whether it is NULL is enough. Also the parsing goes into the while-body.

What you have done is 100% OK, but you can also simply rely on the return of fgets as the test itself, e.g.
char line[100 + 1] = ""; /* initialize all to 0 ('\0') */
while (fgets(line, sizeof(line), tsin)) { /* tsin is FILE* input */
/* ... doing stuff with line */
}
Why? fgets will return a pointer to line on success, or NULL on failure (for whatever reason). A valid pointer will test true and, of course, NULL will test false.
(note: you must insure that line is a character array declared in scope to use sizeof line as the length. If line is simply a pointer to an array, then you are only reading sizeof (char *) characters)

i had the same problem and i solved it in this way
while (fgets(line, sizeof(line), tsin) != 0) { //get an int value
... //doing stuff with line
}

Related

How can I make fgets() read all of the 1024 byte line? [duplicate]

Is this safe to do? Does fgets terminate the buffer with null or should I be setting the 20th byte to null after the call to fgets and before I call clean?
// strip new lines
void clean(char *data)
{
while (*data)
{
if (*data == '\n' || *data == '\r') *data = '\0';
data++;
}
}
// for this, assume that the file contains 1 line no longer than 19 bytes
// buffer is freed elsewhere
char *load_latest_info(char *file)
{
FILE *f;
char *buffer = (char*) malloc(20);
if (f = fopen(file, "r"))
if (fgets(buffer, 20, f))
{
clean(buffer);
return buffer;
}
free(buffer);
return NULL;
}
Yes fgets() always properly null-terminates the buffer. From the man page:
The fgets() function reads at most one less than the number of characters specified by n from the given stream and stores them in the string s. Reading stops when
a newline character is found, at end-of-file or error. The newline, if any, is retained. If any characters are read and there is no error, a '\0' character is
appended to end the string.
If there is an error, fgets() may or may not store any zero bytes anywhere in the buffer. Code which doesn't check the return value of fgets() won't be safe unless it ensures there's a zero in the buffer somewhere; the easiest way to do that is to unconditionally store a zero to the last spot. Doing that will mean that an unnoticed error may (depending upon implementation) cause a bogus extra line of data to be read, but won't fall off into Undefined Behavior.

Return value of fgets()

I have just recently started working with I/O in C. Here is my question - I have a file, from which I read my input. Then I use fgets() to get strings in a buffer which I utilise in some way. Now, what happens if the input is too short for the buffer i.e. if the first read by fgets() reaches EOF. Should fgets() return NULL(as I have read in fgets() documentation)? It seems that it doesn't and I get my input properly. Besides even my feof(input) does not say that we have reached EOF. Here is my code snippet.
char buf[BUFSIZ];
FILE *input,
*output;
input = fopen(argv[--argc], "r");
output = fopen(argv[--argc], "w");
/**
* If either of the input or output were unable to be opened
* we exit
*/
if (input == NULL) {
fprintf(stdout, "Failed to open file - %s.\n", argv[argc + 1]);
exit(EXIT_FAILURE);
}
if (output == NULL) {
fprintf(stdout, "Failed to open file - %s.\n", argv[argc + 0]);
exit(EXIT_FAILURE);
}
if (fgets(buf, sizeof(buf), input) != NULL) {
....
}
/**
* After the fgets() condition exits it is because, either -
* 1) The EOF was reached.
* 2) There is a read error.
*/
if (feof(input)) {
fprintf(stdout, "Reached EOF.\n");
}
else if (ferror(input)) {
fprintf(stdout, "Error while reading the file.\n");
}
The documentation for fgets() does not say what you think it does:
From my manpage
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading
stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored
after the last character in the buffer.
And later
gets() and fgets() return s on success, and NULL on error or when end of file occurs while no characters have been read.
I don't read that as saying an EOF will be treated as an error condition and return NULL. Indeed it says a NULL would only occur where EOF occurs when no characters have been read.
The POSIX standard (which defers to the less accessible C standard) is here: http://pubs.opengroup.org/onlinepubs/009695399/functions/fgets.html and states:
Upon successful completion, fgets() shall return s. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets() shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets() shall return a null pointer, and shall set errno to indicate the error.
This clearly indicates it's only going to return a NULL if it's actually at EOF when called, i.e. if any bytes are read, it won't return NULL.

Extra EOF character

I have a program that reads a file into a buffer structure. The problem I'm having is that when I look at the output of the file, there's an extra EOF character at the end. Ill post the related functions:(NOTE: I removed parameter checks and only posted code in the function related to the issue)
b_load
int b_load(FILE * const fi, Buffer * const pBD){
unsigned char character; /*Variable to hold read character from file*/
Buffer * tempBuffer; /*Temparary Bufer * to prevent descruction of main Buffer*/
short num_chars = 0; /*Counter of the amount of characters read into the buffer*/
/*Assigns main Buffer to tempBuffer*/
tempBuffer = pBD;
/*Infinite loop that breaks after EOF is read*/
while(1){
/*calls fgetc() and returns the char into the character variable*/
character = (unsigned char)fgetc(fi);
if(!feof(fi)){
tempBuffer = b_addc(pBD,character);
if(tempBuffer == NULL)
return LOAD_FAIL;
++num_chars;
}else{
break;
}
}
return num_chars;
}
b_print
int b_print(Buffer * const pBD){
int num_chars = 0;
if(pBD->addc_offset == 0)
printf("The buffer is empty\n");
/*Sets getc_offset to 0*/
b_set_getc_offset(pBD, 0);
pBD->eob=0;
/*b_eob returns the structures eob field*/
while (!b_eob(pBD)){
printf("%c",b_getc(pBD));
++num_chars;
}
printf("\n");
return num_chars;
}
b_getc
char b_getc(Buffer * const pBD){
if(pBD->getc_offset == pBD->addc_offset){
pBD->eob = 1;
return R_FAIL_1;
}
pBD->eob = 0;
return pBD->ca_head[(pBD->getc_offset)++];
}
at the end I end up with:
"a catÿ"
(the y is the EOF character)
It prints an EOF character but is never added to the buffer. When the driver code adds an EOF character to the end of the buffer, 2 appear. Any idea what is causing this? I might be using feof() wrong so that may be it, but it is required in the code
There is no "EOF character". EOF is a value returned by getchar() and related functions to indicate that they have no more input to read. It's a macro that expands to a negative integer constant expression, typically (-1).
(For Windows text files, an end-of-file condition may be triggered by a Control-Z character in a file. If you read such a file in text mode, you won't see that character; it will just act like it reached the end of the file at that point.)
Don't use the feof() function to detect that there's no more input to read. Instead, look at the value returned by whatever input function you're using. Different input functions use different ways to indicate that they weren't able to read anything; read the documentation for whichever one you're using. For example, fgets() returns a null pointer, getchar() returns EOF, and scanf() returns the number of items it was able to read.
getchar(), for example, returns either the character it just read (treated as an unsigned char and converted to int) or the value EOF to indicate that it wasn't able to read anything. The negative value of EOF is chosen specifically to avoid colliding with any valid value of type unsigned char. Which means you need to store the value returned by getchar() in an int object; if you store it in a char or unsigned char instead, you can lose information, and an actual character with the value 0xff can be mistaken for EOF.
The feof() function returns the value of the end-of-file indicator for the file you're reading from. That indicator becomes true after you've tried and failed to read from the file. And if you ran out of input because of an error, rather than because of an end-of-file condition, feof() will never become true.
You can use feof() and/or ferror() to determine why there was no more input to be read, but only after you've detected it by other means.
Recommended reading: Section 12 of the comp.lang.c FAQ, which covers stdio. (And the rest of it.)
UPDATE :
I haven't seen enough of your code to understand what you're doing with the Buffer objects. Your input look actually looks (almost) correct, though it's written in a clumsy way.
The usual idiom for reading characters from a file is:
int c; /* `int`, NOT `char` or `unsigned char` */
while ((c = fgetc(fi)) != EOF) {
/* process character in `c` */
}
But your approach, which I might rearrange like this:
while (1) {
c = fgetc(fi);
if (feof(fi) || ferror(fi)) {
/* no more input */
break;
}
/* process character in c */
}
should actually work. Note that I've added a check for ferror(f1). Could it be that you have an error on input (which you're not detecting)? That would cause c to contain EOF, or the value of EOF converted to the type of c. That's doubtful, though, since it would probably give you an infinite loop.
Suggested approach: Using either an interactive debugger or added printf calls, show the value of character every time through the loop. If your input loop is working correctly, then build a stripped-down version of your program with a hard-wired sequence of calls to b_addc(), and see if you can reproduce the problem that way.
There you go ...
int b_load(FILE * const fi, Buffer * const pBD){
int character; /*Variable to hold read character from file*/
Buffer * tempBuffer; /*Temparary Bufer * to prevent descruction of main Buffer*/
short num_chars ; /*Counter of the amount of characters read into the buffer*/
/*Infinite loop that breaks WHEN EOF is read*/
while(num_chars = 0; 1; num_chars++ ) {
character = fgetc(fi);
if (character == EOF || feof(fi)) break; // since you insist on the silly feof() ...
tempBuffer = b_addc(pBD, (unsigned char) character);
if(tempBuffer == NULL) return LOAD_FAIL;
}
}
return num_chars;
}

Elegant way to determine EOF?

I am reading from a text file, iterating with a while(!feof) loop,
but whenever I use this condition the loop iterates an extra time.
I solved the problem with this 'patchy' code
while (stop == FALSE)
{
...
terminator = fgetc(input);
if (terminator == EOF)
stop = TRUE;
else
fseek(input, -1, SEEK_CUR);
}
But it looks and feels very bad.
You can take advantage of the fact that an assignment gets evaluated as the value being assigned, in this case to the character being read:
while((terminator = fgetc(input))!= EOF) {
// ...
}
Here is an idiomatic example (source):
fp = fopen("datafile.txt", "r"); // error check this!
// this while-statement assigns into c, and then checks against EOF:
while((c = fgetc(fp)) != EOF) {
/* ... */
}
fclose(fp);
Similarly you ca read line-by-line:
char buf[MAXLINE];
// ...
while((fgets(buf,MAXLINE,stdin)) != NULL) {
do_something(buf);
}
Since fgets copies the detected newline character, you can detect
end of line by checking the second to last buffer element. You can use
realloc to resize the buffer (be sure you keep a pointer to the beginning of the buffer, but pass buf+n, to the next fgets, where n is the number of read characters). From the standard regarding fgets:
Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the end-of-file is reached, whichever happens first. A newline character makes fgets stop reading, but it is considered a valid character by the function and included in the string copied to str.
Alternatively, you could read the whole file in one go using fread() (see example following the link).

Does fgets() always null-terminate the string it returns?

Is this safe to do? Does fgets terminate the buffer with null or should I be setting the 20th byte to null after the call to fgets and before I call clean?
// strip new lines
void clean(char *data)
{
while (*data)
{
if (*data == '\n' || *data == '\r') *data = '\0';
data++;
}
}
// for this, assume that the file contains 1 line no longer than 19 bytes
// buffer is freed elsewhere
char *load_latest_info(char *file)
{
FILE *f;
char *buffer = (char*) malloc(20);
if (f = fopen(file, "r"))
if (fgets(buffer, 20, f))
{
clean(buffer);
return buffer;
}
free(buffer);
return NULL;
}
Yes fgets() always properly null-terminates the buffer. From the man page:
The fgets() function reads at most one less than the number of characters specified by n from the given stream and stores them in the string s. Reading stops when
a newline character is found, at end-of-file or error. The newline, if any, is retained. If any characters are read and there is no error, a '\0' character is
appended to end the string.
If there is an error, fgets() may or may not store any zero bytes anywhere in the buffer. Code which doesn't check the return value of fgets() won't be safe unless it ensures there's a zero in the buffer somewhere; the easiest way to do that is to unconditionally store a zero to the last spot. Doing that will mean that an unnoticed error may (depending upon implementation) cause a bogus extra line of data to be read, but won't fall off into Undefined Behavior.

Resources