I have just recently started working with I/O in C. Here is my question - I have a file, from which I read my input. Then I use fgets() to get strings in a buffer which I utilise in some way. Now, what happens if the input is too short for the buffer i.e. if the first read by fgets() reaches EOF. Should fgets() return NULL(as I have read in fgets() documentation)? It seems that it doesn't and I get my input properly. Besides even my feof(input) does not say that we have reached EOF. Here is my code snippet.
char buf[BUFSIZ];
FILE *input,
*output;
input = fopen(argv[--argc], "r");
output = fopen(argv[--argc], "w");
/**
* If either of the input or output were unable to be opened
* we exit
*/
if (input == NULL) {
fprintf(stdout, "Failed to open file - %s.\n", argv[argc + 1]);
exit(EXIT_FAILURE);
}
if (output == NULL) {
fprintf(stdout, "Failed to open file - %s.\n", argv[argc + 0]);
exit(EXIT_FAILURE);
}
if (fgets(buf, sizeof(buf), input) != NULL) {
....
}
/**
* After the fgets() condition exits it is because, either -
* 1) The EOF was reached.
* 2) There is a read error.
*/
if (feof(input)) {
fprintf(stdout, "Reached EOF.\n");
}
else if (ferror(input)) {
fprintf(stdout, "Error while reading the file.\n");
}
The documentation for fgets() does not say what you think it does:
From my manpage
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading
stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored
after the last character in the buffer.
And later
gets() and fgets() return s on success, and NULL on error or when end of file occurs while no characters have been read.
I don't read that as saying an EOF will be treated as an error condition and return NULL. Indeed it says a NULL would only occur where EOF occurs when no characters have been read.
The POSIX standard (which defers to the less accessible C standard) is here: http://pubs.opengroup.org/onlinepubs/009695399/functions/fgets.html and states:
Upon successful completion, fgets() shall return s. If the stream is at end-of-file, the end-of-file indicator for the stream shall be set and fgets() shall return a null pointer. If a read error occurs, the error indicator for the stream shall be set, fgets() shall return a null pointer, and shall set errno to indicate the error.
This clearly indicates it's only going to return a NULL if it's actually at EOF when called, i.e. if any bytes are read, it won't return NULL.
Related
I have a question about the fgets()
char *fgets(char *str, int strsize, FILE *stream);
fgets()' document says:
On success, the function returns the same str parameter. If the End-of-File is encountered and no characters have been read, the contents of str remain unchanged and a null pointer is returned.
If an error occurs, a null pointer is returned.
How do you differentiate between above two situations - fgets reaching EOF(END OF FILE) & error whilst reading file?
Also, when an error occurs whilst fgets reads file, does fgets keep track of whatever has been read up to that point in the str?
How do you check if fgets immediately reaches EOF?
Referring your 1st question:
How do you differentiate between above two situations - fgets reaching EOF(END OF FILE) & error whilst reading file?
If fgets() returned NULL, call ferror() for the file pointer which just before had been used with the fgets() call, which returned NULL. If ferror() returns a non zero value, then fgets() failed, else it had reached the end-of-the file.
Example:
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#define LINE_LEN_MAX (42)
int main(int argc, char ** argv)
{
if (1 >= argc)
{
errno = EINVAL;
perror("main() failed");
exit(EXIT_FAILURE);
}
{
FILE * fp = fopen(argv[1], "r");
if (NULL == fp)
{
perror("fopen() failed");
exit(EXIT_FAILURE);
}
for (char line[LINE_LEN_MAX];
NULL != fgets(line, LINE_LEN_MAX, fp);)
{
printf("%s", line);
}
if (0 != ferror(fp))
{
perror("fgets() failed");
exit(EXIT_FAILURE);
}
fclose(fp);
}
return EXIT_SUCCESS;
}
The other to question can be answered straight forward from the docs:
Question 3:
How do you check if fgets immediately reaches EOF?
Answer:
If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned.
Question 2:
when an error occurs whilst fgets reads file, does fgets keep track of whatever has been read up to that point in the str?
Answer:
If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
what is the correct way to read a text file until EOF using fgets in C? Now I have this (simplified):
char line[100 + 1];
while (fgets(line, sizeof(line), tsin) != NULL) { // tsin is FILE* input
... //doing stuff with line
}
Specifically I'm wondering if there should be something else as the while-condition? Does the parsing from the text-file to "line" have to be carried out in the while-condition?
According to the reference
On success, the function returns str.
If the end-of-file is encountered while attempting to read a character, the eof indicator is
set (feof). If this happens before any characters could be read, the
pointer returned is a null pointer (and the contents of str remain
unchanged). If a read error occurs, the error indicator (ferror) is
set and a null pointer is also returned (but the contents pointed by
str may have changed).
So checking the returned value whether it is NULL is enough. Also the parsing goes into the while-body.
What you have done is 100% OK, but you can also simply rely on the return of fgets as the test itself, e.g.
char line[100 + 1] = ""; /* initialize all to 0 ('\0') */
while (fgets(line, sizeof(line), tsin)) { /* tsin is FILE* input */
/* ... doing stuff with line */
}
Why? fgets will return a pointer to line on success, or NULL on failure (for whatever reason). A valid pointer will test true and, of course, NULL will test false.
(note: you must insure that line is a character array declared in scope to use sizeof line as the length. If line is simply a pointer to an array, then you are only reading sizeof (char *) characters)
i had the same problem and i solved it in this way
while (fgets(line, sizeof(line), tsin) != 0) { //get an int value
... //doing stuff with line
}
The programm I tried writing should have been able to read a string of a length not longer than 8 characters and check if such string were present in the file. I decided to use 'read' system function for it, but I've come up with a strange behavior of this function. As it's written in manual, it must return 0 when the end of file is reached, but in my case when there were no more characters to read it still read a '\n' and returned 1 (number of bytes read) (I've checked the ASCII code of the read character and it's actually 10 which is of '\n'). So considering this fact I changed my code and it worked, but I still can't understand why does it behave in this way. Here is the code of my function:
int is_present(int fd, char *string)
{
int i;
char ch, buf[9];
if (!read(fd, &ch, 1)) //file is empty
return 0;
while (1) {
i = 0;
while (ch != '\n') {
buf[i++] = ch;
read(fd, &ch, 1);
}
buf[i] = '\0';
if (!strncmp(string, buf, strlen(buf))) {
close(fd);
return 1;
}
if(!read(fd, &ch, 1)) //EOF reached
break;
}
close(fd);
return 0;
}
I think that your problem is in the inner read() call. There you are not checking the return of the function.
while (ch != '\n') {
buf[i++] = ch;
read(fd, &ch, 1);
}
If the file happens to be at EOF when entering the function and ch equals '\n' then it will be an infinite loop, because read() will not modify the ch value. BTW, you are not checking the bounds of buf.
I'm assuming the question is 'why does read() work this way' and not 'what is wrong with my program?'.
This is not an error. From the manual page:
On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. In this case it is left unspecified whether the file position (if any) changes.
If you think about it read must work this way. If it returned 0 to indicate an end of file was reached when some data had been read, you would have no idea how much data had been read. Therefore read returns 0 only when no data is read because of an end-of-file condition.
Therefore in this case, where there is only a \n available, read() will succeed and return 1. The next read will return a zero to indicate end of file.
The read() function unless it finds a EOF keeps reading characters and places it on the buffer. here in this case \n is also considered as a character. hence it reads that also. Your code would have closed after it read the \n as there was nothing else other than EOF . So only EOF is the delimiter for the read() and every other character is considered normal. Cheers!
Test
In order to find the behaviour of getline() when confronted with EOF, I wrote the following test:
int main (int argc, char *argv[]) {
size_t max = 100;
char *buf = malloc(sizeof(char) * 100);
size_t len = getline(&buf, &max, stdin);
printf("length %zu: %s", len, buf);
}
And input1 is:
abcCtrl-DEnter
Result:
length 4: abc //notice that '\n' is also taken into consideration and printed
Input2:
abcEnter
Exactly same output:
length 4: abc
It seems that the EOF is left out out by getline()
Source code
So I find the source code of getline() and following is a related snippet of it (and I leave out some comments and irrelevant codes for conciseness):
while ((c = getc (stream)) != EOF)
{
/* Push the result in the line. */
(*lineptr)[indx++] = c;
/* Bail out. */
if (c == delim) //delim here is '\n'
break;
}
/* Make room for the null character. */
if (indx >= *n)
{
*lineptr = realloc (*lineptr, *n + line_size);
if (*lineptr == NULL)
return -1;
*n += line_size;
}
/* Null terminate the buffer. */
(*lineptr)[indx++] = 0;
return (c == EOF && (indx - 1) == 0) ? -1 : indx - 1;
Question
So my question is:
why length here is 4 (as far as I can see it should be 5)(as wiki says, It won't be a EOF if it not at the beginning of a line)
A similar question:EOF behavior when accompanied by other values but notice getline() in that question is different from GNU-getline
I use GCC: (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Ctrl-D causes your terminal to flush the input buffer if it isn’t already flushed. Otherwise, the end-of-file indicator for the input stream is set. A newline also flushes the buffer.
So you didn't close the stream, but only flushed the input buffer, which is why getline doesn't see an end-of-file indicator.
In neither of these cases, a literal EOT character (ASCII 0x04, ^D) is received by getline (in order to do so, you can type Ctrl-VCtrl-D).
Type
abcCtrl-DCtrl-D
or
abcEnterCtrl-D
to actually set the end-of-file indicator.
From POSIX:
Special characters
EOF
Special character on input, which is recognized if the ICANON flag is set. When received, all the bytes waiting to be read are immediately passed to the process without waiting for a <newline>, and the EOF is discarded. Thus, if there are no bytes waiting (that is, the EOF occurred at the beginning of a line), a byte count of zero shall be returned from the read(), representing an end-of-file indication. If ICANON is set, the EOF character shall be discarded when processed.
FYI, the ICANON flag is specified here.
I have a program that reads a file into a buffer structure. The problem I'm having is that when I look at the output of the file, there's an extra EOF character at the end. Ill post the related functions:(NOTE: I removed parameter checks and only posted code in the function related to the issue)
b_load
int b_load(FILE * const fi, Buffer * const pBD){
unsigned char character; /*Variable to hold read character from file*/
Buffer * tempBuffer; /*Temparary Bufer * to prevent descruction of main Buffer*/
short num_chars = 0; /*Counter of the amount of characters read into the buffer*/
/*Assigns main Buffer to tempBuffer*/
tempBuffer = pBD;
/*Infinite loop that breaks after EOF is read*/
while(1){
/*calls fgetc() and returns the char into the character variable*/
character = (unsigned char)fgetc(fi);
if(!feof(fi)){
tempBuffer = b_addc(pBD,character);
if(tempBuffer == NULL)
return LOAD_FAIL;
++num_chars;
}else{
break;
}
}
return num_chars;
}
b_print
int b_print(Buffer * const pBD){
int num_chars = 0;
if(pBD->addc_offset == 0)
printf("The buffer is empty\n");
/*Sets getc_offset to 0*/
b_set_getc_offset(pBD, 0);
pBD->eob=0;
/*b_eob returns the structures eob field*/
while (!b_eob(pBD)){
printf("%c",b_getc(pBD));
++num_chars;
}
printf("\n");
return num_chars;
}
b_getc
char b_getc(Buffer * const pBD){
if(pBD->getc_offset == pBD->addc_offset){
pBD->eob = 1;
return R_FAIL_1;
}
pBD->eob = 0;
return pBD->ca_head[(pBD->getc_offset)++];
}
at the end I end up with:
"a catÿ"
(the y is the EOF character)
It prints an EOF character but is never added to the buffer. When the driver code adds an EOF character to the end of the buffer, 2 appear. Any idea what is causing this? I might be using feof() wrong so that may be it, but it is required in the code
There is no "EOF character". EOF is a value returned by getchar() and related functions to indicate that they have no more input to read. It's a macro that expands to a negative integer constant expression, typically (-1).
(For Windows text files, an end-of-file condition may be triggered by a Control-Z character in a file. If you read such a file in text mode, you won't see that character; it will just act like it reached the end of the file at that point.)
Don't use the feof() function to detect that there's no more input to read. Instead, look at the value returned by whatever input function you're using. Different input functions use different ways to indicate that they weren't able to read anything; read the documentation for whichever one you're using. For example, fgets() returns a null pointer, getchar() returns EOF, and scanf() returns the number of items it was able to read.
getchar(), for example, returns either the character it just read (treated as an unsigned char and converted to int) or the value EOF to indicate that it wasn't able to read anything. The negative value of EOF is chosen specifically to avoid colliding with any valid value of type unsigned char. Which means you need to store the value returned by getchar() in an int object; if you store it in a char or unsigned char instead, you can lose information, and an actual character with the value 0xff can be mistaken for EOF.
The feof() function returns the value of the end-of-file indicator for the file you're reading from. That indicator becomes true after you've tried and failed to read from the file. And if you ran out of input because of an error, rather than because of an end-of-file condition, feof() will never become true.
You can use feof() and/or ferror() to determine why there was no more input to be read, but only after you've detected it by other means.
Recommended reading: Section 12 of the comp.lang.c FAQ, which covers stdio. (And the rest of it.)
UPDATE :
I haven't seen enough of your code to understand what you're doing with the Buffer objects. Your input look actually looks (almost) correct, though it's written in a clumsy way.
The usual idiom for reading characters from a file is:
int c; /* `int`, NOT `char` or `unsigned char` */
while ((c = fgetc(fi)) != EOF) {
/* process character in `c` */
}
But your approach, which I might rearrange like this:
while (1) {
c = fgetc(fi);
if (feof(fi) || ferror(fi)) {
/* no more input */
break;
}
/* process character in c */
}
should actually work. Note that I've added a check for ferror(f1). Could it be that you have an error on input (which you're not detecting)? That would cause c to contain EOF, or the value of EOF converted to the type of c. That's doubtful, though, since it would probably give you an infinite loop.
Suggested approach: Using either an interactive debugger or added printf calls, show the value of character every time through the loop. If your input loop is working correctly, then build a stripped-down version of your program with a hard-wired sequence of calls to b_addc(), and see if you can reproduce the problem that way.
There you go ...
int b_load(FILE * const fi, Buffer * const pBD){
int character; /*Variable to hold read character from file*/
Buffer * tempBuffer; /*Temparary Bufer * to prevent descruction of main Buffer*/
short num_chars ; /*Counter of the amount of characters read into the buffer*/
/*Infinite loop that breaks WHEN EOF is read*/
while(num_chars = 0; 1; num_chars++ ) {
character = fgetc(fi);
if (character == EOF || feof(fi)) break; // since you insist on the silly feof() ...
tempBuffer = b_addc(pBD, (unsigned char) character);
if(tempBuffer == NULL) return LOAD_FAIL;
}
}
return num_chars;
}