This question already has answers here:
Can sizeof(int) ever be 1 on a hosted implementation?
(8 answers)
Closed 7 years ago.
When using fgetc to read the next character of a stream, you usually check that the end-of-file was not attained by
if ((c = fgetc (stream)) != EOF)
where c is of int type. Then, either the end-of-file has been attained and the condition will fail, or c shall be an unsigned char converted to int, which is expected to be different from EOF —for EOF is ensured to be negative. Fine... apparently.
But there is a small problem... Usually the char type has no more than 8 bits, while int must have at least 16 bits, so every unsigned char will be representable as an int. Nevertheless, in the case char would have 16 or 32 bits (I know, this is never the case in practice...), there is no reason why one could not have sizeof(int) == 1, so that it would be (theoretically!) possible that fgetc (stream) returns EOF (or another negative value) but that end-of-file has not been attained...
Am I mistaken? Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained? (If yes, I could not find it!). Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?...
EDIT: Indeed, this was a duplicate of Question #3860943. I did not find that question at first search. Thank for your help! :-)
You asked:
Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained?
On the contrary, the standard explicitly allows EOF to be returned when an error occurs.
If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.
In the footnotes, I see:
An end-of-file and a read error can be distinguished by use of the feof and ferror functions.
You also asked:
Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?
On the theoretical platform where CHAR_BIT is more than 8 and sizeof(int) == 1, that won't be a valid way to check that end-of-file has been reached. For that, you'll have to resort to feof and ferror.
c = fgetc (stream);
if ( !feof(stream) && !ferror(stream) )
{
// Got valid input in c.
}
I think you need to rely on stream error.
ch = fgetc(stream);
if (ferror(stream) && (ch == EOF)) /* end of file */;
From the standard
If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.
Edit for better version
ch = fgetc(stream);
if (ch == EOF) {
if (ferror(stream)) /* error reading */;
else if (feof(stream)) /* end of file */;
else /* read valid character with value equal to EOF */;
}
If you are reading a stream that is standard ASCII only, there's no risk of receiving the char equivalent to EOF before the real end-of-file, because valid ASCII char codes go up to 127 only. But it could happen when reading a binary file. The byte would need to be 255(unsigned) to correspond to a -1 signed char, and nothing prevents it from appearing in a binary file.
But about your specific question (if there's something in the standard), not exactly... but notice that fgetc promotes the character as an unsigned char, so it won't ever be negative in this case anyway. The only risk would be if you had explicitly or implicitly cast down the return value to signed char (for instance, if your c variable were signed char).
NOTE: as #Ulfalizer mentioned in the comments, there's one rare case in which you may need to worry: if sizeof(int)==1, and you're reading a file that contains non-ascii characters, then you may get a -1 return value that is not the real EOF. Notice that environments in which this happens are quite rare (to my knowledge, compilers for low-end 8-bit microcontrollers, like the 8051). In such a case, the safe option would be to test feof() as #pmg suggested.
I agree with your reading.
C Standard says (C11, 7.21.7.1 The fgetc function p3):
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the endof-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function
returns EOF.
There is nothing in the Standard (assuming UCHAR_MAX > INT_MAX) that disallows fgetc in a hosted implementation to return a value equal to EOF that is neither an end-of-file nor an error condition indicator.
Related
Certainly fgetc() returns EOF when end-of-file or an input error occurs.
Is that all and does that mean no more data is available?
FILE *inf = ...;
int ch;
while ((ch = fgetc(inf)) != EOF) {
;
}
if (feof(inf)) puts("End-of-file");
else if (ferror(inf)) puts("Error");
else puts("???");
Is testing with feof(), ferror() sufficient?
Note: EOF here is a macro that evaluates to some negative int, often -1. It is not a synonym for end-of-file.
I have found some questions and more that are close to this issue, yet none that enumerate all possibilities.
Is that all and does that mean no more data available?
No, there are more ways for EOF.
An EOF does not certainly mean no more data - it depends.
The C library lists three cases where fgetc() returns EOF.
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF. C17dr § 7.21.7.1 3
Recall each stream, like stdin, has an end-of-file indicator and error indicator.
stream just encountered the end-of-file
(Most common) An attempt has been made to get more data, but there was none.
end-of-file indicator for the stream is set
The stream first examines its end-of-file indicator. If it sees that the indicator is set, it returns EOF. No attempt is made to see if more data exists. Some types of streams will report EOF, but data will have arrived after the prior EOF report. Until the end-of-file indicator is cleared as with clearerr(), the return remains EOF. Example 1. Example 2.
Input error
The stream error indicator is not examined. Yet the function failed for some reason to read data other than end-of-file. A common example is fputc(stdin). Often input errors are persistent. Some are not. More data may be available. The common strategy is to end the input.
// Example where ferror() is true, yet fgetc() does not return EOF
FILE *inf = stdin;
printf("end-of-file:%d error:%d\n", feof(inf), ferror(inf));
printf("fputc():%d\n", fputc('?', inf)); // EOF reported
printf("end-of-file:%d error:%d\n", feof(inf), ferror(inf));
printf("fgetc():%d\n", fgetc(inf)); // User typed in `A`, 'A' reported
printf("end-of-file:%d error:%d\n", feof(inf), ferror(inf));
Output
end-of-file:0 error:0
fputc():-1
end-of-file:0 error:1
fgetc():65
end-of-file:0 error:1
When ferror() is true, it does not mean the error just occurred, just sometime in the past.
Other cases
Apparent EOF due to improperly saving as char
fgetc() returns an int with a value in the unsigned char range and EOF - a negative value.
When fgetc() reads character code 255, yet saves that as a char on a system where char is signed, that commonly results in the char having the same value as EOF, yet end-of-file did not occur.
FILE *f = fopen("t", "w");
fputc(EOF & 255, f);
fclose(f);
f = fopen("t", "r");
char ch = fgetc(f); // Should be int ch
printf ("%d %d\n", ch == EOF, ch);
printf("end-of-file:%d error:%d\n", feof(f), ferror(f));
fclose(f);
Output
1 -1 // ch == EOF !
end-of-file:0 error:0
Systems where UCHAR_MAX == UINT_MAX. Rare.
(I have only come across this in some older graphics processors, still something C allows.) In that case, fgetc() may read an unsigned char outside the int range and so convert it to EOF on the function return. Thus fgetc() is returning a character code that happens to equal EOF. This is mostly an oddity in the C history. A way to mostly handle is:
while ((ch = fgetc(inf)) != EOF && !feof(inf) && !ferror(inf)) {
;
}
Such pedantic code is rarely needed.
Undefined behavior
Of course when UB occurs, anything is possible.
FILE * f = fopen("Some_non_existent_file", "r");
// Should have tested f == NULL here
printf("%d\n", fgetc(f) == EOF); // Result may be 1
A robust way to handle the return from fgetc().
FILE *inf = ...;
if (inf) { // Add test
int ch; // USE int !
// Pedantic considerations, usually can be ignored
#if UCHAR_MAX > INT_MAX
clearerr(inf); // Clear history of prior flags
while ((ch = fgetc(inf)) != EOF && !feof(inf) && !ferror(inf)) {
;
}
#else
while ((ch = fgetc(inf)) != EOF) {
;
}
#endif
if (feof(inf)) puts("End-of-file");
else puts("Error");
If code needs to look for data after end-of-file or error, call clearerr() and repeat the if() block.
Another case where EOF doesn't necessarily mean 'no more data' was (rather than 'is') reading magnetic tapes. You could have multiple files on a single tape, with the end of each marked with EOF. When you encountered EOF, you used clearerr(fp) to reset the EOF and error states on the file stream, and you could then continue reading the next file on the tape. However, magnetic tapes have (for the most part) gone the way of the dodo, so this barely counts any more.
Here's one obscure reason:
On Windows, reading byte 0x1A in text mode causes EOF.
By "Windows" I mean both MSVC and MinGW (so it's probably a quirk of Microsoft's CRT). This doesn't happen on Cygwin.
This question already has answers here:
fgetc, checking EOF
(2 answers)
Closed 6 years ago.
fgetc reads a character from a file at a time, and returns the character as type of int. If the file ends, then fgetc returns EOF ("probably" (int)(-1)). However, I have seen this so frequently:
char ch;
while ((ch = fgetc(fp)) != EOF) { /* Do something. */ }
|<-- PARN -->|
This is what I concern:
fgetc reads OxFF and returns 0x000000FF.
0xFF is assigned to ch. (type casting from int to char)
If PARN is type of char, then it will be promoted to 0xFFFFFFFF.
Break out of the while loop because PARN == EOF. (stop reading from the file)
How can we tell reading OxFF and returning EOF apart?
The whole reason the return type of fgetc is int is because it can return either a value of an unsigned char, or EOF. This is clearly documented in the specification and in any good resource on learning C. If you use an object of type char to store the result, this is programmer error.
Yes, your concern is correct that the value gets typecast to char before the comparison is made.
The easiest solution is:
int chread;
while ((chread = fgetc(fp)) != EOF) { char ch = chread; /* Do something. */ }
fgetc() returns the obtained character on success or EOF on failure.
If the failure has been caused by end-of-file condition, additionally sets the eof indicator (see feof()) on stream. If the failure has been caused by some other error, sets the error indicator (seeferror() ) on stream.
I was given an assignment in C language about reading and writing in a file.
I have read different codes on different websites and also their explanations. but there is one question that remained unanswered! Following is the general code I found on different sites:
#include <stdio.h>
void main()
{
FILE *fp;
int c;
fp = fopen("E:\\maham work\\CAA\\TENLINES.TXT","r");
c = getc(fp) ;
while (c!= EOF)
{
putchar(c);
c = getc(fp);
}
fclose(fp);
}
My questions are simple and stright.
in line c = getc(fp) ; what is that C receives?? address? character? ASCII Code?? what?
and
while (c!= EOF)
{
putchar(c);
c = getc(fp);
}
here how is c able to read the file character by character?? there is no increment operator... does the pointer "fp" helps in reading the code??
lastly, why is putchar(c); used in printing ? why not use printf("%C", c); ?
getc() returns the integer value of the byte at the current position in the file handle, then advances that position by one byte.
putchar() is simpler than printf.
1 minute googling got me this.
C++ reference
tutorial points
wikipedia
Quoting reference documentation (C++ here, but probably very similar in C).
int getc ( FILE * stream );
Get character from stream Returns the character currently pointed by the internal file position indicator of the specified stream. The internal file position indicator is then advanced to the next character.
If the stream is at the end-of-file when called, the function returns EOF and sets the end-of-file indicator for the stream (feof).
If a read error occurs, the function returns EOF and sets the error indicator for the stream (ferror).
getc and fgetc are equivalent, except that getc may be implemented as a macro in some libraries. See getchar for a similar function that reads directly from stdin.
Further reading gives us:
On success, the character read is returned (promoted to an int value).
The return type is int to accommodate for the special value EOF, which indicates failure: If the position indicator was at the end-of-file, the function returns EOF and sets the eof indicator (feof) of stream.
If some other reading error happens, the function also returns EOF, but sets its error indicator (ferror) instead.
Here we read
This function returns the character read as an unsigned char cast to an int or EOF on end of file or error.
And on wikipedia
Reads a byte/wchar_t from a file stream
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
fgetc does not identify EOF
fgetc, checking EOF
I have created a file and named it "file.txt" in Unix. I tried to read the file content from my C program. I am not able to receive the EOF character. Unix doesn't store EOF character on file creation? If so what is the alternative way to read the EOF from a Unix created file using C.
Here's the code sample
int main(){
File *fp;
int nl,c;
nl =0;
fp = fopen("file.txt", "r");
while((c = fgetc(fp)) != EOF){
if (c=='\n')
nl++;
}
return 0;
}
If I explicitly give CTRL + D the EOF is detected even when I use char c.
This can happen if the type of c is char (and char is unsigned in your compiler, you can check this by examining the value of CHAR_MIN in ) and not int.
The value of EOF is negative according to the C standard.
So, implicitly casting EOF to unsigned char will lose the true value of EOF and the comparison will always fail.
UPDATE: There's a bigger problem that has to be addressed first. In the expression c = fgetc(fp) != EOF, fgetc(fp) != EOF is evaluated first (to 0 or 1) and then the value is assigned to c. If there's at least one character in the file, fgetc(fp) != EOF will evaluate to 0 and the body of the while loop will never execute. You need to add parentheses, like so: (c = fgetc(fp)) != EOF.
Missing parentheses. Should be:
while((c = fgetc(fp)) != EOF)
Remember: fgetc() returns an int, not a char. It has to return an int because its set of return values includes all possible valid characters plus a separate (negative) EOF indicator.
There are two possible traps if you use type char for c instead of int:
If the type char is signed with your compiler, you will detect a valid character as EOF. Often, the character ÿ (y-umlaut, officially known in Unicode as LATIN LOWER CASE Y WITH DIAERESIS, U+00FF, hex code 0xFF in the ISO 8859-1 aka Latin 1 code set) will be detected as equivalent to EOF, when it is a valid character.
If the type char is unsigned, then the comparison will never be true.
Both problems are serious, and both are avoided by using the correct type:
FILE *fp = fopen("file.txt", "r");
if (fp != 0)
{
int c;
int nl = 0;
while ((c = fgetc(fp)) != EOF)
if (c == '\n')
nl++;
printf("Number of lines: %d\n", nl);
}
Note that the type is FILE and not File. Note that you should check that the file was opened before trying to read via fp.
If I explicitly give CTRL + D, the EOF is detected even when I use char c.
This means that your compiler provides you with char as a signed type. It also means you will not be able to count lines accurately in files which contain ÿ.
Unlike CP/M and DOS, Unix does not use any character to indicate EOF; you reach EOF when there are no more characters to read. What confuses many people is that if you type a certain key combination at the terminal, programs detect EOF. What actually happens is that the terminal driver recognizes the character and sends any unread characters to the program. If there are no unread characters, the program gets 0 bytes returned, which is the same result you get when you've reached the end of file. So, the character combination (often, but not always, Ctrl-D) appears to 'send EOF' to the program. However, the character is not stored in a file if you are using cat >file; further, if you read a file which contains a control-D, that is a perfectly fine character with byte value 0x04. If a program generates a control-D and sends that to a program, that does not indicate EOF to the program. It is strictly a property of Unix terminals (tty and pty — teletype and pseudo-teletype — devices).
You do not show how you declare the variable c it should be of type int, not char.
Is EOF always negative?
I'm thinking of writing a function that reads the next word in the input and returns the line number the word was found in or EOF if the end of the input has been reached. If EOF is not necessarily negative, the function would be incorrect.
EOF is always == EOF. Don't assume anything else.
On a second reading of the standard (and as per some other comments here) it seems EOF is always negative - and for the use specified in this question (line number or EOF) it would work. What I meant to warn against (and still do) is assuming characters are positive and EOF is negative.
Remember that it's possible for a standard conforming C implementation to have negative character values - this is even mentioned in 'The C programming language' (K&R). Printing characters are always positive, but on some architectures (probably all ancient), control characters are negative. The C standard does not specify whether the char type is signed or unsigned, and the only character constant guaranteed to have the same value across platforms, is '\0'.
Yes, EOF is always negative.
The Standard says:
7.19 Input/output
7.19.1 Introduction
3 The macros are [...] EOF which
expands to an integer constant
expression, with type int and a
negative value, that is returned by
several functions to indicate
end-of-file, that is, no more input
from a stream;
Note that there's no problem with "plain" char being signed. The <stdio.h> functions which deal with chars, specifically cast the characters to unsigned char and then to int, so that all valid characters have a positive value. For example:
int fgetc(FILE *stream)
7.19.7.1
... the fgetc function obtains that character as an unsigned char converted to an int ...
Have that function return
the line number the word was found in
or -1 in case the end of the input has been reached
Problem solved, without a need for relying on any EOF values. The caller can easily test for greater-or-equal-to-zero for a successful call, and assume EOF/IO-error otherwise.
From the online draft n1256, 17.9.1.3:
EOF
which expands to an integer constant expression, with type int and a negative value,
that is returned by several functions to indicate end-of-file, that is, no more input
from a stream;
EOF is always negative, though it may not always be -1.
For issues like this, I prefer separating error conditions from data by returning an error code (SUCCESS, END_OF_FILE, READ_ERROR, etc.) as the function's return value, and then writing the data of interest to separate parameters, such as
int getNextWord (FILE *stream, char *buffer, size_t bufferSize, int *lineNumber)
{
if (!fgets(buffer, bufferSize, stream))
{
if (feof(stream)) return END_OF_FILE; else return READ_ERROR;
}
else
{
// figure out the line number
*lineNumber = ...;
}
return SUCCESS;
}
EOF is a condition, rather than a value. The exact value of this sentinel is implementation defined. In a lot of cases, it is a negative number.
From wikipedia :
The actual value of EOF is a
system-dependent negative number,
commonly -1, which is guaranteed to be
unequal to any valid character code.
But no references ...
From Secure Coding : Detect and handle input and output errors
EOF is negative but only when sizeof(int) > sizeof(char).