What will happen if fgetc reads 0xFF? [duplicate] - c

This question already has answers here:
fgetc, checking EOF
(2 answers)
Closed 6 years ago.
fgetc reads a character from a file at a time, and returns the character as type of int. If the file ends, then fgetc returns EOF ("probably" (int)(-1)). However, I have seen this so frequently:
char ch;
while ((ch = fgetc(fp)) != EOF) { /* Do something. */ }
|<-- PARN -->|
This is what I concern:
fgetc reads OxFF and returns 0x000000FF.
0xFF is assigned to ch. (type casting from int to char)
If PARN is type of char, then it will be promoted to 0xFFFFFFFF.
Break out of the while loop because PARN == EOF. (stop reading from the file)
How can we tell reading OxFF and returning EOF apart?

The whole reason the return type of fgetc is int is because it can return either a value of an unsigned char, or EOF. This is clearly documented in the specification and in any good resource on learning C. If you use an object of type char to store the result, this is programmer error.

Yes, your concern is correct that the value gets typecast to char before the comparison is made.
The easiest solution is:
int chread;
while ((chread = fgetc(fp)) != EOF) { char ch = chread; /* Do something. */ }

fgetc() returns the obtained character on success or EOF on failure.
If the failure has been caused by end-of-file condition, additionally sets the eof indicator (see feof()) on stream. If the failure has been caused by some other error, sets the error indicator (seeferror() ) on stream.

Related

What are all the reasons `fgetc()` might return `EOF`?

Certainly fgetc() returns EOF when end-of-file or an input error occurs.
Is that all and does that mean no more data is available?
FILE *inf = ...;
int ch;
while ((ch = fgetc(inf)) != EOF) {
;
}
if (feof(inf)) puts("End-of-file");
else if (ferror(inf)) puts("Error");
else puts("???");
Is testing with feof(), ferror() sufficient?
Note: EOF here is a macro that evaluates to some negative int, often -1. It is not a synonym for end-of-file.
I have found some questions and more that are close to this issue, yet none that enumerate all possibilities.
Is that all and does that mean no more data available?
No, there are more ways for EOF.
An EOF does not certainly mean no more data - it depends.
The C library lists three cases where fgetc() returns EOF.
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF. C17dr § 7.21.7.1 3
Recall each stream, like stdin, has an end-of-file indicator and error indicator.
stream just encountered the end-of-file
(Most common) An attempt has been made to get more data, but there was none.
end-of-file indicator for the stream is set
The stream first examines its end-of-file indicator. If it sees that the indicator is set, it returns EOF. No attempt is made to see if more data exists. Some types of streams will report EOF, but data will have arrived after the prior EOF report. Until the end-of-file indicator is cleared as with clearerr(), the return remains EOF. Example 1. Example 2.
Input error
The stream error indicator is not examined. Yet the function failed for some reason to read data other than end-of-file. A common example is fputc(stdin). Often input errors are persistent. Some are not. More data may be available. The common strategy is to end the input.
// Example where ferror() is true, yet fgetc() does not return EOF
FILE *inf = stdin;
printf("end-of-file:%d error:%d\n", feof(inf), ferror(inf));
printf("fputc():%d\n", fputc('?', inf)); // EOF reported
printf("end-of-file:%d error:%d\n", feof(inf), ferror(inf));
printf("fgetc():%d\n", fgetc(inf)); // User typed in `A`, 'A' reported
printf("end-of-file:%d error:%d\n", feof(inf), ferror(inf));
Output
end-of-file:0 error:0
fputc():-1
end-of-file:0 error:1
fgetc():65
end-of-file:0 error:1
When ferror() is true, it does not mean the error just occurred, just sometime in the past.
Other cases
Apparent EOF due to improperly saving as char
fgetc() returns an int with a value in the unsigned char range and EOF - a negative value.
When fgetc() reads character code 255, yet saves that as a char on a system where char is signed, that commonly results in the char having the same value as EOF, yet end-of-file did not occur.
FILE *f = fopen("t", "w");
fputc(EOF & 255, f);
fclose(f);
f = fopen("t", "r");
char ch = fgetc(f); // Should be int ch
printf ("%d %d\n", ch == EOF, ch);
printf("end-of-file:%d error:%d\n", feof(f), ferror(f));
fclose(f);
Output
1 -1 // ch == EOF !
end-of-file:0 error:0
Systems where UCHAR_MAX == UINT_MAX. Rare.
(I have only come across this in some older graphics processors, still something C allows.) In that case, fgetc() may read an unsigned char outside the int range and so convert it to EOF on the function return. Thus fgetc() is returning a character code that happens to equal EOF. This is mostly an oddity in the C history. A way to mostly handle is:
while ((ch = fgetc(inf)) != EOF && !feof(inf) && !ferror(inf)) {
;
}
Such pedantic code is rarely needed.
Undefined behavior
Of course when UB occurs, anything is possible.
FILE * f = fopen("Some_non_existent_file", "r");
// Should have tested f == NULL here
printf("%d\n", fgetc(f) == EOF); // Result may be 1
A robust way to handle the return from fgetc().
FILE *inf = ...;
if (inf) { // Add test
int ch; // USE int !
// Pedantic considerations, usually can be ignored
#if UCHAR_MAX > INT_MAX
clearerr(inf); // Clear history of prior flags
while ((ch = fgetc(inf)) != EOF && !feof(inf) && !ferror(inf)) {
;
}
#else
while ((ch = fgetc(inf)) != EOF) {
;
}
#endif
if (feof(inf)) puts("End-of-file");
else puts("Error");
If code needs to look for data after end-of-file or error, call clearerr() and repeat the if() block.
Another case where EOF doesn't necessarily mean 'no more data' was (rather than 'is') reading magnetic tapes. You could have multiple files on a single tape, with the end of each marked with EOF. When you encountered EOF, you used clearerr(fp) to reset the EOF and error states on the file stream, and you could then continue reading the next file on the tape. However, magnetic tapes have (for the most part) gone the way of the dodo, so this barely counts any more.
Here's one obscure reason:
On Windows, reading byte 0x1A in text mode causes EOF.
By "Windows" I mean both MSVC and MinGW (so it's probably a quirk of Microsoft's CRT). This doesn't happen on Cygwin.

*Might* an unsigned char be equal to EOF? [duplicate]

This question already has answers here:
Can sizeof(int) ever be 1 on a hosted implementation?
(8 answers)
Closed 7 years ago.
When using fgetc to read the next character of a stream, you usually check that the end-of-file was not attained by
if ((c = fgetc (stream)) != EOF)
where c is of int type. Then, either the end-of-file has been attained and the condition will fail, or c shall be an unsigned char converted to int, which is expected to be different from EOF —for EOF is ensured to be negative. Fine... apparently.
But there is a small problem... Usually the char type has no more than 8 bits, while int must have at least 16 bits, so every unsigned char will be representable as an int. Nevertheless, in the case char would have 16 or 32 bits (I know, this is never the case in practice...), there is no reason why one could not have sizeof(int) == 1, so that it would be (theoretically!) possible that fgetc (stream) returns EOF (or another negative value) but that end-of-file has not been attained...
Am I mistaken? Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained? (If yes, I could not find it!). Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?...
EDIT: Indeed, this was a duplicate of Question #3860943. I did not find that question at first search. Thank for your help! :-)
You asked:
Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained?
On the contrary, the standard explicitly allows EOF to be returned when an error occurs.
If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.
In the footnotes, I see:
An end-of-file and a read error can be distinguished by use of the feof and ferror functions.
You also asked:
Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?
On the theoretical platform where CHAR_BIT is more than 8 and sizeof(int) == 1, that won't be a valid way to check that end-of-file has been reached. For that, you'll have to resort to feof and ferror.
c = fgetc (stream);
if ( !feof(stream) && !ferror(stream) )
{
// Got valid input in c.
}
I think you need to rely on stream error.
ch = fgetc(stream);
if (ferror(stream) && (ch == EOF)) /* end of file */;
From the standard
If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.
Edit for better version
ch = fgetc(stream);
if (ch == EOF) {
if (ferror(stream)) /* error reading */;
else if (feof(stream)) /* end of file */;
else /* read valid character with value equal to EOF */;
}
If you are reading a stream that is standard ASCII only, there's no risk of receiving the char equivalent to EOF before the real end-of-file, because valid ASCII char codes go up to 127 only. But it could happen when reading a binary file. The byte would need to be 255(unsigned) to correspond to a -1 signed char, and nothing prevents it from appearing in a binary file.
But about your specific question (if there's something in the standard), not exactly... but notice that fgetc promotes the character as an unsigned char, so it won't ever be negative in this case anyway. The only risk would be if you had explicitly or implicitly cast down the return value to signed char (for instance, if your c variable were signed char).
NOTE: as #Ulfalizer mentioned in the comments, there's one rare case in which you may need to worry: if sizeof(int)==1, and you're reading a file that contains non-ascii characters, then you may get a -1 return value that is not the real EOF. Notice that environments in which this happens are quite rare (to my knowledge, compilers for low-end 8-bit microcontrollers, like the 8051). In such a case, the safe option would be to test feof() as #pmg suggested.
I agree with your reading.
C Standard says (C11, 7.21.7.1 The fgetc function p3):
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the endof-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function
returns EOF.
There is nothing in the Standard (assuming UCHAR_MAX > INT_MAX) that disallows fgetc in a hosted implementation to return a value equal to EOF that is neither an end-of-file nor an error condition indicator.

fgetc returns an unknown character

I have the following code:
FILE *f = fopen('/path/to/some/file', 'rb');
char c;
while((c = fgetc(f)) != EOF)
{
printf("next char: '%c', '%d'", c, c);
}
For some reason, when printing out the characters, at the end of the file, an un-renderable character gets printed out, along with the ASCII ordinal -1.
next char: '?', '-1'
What character is this supposed to be? I know it's not EOF because there's a check for that, and quickly after the character is printed, the program SEGFAULT.
The trouble is that fgetc() and its relatives return an int, not a char:
If the end-of-file indicator for the input stream pointed to by stream is not set and a
next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the
stream (if defined).
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-
file indicator for the stream is set and the fgetc function returns EOF.
It has to return every possible valid character value and a distinct value, EOF (which is negative, and usually but not necessarily -1).
When you read the value into a char instead of an int, one of two undesirable things happens:
If plain char is unsigned, then you never get a value equal to EOF, so the loop never terminates.
If plain char is signed, then you can mistake a legitimate character, 0xFF (often ÿ, y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS) is treated the same as EOF, so you detect EOF prematurely.
Either way, it is not good.
The Fix
The fix is to use int c; instead of char c;.
Incidentally, the fopen() call should not compile:
FILE *f = fopen('/path/to/some/file', 'rb');
should be:
FILE *f = fopen("/path/to/some/file", "rb");
Always check the result of fopen(); of all the I/O functions, it is more prone to failure than almost any other (not through its own fault, but because the user or programmer makes a mistake with the file name).
This is the culprit:
char c;
Please change it to:
int c;
The return type of fgetc is int, not char. You get strange behavior when you convert int to char in some platforms.

EOF missing for unix file [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
fgetc does not identify EOF
fgetc, checking EOF
I have created a file and named it "file.txt" in Unix. I tried to read the file content from my C program. I am not able to receive the EOF character. Unix doesn't store EOF character on file creation? If so what is the alternative way to read the EOF from a Unix created file using C.
Here's the code sample
int main(){
File *fp;
int nl,c;
nl =0;
fp = fopen("file.txt", "r");
while((c = fgetc(fp)) != EOF){
if (c=='\n')
nl++;
}
return 0;
}
If I explicitly give CTRL + D the EOF is detected even when I use char c.
This can happen if the type of c is char (and char is unsigned in your compiler, you can check this by examining the value of CHAR_MIN in ) and not int.
The value of EOF is negative according to the C standard.
So, implicitly casting EOF to unsigned char will lose the true value of EOF and the comparison will always fail.
UPDATE: There's a bigger problem that has to be addressed first. In the expression c = fgetc(fp) != EOF, fgetc(fp) != EOF is evaluated first (to 0 or 1) and then the value is assigned to c. If there's at least one character in the file, fgetc(fp) != EOF will evaluate to 0 and the body of the while loop will never execute. You need to add parentheses, like so: (c = fgetc(fp)) != EOF.
Missing parentheses. Should be:
while((c = fgetc(fp)) != EOF)
Remember: fgetc() returns an int, not a char. It has to return an int because its set of return values includes all possible valid characters plus a separate (negative) EOF indicator.
There are two possible traps if you use type char for c instead of int:
If the type char is signed with your compiler, you will detect a valid character as EOF. Often, the character ÿ (y-umlaut, officially known in Unicode as LATIN LOWER CASE Y WITH DIAERESIS, U+00FF, hex code 0xFF in the ISO 8859-1 aka Latin 1 code set) will be detected as equivalent to EOF, when it is a valid character.
If the type char is unsigned, then the comparison will never be true.
Both problems are serious, and both are avoided by using the correct type:
FILE *fp = fopen("file.txt", "r");
if (fp != 0)
{
int c;
int nl = 0;
while ((c = fgetc(fp)) != EOF)
if (c == '\n')
nl++;
printf("Number of lines: %d\n", nl);
}
Note that the type is FILE and not File. Note that you should check that the file was opened before trying to read via fp.
If I explicitly give CTRL + D, the EOF is detected even when I use char c.
This means that your compiler provides you with char as a signed type. It also means you will not be able to count lines accurately in files which contain ÿ.
Unlike CP/M and DOS, Unix does not use any character to indicate EOF; you reach EOF when there are no more characters to read. What confuses many people is that if you type a certain key combination at the terminal, programs detect EOF. What actually happens is that the terminal driver recognizes the character and sends any unread characters to the program. If there are no unread characters, the program gets 0 bytes returned, which is the same result you get when you've reached the end of file. So, the character combination (often, but not always, Ctrl-D) appears to 'send EOF' to the program. However, the character is not stored in a file if you are using cat >file; further, if you read a file which contains a control-D, that is a perfectly fine character with byte value 0x04. If a program generates a control-D and sends that to a program, that does not indicate EOF to the program. It is strictly a property of Unix terminals (tty and pty — teletype and pseudo-teletype — devices).
You do not show how you declare the variable c it should be of type int, not char.

about getchar function in C

I am reading K & R C language book, following code fragment:
char c;
while ((c = getchar()) != EOF) ...
It was mentioned that for EOF (i think it is -1) is an "out of band" return value from getchar, distinct from all possible
values that getchar can return.
My questions are following:
I ran my program with char and it ran successfully, and my
understanding is signed char can store -127 to +127 so it can check
for -1 how it is "out of band" ?
Can any one provide simple example where above program fragment will fail if we use char c instead of int c?
Thanks!
You have a small mistake, getchar returns an int, not a char:
int c;
while ((c = getchar()) != EOF) ...
The valid values for ascii chars are from 0 to 127, the EOF is some other (int) value.
If you keep using char, you might get into troubles (as I got into)
Well, your question is answered in the C FAQ.
Two failure modes are possible if, as in the fragment above, getchar's
return value is assigned to a char.
If type char is signed, and if EOF is defined (as is usual) as -1,
the character with the decimal value 255 ('\377' or '\xff' in C) will
be sign-extended and will compare equal to EOF, prematurely
terminating the input.
If type char is unsigned, an actual EOF value will be truncated (by
having its higher-order bits discarded, probably resulting in 255 or 0xff) and will not be recognized as EOF, resulting in effectively infinite input.
Whatever value EOF has depends on your platform. Take a look at stdio.h too see its actual definition.

Resources