File operations with error indicator set - c

I have to individually read characters and substrings from a stream in C while parsing them. I wish also to check for input error. The obvious way to do this is something like:
c = fgetc(f);
if(ferror(f)) {
puts(strerror(errno));
exit(1);
}
/* do something with c */
c = fgetc(f);
if(ferror(f)) {
puts(strerror(errno));
exit(1);
}
/* do something with c */
Etc. However, it would be much more practical and fast (in the non-exceptional case when there's no error) if I could do all the input operations and check for the error indicator later:
c = fgetc(f);
/* do something with c */
c = fgetc(f);
/* do something with c */
if(ferror(f)) {
puts(strerror(errno));
exit(1);
}
This would be possible if input operations like fgetc(), scanf() etc were simple passthrough no-ops when the error indicator of f is set. Say, an error occours in the first fgetc() and therefore the second fgetc() is a no-op that fails but change neither the error indicator of f nor errno.
A similiar question may be asked about output functions.
My question is: is this the behaviour of stdio functions? May I check ferror(f) after all operations and get errno then if I am sure that all those "do something with c" do not change errno?
Thanks!

No, those are not the defined semantics of errno.
Quoting this manual page:
Its value is significant only when the return value of the call indicated an error (i.e., -1 from most system calls; -1 or NULL from most library functions); a function that succeeds is allowed to change errno.
This implies that if you were to do two I/O operations where the first fails, and the second is a "no-op" (like read zero bytes) it could succeed and opt to clear errno, thus dropping the error set by the first call.

Answering my own question, it seems that the reliable way to implement what I was looking for is to write a wrapper function myfgetc(), as Michael Walz suggested, together with a global variable myerrno:
__thread int myerrno = 0;
int myfgetc(FILE *f) {
int c;
if(myerrno)
return EOF;
if((c = fgetc(f)) == EOF)
myerrno = errno;
return c;
}
The storage class __thread is added to myerrno so that every thread has its own myerrno. It can be ommited if the program is single threaded.

...is this (an error occurs in the first fgetc() and therefore the second fgetc() is a no-op that fails) the behaviour of stdio functions?
No - not a no-op.
FILE has: "an error indicator that records whether a read/write error has occurred," (C11dr § § 7.21.1 2), not that an error just occurred. It is a flag that accumulates the history of read errors.
For fgetc() and friends,
If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF. C11dr § 7.21.7.1 3.
This return of EOF due to input error differs from EOF due to end-of-file. The latter has an or "If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF". EOF due to input error does not have an or.
I interpret this to imply that the error indicator of the stream can be true and fgetc() does not return EOF as the byte just read was not in error.
How a stream error indicator affects following input code? may be useful.
May I check ferror(f) after all operations and get errno then if I am sure that all those "do something with c" do not change errno?
errno not that useful here. C does not specify any I/O functions as certainly setting errno - that is an extension of some compilers. C expressly prohibits standard functions from clearing errno.
Yes, code can check ferror(f) to see if an error had occurred sometime in the past. Examination of errno is not needed.
To clear both error indicator and end-of-file, research clearer().

Related

Checking stdin and stdout using external function

Does scope impact, checking for errors while obtaining input from stdin or outputting to stdout? For example if I have a code body built in the following way:
void streamCheck(){
if (ferror(stdin)){
fprintf(stderr, "stdin err");
exit(1);
}
if (ferror(stdout)){
fprintf(stderr, "stdout err");
exit(2);
}
}
int main(){
int c = getchar();
streamCheck();
...
putchar(c)
streamCheck();
}
are the return values of ferror(stdin) / ferror(stdout) impacted by the fact that I am checking them in a function rather than in the main? If there is a better way to do this also let me know I am quite new to C.
As long as you call ferror on a particular stream before calling any other function on that stream you should be fine.
It doesn't matter that ferror is being called from a different function that getchar or putchar was called from.
There is no problem in your function. ferror() checks the error indicator of the FILE * that is passed as argument. In other words, the error indicator is a property of the file object and is directly obtainable from the FILE * pointer. Therefore, no matter where you call ferror() from, it will be able to determine if an error happened with the FILE * that is passed as argument (that is, of course, if the argument is valid).
are the return values of ferror(stdin) / ferror(stdout) impacted by
the fact that I am checking them in a function rather than in the
main?
The return value of ferror() is characteristic of the current state of the stream that you provide as an argument. At any given time, the stdin provided by stdio.h refers to the same stream, with the same state, in every function, including main(). Therefore, you will obtain the same result by calling the ferror() function indirectly, via an intermediary function, as you would by calling it directly.
NEVERTHELESS, the approach you present in your example is poor. For the most part, C standard library functions indicate whether an error has occurred via their return values. In particular, getchar() returns a special value, represented by the macro EOF, if either the end of the file is encountered or an error occurs. This is typical of the stdio functions. You should consistently test functions' return values to recognize when exceptional conditions have occurred. For stream functions, you should call ferror() and / or feof() only after detecting such a condition, and only if you want to distinguish between the end-of-file case and the I/O error case (and the "neither" case for some functions). See also "Why is 'while ( !feof (file) )' always wrong?"
Personally, I probably would not write a generic function such as your streamCheck() at all, as error handling is generally situation specific. But if I did write one, then I'd certainly have it test just one stream that I specify to it. Something like this, for example:
void streamCheck(FILE *stream, const char *message, int exit_code) {
if (ferror(stream)) {
fputs(message, stderr);
exit(exit_code);
}
}
int main(void) {
int c = getchar();
if (c == EOF) {
streamCheck(stdin, "stdin err\n", 1);
}
// ...
if (putchar(c) == EOF) {
streamCheck("stdout err\n", 2);
}
}

EOF on macos and centos different result

I'm using EOF to jump out of a 'while' loop, and want to input some numbers by 'scanf'.The 'scanf' outside the loop doesn't work on macos.
I've tried running this code on macos and centos.The result on centos is what I need.
#include <stdio.h>
#include <stdlib.h>
int main(){
int i;
//ctrl+d=EOF
while(scanf("%d",&i) != EOF){
printf("?");
}
printf("\nloopend\n");
//those 'scanf' are ignored on macos.
scanf("%d",&i);
scanf(" %d",&i);
scanf("%d ",&i);
printf("\nend\n");
}
Input(without ','):
1,\n,ctrl+d
Output(centos):
1
?
loopend
//waiting for input here
Output(macos):
1
?
loopend
end
//the program ended directly
The MacOS behaviour is correct. According to the C standard §7.21.7.1/3 (the fgetc library function), the end-of-file indication is sticky; once fgetc sees an EOF, it must set the file's end-of-file indicator which will cause subsequent calls to return EOF until the end-of-file indicator is cleared, for example with clearerr():
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end- of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.
Since other input functions, including scanf, are supposed to act as if implemented by repeated calls to fgetc, EOF should be sticky for them, too. If you want to continue reading after you receive an EOF return, you should call clearerr() on the stream. (Or something else which resets the indicator, such as seek().)
For many years, the Gnu implementation of the standard C library did not follow the standard. It only reported EOF once, leaving the next fgetc to wait for more input on devices like terminals and pipes. The bug, reported in 2006, was finally fixed in v2.28, released in August 2018, although that might not yet be part of the Centos distro.
[Note: there is a longer discussion about this behaviour in this answer, including a now-outdated grump (by me) and some links to historic discussions about the issue.]
In any case, it has always been clear that portable code should call clearerr(), since BSD-derived standard library implementations (including MacOS) follow the standard as quoted above.

Why does an fread loop require an extra Ctrl+D to signal EOF with glibc?

Normally, to indicate EOF to a program attached to standard input on a Linux terminal, I need to press Ctrl+D once if I just pressed Enter, or twice otherwise. I noticed that the patch command is different, though. With it, I need to press Ctrl+D twice if I just pressed Enter, or three times otherwise. (Doing cat | patch instead doesn't have this oddity. Also, If I press Ctrl+D before typing any real input at all, it doesn't have this oddity.) Digging into patch's source code, I traced this back to the way it loops on fread. Here's a minimal program that does the same thing:
#include <stdio.h>
int main(void) {
char buf[4096];
size_t charsread;
while((charsread = fread(buf, 1, sizeof(buf), stdin)) != 0) {
printf("Read %zu bytes. EOF: %d. Error: %d.\n", charsread, feof(stdin), ferror(stdin));
}
printf("Read zero bytes. EOF: %d. Error: %d. Exiting.\n", feof(stdin), ferror(stdin));
return 0;
}
When compiling and running the above program exactly as-is, here's a timeline of events:
My program calls fread.
fread calls the read system call.
I type "asdf".
I press Enter.
The read system call returns 5.
fread calls the read system call again.
I press Ctrl+D.
The read system call returns 0.
fread returns 5.
My program prints Read 5 bytes. EOF: 1. Error: 0.
My program calls fread again.
fread calls the read system call.
I press Ctrl+D again.
The read system call returns 0.
fread returns 0.
My program prints Read zero bytes. EOF: 1. Error: 0. Exiting.
Why does this means of reading stdin have this behavior, unlike the way that every other program seems to read it? Is this a bug in patch? How should this kind of loop be written to avoid this behavior?
UPDATE: This seems to be related to libc. I originally experienced it on glibc 2.23-0ubuntu3 from Ubuntu 16.04. #Barmar noted in the comments that it doesn't happen on macOS. After hearing this, I tried compiling the same program against musl 1.1.9-1, also from Ubuntu 16.04, and it didn't have this problem. On musl, the sequence of events has steps 12 through 14 removed, which is why it doesn't have the problem, but is otherwise the same (except for the irrelevant detail of readv in place of read).
Now, the question becomes: is glibc wrong in its behavior, or is patch wrong in assuming that its libc won't have this behavior?
I've managed to confirm that this is due to an unambiguous bug in glibc versions prior to 2.28 (commit 2cc7bad). Relevant quotes from the C standard:
The byte input/output functions — those functions described in this subclause that perform
input/output: [...], fread
The byte input functions read characters from the stream as if by successive
calls to the fgetc function.
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream.
(emphasis on "or" mine)
The following program demonstrates the bug with fgetc:
#include <stdio.h>
int main(void) {
while(fgetc(stdin) != EOF) {
puts("Read and discarded a character from stdin");
}
puts("fgetc(stdin) returned EOF");
if(!feof(stdin)) {
/* Included only for completeness. Doesn't occur in my testing. */
puts("Standard violation! After fgetc returned EOF, the end-of-file indicator wasn't set");
return 1;
}
if(fgetc(stdin) != EOF) {
/* This happens with glibc in my testing. */
puts("Standard violation! When fgetc was called with the end-of-file indicator set, it didn't return EOF");
return 1;
}
/* This happens with musl in my testing. */
puts("No standard violation detected");
return 0;
}
To demonstrate the bug:
Compile the program and execute it
Press Ctrl+D
Press Enter
The exact bug is that if the end-of-file stream indicator is set, but the stream is not at end-of-file, glibc's fgetc will return the next character from the stream, rather than EOF as the standard requires.
Since fread is defined in terms of fgetc, this is the cause of what I originally saw. It's previously been reported as glibc bug #1190 and has been fixed since commit 2cc7bad in February 2018, which landed in glibc 2.28 in August 2018.

fread and ferror don't set errno

I'm trying to check when fread() raises an error, so I use ferror().
chunk = fread(buf, 1, 100, file);
if (ferror(file))
{
return errno;
}
But, ferror() man page (man 3 ferror, or just man ferror) says:
ERRORS
These functions should not fail and do not set the external variable errno.
So, how can I know the error type occurred when file has been read, although fread() and ferror() didn't set errno?
You can't get there from here.
fread does not set errno (as you have discovered), and as such you cannot really determine much about a specific error state; only that there is one. The exact nature of the error is generally implementation-dependent. There is no C standard-library-based portable way to gather it .
For specific system-level errors, you can slum it to system-calls, possibly suffering with the pitfalls like poor/nonexistent IO buffering along the way. There POSIX can somewhat come to your rescue. Calls like read, do set errno and have a fairly detailed set of possible outcomes. That may be an option for you if the platform you're working with is POSIX compliant and the code is really so critical to be in-the-know.
But from the C standard library, you're not going to find much beyond being told an error has happened. Generally you'll find you don't need more than that anyway.
Those functions don't use errno, so you shouldn't either.
It is worth noting that you can tell if everything went smoothly from the return value of fread(). If the return value of fread() differs from the passed nmemb parameter (100 in your case), then you either reached the end of your file or an error occured reading it (source). So test only in that case:
Just drop the use of errno alltogether:
chunk = fread(buf, 1, 100, file);
if (chunk != 100) { // If fread() returns a number different to the nmemb parameter, either error or EOF occured
if (ferror(file))
{
printf("Error occured while reading file.");
return -1; // Or what ever return value you use to indicate an error
}
}

Does EOF set errno?

I always struggle with return values of system calls - they are just so inconsistent!
Normally I check if they are NULL or -1 and then call perror. However, for fgets, the man page says:
gets() and fgets() return s on success, and NULL on error or when end of file occurs while no characters have been read.
which means the return value NULL is not necessarily an error - it can also be EOF. Is errno set when the end of file is reached? Can I still call perror in this case?
If not, what is the common way to tell if the call returned an error versus EOF. I want to use perror with NULL string for errors and a custom string for EOF.
Use ferror and feof to distinguish between error and EOF. There's no general way to find out exactly what the error was, if there was an error, but you can tell that there was one.
Standard C (f)gets (and (f)getc) are not required to set errno, although a conforming library implementation can set errno to a non-zero value pretty much at will.
Posix does requires that (f)get{c,s} set errno on read errors, but that only helps you after you have determined that there was a read error (by calling ferror). It's also important to remember that library functions never set errno to 0, but may set errno to a non-zero value even if no error occurs. So you cannot test errno as a replacement for checking the error return of any library function, including fgets. And since end-of-file is not an error, errno will (probably) not be modified on EOF, so its value in that case is meaningless.
According to fputs own documentation, yes, EOF does set errno. The man pages infer it indirectly as opposed to stating it outright, which hopefully will be amended. The function fputs returns an integer that will either be positive on success or EOF on failure. So the key to error handling fputs is to setup a code block that checks the return value of fputs as it is being called. The following is a snippet of how I've been taught to handle fputs errors.
if (fputs(buffer, stdout) == EOF)
{
fprintf(stderr, "fputs returned EOF: %s\n", strerror(errno));
// .. and now do whatever cleanup you need to do.
// or be lazy and exit(-1)
}
Here I am writing the contents of buffer to standard output and checking to see if fputs returns EOF. EOF indicates an error code was set, so as long as you follow the documentation on the man pages for fputs, you should be able to create a bunch of if statements to check the various error codes errno can be set to.
(1) What is buffer? Some character array I declared elsewhere.
(2) What does fprintf do? It prints output to a passed in file descriptor, which is in this case standard error (stderr... it prints to console like stdout, but for errors).
(3) What is strerror? It is a function defined in the string.h header that prints error information for the passed in error code. It has information for every single error code that errno can be set to. The header string.h should NOT be confused with strings.h, which is a BSD linux header file that does not contain strerror(3).
Edit: Ok, I messed up. You were looking for an answer on fgets, not fputs.
To check for an error on fgets, do the following
if (fgets(buffer, BUF_SIZE, myFile) == NULL)
{
// print out error as a string to stderr
fprintf(stderr, "fgets error occurred: %s\n", strerror(errno));
// do cleanup
}
// special: you also need to check errno AFTER the if statement...
The thing is, the only way you are getting an error on this is if the stream becomes unreadable, which is either due to permissions or trying to read something that is in write mode. In the case of a network, it may be possible for something to cut off your connection in the middle of reading, in which case you need to check the error code after the fgets if statement as well. But it will set the error code if something went wrong.
At least that is if the man pages are correct. See the linux man pages for more details. Only error code that can be set is the "I can't read this thing" code, which is errno == EBADF

Resources