EOF on macos and centos different result - c

I'm using EOF to jump out of a 'while' loop, and want to input some numbers by 'scanf'.The 'scanf' outside the loop doesn't work on macos.
I've tried running this code on macos and centos.The result on centos is what I need.
#include <stdio.h>
#include <stdlib.h>
int main(){
int i;
//ctrl+d=EOF
while(scanf("%d",&i) != EOF){
printf("?");
}
printf("\nloopend\n");
//those 'scanf' are ignored on macos.
scanf("%d",&i);
scanf(" %d",&i);
scanf("%d ",&i);
printf("\nend\n");
}
Input(without ','):
1,\n,ctrl+d
Output(centos):
1
?
loopend
//waiting for input here
Output(macos):
1
?
loopend
end
//the program ended directly

The MacOS behaviour is correct. According to the C standard §7.21.7.1/3 (the fgetc library function), the end-of-file indication is sticky; once fgetc sees an EOF, it must set the file's end-of-file indicator which will cause subsequent calls to return EOF until the end-of-file indicator is cleared, for example with clearerr():
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end- of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.
Since other input functions, including scanf, are supposed to act as if implemented by repeated calls to fgetc, EOF should be sticky for them, too. If you want to continue reading after you receive an EOF return, you should call clearerr() on the stream. (Or something else which resets the indicator, such as seek().)
For many years, the Gnu implementation of the standard C library did not follow the standard. It only reported EOF once, leaving the next fgetc to wait for more input on devices like terminals and pipes. The bug, reported in 2006, was finally fixed in v2.28, released in August 2018, although that might not yet be part of the Centos distro.
[Note: there is a longer discussion about this behaviour in this answer, including a now-outdated grump (by me) and some links to historic discussions about the issue.]
In any case, it has always been clear that portable code should call clearerr(), since BSD-derived standard library implementations (including MacOS) follow the standard as quoted above.

Related

Why does an fread loop require an extra Ctrl+D to signal EOF with glibc?

Normally, to indicate EOF to a program attached to standard input on a Linux terminal, I need to press Ctrl+D once if I just pressed Enter, or twice otherwise. I noticed that the patch command is different, though. With it, I need to press Ctrl+D twice if I just pressed Enter, or three times otherwise. (Doing cat | patch instead doesn't have this oddity. Also, If I press Ctrl+D before typing any real input at all, it doesn't have this oddity.) Digging into patch's source code, I traced this back to the way it loops on fread. Here's a minimal program that does the same thing:
#include <stdio.h>
int main(void) {
char buf[4096];
size_t charsread;
while((charsread = fread(buf, 1, sizeof(buf), stdin)) != 0) {
printf("Read %zu bytes. EOF: %d. Error: %d.\n", charsread, feof(stdin), ferror(stdin));
}
printf("Read zero bytes. EOF: %d. Error: %d. Exiting.\n", feof(stdin), ferror(stdin));
return 0;
}
When compiling and running the above program exactly as-is, here's a timeline of events:
My program calls fread.
fread calls the read system call.
I type "asdf".
I press Enter.
The read system call returns 5.
fread calls the read system call again.
I press Ctrl+D.
The read system call returns 0.
fread returns 5.
My program prints Read 5 bytes. EOF: 1. Error: 0.
My program calls fread again.
fread calls the read system call.
I press Ctrl+D again.
The read system call returns 0.
fread returns 0.
My program prints Read zero bytes. EOF: 1. Error: 0. Exiting.
Why does this means of reading stdin have this behavior, unlike the way that every other program seems to read it? Is this a bug in patch? How should this kind of loop be written to avoid this behavior?
UPDATE: This seems to be related to libc. I originally experienced it on glibc 2.23-0ubuntu3 from Ubuntu 16.04. #Barmar noted in the comments that it doesn't happen on macOS. After hearing this, I tried compiling the same program against musl 1.1.9-1, also from Ubuntu 16.04, and it didn't have this problem. On musl, the sequence of events has steps 12 through 14 removed, which is why it doesn't have the problem, but is otherwise the same (except for the irrelevant detail of readv in place of read).
Now, the question becomes: is glibc wrong in its behavior, or is patch wrong in assuming that its libc won't have this behavior?
I've managed to confirm that this is due to an unambiguous bug in glibc versions prior to 2.28 (commit 2cc7bad). Relevant quotes from the C standard:
The byte input/output functions — those functions described in this subclause that perform
input/output: [...], fread
The byte input functions read characters from the stream as if by successive
calls to the fgetc function.
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream.
(emphasis on "or" mine)
The following program demonstrates the bug with fgetc:
#include <stdio.h>
int main(void) {
while(fgetc(stdin) != EOF) {
puts("Read and discarded a character from stdin");
}
puts("fgetc(stdin) returned EOF");
if(!feof(stdin)) {
/* Included only for completeness. Doesn't occur in my testing. */
puts("Standard violation! After fgetc returned EOF, the end-of-file indicator wasn't set");
return 1;
}
if(fgetc(stdin) != EOF) {
/* This happens with glibc in my testing. */
puts("Standard violation! When fgetc was called with the end-of-file indicator set, it didn't return EOF");
return 1;
}
/* This happens with musl in my testing. */
puts("No standard violation detected");
return 0;
}
To demonstrate the bug:
Compile the program and execute it
Press Ctrl+D
Press Enter
The exact bug is that if the end-of-file stream indicator is set, but the stream is not at end-of-file, glibc's fgetc will return the next character from the stream, rather than EOF as the standard requires.
Since fread is defined in terms of fgetc, this is the cause of what I originally saw. It's previously been reported as glibc bug #1190 and has been fixed since commit 2cc7bad in February 2018, which landed in glibc 2.28 in August 2018.

File operations with error indicator set

I have to individually read characters and substrings from a stream in C while parsing them. I wish also to check for input error. The obvious way to do this is something like:
c = fgetc(f);
if(ferror(f)) {
puts(strerror(errno));
exit(1);
}
/* do something with c */
c = fgetc(f);
if(ferror(f)) {
puts(strerror(errno));
exit(1);
}
/* do something with c */
Etc. However, it would be much more practical and fast (in the non-exceptional case when there's no error) if I could do all the input operations and check for the error indicator later:
c = fgetc(f);
/* do something with c */
c = fgetc(f);
/* do something with c */
if(ferror(f)) {
puts(strerror(errno));
exit(1);
}
This would be possible if input operations like fgetc(), scanf() etc were simple passthrough no-ops when the error indicator of f is set. Say, an error occours in the first fgetc() and therefore the second fgetc() is a no-op that fails but change neither the error indicator of f nor errno.
A similiar question may be asked about output functions.
My question is: is this the behaviour of stdio functions? May I check ferror(f) after all operations and get errno then if I am sure that all those "do something with c" do not change errno?
Thanks!
No, those are not the defined semantics of errno.
Quoting this manual page:
Its value is significant only when the return value of the call indicated an error (i.e., -1 from most system calls; -1 or NULL from most library functions); a function that succeeds is allowed to change errno.
This implies that if you were to do two I/O operations where the first fails, and the second is a "no-op" (like read zero bytes) it could succeed and opt to clear errno, thus dropping the error set by the first call.
Answering my own question, it seems that the reliable way to implement what I was looking for is to write a wrapper function myfgetc(), as Michael Walz suggested, together with a global variable myerrno:
__thread int myerrno = 0;
int myfgetc(FILE *f) {
int c;
if(myerrno)
return EOF;
if((c = fgetc(f)) == EOF)
myerrno = errno;
return c;
}
The storage class __thread is added to myerrno so that every thread has its own myerrno. It can be ommited if the program is single threaded.
...is this (an error occurs in the first fgetc() and therefore the second fgetc() is a no-op that fails) the behaviour of stdio functions?
No - not a no-op.
FILE has: "an error indicator that records whether a read/write error has occurred," (C11dr § § 7.21.1 2), not that an error just occurred. It is a flag that accumulates the history of read errors.
For fgetc() and friends,
If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF. C11dr § 7.21.7.1 3.
This return of EOF due to input error differs from EOF due to end-of-file. The latter has an or "If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF". EOF due to input error does not have an or.
I interpret this to imply that the error indicator of the stream can be true and fgetc() does not return EOF as the byte just read was not in error.
How a stream error indicator affects following input code? may be useful.
May I check ferror(f) after all operations and get errno then if I am sure that all those "do something with c" do not change errno?
errno not that useful here. C does not specify any I/O functions as certainly setting errno - that is an extension of some compilers. C expressly prohibits standard functions from clearing errno.
Yes, code can check ferror(f) to see if an error had occurred sometime in the past. Examination of errno is not needed.
To clear both error indicator and end-of-file, research clearer().

getchar() continues to accept input after including Ctrl+Z in same line

Simple c program to accept and print the character.
int c;
while((c=getchar())!=EOF)
{
putchar(c);
}
I am not getting why it accept input when I press Ctrl+Z at the end of line
ex Hello(press Ctrl+Z)
hello (some symbol)
but it work properly after leaving a line then pressing Ctrl+Z.
And I am using Window 7
When you call getchar() it in turn ends up making a read() system call. The read() will block until it has some characters available. The terminal driver only makes characters available when you press the return key or the key signifying end of input. At least that is what it looks like. In reality, it's more involved.
On the assumption that by ctrl-Z you mean whatever keystroke combination means "end of input", the reason is the way that the read() system call works. The ctrl-D (it's ctrl-D on most Unixes) character doesn't mean "end of input", it means "send the current pending input to the application".
When you've typed something in before pressing ctrl-D, that input gets sent to the application which is probably blocked on the read() system call. read() fills a buffer with the input and returns the number of bytes it put in the buffer.
When you press ctrl-D without any input pending (i.e. the last thing you didwas hit return or ctrl-D, the same thing happens but there are no characters, so read() returns 0. read() returning 0 is the convention for end of input. When getchar() sees this, it returns EOF to the calling program.
This answer in Stack Exchange puts it a bit more clearly
https://unix.stackexchange.com/a/177662/6555
You have not said what system you are working on, [U|Li]nix or Windows. This answer is Windows specific. For [Li|U]nix, replace references to ctrl-z with ctrl-d.
While using a terminal, Ctrl-z will not produce an EOF (-1) (see good answers from Haccks & JeremyP for detailed whys), so the loop will not exit the way you have it written. However, you can put a test for ctrl-z in your while loop condition to exit...
int main ()
{
int c=0;
puts ("Enter text. ctrl-z to exit:");
while(c != 26) //(26 is the ASCII value for ctrl-z)
{
putchar(c);
c = getchar();
}
return 0;
}
By the way, here is a table showing the values for ASCII control characters.
I found the answer on wiki:
In Microsoft's DOS and Windows (and in CP/M and many DEC operating systems), reading from the terminal will never produce an EOF. Instead, programs recognize that the source is a terminal (or other "character device") and interpret a given reserved character or sequence as an end-of-file indicator; most commonly this is an ASCII Control-Z, code 26.
#include <stdio.h>
int main()
{
int c;
while((c=getchar())!=26)
{
putchar(c);
}
}
You can use ASCII value of CTRL-Z.Now it won't take input after pressing CTRL-Z.
getchar() fution read single character by Pressing Ctrl+Z sends the TSTP signal to your process, means terminate the process (unix/linux)

Issues with standard input in C

I'm making a simple program in C that reads an input. It then displays the number of characters used.
What I tried first:
#include <stdio.h>
int main(int argc, char** argv) {
int currentChar;
int charCount = 0;
while((currentChar = getchar()) != EOF) {
charCount++;
}
printf("Display char count? [y/n]");
int response = getchar();
if(response == 'y' || response == 'Y')
printf("Count: %d\n",charCount);
}
What happened:
I would enter some lines and end it with ^D (I'm on Mac). The program would not wait at int response = getchar();. I found online that this is because there is still content left in the input stream.
My first question is what content would that be? I don't enter anything after pressing ^D to input EOF and when I tried to print anything left in the stream, it would print a ?.
What I tried next:
Assuming there were characters left in the input stream, I made a function to clear the input buffer:
void clearInputBuffer() {
while(getchar() != '\n') {};
}
I called the function right after the while loop:
while((currentChar = getchar()) != EOF) {
charCount++;
}
clearInputBuffer();
Now I would assume if there is anything left after pressing ^D, it would be cleared up to the next \n.
But instead, I can't stop the input request. When I press ^D, rather than sending EOF to currentChar, a ^D is shown on the terminal.
I know there is a probably a solution to this online, but since I'm not sure what exactly my problem is, I don't really know what to look for.
Why is this happening? Can someone also explain exactly what is going on behind the scenes of this program and the Terminal?
man 3 termios - search for VEOF. That will tell you what it actually does.
If you need more explanation, I'll start by saying the ISO C stdin stream has a default buffer, so any bytes read are stored into that buffer unless this behavior is somehow overridden (e.g. setvbuf).
The getchar function will read from this default buffer unless the buffer has no characters in it left to read. In that case, it will call the read function to actually store new data into that buffer and return the number of bytes read.
However, your terminal has its own input buffer. It will wait for an input sequence recognized as an end-of-line (EOL) delimiter. This is where things get interesting. If ICANON is enabled, and you use Ctrl+D with bytes in the terminal's input buffer already, then you effectively will send all of that pending bytes to the program, as if you had entered an end-of-line delimiter. The read function will receive those bytes and store them in the input buffer used for stdin, resulting in getchar returning an appropriate value.
If Ctrl+D is pressed with no pending bytes in the terminal's input buffer, no data will be sent, read will return 0, and EOF gets returned by getchar after getchar sets the end-of-file indicator for the stdin stream.
Given the two behaviors of Ctrl+D, it follows that pressing it twice will send all pending bytes on the first key press, effectively emptying the terminal's input buffer, followed by the second key press sending 0 bytes to read, which means getchar returns EOF and the end-of-file indicator for stdin is set.
If an error occurs (e.g. stdin was closed), read itself will return -1, and getchar will return EOF after setting the error indicator for the stdin stream. The following may help to illustrate the idea of how it works, though there's likely a lot more going on behind the scenes with the TTY itself than just waiting for an EOL or VEOF and sending data after either one is detected:
Of course, if ICANON isn't set on the controlling terminal, then you will never receive EOF unless your input is not from a terminal because suddenly certain special key sequences like Ctrl+D won't be recognized as special key sequences since the feature is turned off.
For a bit more completeness, please note that the ICANON bit and termios stuff in general do not necessarily apply much on Windows. The Windows Command Prompt uses Ctrl+Z for one thing, and the Windows operating system has no concept of terminals other than things like the _isatty C runtime function that is used to detect whether a file descriptor points to a file description that involves a console handle.
Pressing Ctrl+Z with data pending will effectively cancel any remaining input that follows it, though an end-of-line character (Ctrl+M or Enter) still needs to be pressed for the data to be sent unless processed input was disabled by using the SetConsoleMode Windows API function.
If pressed with no input data pending and sent by entering an end-of-line character, it acts as EOF. For example, hello^Z1234^M results in hello^Z being read, and everything including the ^M end-of-line character is ignored. ^Z1234^M or just ^Z^M will trigger EOF.
Operating systems are weird.
Ctrl+D is a bit weird on Unix -- it's not actually an EOF character. Rather, it's a signal to the shell that stdin should be closed. As a result, the behavior can be somewhat unintuitive. Two Ctrl+Ds in a row, or a Return followed by a Ctrl+D, will give you the behavior you're looking for. I tested it with this code:
#include <stdio.h>
int main(void) {
size_t charcount = 0;
while (getchar() != EOF)
charcount++;
printf("Characters: %zu\n", charcount);
return 0;
}
Edited to include chux's format character suggestion.
You can do it (also) this way:
fseek(stdin,0,SEEK_END);
This works fine for me.

Confusion about how a getchar() loop works internally

I've included an example program using getchar() below, for reference (not that anyone probably needs it), and feel free to address concerns with it if you desire. But my question is:
What exactly is going on when the program calls getchar()?
Here is my understanding (please clarify or correct me):
When getchar is called, it checks the STDIN buffer to see if there is any input.
If there isn't any input, getchar sleeps.
Upon wake, getchar checks to see if there is any input, and if not, puts it self to sleep again.
Steps 2 and 3 repeat until there is input.
Once there is input (which by convention includes an 'EOF' at the end), getchar returns the first character of this input and does something to indicate that the next call to getchar should return the second letter from the same buffer? I'm not really sure what that is.
When there are no more characters left other than EOF, does getchar flush the buffer?
The terms I used are probably not quite correct.
#include <stdio.h>
int getLine(char buffer[], int maxChars);
#define MAX_LINE_LENGTH 80
int main(void){
char line[MAX_LINE_LENGTH];
int errorCode;
errorCode = getLine(line, sizeof(line));
if(errorCode == 1)
printf("Input exceeded maximum line length of %d characters.\n", MAX_LINE_LENGTH);
printf("%s\n", line);
return 0;
}
int getLine(char buffer[], int maxChars){
int c, i = 0;
while((c = getchar()) != EOF && c != '\n' && i < maxChars - 1)
buffer[i++] = c;
buffer[i++] = '\0';
if(i == maxChars)
return 1;
else
return 0;
}
Step 2-4 are slightly off.
If there is no input in the standard I/O buffer, getchar() calls a function to reload the buffer. On a Unix-like system, that normally ends up calling the read() system call, and the read() system call puts the process to sleep until there is input to be processed, or the kernel knows there will be no input to be processed (EOF). When the read returns, the code adjusts the data structures so that getchar() knows how much data is available. You description implies polling; the standard I/O system does not poll for input.
Step 5 uses the adjusted pointers to return the correct values.
There really isn't an EOF character; it is a state, not a character. Even though you type Control-D or Control-Z to indicate 'EOF', that character is not inserted into the input stream. In fact, those characters cause the system to flush any typed characters that are still waiting for 'line editing' operations (like backspace) to change them so that they are made available to the read() system call. If there are no such characters, then read() returns 0 as the number of available characters, which means EOF. Then getchar() returns the value EOF (usually -1 but guaranteed to be negative whereas valid characters are guaranteed to be non-negative (zero or positive)).
So basically, rather than polling, is it that hitting Return causes a certain I/O interrupt, and then when the OS receives this, it wakes up any processes that are sleeping for I/O?
Yes, hitting Return triggers interrupts and the OS kernel processes them and wakes up processes that are waiting for the data. The terminal driver is woken by the kernel when interrupt occurs, and decides what to do with the character(s) that were just received. They may be stashed for further processing (canonical mode) or made available immediately (raw mode), etc. Assuming, of course, that the input is a terminal; if the input is from a disk file, it is simpler in many ways — or if it is a pipe, or …
Nominally, it isn't the terminal app that gets woken by the interrupt; it is the kernel that wakes first, then the shell running in the terminal app that is woken because there's data for it to read, and only when there's output does the terminal app get woken.
I say 'nominally' because there's an outside chance that in fact the terminal app does mediate the I/O via a pty (pseudo-tty), but I think it happens at the kernel level and the terminal application is involved fairly late in the process. There's a huge disconnect really between the keyboard where you type and the display where what you type appears.
See also Canonical vs non-canonical terminal input.

Resources