C getchar() and EOF behavior [duplicate] - c

Why Ctrl+Z does not trigger the loop to finish on the following small program?
#include <stdio.h>
main()
{
int c;
while ((c = getchar()) != EOF)
{
//nothing
}
return 0;
}
If I enter: test^ZEnter, it does not get out of the loop.
I found related questions around (here and here) but none to explain it for C (not C++) under Windows.
Note: I use Visual Studio 2015 PRE on a windows 8.1

You need to hit Enter and then use ctrl+Z and then Enter again.
or, you may also use F6

EOF like you use it is not a character. It's the status in which that stream is.
I mean, heck, you even link this question, so you might as well read the accepted answer:
The underlying form of an EOF is a zero-length read.
It's not an "EOF character".
http://www.c-faq.com/stdio/getcharc.html cites a different case than yours, where someone stored the return value of getchar in a char. The underlying problem still occurs occasionally: different runtimes implement different values for the EOF integer (which is why I said, it's not an EOF character), and things love to go wrong. Especially in Visual C++, which is not a "real" C compiler but a C++ compiler with a compatibility mode, it seems things can go wrong.

Related

Why does this loop forever if a non-numeric is entered, but works as intended for an out-of-range numeric [duplicate]

So a quick Google search for fflush(stdin) for clearing the input buffer reveals numerous websites warning against using it. And yet that's exactly how my CS professor taught the class to do it.
How bad is using fflush(stdin)? Should I really abstain from using it, even though my professor is using it and it seems to work flawlessly?
Simple: this is undefined behavior, since fflush is meant to be called on an output stream. This is an excerpt from the C standard:
int fflush(FILE *ostream);
ostream points to an output stream or
an update stream in which the most
recent operation was not input, the
fflush function causes any unwritten
data for that stream to be delivered
to the host environment to be written
to the file; otherwise, the behavior
is undefined.
So it's not a question of "how bad" this is. fflush(stdin) is simply not portable, so you should not use it if you want your code to be portable between compilers.
Converting comments into an answer.
TL;DR — Portable code doesn't use fflush(stdin)
The rest of this answer explains why portable code does not use fflush(stdin). It is tempting to add "reliable code doesn't use fflush(stdin)", which is also generally true.
Standard C and POSIX leave fflush(stdin) as undefined behaviour
The POSIX, C and C++ standards for fflush() explicitly state that the behaviour is undefined (because stdin is an input stream), but none of them prevent a system from defining it.
ISO/IEC 9899:2011 — the C11 Standard — says:
§7.21.5.2 The fflush function
¶2 If stream points to an output stream or an update stream in which the most recent operation was not input, the fflush function causes any unwritten data for that stream to be delivered to the host environment to be written to the file; otherwise, the behavior is undefined.
POSIX mostly defers to the C standard but it does mark this text as a C extension.
[CX] ⌦ For a stream open for reading, if the file is not already at EOF, and the file is one capable of seeking, the file offset of the underlying open file description shall be set to the file position of the stream, and any characters pushed back onto the stream by ungetc() or ungetwc() that have not subsequently been read from the stream shall be discarded (without further changing the file offset). ⌫
Note that terminals are not capable of seeking; neither are pipes or sockets.
Microsoft defines the behaviour of fflush(stdin)
In 2015, Microsoft and the Visual Studio runtime used to define the behaviour of fflush() on an input stream like this (but the link leads to different text in 2021):
If the stream is open for input, fflush clears the contents of the buffer.
M.M notes:
Cygwin is an example of a fairly common platform on which fflush(stdin) does not clear the input.
This is why this answer version of my comment notes 'Microsoft and the Visual Studio runtime' — if you use a non-Microsoft C runtime library, the behaviour you see depends on that library.
Weather Vane pointed out to me in a comment to another question that, at some time before June 2021, Microsoft changed its description of fflush() compared with what was originally specified when this answer was written in 2015. It now says:
If the stream was opened in read mode, or if the stream has no buffer, the call to fflush has no effect, and any buffer is retained. A call to fflush negates the effect of any prior call to ungetc for the stream.
Caveat Lector: it is probably best not to rely on fflush(stdin) on any platform.
Linux documentation and practice seem to contradict each other
Surprisingly, Linux nominally documents the behaviour of fflush(stdin) too, and even defines it the same way (miracle of miracles). This quote is from 2015.
For input streams, fflush() discards any buffered data that has been fetched from the underlying file, but has not been consumed by the application.
In 2021, the quote changes to:
For input streams, fflush() discards any buffered data that has been fetched from the underlying file, but has not been consumed by the application. The open status of the stream is unaffected.
And another source for fflush(3) on Linux agrees (give or take paragraph breaks):
For input streams associated with seekable files (e.g., disk files, but not pipes or terminals), fflush() discards any buffered data that has been fetched from the underlying file, but has not been consumed by the application.
Neither of these explicitly addresses the points made by the POSIX specification about ungetc().
In 2021, zwol commented that the Linux documentation has been improved.
It seems to me that there is still room for improvement.
In 2015, I was a bit puzzled and surprised at the Linux documentation saying that fflush(stdin) will work.
Despite that suggestion, it most usually does not work on Linux. I just checked the documentation on Ubuntu 14.04 LTS; it says what is quoted above, but empirically, it does not work — at least when the input stream is a non-seekable device such as a terminal.
demo-fflush.c
#include <stdio.h>
int main(void)
{
int c;
if ((c = getchar()) != EOF)
{
printf("Got %c; enter some new data\n", c);
fflush(stdin);
}
if ((c = getchar()) != EOF)
printf("Got %c\n", c);
return 0;
}
Example output
$ ./demo-fflush
Alliteration
Got A; enter some new data
Got l
$
This output was obtained on both Ubuntu 14.04 LTS and Mac OS X 10.11.2. To my understanding, it contradicts what the Linux manual says. If the fflush(stdin) operation worked, I would have to type a new line of text to get information for the second getchar() to read.
Given what the POSIX standard says, maybe a better demonstration is needed, and the Linux documentation should be clarified.
demo-fflush2.c
#include <stdio.h>
int main(void)
{
int c;
if ((c = getchar()) != EOF)
{
printf("Got %c\n", c);
ungetc('B', stdin);
ungetc('Z', stdin);
if ((c = getchar()) == EOF)
{
fprintf(stderr, "Huh?!\n");
return 1;
}
printf("Got %c after ungetc()\n", c);
fflush(stdin);
}
if ((c = getchar()) != EOF)
printf("Got %c\n", c);
return 0;
}
Example output
Note that /etc/passwd is a seekable file. On Ubuntu, the first line looks like:
root:x:0:0:root:/root:/bin/bash
On Mac OS X, the first 4 lines look like:
##
# User Database
#
# Note that this file is consulted directly only when the system is running
In other words, there is commentary at the top of the Mac OS X /etc/passwd file. The non-comment lines conform to the normal layout, so the root entry is:
root:*:0:0:System Administrator:/var/root:/bin/sh
Ubuntu 14.04 LTS:
$ ./demo-fflush2 < /etc/passwd
Got r
Got Z after ungetc()
Got o
$ ./demo-fflush2
Allotrope
Got A
Got Z after ungetc()
Got B
$
Mac OS X 10.11.2:
$ ./demo-fflush2 < /etc/passwd
Got #
Got Z after ungetc()
Got B
$
The Mac OS X behaviour ignores (or at least seems to ignore) the fflush(stdin) (thus not following POSIX on this issue). The Linux behaviour corresponds to the documented POSIX behaviour, but the POSIX specification is far more careful in what it says — it specifies a file capable of seeking, but terminals, of course, do not support seeking. It is also much less useful than the Microsoft specification.
Summary
Microsoft documents the behaviour of fflush(stdin), but that behaviour has changed between 2015 and 2021. Apparently, it works as documented on the Windows platform, using the native Windows compiler and C runtime support libraries.
Despite documentation to the contrary, it does not work on Linux when the standard input is a terminal, but it seems to follow the POSIX specification which is far more carefully worded. According to the C standard, the behaviour of fflush(stdin) is undefined. POSIX adds the qualifier 'unless the input file is seekable', which a terminal is not. The behaviour is not the same as Microsoft's.
Consequently, portable code does not use fflush(stdin). Code that is tied to Microsoft's platform may use it and it may work as expected, but beware of the portability issues.
POSIX way to discard unread terminal input from a file descriptor
The POSIX standard way to discard unread information from a terminal file descriptor (as opposed to a file stream like stdin) is illustrated at How can I flush unread data from a tty input queue on a Unix system. However, that is operating below the standard I/O library level.
According to the standard, fflush can only be used with output buffers, and obviously stdin isn't one. However, some standard C libraries provide the use of fflush(stdin) as an extension. In that case you can use it, but it will affect portability, so you will no longer be able to use any standards-compliant standard C library on earth and expect the same results.
I believe that you should never call fflush(stdin), and for the simple reason that you should never even find it necessary to try to flush input in the first place. Realistically, there is only one reason you might think you had to flush input, and that is: to get past some bad input that scanf is stuck on.
For example, you might have a program that is sitting in a loop reading integers using scanf("%d", &n). Soon enough you'll discover that the first time the user types a non-digit character like 'x', the program goes into an infinite loop.
When faced with this situation, I believe you basically have three choices:
Flush the input somehow (if not by using fflush(stdin), then by calling getchar in a loop to read characters until \n, as is often recommended).
Tell the user not to type non-digit characters when digits are expected.
Use something other than scanf to read input.
Now, if you're a beginner, scanf seems like the easiest way to read input, and so choice #3 looks scary and difficult. But #2 seems like a real cop-out, because everyone knows that user-unfriendly computer programs are a problem, so it'd be nice to do better. So all too many beginning programmers get painted into a corner, feeling that they have no choice but to do #1. They more or less have to do input using scanf, meaning that it will get stuck on bad input, meaning that they have to figure out a way to flush the bad input, meaning that they're sorely tempted to use fflush(stdin).
I would like to encourage all beginning C programmers out there to make a different set of tradeoffs:
During the earliest stages of your C programming career, before you're comfortable using anything other than scanf, just don't worry about bad input. Really. Go ahead and use cop-out #2 above. Think about it like this: You're a beginner, there are lots of things you don't know how to do yet, and one of the things you don't know how to do yet is: deal gracefully with unexpected input.
As soon as you can, learn how to do input using functions other than scanf. At that point, you can start dealing gracefully with bad input, and you'll have many more, much better techniques available to you, that won't require trying to "flush the bad input" at all.
Or, in other words, beginners who are still stuck using scanf should feel free to use cop-out #2, and when they're ready they should graduate from there to technique #3, and nobody should be using technique #1 to try to flush input at all -- and certainly not with fflush(stdin).
Using fflush(stdin) to flush input is kind of like dowsing for water using a stick shaped like the letter "S".
And helping people to flush input in some "better" way is kind of like rushing up to an S-stick dowser and saying "No, no, you're doing it wrong,
you need to use a Y-shaped stick!".
In other words, the real problem isn't that fflush(stdin) doesn't work. Calling fflush(stdin) is a symptom of an underlying problem. Why are you having to "flush" input at all? That's your problem.
And, usually, that underlying problem is that you're using scanf, in one of its many unhelpful modes that unexpectedly leaves newlines or other "unwanted" text on the input. The best long-term solution, therefore, is to learn how to do input using better techniques than scanf, so that you don't have to deal with its unhandled input and other idiosyncrasies at all.
None of the existing answers point out a key aspect of the issue.
If you find yourself wanting to "clear the input buffer", you're probably writing a command-line interactive program, and it would be more accurate to say that what you want is to discard characters from the current line of input that you haven't already read.
This is not what fflush(stdin) does. The C libraries that support using fflush on an input stream, document it as either doing nothing, or as discarding buffered data that has been read from the underlying file but not passed to the application. That can easily be either more or less input than the rest of the current line. It probably does work by accident in a lot of cases, because the terminal driver (in its default mode) supplies input to a command-line interactive program one line at a time. However, the moment you try to feed input to your program from an actual file on disk (perhaps for automated testing), the kernel and C library will switch over to buffering data in large "blocks" (often 4 to 8 kB) with no relationship to line boundaries, and you'll be wondering why your program is processing the first line of the file and then skipping several dozen lines and picking up in the middle of some apparently random line below. Or, if you decide to test your program on a very long line typed by hand, then the terminal driver won't be able to give the program the whole line at once and fflush(stdin) won't skip all of it.
So what should you do instead? The approach that I prefer is, if you're processing input one line at a time, then read an entire line all at once. The C library has functions specifically for this: fgets (in C90, so fully portable, but does still make you process very long lines in chunks) and getline (POSIX-specific, but will manage a malloced buffer for you so you can process long lines all at once no matter how long they get). There's usually a direct translation from code that processes "the current line" directly from stdin to code that processes a string containing "the current line".
Quote from POSIX:
For a stream open for reading, if the file is not already at EOF, and the file is one
capable of seeking, the file offset of the underlying open file description shall be set
to the file position of the stream, and any characters pushed back onto the stream by
ungetc() or ungetwc() that have not subsequently been read from the stream shall be dis-
carded (without further changing the file offset).
Note that terminal is not capable of seeking.

Why Ctrl-Z does not trigger EOF?

Why Ctrl+Z does not trigger the loop to finish on the following small program?
#include <stdio.h>
main()
{
int c;
while ((c = getchar()) != EOF)
{
//nothing
}
return 0;
}
If I enter: test^ZEnter, it does not get out of the loop.
I found related questions around (here and here) but none to explain it for C (not C++) under Windows.
Note: I use Visual Studio 2015 PRE on a windows 8.1
You need to hit Enter and then use ctrl+Z and then Enter again.
or, you may also use F6
EOF like you use it is not a character. It's the status in which that stream is.
I mean, heck, you even link this question, so you might as well read the accepted answer:
The underlying form of an EOF is a zero-length read.
It's not an "EOF character".
http://www.c-faq.com/stdio/getcharc.html cites a different case than yours, where someone stored the return value of getchar in a char. The underlying problem still occurs occasionally: different runtimes implement different values for the EOF integer (which is why I said, it's not an EOF character), and things love to go wrong. Especially in Visual C++, which is not a "real" C compiler but a C++ compiler with a compatibility mode, it seems things can go wrong.

K&R C Exercise 4-9: Why ignore EOF?

Just a little confusion I'm hoping someone can clear up - this question asks:
"Our getch and ungetch do not handle a pushed-back EOF correctly. Decide what their properties ought to be if an EOF is pushed back, then implement your design".
With the code as it is, an EOF is pushed back, refetched with getch(), which causes a loop such as:
while ((c = getch()) != EOF)
putchar(c);
to terminate when it is encountered from the buffer. I fail to see how this behaviour is incorrect. Surely as an EOF will in theory (mostly) only ever be encountered once, if it is pushed back and then read from a buffer in this way, it doesn't really matter? I hope someone could clear up the purpose of this question for me - I get that most solutions involve programming ungetch() to ignore EOF, I just don't see the point.
I'm sure there is one, as Dennis Ritchie and Brian Kernighan are a lot brighter than little old me - just hoping someone could point it out. Thanks :-)
Regards,
Phil
The definition of buf is char buf[BUFSIZE]; ,according to the content in the book, page 19:
We must declare c to be a type big enough to hold any value that
getchar returns. We can't use char since c must be big enough to hold
EOF in addition to any possible char. Therefore we use int.
Then we get the answer:
int buf[BUFSIZE];

C program stops accepting input at 1023 characters

I'm teaching myself C using K&R. Exercise 1-16 asks me to refactor some provided code to give the length of "arbitrarily long input lines".
Whilst working on the problem I found that my terminal ceases to accept input after 1023 characters; a very suspicious number I'm sure you'll agree!! I have tested on Mac OS X and OpenBSD and see the same behaviour. The program hasn't stopped responding because typing backspace and submitting the input works correctly.
I couldn't figure out how to debug this with gdb because the problem occurs during data entry, not after submission when stepping through with gdb.
I could see no reference to a limit in the getchar or bash manpages, and indeed it seems very little input anyway.
I reduced the problem to the following and see the same behaviour.
#include <stdio.h>
main()
{
int c,i=0;
while ((c=getchar()) != EOF && c!='\n')
++i;
printf("%d\n",i);
return 0;
}
Could people please explain:
Why this is happening
How I might debug this kind of issue myself
Many thanks.
As per the comments on my question, it would appear to be a terminal limitation. Piping a file into the program works as expected.

fseek(stdin,0,SEEK_SET) and rewind(stdin) REALLY do flush the input buffer "stdin".Is it OK to use them? [duplicate]

This question already has an answer here:
Can fseek(stdin,1,SEEK_SET) or rewind(stdin) be used to flush the input buffer instead of non-portable fflush(stdin)?
(1 answer)
Closed 8 years ago.
I was thinking since the start that why can't fseek(stdin,0,SEEK_SET) and rewind(stdin) flush the input buffer since it is clearly written in cplusplusreference that calling these two functions flush the buffer(Input or Output irrespective).But since the whole idea seemed new,I had put it in a clumsy question yesterday.
Can fseek(stdin,1,SEEK_SET) or rewind(stdin) be used to flush the input buffer instead of non-portable fflush(stdin)?
And I was skeptical about the answers I got which seemed to suggest I couldn't do it.Frankly,I saw no reason why not.Today I tried it myself and it works!! I mean, to deal with the problem up the newline lurking in stdin while using multiple scanf() statments, it seems like I can use fseek(stdin,0,SEEK_SET) or rewind(stdin) inplace of the non-portable and UB fflush(stdin).
Please tell me if this is a correct approach without any risk.Till now, I had been using the following code to deal with newline in stdin: while((c = getchar()) != '\n' && c != EOF);. Here's my code below:
#include <stdio.h>
int main ()
{
int a,b;
char c;
printf("Enter 2 integers\n");
scanf("%d%d",&a,&b);
printf("Enter a character\n");
//rewind(stdin); //Works if activated
fseek(stdin,0,SEEK_SET); //Works fine
scanf("%c",&c); //This scanf() is skipped without fseek() or rewind()
printf("%d,%d,%c",a,b,c);
}
In my program, if I don't use either of fseek(stdin,0,SEEK_SET) or rewind(stdin),the second scanf() is skipped and newline is always taken up as the character.The problem is solved if I use fseek(stdin,0,SEEK_SET) or rewind(stdin).
I'm not sure where you read on cplusplusreference (whatever that is) that flushing to end of line is the mandated behaviour.
The closest matches I could find, http://www.cplusplus.com/reference/cstdio/fseek/ and http://www.cplusplus.com/reference/cstdio/rewind, don't mention flushing at all, other than in reference to fflush().
In any case, there's nothing in the C standard which mandates this behaviour either. C11 7.20.9.2 fseek and 7.20.9.5 rewind (which is, after all, identical to fseek with zero offset and SEEK_SET) also make no mention of flushing.
All they state is that the file pointer is moved to the relevant position in the stream.
So, to the extent this works in your environment, all we can say is that this works in your environment. It may not work elsewhere, it may even stop working in your envirnment at an indeterminate point in the future.
If you really want robust input, you should be using a two-stage approach, fgets to retrieve a line followed by sscanf to get what you want from that line. Mixing the two paradigms of input (scanf and getchar) is frequently problematic.
A good (robust, error-checking, and clearing to end of line if needed) input function can be found here.
I tested it right ago, and I checked that fseek doesn't work on stdin. fseek() usually works on the file on the disk so that it seems to be prohibited to access to stdin by the kernel for some secure reasons. Anyway, it was so happy to see who thought like me. Tnx for good question.

Resources