Why fprintf does not work after fgets? [duplicate] - c

Q: I'm trying to update a file in
place, by using fopen mode "r+",
reading a certain string, and writing
back a modified string, but it's not
working.
A: Be sure to call fseek before
you write, both to seek back to the
beginning of the string you're trying
to overwrite, and because an fseek
or fflush is always required between
reading and writing in the read/write
"+" modes.
My question is why fseek or fflush is always required between reading and writing in the read/write "+" modes? Section 5.2 of
Andrew Koenig's
C Traps and Pitfalls (1989) mentioned that it is because of a backward compatibility issue. Can anyone explain in detail?

The library buffers input and output operations. Check out setvbuf() and the _IOFBF, _IOLBF parameters to that function.
fseek() or fflush() require the library to commit buffered operations.
The standard specifies a seek or flush operation (flushing the buffers) as mandatory prior to changing I/O direction to allow the library some shortcuts. Without this restriction, the library would have to check for every I/O operation if the previous operation was the same direction (reading / writing), and trigger a flush by itself if the I/O direction changed. With the restriction as-is, the library may assume the client did the seek / flush before changing I/O direction, and can omit the direction checks.

Because it keeps OS/library code simpler. A file stream may have separate read and write buffers, and extra effort would be required to make sure they are always synchronised. This would cost performance at times when it wasn't needed.
So instead, the programmer needs to do this explicitly when it is needed.

Read Plauger's "The Standard C Library" for some insights into why various features of the (C89) standard library are as they are - and in particular why parts of the standard I/O library are as they are. One reason is that C runs on very diverse systems and with diverse media; devices such as tapes may well need to be handled somewhat differently from the disk drive you're accustomed to thinking of. Also, on Unix, consider your 'tty' device - it connects a keyboard and a mouse to a screen - three quite different bits of hardware. Coordinating between those is tricky enough; the rules in the standard make it easier.
Note that the standard mandates this. This is from the C11 standard, ISO/IEC 9899:2011, but the wording was similar in prior editions:
§7.21.5.3 The fopen function
¶7 When a file is opened with update mode ('+' as the second or third character in the
above list of mode argument values), both input and output may be performed on the
associated stream. However, output shall not be directly followed by input without an
intervening call to the fflush function or to a file positioning function (fseek,
fsetpos, or rewind), and input shall not be directly followed by output without an
intervening call to a file positioning function, unless the input operation encounters end-of-file. Opening (or creating) a text file with update mode may instead open (or create) a
binary stream in some implementations.

Related

fseek/fsetpos may discard stream buffer?

In the C standard for fopen regarding files opened in update mode (C11 7.21.5.3/7), output followed by input requires an intervening call to fflush or a file positioning function (fseek, fsetpos, or rewind). However, none of the file positioning functions are required to do anything regarding the output buffer.
The POSIX standard maintains the same requirement for fopen and update mode. As with the C standard, fsetpos is not required to do anything with the output buffer. However, fseek is required to write the buffer to file.
In the case of both C and POSIX, a conforming implementation seems free to discard the write buffer when fsetpos is called, and C seems to allow fseek to do the same. My first question is whether I've missed something relevant in the standards. The implication here is that a portable application must call fflush (or fseek/rewind in the case of POSIX) to ensure buffered output is actually written before switching from output to input.
Obviously, discarding the write buffer goes against the intent of all of the write functions, and I'm not aware of any implementation that does this or anything comparably counter-intuitive. I'm also aware of my limited awareness, so my second question is whether there are any conforming implementations that don't ensure the buffered content eventually gets written in the proper place.
For context, the GNU documentation maintains the same requirement for fopen and update mode. As with C and POSIX, fsetpos says nothing about the output buffer, but my testing suggests my version does flush the buffer. However, fseek may either flush the buffer or remember enough about it to ensure its contents eventually get written properly.
TL;DR: Does C or POSIX disallow fsetpos from discarding the write buffer? Are there implementations that do this?
EDIT: Nobody has yet presented credible evidence that either standard prohibits fsetpos from discarding the write buffer. Similarly, nobody has mentioned any implementations that do this. However, this is not mentioned in the list of portability issues in the C standard (Annex J), suggesting it is an oversight and not an obscure portability concern. Furthermore, as mentioned by R.., there is no prohibition preventing completely unrelated functions from discarding buffers.
I don't see where you're getting that idea from. POSIX goes into a little bit more detail than the C standard about buffering behavior because it has to deal with interactions of stdio FILE streams with other means of accessing the same files. But there is nothing in the C standard that suggests the implementation is allowed to lose output when you call fsetpos. Logically the data has already been written.
Further, the specification (C11 7.21.9.3, ¶2) for fsetpos reads:
If a read or write error occurs, the error indicator for the stream is set and fsetpos fails.
The only plausible reason a write error could occur is some sort of write operation, and the only plausible write operation is flushing pending output.
I don't see anything requiring the flush in the fsetpos case, beyond this remark in the Errors section (twice):
or the stream's buffer needed to be flushed,
This looks like an omission in POSIX. Please file a clarification request in the Austin Group issue tracker.
The C standard does not seem to explicitly prohibit fsetpos (or any other function) from discarding the buffer, which seems to be an arguably pedantic deficiency. However, the C99 Rationale document (7.19.5.3) states that fsetpos, fseek, rewind, and fflush "assure that the I/O buffer has been flushed". It's not clear why such text was not included in the standard, although one could speculate about GNU and write-back caches and whether forcing disk I/O on seek operations is desirable.
In practice, this means that one should be able to assume writing, then seeking, then reading will return the expected data. Given that at least one implementation (GNU) may not always flush when seeking, though, one should not assume the data will have reached the kernel (let alone the underlying device) without an explicit flush request.

I can't make fflush() to clear stdin [duplicate]

So a quick Google search for fflush(stdin) for clearing the input buffer reveals numerous websites warning against using it. And yet that's exactly how my CS professor taught the class to do it.
How bad is using fflush(stdin)? Should I really abstain from using it, even though my professor is using it and it seems to work flawlessly?
Simple: this is undefined behavior, since fflush is meant to be called on an output stream. This is an excerpt from the C standard:
int fflush(FILE *ostream);
ostream points to an output stream or
an update stream in which the most
recent operation was not input, the
fflush function causes any unwritten
data for that stream to be delivered
to the host environment to be written
to the file; otherwise, the behavior
is undefined.
So it's not a question of "how bad" this is. fflush(stdin) is simply not portable, so you should not use it if you want your code to be portable between compilers.
Converting comments into an answer.
TL;DR — Portable code doesn't use fflush(stdin)
The rest of this answer explains why portable code does not use fflush(stdin). It is tempting to add "reliable code doesn't use fflush(stdin)", which is also generally true.
Standard C and POSIX leave fflush(stdin) as undefined behaviour
The POSIX, C and C++ standards for fflush() explicitly state that the behaviour is undefined (because stdin is an input stream), but none of them prevent a system from defining it.
ISO/IEC 9899:2011 — the C11 Standard — says:
§7.21.5.2 The fflush function
¶2 If stream points to an output stream or an update stream in which the most recent operation was not input, the fflush function causes any unwritten data for that stream to be delivered to the host environment to be written to the file; otherwise, the behavior is undefined.
POSIX mostly defers to the C standard but it does mark this text as a C extension.
[CX] ⌦ For a stream open for reading, if the file is not already at EOF, and the file is one capable of seeking, the file offset of the underlying open file description shall be set to the file position of the stream, and any characters pushed back onto the stream by ungetc() or ungetwc() that have not subsequently been read from the stream shall be discarded (without further changing the file offset). ⌫
Note that terminals are not capable of seeking; neither are pipes or sockets.
Microsoft defines the behaviour of fflush(stdin)
In 2015, Microsoft and the Visual Studio runtime used to define the behaviour of fflush() on an input stream like this (but the link leads to different text in 2021):
If the stream is open for input, fflush clears the contents of the buffer.
M.M notes:
Cygwin is an example of a fairly common platform on which fflush(stdin) does not clear the input.
This is why this answer version of my comment notes 'Microsoft and the Visual Studio runtime' — if you use a non-Microsoft C runtime library, the behaviour you see depends on that library.
Weather Vane pointed out to me in a comment to another question that, at some time before June 2021, Microsoft changed its description of fflush() compared with what was originally specified when this answer was written in 2015. It now says:
If the stream was opened in read mode, or if the stream has no buffer, the call to fflush has no effect, and any buffer is retained. A call to fflush negates the effect of any prior call to ungetc for the stream.
Caveat Lector: it is probably best not to rely on fflush(stdin) on any platform.
Linux documentation and practice seem to contradict each other
Surprisingly, Linux nominally documents the behaviour of fflush(stdin) too, and even defines it the same way (miracle of miracles). This quote is from 2015.
For input streams, fflush() discards any buffered data that has been fetched from the underlying file, but has not been consumed by the application.
In 2021, the quote changes to:
For input streams, fflush() discards any buffered data that has been fetched from the underlying file, but has not been consumed by the application. The open status of the stream is unaffected.
And another source for fflush(3) on Linux agrees (give or take paragraph breaks):
For input streams associated with seekable files (e.g., disk files, but not pipes or terminals), fflush() discards any buffered data that has been fetched from the underlying file, but has not been consumed by the application.
Neither of these explicitly addresses the points made by the POSIX specification about ungetc().
In 2021, zwol commented that the Linux documentation has been improved.
It seems to me that there is still room for improvement.
In 2015, I was a bit puzzled and surprised at the Linux documentation saying that fflush(stdin) will work.
Despite that suggestion, it most usually does not work on Linux. I just checked the documentation on Ubuntu 14.04 LTS; it says what is quoted above, but empirically, it does not work — at least when the input stream is a non-seekable device such as a terminal.
demo-fflush.c
#include <stdio.h>
int main(void)
{
int c;
if ((c = getchar()) != EOF)
{
printf("Got %c; enter some new data\n", c);
fflush(stdin);
}
if ((c = getchar()) != EOF)
printf("Got %c\n", c);
return 0;
}
Example output
$ ./demo-fflush
Alliteration
Got A; enter some new data
Got l
$
This output was obtained on both Ubuntu 14.04 LTS and Mac OS X 10.11.2. To my understanding, it contradicts what the Linux manual says. If the fflush(stdin) operation worked, I would have to type a new line of text to get information for the second getchar() to read.
Given what the POSIX standard says, maybe a better demonstration is needed, and the Linux documentation should be clarified.
demo-fflush2.c
#include <stdio.h>
int main(void)
{
int c;
if ((c = getchar()) != EOF)
{
printf("Got %c\n", c);
ungetc('B', stdin);
ungetc('Z', stdin);
if ((c = getchar()) == EOF)
{
fprintf(stderr, "Huh?!\n");
return 1;
}
printf("Got %c after ungetc()\n", c);
fflush(stdin);
}
if ((c = getchar()) != EOF)
printf("Got %c\n", c);
return 0;
}
Example output
Note that /etc/passwd is a seekable file. On Ubuntu, the first line looks like:
root:x:0:0:root:/root:/bin/bash
On Mac OS X, the first 4 lines look like:
##
# User Database
#
# Note that this file is consulted directly only when the system is running
In other words, there is commentary at the top of the Mac OS X /etc/passwd file. The non-comment lines conform to the normal layout, so the root entry is:
root:*:0:0:System Administrator:/var/root:/bin/sh
Ubuntu 14.04 LTS:
$ ./demo-fflush2 < /etc/passwd
Got r
Got Z after ungetc()
Got o
$ ./demo-fflush2
Allotrope
Got A
Got Z after ungetc()
Got B
$
Mac OS X 10.11.2:
$ ./demo-fflush2 < /etc/passwd
Got #
Got Z after ungetc()
Got B
$
The Mac OS X behaviour ignores (or at least seems to ignore) the fflush(stdin) (thus not following POSIX on this issue). The Linux behaviour corresponds to the documented POSIX behaviour, but the POSIX specification is far more careful in what it says — it specifies a file capable of seeking, but terminals, of course, do not support seeking. It is also much less useful than the Microsoft specification.
Summary
Microsoft documents the behaviour of fflush(stdin), but that behaviour has changed between 2015 and 2021. Apparently, it works as documented on the Windows platform, using the native Windows compiler and C runtime support libraries.
Despite documentation to the contrary, it does not work on Linux when the standard input is a terminal, but it seems to follow the POSIX specification which is far more carefully worded. According to the C standard, the behaviour of fflush(stdin) is undefined. POSIX adds the qualifier 'unless the input file is seekable', which a terminal is not. The behaviour is not the same as Microsoft's.
Consequently, portable code does not use fflush(stdin). Code that is tied to Microsoft's platform may use it and it may work as expected, but beware of the portability issues.
POSIX way to discard unread terminal input from a file descriptor
The POSIX standard way to discard unread information from a terminal file descriptor (as opposed to a file stream like stdin) is illustrated at How can I flush unread data from a tty input queue on a Unix system. However, that is operating below the standard I/O library level.
According to the standard, fflush can only be used with output buffers, and obviously stdin isn't one. However, some standard C libraries provide the use of fflush(stdin) as an extension. In that case you can use it, but it will affect portability, so you will no longer be able to use any standards-compliant standard C library on earth and expect the same results.
I believe that you should never call fflush(stdin), and for the simple reason that you should never even find it necessary to try to flush input in the first place. Realistically, there is only one reason you might think you had to flush input, and that is: to get past some bad input that scanf is stuck on.
For example, you might have a program that is sitting in a loop reading integers using scanf("%d", &n). Soon enough you'll discover that the first time the user types a non-digit character like 'x', the program goes into an infinite loop.
When faced with this situation, I believe you basically have three choices:
Flush the input somehow (if not by using fflush(stdin), then by calling getchar in a loop to read characters until \n, as is often recommended).
Tell the user not to type non-digit characters when digits are expected.
Use something other than scanf to read input.
Now, if you're a beginner, scanf seems like the easiest way to read input, and so choice #3 looks scary and difficult. But #2 seems like a real cop-out, because everyone knows that user-unfriendly computer programs are a problem, so it'd be nice to do better. So all too many beginning programmers get painted into a corner, feeling that they have no choice but to do #1. They more or less have to do input using scanf, meaning that it will get stuck on bad input, meaning that they have to figure out a way to flush the bad input, meaning that they're sorely tempted to use fflush(stdin).
I would like to encourage all beginning C programmers out there to make a different set of tradeoffs:
During the earliest stages of your C programming career, before you're comfortable using anything other than scanf, just don't worry about bad input. Really. Go ahead and use cop-out #2 above. Think about it like this: You're a beginner, there are lots of things you don't know how to do yet, and one of the things you don't know how to do yet is: deal gracefully with unexpected input.
As soon as you can, learn how to do input using functions other than scanf. At that point, you can start dealing gracefully with bad input, and you'll have many more, much better techniques available to you, that won't require trying to "flush the bad input" at all.
Or, in other words, beginners who are still stuck using scanf should feel free to use cop-out #2, and when they're ready they should graduate from there to technique #3, and nobody should be using technique #1 to try to flush input at all -- and certainly not with fflush(stdin).
Using fflush(stdin) to flush input is kind of like dowsing for water using a stick shaped like the letter "S".
And helping people to flush input in some "better" way is kind of like rushing up to an S-stick dowser and saying "No, no, you're doing it wrong,
you need to use a Y-shaped stick!".
In other words, the real problem isn't that fflush(stdin) doesn't work. Calling fflush(stdin) is a symptom of an underlying problem. Why are you having to "flush" input at all? That's your problem.
And, usually, that underlying problem is that you're using scanf, in one of its many unhelpful modes that unexpectedly leaves newlines or other "unwanted" text on the input. The best long-term solution, therefore, is to learn how to do input using better techniques than scanf, so that you don't have to deal with its unhandled input and other idiosyncrasies at all.
None of the existing answers point out a key aspect of the issue.
If you find yourself wanting to "clear the input buffer", you're probably writing a command-line interactive program, and it would be more accurate to say that what you want is to discard characters from the current line of input that you haven't already read.
This is not what fflush(stdin) does. The C libraries that support using fflush on an input stream, document it as either doing nothing, or as discarding buffered data that has been read from the underlying file but not passed to the application. That can easily be either more or less input than the rest of the current line. It probably does work by accident in a lot of cases, because the terminal driver (in its default mode) supplies input to a command-line interactive program one line at a time. However, the moment you try to feed input to your program from an actual file on disk (perhaps for automated testing), the kernel and C library will switch over to buffering data in large "blocks" (often 4 to 8 kB) with no relationship to line boundaries, and you'll be wondering why your program is processing the first line of the file and then skipping several dozen lines and picking up in the middle of some apparently random line below. Or, if you decide to test your program on a very long line typed by hand, then the terminal driver won't be able to give the program the whole line at once and fflush(stdin) won't skip all of it.
So what should you do instead? The approach that I prefer is, if you're processing input one line at a time, then read an entire line all at once. The C library has functions specifically for this: fgets (in C90, so fully portable, but does still make you process very long lines in chunks) and getline (POSIX-specific, but will manage a malloced buffer for you so you can process long lines all at once no matter how long they get). There's usually a direct translation from code that processes "the current line" directly from stdin to code that processes a string containing "the current line".
Quote from POSIX:
For a stream open for reading, if the file is not already at EOF, and the file is one
capable of seeking, the file offset of the underlying open file description shall be set
to the file position of the stream, and any characters pushed back onto the stream by
ungetc() or ungetwc() that have not subsequently been read from the stream shall be dis-
carded (without further changing the file offset).
Note that terminal is not capable of seeking.

Why is fflush() needed between input and output operations for file stream created by fopen()?

Here is a quote from IBM (also specified in C99 standard)
When you open a file for update, you can perform both input and output
operations on the resulting stream. However, an output operation
cannot be directly followed by an input operation without an
intervening fflush subroutine call or a file positioning operation
(fseek, fseeko, fseeko64, fsetpos, fsetpos64 or rewind subroutine).
Also, an input operation cannot be directly followed by an output
operation without an intervening flush or file positioning operation,
unless the input operation encounters the end of the file.
Why this is necessary?
The operations are likely buffered to avoid writing every single byte individually to the disk. Before starting to read we must assure that all previous writes are actually stored on disk.
Also, a seek is likely needed anyway to locate the thing you want to read. So not a problem in practice.
It is feasible to write the standard stream functions in such a way as to allow input and output operations to be freely mixed. The C Standard does not impose such a constraint on the library authors for two main reasons:
restricting the direction change to certain operations allows for some optimisations, reducing the number of tests for basic input/output functions such as getc() and putc() which are commonly implemented as simple macros.
historical implementations took advantage of the above and already had restrictions as to how and when to allow a change of direction. The C Standard Committee just formalized these restrictions to allow existing code to be conformant.
Newer versions of the C library must lock the streams for all input/output operations, so an extra test for a direction change would have negligible cost but the C Standard is unlikely to remove the restriction.

what is the orientation of a stdout in C

The GNU C manual says that:
Being able to use the same stream for wide and normal operations comes
with a restriction: a stream can be used either for wide operations or
for normal operations.
[...]
It is important to never mix the use of wide and not wide
operations on a stream. There are no diagnostics issued. The
application behavior will simply be strange or the application will
simply crash. The fwide function can help avoiding this.
I have tried on vs2012, a printf followed immediately by a wprintf, and the simple program works properly.
Then my question is, what does the manual mean? When and why we should use fwide function?
The manual says, more fully:
Being able to use the same stream for wide and normal operations comes
with a restriction: a stream can be used either for wide operations or
for normal operations. Once it is decided there is no way back. Only a
call to freopen or freopen64 can reset the orientation. The
orientation can be decided in three ways:
If any of the normal character functions is used (this includes the fread and fwrite functions) the stream is marked as not wide oriented.
If any of the wide character functions is used the stream is marked as wide oriented.
The fwide function can be used to set the orientation either way.
It is important to never mix the use of wide and not wide operations
on a stream. There are no diagnostics issued. The application behavior
will simply be strange or the application will simply crash. The fwide
function can help avoiding this.
Note that the Microsoft documentation says their fwide() is "not implemented" (it's a no-op) and it "does not comply with the standard."
My reading of all this is that programs using glibc must not use both narrow and wide character functions on a single stream without reopening it. Perhaps on Microsoft platforms there is no such restriction; perhaps even other libc implementations are more flexible.

Why is fseek or fflush always required between reading and writing in the update modes?

Q: I'm trying to update a file in
place, by using fopen mode "r+",
reading a certain string, and writing
back a modified string, but it's not
working.
A: Be sure to call fseek before
you write, both to seek back to the
beginning of the string you're trying
to overwrite, and because an fseek
or fflush is always required between
reading and writing in the read/write
"+" modes.
My question is why fseek or fflush is always required between reading and writing in the read/write "+" modes? Section 5.2 of
Andrew Koenig's
C Traps and Pitfalls (1989) mentioned that it is because of a backward compatibility issue. Can anyone explain in detail?
The library buffers input and output operations. Check out setvbuf() and the _IOFBF, _IOLBF parameters to that function.
fseek() or fflush() require the library to commit buffered operations.
The standard specifies a seek or flush operation (flushing the buffers) as mandatory prior to changing I/O direction to allow the library some shortcuts. Without this restriction, the library would have to check for every I/O operation if the previous operation was the same direction (reading / writing), and trigger a flush by itself if the I/O direction changed. With the restriction as-is, the library may assume the client did the seek / flush before changing I/O direction, and can omit the direction checks.
Because it keeps OS/library code simpler. A file stream may have separate read and write buffers, and extra effort would be required to make sure they are always synchronised. This would cost performance at times when it wasn't needed.
So instead, the programmer needs to do this explicitly when it is needed.
Read Plauger's "The Standard C Library" for some insights into why various features of the (C89) standard library are as they are - and in particular why parts of the standard I/O library are as they are. One reason is that C runs on very diverse systems and with diverse media; devices such as tapes may well need to be handled somewhat differently from the disk drive you're accustomed to thinking of. Also, on Unix, consider your 'tty' device - it connects a keyboard and a mouse to a screen - three quite different bits of hardware. Coordinating between those is tricky enough; the rules in the standard make it easier.
Note that the standard mandates this. This is from the C11 standard, ISO/IEC 9899:2011, but the wording was similar in prior editions:
§7.21.5.3 The fopen function
¶7 When a file is opened with update mode ('+' as the second or third character in the
above list of mode argument values), both input and output may be performed on the
associated stream. However, output shall not be directly followed by input without an
intervening call to the fflush function or to a file positioning function (fseek,
fsetpos, or rewind), and input shall not be directly followed by output without an
intervening call to a file positioning function, unless the input operation encounters end-of-file. Opening (or creating) a text file with update mode may instead open (or create) a
binary stream in some implementations.

Resources