What exactly does rewind() do? - c

I came across the rewind() function in C. I went through its description and example from here.
The description mentioned the following about the function:
The C library function void rewind(FILE *stream) sets the file position to the beginning of the file of the given stream.
I really didn't get the idea clear yet. Can we imagine it as a cursor moving in the file to be read, and rewind() simply sets that cursor to the beginning of the file?

From the man page:
The rewind() function sets the file position indicator for the stream pointed to by stream to the beginning of the file. It is equivalent to:
(void)fseek(stream, 0L, SEEK_SET)
except that the error indicator for the stream is also cleared (see
clearerr(3)).
So the next time you read from a file after calling rewind, you start reading from the beginning. So your cursor analogy is a valid one.

Related

How does fread know where it last stopped?

Fread apparently knows the place where it last stopped, by that I mean this:
while(fread(buffer, 1, 1, file))
{
…
}
This loop would continue the next time where it stopped the last time. I assume it just moves the file pointer forward, but could someone explain if it’s exactly like that?
The function fread reads from a stream, which is not necessarily a file. Streams can also be linked to consoles/terminals. Some streams are seekable and have a file position indicator, some do not. Streams which are linked to actual files usually do have a file position indicator.
The function fread itself does not advance any file position indicator (it does not call fseek). It just reads from the stream.
If a stream has a file position indicator, then the runtime library will advance the file position indicator, whenever a read takes place on the stream. It does this for all reads on the stream, not just for fread.

Can I read from the beginning a file open for append mode without an initial fseek()?

When an existing non empty file is successfully opened by fopen() in "a+" or "ab+" mode, I should be able to read from it or write to the end without an initial call to fseek() or rewind(). Does the C Standard specify that an initial read from this file will read from the beginning of the file or should I always set the file position before reading?
The C Standard seems ambiguous as it states in 7.21.5.2 the fopen function that:
6. Opening a file with append mode (a as the first character in the mode argument) causes all subsequent writes to the file to be forced to the then current end-of-file, regardless of intervening calls to the fseek function. In some implementations, opening a binary file with append mode (b as the second or third character in the above list of mode argument values) may initially position the file position indicator for the stream beyond the last data written, because of null character padding.
On those systems where the file position indicator would point at or beyond the last data written, would an initial reading operation fail?
The behavior is implementation defined:
7.21.3 Files
1 A stream is associated with an external file (which may be a physical device) by opening a file, which may involve creating a new file. Creating an existing file causes its former contents to be discarded, if necessary. If a file can support positioning requests (such as a disk file, as opposed to a terminal), then a file position indicator associated with the stream is positioned at the start (character number
zero) of the file, unless the file is opened with append mode in which case it is implementation-defined whether the file position indicator is initially positioned at the beginning or the end of the file. The file position indicator is maintained by subsequent reads, writes, and positioning requests, to facilitate an orderly progression through the file.
So a call to rewind() or fseek(fp, 0L, SEEK_SET) is required before an initial read from the beginning of a file open for update mode/write to the end, as determined by a mode string starting with "a+" or "ab+".

Why the restrictions on C standard I/O streams that interact with sockets?

In book CSAPP section 10.9, it says that there are two restrictions on standard I/O streams that interact badly with restrictions on sockets.
Restriction 1: Input functions following output functions. An input function cannot follow an output function without an intervening call to fflush, fseek, fsetpos, or rewind. The fflush function empties the buffer associated with a stream. The latter three functions use the Unix I/O lseek function to reset the current file position.
Restriction 2: Output functions following input functions. An output function cannot follow an input function without an intervening call to fseek, fsetpos, or rewind, unless the input function encounters an end-of-file.
But I cannot figure out why the restrictions imposed. So, my question is: what factors result to the two restrictions?
It also says that "It is illegal to use the lseek function on a socket.", but how is it possible fseek, fsetpos and rewind use lseek to reset the current file position if it is true?
There is a similar question here, but my question different from this one.
The stdio functions are for buffered file input and output. A socket is not a file, but a socket. It doesn't even have a file position, and the buffer requirements are quite distinct from ordinary files - sockets can have independent input and output buffers, stdio file I/O cannot!
The problem is that the file input and file output share the same file position, and the operating system might have (and indeed will have on Unix) a distinct file position from what the file position due to the buffering in C would be.
Hence, from C99 rationale
A change of input/output direction on an update file is only allowed following a successful
fsetpos, fseek, rewind, or fflush operation, since these are precisely the functions
which assure that the I/O buffer has been flushed.
Note that all this applies to only files opened with + - with files opened in any other standard modes, it is not possible to mix input and output.
Since it is required by the C standard that when switching from input to output on FILE * one of the functions fsetpos, rewind or fseek, which essentially invoke lseek must be successful (mind you, calling fflush causes the buffered output to be written, and certainly not discarding the buffered input) before an output function is attempted... yet a socket is not seekable and lseek would therefore always fail - it means that you cannot use a FILE * that has been opened for both reading and writing wrapping the socket for actually both reading from and writing to the socket.
It is possible to use fdopen to open a FILE * with stream sockets if you really need to: just open two files - one "rb" for input and another with "wb" for output.
When it says "An input function cannot follow an output function without an intervening call to fflush, fseek, fsetpos, or rewind", what it means is that if you don't, it might not work as you expect. But they're mostly talking about i/o to/from ordinary files.
If you have a FILE * stream connected to a socket, and you want to switch back and forth between writing and reading, I would expect it to work just fine if you called fflush when switching from writing to reading. I would not expect it to be necessary to call anything when switching from reading to writing.
(When working with files, the call to fseek or one of its relatives is necessary in order to update the file position properly, but streams don't have a file position to update.)
I think the reason is that, in the early days, the buffer is shared for read and write for most implementations.
The rationale is simple, most cases are uni-direction. And maintain 2 buffers for read and write respectively wastes space.
If you have only one buffer, when you change the IO direction, you need to deal with the buffer. That's why you need fflush, fseek, fsetpos, or rewind to either write the buffer to disk or empty the buffer in preparation for the next IO operation.
I checked one glibc implementation, which only uses one single buffer for read and write.
static void init_stream (register FILE *fp) {
...
fp->__buffer = (char *) malloc (fp->__bufsize);
if (fp->__bufp == NULL)
{
/* Set the buffer pointer to the beginning of the buffer. */
fp->__bufp = fp->__buffer;
fp->__put_limit = fp->__get_limit = fp->__buffer;
}
}
take fseek for example
/* Move the file position of STREAM to OFFSET
bytes from the beginning of the file if WHENCE
is SEEK_SET, the end of the file is it is SEEK_END,
or the current position if it is SEEK_CUR. */
int
fseek (stream, offset, whence)
register FILE *stream;
long int offset;
int whence;
{
...
if (stream->__mode.__write && __flshfp (stream, EOF) == EOF)
return EOF;
...
/* O is now an absolute position, the new target. */
stream->__target = o;
/* Set bufp and both end pointers to the beginning of the buffer.
The next i/o will force a call to the input/output room function. */
stream->__bufp
= stream->__get_limit = stream->__put_limit = stream->__buffer;
...
}
this implementation would flush the buffer to disk file if it's write mode.
And it will reset the pointer for both read and write. It's equivalent to reset or flush the buffer for read.
It matches the C99 (credit to the previous answer)
A change of input/output direction on an update file is only allowed following a successful fsetpos, fseek, rewind, or fflush operation, since these are precisely the functions which assure that the I/O buffer has been flushed.
For more details, check here.

How do C functions like fscanf() and fgets() remember where in the file to start reading from?

How do C functions like fscanf() and fgets() remember where in the file to start reading from? For instance, when reading a file using fscanf(), it seems to remember where it last left terminated, rather than starting from the beginning of the file again. How exactly does this work?
The FILE * parameter points to a buffer and a file handle (see the fileno() function).
The actual where is remembered in the kernel in the file structure.
There is a legend that the FILE * pointer points into the file. This is not literally true, but it might as well be true for the interpretation of the beginning programmer.
In fact what happens is as follows: Every process has an array in kernel of type struct file (this type is not defined in userspace so don't go looking for it) that contains all of its open files. A handle is returned by the open() syscall that is merely an index into the array. The function fileno() retrieves the handle from the FILE * pointer returned by fopen() and can be manipulated directly. This is usually a bad idea except for accessing ioctl() or fctl() as you will end up fighting with the internal buffer in the FILE object.
One of the members of struct file is loff_t f_pos which is the exact location in bytes the kernel read() or write() stopped at. This is buffered in FILE which knows how many bytes it read ahead or pended for a later write for you.

A successful call to the fseek() function clears the end-of-file indicator for the stream

The manual for the standard c library function fseek says: "A successful call to the fseek()function clears the end-of-file indicator for the stream."
To me it sounds like saying if EOF is at 2 and I call fseek() to get the pointer at 4, that should work and eof will be now pointing to 5. But when I test this hypothesis, the pointer doesn't advance beyond current EOF(2 in above case), and hence my understanding of the line is wrong. What does this line mean then? Thanks!
You have to remember that the EOF flag is not set until you actually try to read from beyond the end of the file.
With fseek clearing the flag, it does this even if you seek to beyond the end of the file. And it works because the flag will be set again next time you read.
That is why it's a bad idea to have loops such as while (!feof(...)), as those will then loop once to many without detecting the actual end of the file condition.

Resources