C input functions mechanism - c

How do input functions like scanf, getc, etc. work? When a program is invoked and execution reaches scanf it stops waiting for input. Does it start reading from the input buffer till it reads code for enter key or the function implementation relies on specific system call or interrupt mechanism? I mean how the OS is involved here?
I searched online and read a few books but none explains that very well.

Basically, the way the stdio functions work is that they maintain a buffer (a char array of some size, generally with a 'head' and 'tail' pointer tracking what part of the buffer is valid/in use). They operate on data in the buffer (input functions read from the buffer, output functions write to the buffer) and call lower-level OS functions to fill or flush the buffer when it is empty or full or when switching a FILE between input and output.
So in the case of getc, if the buffer is empty, it will call an OS function to get some data into the buffer, then will return the first character in the buffer (advancing the pointer, so subsequent getc calls will return subsequent characters from the buffer). In the case of scanf it matches data in the buffer to the format string, consuming it (advancing the pointer) if it matches. If there's not enough data it the buffer, it will call the OS function to get more.
Similarly, when printf is called, data will be written to the buffer and the OS function to actually write the data will not be called until the buffer is flushed. You can call setvbuf to control the output bufferring somewhat -- you can set it to "unbuffered" (which doesn't actually eliminate the buffer, just causes the buffer to be flushed after every call) or "line buffered" (which flushes the buffer whenever a \n (newline) character is written.

Related

In which condition the buffer get full

I am reading about stream and buffer.
I got that if stream is line buffered then the accumulated characters of buffer get transfered in the from of block whenever the newline character encounter.
and if stream is unbuffered then character are intended to appear from the source or appear at the destination as soon as possible without getting store in buffer.
but if stream is fully buffered then the accumulated characters of buffer get transfered in the form of block whenever the buffer get totally filled.
Now I am unable to understand that in which condition the buffer get totally filled.
The C standard does not explicitly specify the details of stream buffering, but generally a stream has a buffer of fixed size. This is simply an array of bytes that is used for holding data for the stream.
Quite simply, the buffer is totally filled when the number of bytes that have been written to the stream since the last time the buffer was flushed equals the number of bytes in the buffer.
When <stdio.h> is included, it defines a macro BUFSIZ that provides the size of the buffer used by setbuf. You can print it with printf("%d\n", BUFSIZ);. Presumably that is the default size of a buffer for a stream, although the C standard does not explicitly say this. (It says that is the size of the buffer used by the setbuf function, which allows you to provide your own memory for the buffer.)
You can also use the newer setvbuf to request a different size for the buffer. Both setbuf and setvbuf must be used only just after a stream has been opened and before any other operation.

Why scanf's errors doesn't let it consume the input buffer?

As we know if we try to read a value, whose type does not represent the format's parameter (eg. %d with value z,...), scanf will not clear the input buffer after the reading's error (and usually this can cause infinite loop), but being a library's function (so not directly interfaced with kernel operations) should not use an atomic system call like the read function? If yes, why the output it's not being cleared while with a read it is?
Also when we read something with scanf in the input buffer it will always remain '\n' (new line) character. How it know that may not consume it? \

How does the STDIN buffer and getchar() pointer change during successive calls? [duplicate]

This question already has an answer here:
Confusion about how a getchar() loop works internally
(1 answer)
Closed 8 years ago.
Given input in the stdin buffer, when successive calls to getchar() are performed, does the pointer move along the memory address of the stdin buffer, allowing getchar() to retrieve the value at each address? If so, once they have been retrieved are the values removed and the pointer then incremented?
Generally my understanding of getchar() in a loop follows this logic:
getchar() called
stdin buffer checked for input
If stdin buffer empty, getchar() sleeps
user enters input and awakens get char()
stdin buffer checked again for input
stdin buffer not empty
getchar() retrieves value at address at the start of the stdin buffer
value at address removed from stdin buffer, pointer incremented
subsequent calls repeat steps 7-8 until EOF encountered
A similar question was asked before on stackoverflow but I had trouble understanding the responses.
Generally there is a stdio internal buffer. getchar() may trigger a line read into the buffer, and generally on subsequent calls, it will simply increment a pointer until the pointer reaches the end of the current data in the buffer. The implementation usually uses a simple internal char * to an underlying chunk of dynamic memory, with a few pointers and state variable(s).
Implementations vary, I don't recall the POSIX standard implying much about the internal implementation of getchar() or stdio streams in general, except that given operations should be supported.
If I recall, some implementations are unbuffered (I think the DOS compiler I used did not buffer), but there are multiple standard lib implementations for a given OS.
It is not uncommon to have 2 stdio libs on the same system, example: sys-admins managing AIX, Solaris, HPUX, and other non-Linux/BSD UNIX platforms will frequently install the GNU stack to get tools like gcc, and that stack includes glibc (GNU LIBC).
You can download a libc/stdio source online. See glibc.
If it helps, consider that stdio provides peek and unget functionality, and the only way to do that is by an internal buffer between the terminal and the user program.

Misunderstand line-buffer in Unix

I'm reading Advanced Programming in the UNIX Environment, 3rd Edition and misunderstanding a section in it (page 145, Section 5.4 Buffering, Chapter 5).
Line buffering comes with two caveats. First, the size of the buffer that the
standard I/O library uses to collect each line is fixed, so I/O might take place if
we fill this buffer before writing a newline. Second, whenever input is
requested through the standard I/O library from either (a) an unbuffered stream or (b) a line-buffered stream (that requires data to be requested from the kernel),
all line-buffered output streams are flushed. The reason for the qualifier on (b)
is that the requested data may already be in the buffer, which doesn’t require
data to be read from the kernel. Obviously, any input from an unbuffered
stream, item (a), requires data to be obtained from the kernel.
I can't get the bold lines. My English isn't good. So, could you clarify it for me? Maybe in an easier way. Thanks.
The point behind the machinations described is to ensure that prompts appear before the system goes into a mode where it is waiting for input.
If an input stream is unbuffered, every time the standard I/O library needs data, it has to go to the kernel for some information. (That's the last sentence.) That's because the standard I/O library does not buffer any data, so when it needs more data, it has to read from the kernel. (I think that even an unbuffered stream might buffer one character of data, because it would need to read up to a space character, for example, to detect when it has reached the end of a %s format string; it has to put back (ungetc()) the extra character it read so that the next time it needs a character, there is the character it put back. But it never needs more than the one character of buffering.)
If an input stream is line buffered, there may already be some data in its input buffer, in which case it may not need to go to the kernel for more data. In that case, it might not flush anything. This can occur if the scanf() format requested "%s" and you typed hello world; it would read the whole line, but the first scan would stop after hello, and the next scanf() would not need to go to the kernel for the world word because it is already in the buffer.
However, if there isn't any data in the buffer, it has to ask the kernel to read the data, and it ensures that any line-buffered output streams are flushed so that if you write:
printf("Enter name: ");
if (scanf("%63s", name) != 1)
…handle error or EOF…
then the prompt (Enter name:) appears. However, if you'd previously typed hello world and previously read just hello, then the prompt wouldn't necessarily appear because the world was already waiting in the (line buffered) input stream.
This may explain the point.
Let's imagine that you have a pipe in your program and you use it for communication between different parts of your program (single thread program writing and reading from this single pipe).
If you write to the writing end of the pipe, say the letter 'A', and then call the read operation to read from the reading end of the pipe. You would expect that the letter 'A' is read. However, read operation is a system call to the kernel. To be able to return the letter 'A' it must be written to the kernel first. This means that the writing of 'A' must be flushed, otherwise it would stay in your local writing buffer and your program would be locked forever.
In consequence, before calling a read operation all write buffers are flushed. This is what the section (b) says.
The size of the buffer that the standard I/O library is using to collect each line is fixed.
with the help of the fgets function we are getting the line continuously, during that time it will read the content with the specified buffer size or up to newline.
Second, whenever input is requested through the standard I/O library, it can use an unbuffered stream or line-buffered stream.
unbuffered stream - It will not buffer the character, flush the character regularly.
line-buffered - It will store the character into the buffer and then flush when the operation is completed.
lets take without using \n we are going to print the content in printf statement, that time it will buffer all the content until we flush or printing with new line. Like that when the operation is completed the stream buffer is flushed internally.
(b) is that the requested data may already be in the buffer, which doesn't require data to be read from the kernel
In line oriented stream the requested buffer may already in the buffer because the data can be buffered, so we can't required data to read from the kernel once again.
(a) requires data to be obtained from the kernel.
Any input from unbuffered stream item, a data to be get from the kernel due to the unbuffered stream can't store anything in the buffer.

unistd.h read() function: How to read a file line by line?

What I need to do is use the read function from unistd.h to read a file
line by line. I have this at the moment:
n = read(fd, str, size);
However, this reads to the end of the file, or up to size number of bytes.
Is there a way that I can make it read one line at a time, stopping at a newline?
The lines are all of variable length.
I am allowed only these two header files:
#include <unistd.h>
#include <fcntl.h>
The point of the exercise is to read in a file line by line, and
output each line as it's read in. Basically, to mimic the fgets()
and fputs() functions.
You can read character by character into a buffer and check for the linebreak symbols (\r\n for Windows and \n for Unix systems).
You'll want to create a buffer twice the length of your longest line you'll support, and you'll need to keep track of your buffer state.
Basically, each time you're called for a new line you'll scan from your current buffer position looking for an end-of-line marker. If you find one, good, that's your line. Update your buffer pointers and return.
If you hit your maxlength then you return a truncated line and change your state to discard. Next time you're called you need to discard up to the next end of line, and then enter your normal read state.
If you hit the end of what you've read in, then you need to read in another maxline chars, wrapping to the start of the buffer if you hit the bottom (ie, you may need to make two read calls) and then continue scanning.
All of the above assumes you can set a max line length. If you can't then you have to work with dynamic memory and worry about what happens if a buffer malloc fails. Also, you'll need to always check the results of the read in case you've hit the end of the file while reading into your buffer.
Unfortunately the read function isn't really suitable for this sort of input. Assuming this is some sort of artificial requirement from interview/homework/exercise, you can attempt to simulate line-based input by reading the file in chunks and splitting it on the newline character yourself, maintaining state in some way between calls. You can get away with a static position indicator if you carefully document the function's use.
This is a good question, but allowing only the read function doesn't help! :P
Loop read calls to get a fixed number of bytes, and search the '\n' character, then return a part of the string (untill '\n'), and stores the rest (except '\n') to prepend to the next character file chunk.
Use dynamic memory.
Greater the size of the buffer, less read calls used (which is a system call, so no cheap but nowadays there are preemptive kernels).
...
Or simply fix a maximum line length, and use fgets, if you need to be quick...
If you need to read exactly 1 line (and not overstep) using read(), the only generally-applicable way to do that is by reading 1 byte at a time and looping until you get a newline byte. However, if your file descriptor refers to a terminal and it's in the default (canonical) mode, read will wait for a newline and return less than the requested size as soon as a line is available. It may however return more than one line, if data arrives very quickly, or less than 1 line if your program's buffer or the internal terminal buffer is shorter than the line length.
Unless you really need to avoid overstep (which is sometimes important, if you want another process/program to inherit the file descriptor and be able to pick up reading where you left off), I would suggest using stdio functions or your own buffering system. Using read for line-based or byte-by-byte IO is very painful and hard to get right.
Well, it will read line-by-line from a terminal.
Some choices you have are:
Write a function that uses read when it runs out of data but only returns one line at a time to the caller
Use the function in the library that does exactly that: fgets().
Read only one byte at a time, so you don't go too far.
If you open the file in text mode then Windows "\r\n" will be silently translated to "\n" as the file is read.
If you are on Unix you can use the non-standard1 gcc 'getline()' function.
1 The getline() function is standard in POSIX 2008.
Convert file descriptor to FILE pointer.
FILE* fp = fdopen(fd, "r");
Then you can use getline().

Resources