How does the stdin buffer work? - c

When using functions such as scanf you read bytes from a buffer where (usually) data coming from the keyboard is stored. How is this data stored? Is it stored inside a fixed size vector? Is there any way to access it directly from code?

The buffer used by the standard libraries input routines is private to the implementation of the standard library. You cannot access it other than through the published interface to the standard library.

The setvbuf() function lets you reconfigure the type of buffering for a stdio stream and replace the buffer with one you have allocated. That doesn't mean you should access the buffer behind the C library's back, but it does give you control over the size and whether the stream is unbuffered, line-buffered, or fully buffered.

You can't read the buffer directly. The best you can do is read keystrokes directly as they're typed, effectively enabling you to write your own scanf( ). To see the code for reading keystrokes, search for 'kbhit.c' on this page: http://pwilson.net/sample.html

Related

What is stdin in C language?

I want to build my own scanf function. Basic idea is data from a memory address and save it to another memory address.
What is stdin? Is it a memory-address like 000ffaa?
If it is a memory-address what is it so I can build my own scanf function. Thanks!.
No, stdin is not "a memory address".
It's an I/O stream, basically an operating-system level abstraction that allows data to be read (or written, in the case of stdout).
You need to use the proper stream-oriented I/O functions to read from the stream.
Of course you can read from RAM too, so it's best to write your own function to require a function that reads a character, then you can adapt that function to either read from RAM or from stdin.
Something like:
int my_scanf(int (*getchar_callback)(void *state), void *state, const char *fmt, ...);
Is usually reasonable. The state pointer is some user-defined state that is required by the getchar_callback() function, and passed to it by my_scanf().
stdin is an "input stream", which is an abstract term for something that takes input from the user or from a file. It is an abstraction layer sitting on top of the actual file handling and I/O. The purpose of streams is mainly to make your code portable between different systems.
Reading/writing to memory is much more low-level and has nothing to do with streams as such. In order to use a stream in a meaningful way, you would have to know how a certain compiler implements the stream internally, which may not be public information. In some cases, like in Windows, streams are defined by the OS itself and can get accessed through API calls.
If you are looking to build your own scanf function, you would have to look into specific API functions for a specific OS, then build your own abstraction layer on top of those.
On Unix everything is a file
https://en.wikipedia.org/wiki/Everything_is_a_file
Or like they notice
Everything is a file descriptor
You can find on unix system /dev/stdin who is a symbolic link to /dev/fd/0 who is a Character special file

Difference between int fpurge() and int fflush() in C

Can anyone please explain me about difference between fpurge(FILE *stream) and fflush(FILE *stream) in C?
Both fflush() and fpurge() will discard any unwritten or unread data in the buffer.
Please explain me the exact difference between these two and also their pros and cons.
"... both fflush and fpurge will discard any unwritten or unread data in the buffer..." : No.
fflush:
The function fflush forces a write of all buffered data for the given output or update stream via the stream's underlying write function. The open status of the stream is unaffected.
If the stream argument is NULL, fflush flushes all open output streams.
fpurge:
The function fpurge erases any input or output buffered in the given stream. For output streams this discards any unwritten output. For input streams this discards any input read from the underlying object but not yet obtained via getc. This includes any text pushed back via ungetc. (P.S.: there also exists __fpurge, which does the same, but without returning any value).
Besides the obvious effect on buffered data, one use where you would notice the difference is with input streams. You can fpurge one such stream (although it is usually a mistake, possibly conceptual). Depending on the environment, you might not fflush an input stream (its behaviour might be undefined, see man page). In addition to the above remarked differences: 1) the cases where they lead to errors are different, and 2) fflush can work on all output streams with a single statement, as said (this might be very useful).
As for pros and cons, I would not really quote any... they simply work different (mostly), so you should know when to use each.
In addition to the functional difference (what you were asking), there is a portability difference: fflush is a standard function, while fpurge is not (and __fpurge is not either).
Here you have the respective man pages (fflush, fpurge).
To start with, both the functions clear the buffers (type of operable buffers are discussed below), the major difference is what happens with the data present in the buffer.
For fflush(), the data is forced to be written to disk.
For fpurge(), data is discarded.
That being said, fflush() is a standard C function, mentioned in the C11, chapter ยง7.21.5.2.
Whereas, fpurge() is a non-portable and non-standard function. From the man page
These functions are nonstandard and not portable. The function
fpurge() was introduced in 4.4BSD and is not available under Linux.
The function __fpurge() was introduced in Solaris, and is present in
glibc 2.1.95 and later.
That said, the major usage-side difference is,
Calling fflush() with input stream is undefined behavior.
If stream points to an output stream or an update stream in which the most recent
operation was not input, the fflush function causes any unwritten data for that stream
to be delivered to the host environment to be written to the file; otherwise, the behavior is
undefined.
Calling fpurge() with input stream is defined.
For input streams
this discards any input read from the underlying object but not yet
obtained via getc(3); this includes any text pushed back via
ungetc(3).
Still, try to stick to fflush().

what is the point of using the setvbuf() function in c?

Why would you want to set aside a block of memory in setvbuf()?
I have no clue why you would want to send your read/write stream to a buffer.
setvbuf is not intended to redirect the output to a buffer (if you want to perform IO on a buffer you use sprintf & co.), but to tightly control the buffering behavior of the given stream.
In facts, C IO functions don't immediately pass the data to be written to the operating system, but keep an intermediate buffer to avoid continuously performing (potentially expensive) system calls, waiting for the buffer to fill before actually performing the write.
The most basic case is to disable buffering altogether (useful e.g. if writing to a log file, where you want the data to go to disk immediately after each output operation) or, on the other hand, to enable block buffering on streams where it is disabled by default (or is set to line-buffering). This may be useful to enhance output performance.
Setting a specific buffer for output can be useful if you are working with a device that is known to work well with a specific buffer size; on the other side, you may want to have a small buffer to cut down on memory usage in memory-constrained environments, or to avoid losing much data in case of power loss without disabling buffering completely.
In C files opened with e.g. fopen are by default buffered. You can use setvbuf to supply your own buffer, or make the file operations completely unbuffered (like to stderr is).
It can be used to create fmemopen functionality on systems that doesn't have that function.
The size of a files buffer can affect Standard library call I/O rates. There is a table in Chap 5 of Steven's 'Advanced Programming in the UNIX Environment' that shows I/O throughput increasing dramatically with I/O buffer size, up to ~16K then leveling off. A lot of other factor can influenc overall I/O throughtput, so this one "tuning" affect may or may not be a cureall. This is the main reason for "why" other than turning off/on buffering.
Each FILE structure has a buffer associated with it internally. The reason behind this is to reduce I/O, and real I/O operations are time costly.
All your read/write will be buffered until the buffer is full. All the data buffered will be output/input in one real I/O operation.
Why would you want to set aside a block of memory in setvbuf()?
For buffering.
I have no clue why you would want to send your read/write stream to a buffer.
Neither do I, but as that's not what it does the point is moot.
"The setvbuf() function may be used on any open stream to change its buffer" [my emphasis]. In other words it alread has a buffer, and all the function does is change that. It doesn't say anything about 'sending your read/write streams to a buffer". I suggest you read the man page to see what it actually says. Especially this part:
When an output stream is unbuffered, information appears on the destination file or terminal as soon as written; when it is block buffered many characters are saved up and written as a block; when it is line buffered characters are saved up until a newline is output or input is read from any stream attached to a terminal device (typically stdin).

character reading in C

I am struggling to know the difference between these functions. Which one of them can be used if i want to read one character at a time.
fread()
read()
getc()
Depending on how you want to do it you can use any of those functions.
The easier to use would probably be fgetc().
fread() : read a block of data from a stream (documentation)
read() : posix implementation of fread() (documentation)
getc() : get a character from a stream (documentation). Please consider using fgetc() (doc)instead since it's kind of saffer.
fread() is a standard C function for reading blocks of binary data from a file.
read() is a POSIX function for doing the same.
getc() is a standard C function (a macro, actually) for reading a single character from a file - i.e., it's what you are looking for.
In addition to the other answers, also note that read is unbuffered method to read from a file. fread provides an internal buffer and reading is buffered. The buffer size is determined by you. Also each time you call read a system call occurs which reads the amount of bytes you told it to. Where as with fread it will read a chunk in the internal buffer and return you only the bytes you need. For each call on fread it will first check if it can provide you with more data from the buffer, if not it makes a system call (read) and gets a chunk more data and returns you only the portion you wanted.
Also read directly handles the file descriptor number, where fread needs the file to be opened as a FILE pointer.
The answer depends on what you mean by "one character at a time".
If you want to ensure that only one character is consumed from the underlying file descriptor (which may refer to a non-seekable object like a pipe, socket, or terminal device) then the only solution is to use read with a length of 1. If you use strace (or similar) to monitor a shell script using the shell command read, you'll see that it repeatedly calls read with a length of 1. Otherwise it would risk reading too many bytes (past the newline it's looking for) and having subsequent processes fail to see the data on the "next line".
On the other hand, if the only program that should be performing further reads is your program itself, fread or getc will work just fine. Note that getc should be a lot faster than fread if you're just reading a single byte.

buffer confusion

Could anyone clarify on the types of buffers used by a program?
For eg:
I have a C program that reads from a stdin to stdout.
What are the buffers involved here? I'm aware that there are 2.
One provided by the kernel on which a user don't have any control.
One provided with standard streams namely stdout, stdin and stderr. Each having a separate buffer.
Is my understanding correct?
Thanks,
John
If you are working on linux/unix then you could more easily understand that there are three streams namely
1.STDIN: FILE DESCRIPTOR VALUE 0 (IN unix)
2.STDOUT :FILE DESCRIPTOR VALUE 1
3.STDERR :FILE DESCRIPTOR VALUE 2
By default these streams correspond to keyboard and monitor.In unix we can change these streams to read input from file instead of keyboard.To display output on a file rather than monitor using close(),dup() system calls.Yes there are 3 buffers involved.To clear the contents of input buffer in c we use fflush() function.
If you want to know more about handling these streams in UNIX then let me Know.
The kernel (or other underlying system) could have any number of layers of buffering, depending on what device is being read from and the details of the kernel implementation; in some systems there is no buffering at that level, with the data being read directly into the userspace buffer.
The stdio library allocates a buffer for stdin; the size is implementation-dependent but you can control the size and even use your own buffer with setvbuf. It also allows you to control whether I/O is fully buffered (as much data is read into the buffer as is available), line buffered (data is is only read until a newline is encountered), or unbuffered. The default is line buffering if the system can determine that the input is a terminal, else fully buffered.
The story is similar for stdout. stderr is by default unbuffered.

Resources